When you create a histogram, you need to divide the data set into separate groups. However, some statistical data may be right on the borderline between two groups. What do you do in these situations?
Take a look at the following table showing Best Actress Oscar Award winners between 1928 and 1935:
Year | Winner | Age | Movie |
---|---|---|---|
1928 | Laura Gainor | 22 | Sunrise |
1929 | Mary Pickford | 37 | Coquette |
1930 | Norma Shearer | 30 | The Divorcee |
1931 | Marie Dressler | 62 | Min and Bill |
1932 | Helen Hayes | 32 | The Sin of Madelon Claudet |
1933 | Katharine Hepburn | 26 | Morning Glory |
1934 | Collette Colbert | 31 | It Happened One Night |
1935 | Bette Davis | 27 | Dangerous |
Did you notice that one actress’s age lies right on a borderline? Norma Shearer was 30 years old in 1930 when she won the Oscar for The Divorcee. Now, say you divide the age groups in the histogram into 5-year segments (20–25, 25–30, 30–35, and so on). Would you place her in the 25–30 age group (the lower bar) or the 30–35 age group (the upper bar)?
As long as you are consistent with all the data points, you can either put all the borderline points into their respective lower bars or put all of them into their respective upper bars. The important thing is to pick a direction and be consistent.
The histogram in this example went with the convention of putting all borderline values into their respective upper bars — which puts Norma Shearer’s age in the third bar, the 30–35 age group of the histogram. It is common practice to make the bar intervals left inclusive (that is, the bars include the left endpoint but not the right), just as this example histogram does. Hence, this bar contains the age of 30, but not 35.