Chapter 2 Displaying and Describing Categorical Data
Graphs for Categorical Variables Our concern will be two types of visual representations.
Graphs for Categorical Variables Our concern will be two types of visual representations. 1 Pie charts
Graphs for Categorical Variables Our concern will be two types of visual representations. 1 Pie charts 2 Bar graphs
Graphs for Categorical Variables Our concern will be two types of visual representations. 1 Pie charts 2 Bar graphs Since these both deal with categorical data, they both deal with counts in categories, so we are graphing either raw counts (frequency) or percentages (relative frequency).
Graphs for Categorical Variables Our concern will be two types of visual representations. 1 Pie charts 2 Bar graphs Since these both deal with categorical data, they both deal with counts in categories, so we are graphing either raw counts (frequency) or percentages (relative frequency). Important Note: For all graphs, be sure to label everything clearly.
Pie Charts Example You sit on an overpass and record the color of the first 100 cars you see. The results are as follows: color frequency red 15 blue 21 green 18 white 22 black 19 other 5 Construct a pie chart to illustrate the relationship between the colors of these cars.
How We Construct Pie Charts What are the important things to keep in mind?
How We Construct Pie Charts What are the important things to keep in mind? 1 Must make up to 100%
How We Construct Pie Charts What are the important things to keep in mind? 1 Must make up to 100% 2 Sections must be in proper size relation
How We Construct Pie Charts What are the important things to keep in mind? 1 Must make up to 100% 2 Sections must be in proper size relation To accomplish the latter, we use central angles. Definition The central angle is the angle whose vertex is the center of the circle and whose rays are radii of the circle.
Central Angles So how do we find the central angle associated with a section of the pie chart?
Central Angles So how do we find the central angle associated with a section of the pie chart? Central Angle Calculation To find the central angle, multiply the relative frequency by 360.
Central Angles So how do we find the central angle associated with a section of the pie chart? Central Angle Calculation To find the central angle, multiply the relative frequency by 360. color frequency central angle red 15.15 360 = 54
Central Angles So how do we find the central angle associated with a section of the pie chart? Central Angle Calculation To find the central angle, multiply the relative frequency by 360. color frequency central angle red 15.15 360 = 54 blue 21 75.6 green 18 64.8 white 22 79.2 black 19 68.4 other 5 18
The Resulting Pie Chart Blue Green 18% 21% 15% Red 5% Other 22% 19% White Black
Drawbacks to Pie Charts 1 We must use relative frequencies
Drawbacks to Pie Charts 1 We must use relative frequencies 2 It is just as easy to read the frequency table as the pie chart
Drawbacks to Pie Charts 1 We must use relative frequencies 2 It is just as easy to read the frequency table as the pie chart 3 Only good for categorical variables
Drawbacks to Pie Charts 1 We must use relative frequencies 2 It is just as easy to read the frequency table as the pie chart 3 Only good for categorical variables 4 Not easy to compare two variables
Drawbacks to Pie Charts 1 We must use relative frequencies 2 It is just as easy to read the frequency table as the pie chart 3 Only good for categorical variables 4 Not easy to compare two variables 5 Easy to manipulate
Drawbacks to Pie Charts 1 We must use relative frequencies 2 It is just as easy to read the frequency table as the pie chart 3 Only good for categorical variables 4 Not easy to compare two variables 5 Easy to manipulate 6 Be careful that all percentages are calculated the same way (i.e. the same denominator)
Another Pie Chart Example Example The following is a breakdown of the solid waste that made up America s garbage in 2000. Values given represent millions of tons. Material Weight Food 25.9 Glass 12.8 Metal 18.0 Paper 86.7 Plastics 24.7 Rubber 15.8 Wood 12.7 Yard Trimmings 27.7 Other 7.5 Create a pie chart to represent this data.
Solution We can t make a pie chart with this data; at least not yet. What do we need?
Solution We can t make a pie chart with this data; at least not yet. What do we need? Material Weight Relative Frequency Food 25.9 11.2 % Glass 12.8 5.5% Metal 18.0 7.8% Paper 86.7 37.4% Plastics 24.7 10.7% Rubber 15.8 6.8% Wood 12.7 5.5% Yard Trimmings 27.7 11.9% Other 7.5 3.2% 231.9
Solution Now we can find the central angles and create our pie chart.
Solution Now we can find the central angles and create our pie chart. Material Weight Relative Frequency Central Angle Food 25.9 11.2% 40.3 Glass 12.8 5.5% 19.8 Metal 18.0 7.8% 28.1 Paper 86.7 37.4% 134.6 Plastics 24.7 10.7% 38.5 Rubber 15.8 6.8% 24.5 Wood 12.7 5.5% 19.8 Yard Trimmings 27.7 11.9% 42.8 Other 7.5 3.2% 11.5
Paper 37% Metal Glass 7% 6% Food 11% Plastics 11% 3% Other 12% 7% 6% Trimmings Wood Rubber
Bar Graphs Bar graphs basically give us the same information as a pie chart, with a couple advantages.
Bar Graphs Bar graphs basically give us the same information as a pie chart, with a couple advantages. 1 We can use raw frequencies as all that matters is the size of the rectangle
Bar Graphs Bar graphs basically give us the same information as a pie chart, with a couple advantages. 1 We can use raw frequencies as all that matters is the size of the rectangle 2 We can compare multiple variables
Bar Graphs Bar graphs basically give us the same information as a pie chart, with a couple advantages. 1 We can use raw frequencies as all that matters is the size of the rectangle 2 We can compare multiple variables Important The bars must all be of the same width.
The Good and the Not-So-Good Generally used for categorical variables
The Good and the Not-So-Good Generally used for categorical variables Bars can be vertical or horizontal
The Good and the Not-So-Good Generally used for categorical variables Bars can be vertical or horizontal Cannot analyze distribution because the order of the classes is not necessarily in numerical order
The Good and the Not-So-Good Generally used for categorical variables Bars can be vertical or horizontal Cannot analyze distribution because the order of the classes is not necessarily in numerical order Can be used for comparisons
Bar Graph Example Example The growth of the US population age 65 and over is given in the table. Create a bar graph to represent this data. 1900 4.1 1970 9.8 1910 4.3 1980 11.3 1920 4.7 1990 12.5 1930 5.5 2000 12.4 1940 6.9 2010 13.2 1950 8.1 2020 16.5 1960 9.2 2030 20.0
Here s the Graph 20 Age of Seniors by Decade Percent 15 10 5 1900 1910 1920 1930 1940 1950 1960 Year 1970 1980 1990 2000 2010 2020 2030
Note Notice that we can t do much analysis here other than see which class has the most. We don t even have to put the bars in any kind of order; if we did by size, we d have a paredo graph. But since order does not matter, we cannot talk about the distribution the same way we will be able to for quantitative variables.
Comparisons Using Bar Graphs Example Create a bar graph for the given causes of death and analyze the results. Values given are the number per 100,000 people. Cause of Death 1970 1980 1990 2000 Cardiovascular 640 509 387 318 Cancer 199 208 216 201 Accidents 62 46 36 34
And Our Graph Number of Deaths (per 100,000) 600 450 300 150 Causes of Death Legend Cardiovascular Cancer Accidents 1970 1980 1990 Year 2000
And Our Graph Number of Deaths (per 100,000) 600 450 300 150 Causes of Death Legend Cardiovascular Cancer Accidents 1970 1980 1990 Year 2000 Analysis?
Analysis Cancer and accidents are roughly the same in each decade
Analysis Cancer and accidents are roughly the same in each decade Cardiovascular disease decreases each decade and is approaching level of cancer deaths
Segmented Bar Graphs Usage Segmented bar graphs are best used to show the cummulative effect of a categorical variable.
Contingency Tables Definition Contingency tables are another way to display data. They differ from frequency tables in that each variable is distributed across different categories.
Contingency Tables Definition Contingency tables are another way to display data. They differ from frequency tables in that each variable is distributed across different categories. Contingency tables look like charts with values based on different conditions. We often see these broken out by gender and by whether or not the people have a particular characteristic.
Contingency Table Example Example Suppose the following data was collected from voters leaving a polling station during the 2008 Presidential election. People were asked how they identified themselves and for which candidate they voted. Strong Weak Ind Ind Ind Weak Strong Row Total Dem Dem Dem Repub Repub Repub McCain 4 17 15 18 69 104 164 389 (2.6) (14.9) (11.7) (40.2) (79.5) (89.6) (97.0) (49.1) Obama 136 95 104 25 12 12 5 390 (97.4) (85.1) (83.1) (57.6) (14.2) (10.4) (3.0) (49.2) Other 0 0 7 1 6 0 0 13 (0.0) (0.0) (5.2) (2.3) (6.4) (0.0) (0.0) (1.7) Column 140 111 126 44 87 116 169 792 Total (100) (100) (100) (100) (100) (100) (100) (100)
Now the Questions Strong Weak Ind Ind Ind Weak Strong Row Total Dem Dem Dem Repub Repub Repub McCain 4 17 15 18 69 104 164 389 (2.6) (14.9) (11.7) (40.2) (79.5) (89.6) (97.0) (49.1) Obama 136 95 104 25 12 12 5 390 (97.4) (85.1) (83.1) (57.6) (14.2) (10.4) (3.0) (49.2) Other 0 0 7 1 6 0 0 13 (0.0) (0.0) (5.2) (2.3) (6.4) (0.0) (0.0) (1.7) Column 140 111 126 44 87 116 169 792 Total (100) (100) (100) (100) (100) (100) (100) (100) 1 What percent of those who identify themselves as Independent Democrats voted for Obama?
Now the Questions Strong Weak Ind Ind Ind Weak Strong Row Total Dem Dem Dem Repub Repub Repub McCain 4 17 15 18 69 104 164 389 (2.6) (14.9) (11.7) (40.2) (79.5) (89.6) (97.0) (49.1) Obama 136 95 104 25 12 12 5 390 (97.4) (85.1) (83.1) (57.6) (14.2) (10.4) (3.0) (49.2) Other 0 0 7 1 6 0 0 13 (0.0) (0.0) (5.2) (2.3) (6.4) (0.0) (0.0) (1.7) Column 140 111 126 44 87 116 169 792 Total (100) (100) (100) (100) (100) (100) (100) (100) 1 What percent of those who identify themselves as Independent Democrats voted for Obama? 2 What percent of those who identify themselves as Weak Republicans voted for McCain?
Now the Questions Strong Weak Ind Ind Ind Weak Strong Row Total Dem Dem Dem Repub Repub Repub McCain 4 17 15 18 69 104 164 389 (2.6) (14.9) (11.7) (40.2) (79.5) (89.6) (97.0) (49.1) Obama 136 95 104 25 12 12 5 390 (97.4) (85.1) (83.1) (57.6) (14.2) (10.4) (3.0) (49.2) Other 0 0 7 1 6 0 0 13 (0.0) (0.0) (5.2) (2.3) (6.4) (0.0) (0.0) (1.7) Column 140 111 126 44 87 116 169 792 Total (100) (100) (100) (100) (100) (100) (100) (100) 1 What percent of those who identify themselves as Independent Democrats voted for Obama? 2 What percent of those who identify themselves as Weak Republicans voted for McCain? 3 What percent of people identify themselves as Independent?
What If We Went The Other Way? Strong Weak Ind Ind Ind Weak Strong Row Total Dem Dem Dem Repub Repub Repub McCain 4 17 15 18 69 104 164 389 (2.6) (14.9) (11.7) (40.2) (79.5) (89.6) (97.0) (49.1) Obama 136 95 104 25 12 12 5 390 (97.4) (85.1) (83.1) (57.6) (14.2) (10.4) (3.0) (49.2) Other 0 0 7 1 6 0 0 13 (0.0) (0.0) (5.2) (2.3) (6.4) (0.0) (0.0) (1.7) Column 140 111 126 44 87 116 169 792 Total (100) (100) (100) (100) (100) (100) (100) (100) What percent of McCain voters consider themselves as weak Republicans?
What If We Went The Other Way? Strong Weak Ind Ind Ind Weak Strong Row Total Dem Dem Dem Repub Repub Repub McCain 4 17 15 18 69 104 164 389 (2.6) (14.9) (11.7) (40.2) (79.5) (89.6) (97.0) (49.1) Obama 136 95 104 25 12 12 5 390 (97.4) (85.1) (83.1) (57.6) (14.2) (10.4) (3.0) (49.2) Other 0 0 7 1 6 0 0 13 (0.0) (0.0) (5.2) (2.3) (6.4) (0.0) (0.0) (1.7) Column 140 111 126 44 87 116 169 792 Total (100) (100) (100) (100) (100) (100) (100) (100) What percent of McCain voters consider themselves as weak Republicans? These percentages are based on the column sums. What must we consider to find our answer?
What If We Went The Other Way? Strong Weak Ind Ind Ind Weak Strong Row Total Dem Dem Dem Repub Repub Repub McCain 4 17 15 18 69 104 164 389 (2.6) (14.9) (11.7) (40.2) (79.5) (89.6) (97.0) (49.1) Obama 136 95 104 25 12 12 5 390 (97.4) (85.1) (83.1) (57.6) (14.2) (10.4) (3.0) (49.2) Other 0 0 7 1 6 0 0 13 (0.0) (0.0) (5.2) (2.3) (6.4) (0.0) (0.0) (1.7) Column 140 111 126 44 87 116 169 792 Total (100) (100) (100) (100) (100) (100) (100) (100) What percent of McCain voters consider themselves as weak Republicans? These percentages are based on the column sums. What must we consider to find our answer? Row totals
What If We Went The Other Way? Strong Weak Ind Ind Ind Weak Strong Row Total Dem Dem Dem Repub Repub Repub McCain 4 17 15 18 69 104 164 389 (2.6) (14.9) (11.7) (40.2) (79.5) (89.6) (97.0) (49.1) Obama 136 95 104 25 12 12 5 390 (97.4) (85.1) (83.1) (57.6) (14.2) (10.4) (3.0) (49.2) Other 0 0 7 1 6 0 0 13 (0.0) (0.0) (5.2) (2.3) (6.4) (0.0) (0.0) (1.7) Column 140 111 126 44 87 116 169 792 Total (100) (100) (100) (100) (100) (100) (100) (100) What percent of McCain voters consider themselves as weak Republicans? These percentages are based on the column sums. What must we consider to find our answer? Row totals 104 389 = 26.7%