Chapter 2 Displaying and Describing Categorical Data

Similar documents
Chapter 3 Displaying and Describing Categorical Data

Chapter 2 - Displaying and Describing Categorical Data

Chapter 3 - Displaying and Describing Categorical Data

Acknowledgement: Author is indebted to Dr. Jennifer Kaplan, Dr. Parthanil Roy and Dr Ashoke Sinha for allowing him to use/edit many of their slides.

Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2

STAT 155 Introductory Statistics. Lecture 2: Displaying Distributions with Graphs

3. EXCEL FORMULAS & TABLES

Note that all proportions are between 0 and 1. at risk. How to construct a sentence describing a. proportion:

Grade 6 Math Circles Fall October 7/8 Statistics

Internet Technology Fundamentals. To use a passing score at the percentiles listed below:

Chapter 2 - Frequency Distributions and Graphs

Descriptive Stats. Review

Constructing and Interpreting Two-Way Frequency Tables

The Coach then sorts the 25 players into separate teams and positions

Organizing Quantitative Data

Age of Fans

Chapter 2: Visual Description of Data

CIRCLE The Center for Information & Research on Civic Learning & Engagement. Electoral Engagement Among Minority Youth

Warm-up. Make a bar graph to display these data. What additional information do you need to make a pie chart?

Daron Shaw Department of Government University of Texas at Austin

America s Diversity Explosion: What it means for Presidential Politics. WILLIAM H. FREY Brookings Institution and University of Michigan

PSY201: Chapter 5: The Normal Curve and Standard Scores

1. The data below gives the eye colors of 20 students in a Statistics class. Make a frequency table for the data.

NAME: A graph contains five major parts: a. Title b. The independent variable c. The dependent variable d. The scales for each variable e.

Measuring Relative Achievements: Percentile rank and Percentile point

Safety Assessment of Installing Traffic Signals at High-Speed Expressway Intersections

CHAPTER 1 ORGANIZATION OF DATA SETS

Bivariate Data. Frequency Table Line Plot Box and Whisker Plot

Stats 2002: Probabilities for Wins and Losses of Online Gambling

5.1. Data Displays Batter Up. My Notes ACTIVITY

DATA SCIENCE SUMMER UNI VIENNA

The Red & Blue Nation

Ch. 8 Review - Analyzing Data and Graphs

Background Information. Project Instructions. Problem Statement. EXAM REVIEW PROJECT Microsoft Excel Review Baseball Hall of Fame Problem

2013 Grade 6 Mathematics Set B

Graphing Activities This lab was created by Mr. Buckley from Edward Knox High School. Credit is given for this original activity to Mr. Buckley.

FOR RELEASE: WEDNESDAY, SEPTEMBER 11 AT 4 PM

Statistical Analysis Project - How do you decide Who s the Best?

Practice Test Unit 06B 11A: Probability, Permutations and Combinations. Practice Test Unit 11B: Data Analysis

The 2018 FIU Cuba Poll: How Cuban-Americans in Miami View U.S. Policies toward Cuba

You are to develop a program that takes as input the scorecard filled out by Bob and that produces as output the correct scorecard.

Energy of a Rolling Ball

Name. TAKS Practice Test GO ON

CIRCLE The Center for Information & Research on Civic Learning & Engagement. Civic Engagement Among Minority Youth

Unit 6, Lesson 1: Organizing Data

Sum Fun Tournament Meeting (Multiple Topics)

GALLUP NEWS SERVICE 2018 MIDTERM ELECTION

Fun with M&M s. By: Cassandra Gucciardo. Sorting

Wildlife Ad Awareness & Attitudes Survey 2015

EMBARGOED FOR RELEASE: Sunday, October 14 at 9:00 a.m.

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Wenbing Zhao. Department of Electrical and Computer Engineering

Quantitative Literacy: Thinking Between the Lines

Outline. Terminology. EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Steps in Capacity Planning and Management

Principal factors contributing to heavy haul freight train safety improvements in North America: a quantitative analysis

Reality Math Dot Sulock, University of North Carolina at Asheville

March 7-11, N= 1,362 Republican N= 698

8-1. The Pythagorean Theorem and Its Converse. Vocabulary. Review. Vocabulary Builder. Use Your Vocabulary

Collect marine debris around the coral reef areas surrounding Ao Nang and Phi Phi National Park.

AP Statistics Midterm Exam 2 hours

3. EXCEL FORMULAS & TABLES

Unit 7. Math Problem 1. This segment will go through the endpoint of the original line segment, perpendicular to the line segment.

The Effect of Newspaper Entry and Exit on Electoral Politics Matthew Gentzkow, Jesse M. Shapiro, and Michael Sinkinson Web Appendix

Lab 5: Descriptive Statistics

Section 5 Critiquing Data Presentation - Teachers Notes

EMBARGOED FOR RELEASE: Monday October 1 at 4:00 p.m.

Foundations of Data Science. Spring Midterm INSTRUCTIONS. You have 45 minutes to complete the exam.

Practice Test Unit 6B/11A/11B: Probability and Logic

4-3 Rate of Change and Slope. Warm Up. 1. Find the x- and y-intercepts of 2x 5y = 20. Describe the correlation shown by the scatter plot. 2.

More Interest in GOP Platform than Romney s Speech

Confidence Interval Notes Calculating Confidence Intervals

Louis M. Edwards Mathematics Super Bowl Valencia Community College -- April 19, 2002

Shedding Light on Motion Episode 4: Graphing Motion

True class interval. frequency total. frequency

1ACE Exercise 4. Name Date Class

Mark Scheme (Results) Summer 2009

STANDARD SCORES AND THE NORMAL DISTRIBUTION

A Simple Visualization Tool for NBA Statistics

Look again at the election of the student council president used in the previous activities.

Three Strikes Analysis:

Powered Lawn Mower Market in United Kingdom to Market Size, Development, and Forecasts

9.3 Histograms and Box Plots

Figure 39. Yearly Trend in Death Rates for Drowning: NSW, Year

See if you can determine what the following magnified photos are. Number your paper to 5.

PRIMARY AND GENERAL ELECTION CAMPAIGNS IN THE UNITED STATES. November 5, 2012 Hans Noel Georgetown University

Chapter 4 Displaying Quantitative Data

8-1. The Pythagorean Theorem and Its Converse. Vocabulary. Review. Vocabulary Builder. Use Your Vocabulary

2013 Excellence in Mathematics Contest Team Project Level I (Precalculus and above) School Name: Group Members:

STAT 115 : INTRO TO EXPERIMENTAL DESIGN. Science answers questions with experiments

SCIENTIFIC COMMITTEE SEVENTH REGULAR SESSION August 2011 Pohnpei, Federated States of Micronesia

Catapult Project. Even though we will be wearing safety glasses, the catapult must not have any sharp edges that could injure yourself or others.

Energy Skate Park. Part 1-Designing a Skate Park

CONTENTS III CUMULATIVE REVIEW Copyright by Phoenix Learning Resources. Inc. All Rights Reserved.

Descriptive Statistics Project Is there a home field advantage in major league baseball?

Gonzales Research & Marketing Strategies

4According to professional regulations, a baseball bat

Diameter in cm. Bubble Number. Bubble Number Diameter in cm

Cambridge International Examinations Cambridge Ordinary Level

Lesson 1: Decimal Place Value. Concept/Topic to Teach: Students use Bruins statistical data to order and compare decimals to the thousandths.

UAB MATH-BY-MAIL CONTEST, 2004

GCSE 4353/01 MATHEMATICS (UNITISED SCHEME) UNIT 3: Calculator-Allowed Mathematics FOUNDATION TIER

Transcription:

Chapter 2 Displaying and Describing Categorical Data

Graphs for Categorical Variables Our concern will be two types of visual representations.

Graphs for Categorical Variables Our concern will be two types of visual representations. 1 Pie charts

Graphs for Categorical Variables Our concern will be two types of visual representations. 1 Pie charts 2 Bar graphs

Graphs for Categorical Variables Our concern will be two types of visual representations. 1 Pie charts 2 Bar graphs Since these both deal with categorical data, they both deal with counts in categories, so we are graphing either raw counts (frequency) or percentages (relative frequency).

Graphs for Categorical Variables Our concern will be two types of visual representations. 1 Pie charts 2 Bar graphs Since these both deal with categorical data, they both deal with counts in categories, so we are graphing either raw counts (frequency) or percentages (relative frequency). Important Note: For all graphs, be sure to label everything clearly.

Pie Charts Example You sit on an overpass and record the color of the first 100 cars you see. The results are as follows: color frequency red 15 blue 21 green 18 white 22 black 19 other 5 Construct a pie chart to illustrate the relationship between the colors of these cars.

How We Construct Pie Charts What are the important things to keep in mind?

How We Construct Pie Charts What are the important things to keep in mind? 1 Must make up to 100%

How We Construct Pie Charts What are the important things to keep in mind? 1 Must make up to 100% 2 Sections must be in proper size relation

How We Construct Pie Charts What are the important things to keep in mind? 1 Must make up to 100% 2 Sections must be in proper size relation To accomplish the latter, we use central angles. Definition The central angle is the angle whose vertex is the center of the circle and whose rays are radii of the circle.

Central Angles So how do we find the central angle associated with a section of the pie chart?

Central Angles So how do we find the central angle associated with a section of the pie chart? Central Angle Calculation To find the central angle, multiply the relative frequency by 360.

Central Angles So how do we find the central angle associated with a section of the pie chart? Central Angle Calculation To find the central angle, multiply the relative frequency by 360. color frequency central angle red 15.15 360 = 54

Central Angles So how do we find the central angle associated with a section of the pie chart? Central Angle Calculation To find the central angle, multiply the relative frequency by 360. color frequency central angle red 15.15 360 = 54 blue 21 75.6 green 18 64.8 white 22 79.2 black 19 68.4 other 5 18

The Resulting Pie Chart Blue Green 18% 21% 15% Red 5% Other 22% 19% White Black

Drawbacks to Pie Charts 1 We must use relative frequencies

Drawbacks to Pie Charts 1 We must use relative frequencies 2 It is just as easy to read the frequency table as the pie chart

Drawbacks to Pie Charts 1 We must use relative frequencies 2 It is just as easy to read the frequency table as the pie chart 3 Only good for categorical variables

Drawbacks to Pie Charts 1 We must use relative frequencies 2 It is just as easy to read the frequency table as the pie chart 3 Only good for categorical variables 4 Not easy to compare two variables

Drawbacks to Pie Charts 1 We must use relative frequencies 2 It is just as easy to read the frequency table as the pie chart 3 Only good for categorical variables 4 Not easy to compare two variables 5 Easy to manipulate

Drawbacks to Pie Charts 1 We must use relative frequencies 2 It is just as easy to read the frequency table as the pie chart 3 Only good for categorical variables 4 Not easy to compare two variables 5 Easy to manipulate 6 Be careful that all percentages are calculated the same way (i.e. the same denominator)

Another Pie Chart Example Example The following is a breakdown of the solid waste that made up America s garbage in 2000. Values given represent millions of tons. Material Weight Food 25.9 Glass 12.8 Metal 18.0 Paper 86.7 Plastics 24.7 Rubber 15.8 Wood 12.7 Yard Trimmings 27.7 Other 7.5 Create a pie chart to represent this data.

Solution We can t make a pie chart with this data; at least not yet. What do we need?

Solution We can t make a pie chart with this data; at least not yet. What do we need? Material Weight Relative Frequency Food 25.9 11.2 % Glass 12.8 5.5% Metal 18.0 7.8% Paper 86.7 37.4% Plastics 24.7 10.7% Rubber 15.8 6.8% Wood 12.7 5.5% Yard Trimmings 27.7 11.9% Other 7.5 3.2% 231.9

Solution Now we can find the central angles and create our pie chart.

Solution Now we can find the central angles and create our pie chart. Material Weight Relative Frequency Central Angle Food 25.9 11.2% 40.3 Glass 12.8 5.5% 19.8 Metal 18.0 7.8% 28.1 Paper 86.7 37.4% 134.6 Plastics 24.7 10.7% 38.5 Rubber 15.8 6.8% 24.5 Wood 12.7 5.5% 19.8 Yard Trimmings 27.7 11.9% 42.8 Other 7.5 3.2% 11.5

Paper 37% Metal Glass 7% 6% Food 11% Plastics 11% 3% Other 12% 7% 6% Trimmings Wood Rubber

Bar Graphs Bar graphs basically give us the same information as a pie chart, with a couple advantages.

Bar Graphs Bar graphs basically give us the same information as a pie chart, with a couple advantages. 1 We can use raw frequencies as all that matters is the size of the rectangle

Bar Graphs Bar graphs basically give us the same information as a pie chart, with a couple advantages. 1 We can use raw frequencies as all that matters is the size of the rectangle 2 We can compare multiple variables

Bar Graphs Bar graphs basically give us the same information as a pie chart, with a couple advantages. 1 We can use raw frequencies as all that matters is the size of the rectangle 2 We can compare multiple variables Important The bars must all be of the same width.

The Good and the Not-So-Good Generally used for categorical variables

The Good and the Not-So-Good Generally used for categorical variables Bars can be vertical or horizontal

The Good and the Not-So-Good Generally used for categorical variables Bars can be vertical or horizontal Cannot analyze distribution because the order of the classes is not necessarily in numerical order

The Good and the Not-So-Good Generally used for categorical variables Bars can be vertical or horizontal Cannot analyze distribution because the order of the classes is not necessarily in numerical order Can be used for comparisons

Bar Graph Example Example The growth of the US population age 65 and over is given in the table. Create a bar graph to represent this data. 1900 4.1 1970 9.8 1910 4.3 1980 11.3 1920 4.7 1990 12.5 1930 5.5 2000 12.4 1940 6.9 2010 13.2 1950 8.1 2020 16.5 1960 9.2 2030 20.0

Here s the Graph 20 Age of Seniors by Decade Percent 15 10 5 1900 1910 1920 1930 1940 1950 1960 Year 1970 1980 1990 2000 2010 2020 2030

Note Notice that we can t do much analysis here other than see which class has the most. We don t even have to put the bars in any kind of order; if we did by size, we d have a paredo graph. But since order does not matter, we cannot talk about the distribution the same way we will be able to for quantitative variables.

Comparisons Using Bar Graphs Example Create a bar graph for the given causes of death and analyze the results. Values given are the number per 100,000 people. Cause of Death 1970 1980 1990 2000 Cardiovascular 640 509 387 318 Cancer 199 208 216 201 Accidents 62 46 36 34

And Our Graph Number of Deaths (per 100,000) 600 450 300 150 Causes of Death Legend Cardiovascular Cancer Accidents 1970 1980 1990 Year 2000

And Our Graph Number of Deaths (per 100,000) 600 450 300 150 Causes of Death Legend Cardiovascular Cancer Accidents 1970 1980 1990 Year 2000 Analysis?

Analysis Cancer and accidents are roughly the same in each decade

Analysis Cancer and accidents are roughly the same in each decade Cardiovascular disease decreases each decade and is approaching level of cancer deaths

Segmented Bar Graphs Usage Segmented bar graphs are best used to show the cummulative effect of a categorical variable.

Contingency Tables Definition Contingency tables are another way to display data. They differ from frequency tables in that each variable is distributed across different categories.

Contingency Tables Definition Contingency tables are another way to display data. They differ from frequency tables in that each variable is distributed across different categories. Contingency tables look like charts with values based on different conditions. We often see these broken out by gender and by whether or not the people have a particular characteristic.

Contingency Table Example Example Suppose the following data was collected from voters leaving a polling station during the 2008 Presidential election. People were asked how they identified themselves and for which candidate they voted. Strong Weak Ind Ind Ind Weak Strong Row Total Dem Dem Dem Repub Repub Repub McCain 4 17 15 18 69 104 164 389 (2.6) (14.9) (11.7) (40.2) (79.5) (89.6) (97.0) (49.1) Obama 136 95 104 25 12 12 5 390 (97.4) (85.1) (83.1) (57.6) (14.2) (10.4) (3.0) (49.2) Other 0 0 7 1 6 0 0 13 (0.0) (0.0) (5.2) (2.3) (6.4) (0.0) (0.0) (1.7) Column 140 111 126 44 87 116 169 792 Total (100) (100) (100) (100) (100) (100) (100) (100)

Now the Questions Strong Weak Ind Ind Ind Weak Strong Row Total Dem Dem Dem Repub Repub Repub McCain 4 17 15 18 69 104 164 389 (2.6) (14.9) (11.7) (40.2) (79.5) (89.6) (97.0) (49.1) Obama 136 95 104 25 12 12 5 390 (97.4) (85.1) (83.1) (57.6) (14.2) (10.4) (3.0) (49.2) Other 0 0 7 1 6 0 0 13 (0.0) (0.0) (5.2) (2.3) (6.4) (0.0) (0.0) (1.7) Column 140 111 126 44 87 116 169 792 Total (100) (100) (100) (100) (100) (100) (100) (100) 1 What percent of those who identify themselves as Independent Democrats voted for Obama?

Now the Questions Strong Weak Ind Ind Ind Weak Strong Row Total Dem Dem Dem Repub Repub Repub McCain 4 17 15 18 69 104 164 389 (2.6) (14.9) (11.7) (40.2) (79.5) (89.6) (97.0) (49.1) Obama 136 95 104 25 12 12 5 390 (97.4) (85.1) (83.1) (57.6) (14.2) (10.4) (3.0) (49.2) Other 0 0 7 1 6 0 0 13 (0.0) (0.0) (5.2) (2.3) (6.4) (0.0) (0.0) (1.7) Column 140 111 126 44 87 116 169 792 Total (100) (100) (100) (100) (100) (100) (100) (100) 1 What percent of those who identify themselves as Independent Democrats voted for Obama? 2 What percent of those who identify themselves as Weak Republicans voted for McCain?

Now the Questions Strong Weak Ind Ind Ind Weak Strong Row Total Dem Dem Dem Repub Repub Repub McCain 4 17 15 18 69 104 164 389 (2.6) (14.9) (11.7) (40.2) (79.5) (89.6) (97.0) (49.1) Obama 136 95 104 25 12 12 5 390 (97.4) (85.1) (83.1) (57.6) (14.2) (10.4) (3.0) (49.2) Other 0 0 7 1 6 0 0 13 (0.0) (0.0) (5.2) (2.3) (6.4) (0.0) (0.0) (1.7) Column 140 111 126 44 87 116 169 792 Total (100) (100) (100) (100) (100) (100) (100) (100) 1 What percent of those who identify themselves as Independent Democrats voted for Obama? 2 What percent of those who identify themselves as Weak Republicans voted for McCain? 3 What percent of people identify themselves as Independent?

What If We Went The Other Way? Strong Weak Ind Ind Ind Weak Strong Row Total Dem Dem Dem Repub Repub Repub McCain 4 17 15 18 69 104 164 389 (2.6) (14.9) (11.7) (40.2) (79.5) (89.6) (97.0) (49.1) Obama 136 95 104 25 12 12 5 390 (97.4) (85.1) (83.1) (57.6) (14.2) (10.4) (3.0) (49.2) Other 0 0 7 1 6 0 0 13 (0.0) (0.0) (5.2) (2.3) (6.4) (0.0) (0.0) (1.7) Column 140 111 126 44 87 116 169 792 Total (100) (100) (100) (100) (100) (100) (100) (100) What percent of McCain voters consider themselves as weak Republicans?

What If We Went The Other Way? Strong Weak Ind Ind Ind Weak Strong Row Total Dem Dem Dem Repub Repub Repub McCain 4 17 15 18 69 104 164 389 (2.6) (14.9) (11.7) (40.2) (79.5) (89.6) (97.0) (49.1) Obama 136 95 104 25 12 12 5 390 (97.4) (85.1) (83.1) (57.6) (14.2) (10.4) (3.0) (49.2) Other 0 0 7 1 6 0 0 13 (0.0) (0.0) (5.2) (2.3) (6.4) (0.0) (0.0) (1.7) Column 140 111 126 44 87 116 169 792 Total (100) (100) (100) (100) (100) (100) (100) (100) What percent of McCain voters consider themselves as weak Republicans? These percentages are based on the column sums. What must we consider to find our answer?

What If We Went The Other Way? Strong Weak Ind Ind Ind Weak Strong Row Total Dem Dem Dem Repub Repub Repub McCain 4 17 15 18 69 104 164 389 (2.6) (14.9) (11.7) (40.2) (79.5) (89.6) (97.0) (49.1) Obama 136 95 104 25 12 12 5 390 (97.4) (85.1) (83.1) (57.6) (14.2) (10.4) (3.0) (49.2) Other 0 0 7 1 6 0 0 13 (0.0) (0.0) (5.2) (2.3) (6.4) (0.0) (0.0) (1.7) Column 140 111 126 44 87 116 169 792 Total (100) (100) (100) (100) (100) (100) (100) (100) What percent of McCain voters consider themselves as weak Republicans? These percentages are based on the column sums. What must we consider to find our answer? Row totals

What If We Went The Other Way? Strong Weak Ind Ind Ind Weak Strong Row Total Dem Dem Dem Repub Repub Repub McCain 4 17 15 18 69 104 164 389 (2.6) (14.9) (11.7) (40.2) (79.5) (89.6) (97.0) (49.1) Obama 136 95 104 25 12 12 5 390 (97.4) (85.1) (83.1) (57.6) (14.2) (10.4) (3.0) (49.2) Other 0 0 7 1 6 0 0 13 (0.0) (0.0) (5.2) (2.3) (6.4) (0.0) (0.0) (1.7) Column 140 111 126 44 87 116 169 792 Total (100) (100) (100) (100) (100) (100) (100) (100) What percent of McCain voters consider themselves as weak Republicans? These percentages are based on the column sums. What must we consider to find our answer? Row totals 104 389 = 26.7%