CHAPTER 2 Modeling Distributions of Data 2.1 Describing Location in a Distribution The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers
Describing Location in a Distribution Learning Objectives After this section, you should be able to: Percentiles: find and interpret the percentile of an individual value within a distribution of data. Cumulative relative frequency graph: estimate percentiles and individual values using a cumulative relative frequency graph. Z-score: find and interpret the standardized score (z-score) of an individual value within a distribution of data. Effect of adding, subtracting, multiplying by, or dividing by a constant on the shape, center, and spread of a distribution of data. The Practice of Statistics, 5 th Edition 2
Measuring Position: Percentiles One way to describe the location of a value in a distribution is to tell what percent of observations are less than it. The p th percentile of a distribution is the value with p percent of the observations less than it. Example Jenny earned a score of 86 on her test. How did she perform relative to the rest of the class? Michael got a 73 on his test. How did he perform? And whose score is more unusual? 6 7 7 2334 7 5777899 8 00123334 8 569 9 03 Her score was greater than 21 of the 25 observations. Since 21 of the 25, or 84%, of the scores are below hers, Jenny is at the 84 th percentile in the class s test score distribution. The Practice of Statistics, 5 th Edition 3
Practice: Wins in major League baseball The stemplot below shows the number of wins for each of the 30 Major League Baseball teams in 2012. Problem: Find the percentiles for the following teams: (a) The Minnesota Twins, who won 66 games. (b) The Washington Nationals, who won 98 games. (c) The Texas Rangers and Baltimore Orioles, who both won 93 games. The Practice of Statistics, 5 th Edition 4
Cumulative relative frequency (%) Cumulative Relative Frequency Graphs Interesting graphs with percentile: one being a cumulative relative frequency graph displays the cumulative relative frequency of each class of a frequency distribution. Age Age of First 44 Presidents When They Were Inaugurated Frequenc y Relative frequency Freq 40-44 2 2/44 = 4.5% 45-49 7 7/44 = 15.9% 50-54 13 13/44 = 29.5% 55-59 12 12/44 = 34% 60-64 7 7/44 = 15.9% 65-69 3 3/44 = 6.8% Cumul. Cumulative Cumulative Cumulative frequency Relative relative Frequency frequency 2 2/44 = 4.5% 9 9/44 = 20.5% 22 22/44 = 50.0% 34 34/44 = 77.3% 41 41/44 = 93.2% 44 44/44 = 100% 100 80 60 40 20 0 40 45 50 55 60 65 70 Age at inauguration The Practice of Statistics, 5 th Edition 5
Practice: State median household incomes The table and cumulative relative frequency graph below show the distribution of median household incomes for the 50 states and the District of Columbia in a recent year. Median income ($1000s) Frequency Relative frequency Cumulative frequency Cumulative relative frequency 35 to < 40 1 1/51 = 0.020 1 1/51 = 0.020 40 to < 45 10 10/51 = 0.196 11 11/51 = 0.216 45 to < 50 14 14/51 = 0.275 25 25/51 = 0.490 50 to < 55 12 12/51 = 0.236 37 37/51 = 0.725 55 to < 60 5 5/51 = 0.098 42 42/51 = 0.824 60 to < 65 6 6/51 = 0.118 48 48/51 = 0.941 65 to < 70 3 3/51 = 0.059 51 51/51 = 1.000 Problem: Use the cumulative relative frequency graph for the state income data to answer each question. (a) At what percentile is California, with a median household income of $57,445? (b) Estimate and interpret the first quartile of this distribution. The Practice of Statistics, 5 th Edition 6
Measuring Position: z-scores A z-score tells us how many standard deviations from the mean an observation falls, and in what direction. If x is an observation from a distribution that has known mean and standard deviation, the standardized score of x is: z = x - mean standard deviation A standardized score is often called a z-score. Example Jenny earned a score of 86 on her test. The class mean is 80 and the standard deviation is 6.07. What is her standardized score? z = x - mean standard deviation = 86-80 6.07 = 0.99 The Practice of Statistics, 5 th Edition 7
PRACTICE: The single-season home run record for Major League Baseball has been set just three times since Babe Ruth hit 60 home runs in 1927. Roger Maris hit 61 in 1961, Mark McGwire hit 70 in 1998,and Barry Bonds hit 73 in 2001. In an absolute sense, Barry Bonds had the best performance of these four players, because he hit the most home runs in a single season. However, in a relative sense, this may not be true. Baseball historians suggest that hitting a home run has been easier in some eras than others. This is due to many factors, including quality of batters, quality of pitchers, hardness of the baseball, dimensions of ballparks, and possible use of performance-enhancing drugs. To make a fair comparison, we should see how these performances rate relative to those of other hitters during the same year. Problem: Compute the standardized scores for each performance using the information in the table. Which player had the most outstanding performance relative to his peers? Year Player HR Mean SD 1927 Babe Ruth 60 7.2 9.7 1961 Roger Maris 61 18.8 13.4 1998 Mark McGwire 70 20.7 12.7 2001 Barry Bonds 73 21.4 13.2 The Practice of Statistics, 5 th Edition 8
Homework Page100 #1-18 odd Extra Credit Project: Due October 3rd Recommended websites: www.censusatschool.com & www.gapminder.com Individual work Extra credit: will replace your lowest POP Quiz grade Must make a poster and present the results to the class Chapter 2 Quiz: October 15th The Practice of Statistics, 5 th Edition 9