CHAPTER 2 Modeling Distributions of Data 2.1 Describing Location in a Distribution The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers
2.1 Reading Quiz Choose the correct answer to complete the sentence. 1. Two ways of describing an individual s location within a distribution are percentiles and (p-scores/z-scores). 2. A cumulative relative graph shows the accumulating (count/percent) of observations as you move through the classes in increasing order. 3. A z-score says how many (standard deviations/percentage points) x lies above or below the distribution mean. 4. When changing units of measure for a data set by adding, subtracting, multiplying, or dividing, the shape (will/will not) change. 5. The spread of data (changes/does not change) when all the values in a data set are multiplied by a constant value. The Practice of Statistics, 5 th Edition 2
Describing Location in a Distribution Learning Objectives After this section, you should be able to: FIND and INTERPRET the percentile of an individual value within a distribution of data. ESTIMATE percentiles and individual values using a cumulative relative graph. FIND and INTERPRET the standardized score (z-score) of an individual value within a distribution of data. DESCRIBE the effect of adding, subtracting, multiplying by, or dividing by a constant on the shape, center, and spread of a distribution of data. The Practice of Statistics, 5 th Edition 3
Topic Outline Describing patterns and departures from patterns (20% to 30%): A. Constructing and Interpreting graphical displays of distributions of univariate data (dotplot, stemplot, histogram, cumulative plot) 1. Center and spread 2. Clusters and gaps 3. Outliers and unusual features 4. Shape B. Summarizing distributions of univariate data 1. Measuring center: median, mean 2. Measuring spread: range, interquartile rand, standard deviation 3. Measuring position: quartiles, percentiles, standardized scores (zscores) 4. Using boxplots 5. The effect of changing units on summary measures The Practice of Statistics, 5 th Edition 4
Measuring Position: Percentiles One way to describe the location of a value in a distribution is to tell what percent of observations are less than it. The p th percentile of a distribution is the value with p percent of the observations less than it. Example Jenny earned a score of 86 on her test. How did she perform relative to the rest of the class? 6 7 7 2334 7 5777899 8 00123334 8 569 9 03 Her score was greater than 21 of the 25 observations. Since 21 of the 25, or 84%, of the scores are below hers, Jenny is at the 84 th percentile in the class s test score distribution. The Practice of Statistics, 5 th Edition 5
Who Wins in Major League Baseball? The stemplot below shows the number of wins for each of the 30 Major League Baseball teams in 2012. 5 5 6 14 6 6899 7 234 7 569 8 113 8 56889 9 033444 9 578 Key: 6 1 represents a team with 61 wins. Problem: Find the percentiles for the following teams: (a) The Minnesota Twins, who won 66 games. (b) The Washington Nationals, who won 98 games. (c) The Texas Rangers and Baltimore Orioles, who both won 93 games. The Practice of Statistics, 5 th Edition 6
Cumulative relative (%) Cumulative Relative Frequency Graphs A cumulative relative graph displays the cumulative relative of each class of a distribution. Age of First 44 Presidents When They Were Inaugurated 100 Age Frequency Relative Cumulative Cumulative relative 80 40-44 2 2/44 = 4.5% 2 2/44 = 4.5% 60 45-49 7 7/44 = 15.9% 9 9/44 = 20.5% 40 50-54 55-59 13 13/44 = 29.5% 12 12/44 = 34% 22 22/44 = 50.0% 34 34/44 = 77.3% 20 60-64 65-69 7 7/44 = 15.9% 3 3/44 = 6.8% 41 41/44 = 93.2% 44 44/44 = 100% 0 40 45 50 55 60 65 70 Age at inauguration The Practice of Statistics, 5 th Edition 7
State Median Household Income The table and cumulative relative graph belowshow the distribution of median household incomes for the 50 states and the District of Columbia in a recent year. Median income ($1000s Frequency 35 to < 40 1 40 to < 45 10 45 to < 50 14 50 to < 55 12 55 to < 60 5 60 to < 65 6 65 to < 70 3 Relative Cumulative Cumulative relative The point at (50,0.49) means 49% of the states had median household incomes less than $50,000. The point at (55, 0.725) means that 72.5% of the states had median household incomes less than $55,000. Thus, 72.5% 49% = 23.5% of the states had median household incomes between $50,000 and $55,000 because the cumulative relative increased by 0.235. The Practice of Statistics, 5 th Edition 8
State Median Household Income The table and cumulative relative graph belowshow the distribution of median household incomes for the 50 states and the District of Columbia in a recent year. Median income ($1000s Frequency 35 to < 40 1 40 to < 45 10 45 to < 50 14 50 to < 55 12 55 to < 60 5 60 to < 65 6 65 to < 70 3 Relative Cumulative Cumulative relative The point at (50,0.49) means 49% of the states had median household incomes less than $50,000. The point at (55, 0.725) means that 72.5% of the states had median household incomes less than $55,000. Thus, 72.5% 49% = 23.5% of the states had median household incomes between $50,000 and $55,000 because the cumulative relative increased by 0.235. The Practice of Statistics, 5 th Edition 9
State Median Household Income, part 2 Problem: Use the cumulative relative graph for the state income data to answer each question. (a) At what percentile is California, with a median household income of $57,445? (b) Estimate and interpret the first quartile of this distribution. a) California is at about the 78 th percentile. b) About 25% of states have median incomes less than $45,000. The Practice of Statistics, 5 th Edition 10
Measuring Position: z-scores A z-score tells us how many standard deviations from the mean an observation falls, and in what direction. If x is an observation from a distribution that has known mean and standard deviation, the standardized score of x is: z = x - mean standard deviation A standardized score is often called a z-score. Example Jenny earned a score of 86 on her test. The class mean is 80 and the standard deviation is 6.07. What is her standardized score? z = x - mean standard deviation = 86-80 6.07 = 0.99 The Practice of Statistics, 5 th Edition 11
Wins in Major League Baseball In 2012, the mean number of wins for teams in Major League Baseball was 81 with a standard deviation of 11.9 wins. Problem: Find and interpret the z-scores for the following teams. (a) The New York Yankees, with 95 wins. z = x xҧ = s x 95 81 11.9 = 1.18 The Yankees were 1.18 standard deviations above the mean number of wins. (b) The New York Mets, with 74 wins. z = x xҧ = s x 74 81 11.9 = 0.59 The Mets were 0.59 standard deviations below the mean number of wins. The Practice of Statistics, 5 th Edition 12
Home run kings The single-season home run record for Major League Baseball has been set just three times since Babe Ruth hit 60 home runs in 1927. Roger Maris hit 61 in 1961, Mark McGwire hit 70 in 1998,and Barry Bonds hit 73 in 2001. In an absolute sense, Barry Bonds had the best performance of these four players, because he hit the most home runs in a single season. However, in a relative sense, this may not be true. Problem: Compute the standardized scores for each performance using the information in the table. Which player had the most outstanding performance relative to his peers? Year Player HR Mean SD 1927 Babe Ruth 60 7.2 9.7 1961 Roger Maris 61 18.8 13.4 1998 Mark McGwire 70 20.7 12.7 2001 Barry Bonds 73 21.4 13.2 z = x xҧ 60 7.2 = = 5.44 s x 9.7 z = x xҧ 61 18.8 = = 3.15 s x 13.4 z = x xҧ 70 20.7 = = 3.88 s x 12.7 z = x xҧ 73 21.4 = = 3.91 s x 13.2 Although all four performances were outstanding, Babe Ruth can still lay claim to being the single-season home run champ, relatively speaking. The Practice of Statistics, 5 th Edition 13
Blood Pressure Larry came home very excited after a visit to his doctor. He announced proudly to his wife, My doctor says my blood pressure is at the 90 th percentile among men my age. That means I m better off than about 90% of similar men. How should his wife, who is a statistician, respond to Larry s statement? Larry s wife should gently break the news that being in the 90th percentile is not good news in this situation because lower blood pressure is better. About 90% of men similar to Larry have lower blood pressures. The doctor was suggesting that Larry take action to lower his blood pressure. The Practice of Statistics, 5 th Edition 14
Describing Location in a Distribution Section Summary In this section, we learned how to FIND and INTERPRET the percentile of an individual value within a distribution of data. ESTIMATE percentiles and individual values using a cumulative relative graph. FIND and INTERPRET the standardized score (z-score) of an individual value within a distribution of data. The Practice of Statistics, 5 th Edition 15
PAGE 99 2, 6, 10, 12 Homework The Practice of Statistics, 5 th Edition 16