Describing Location in a Distribution (2.1) Measuring Position: Percentiles One way to describe the location of a value in a distribution is to tell what percent of observations are less than it. De#inition: The pth percentile of a distribution is the value with p percent of the observations less than it.
The stemplot below shows the number of wins for each of the 30 Major League Baseball teams in 2009. 5 9 6 2455 7 00455589 8 0345667778 9 123557 10 3 Problem: Find the percentiles for the following teams: (a) The Colorado Rockies, who won 92 games. (b) The New York Yankees, who won 103 games. The stemplot below shows the number of wins for each of the 30 Major League Baseball teams in 2009. 5 9 6 2455 7 00455589 8 0345667778 9 123557 10 3 Problem: Find the following - (c) The percentile for the Kansas City Royals, who won 65 games. (d) A MLB team represented the 75th percentile for number of wins. Assuming no team had the same number of wins that they did, approximately how many teams had more wins than they did?
A cumulative relative frequency graph (or ogive) displays the cumulative relative frequency of each class of a frequency distribution. Here is a table showing the distribution of median household incomes for the 50 states and the District of Columbia. Here is a table showing the distribution of median household incomes for the 50 states and the District of Columbia. Construct a cumulative relative frequency graph (ogive) of this data set.
Problem: Use the cumulative relative frequency graph for the state income data to answer each question. (a) At what percentile is California, with a median income of $57,445? (b) Estimate and interpret the ]irst quartile of this solution. (c) Are there any states whose median income can be considered an outlier? Measuring Position: z-scores A z-score tells us how many standard deviations from the mean an observation falls, and in what direction. De#inition: If x is an observation from a distribution that has known mean and standard deviation, the standardized value of x is: A standardized value is often called a z-score.
In 2009, the mean number of wins for teams in the MLB was 81 with a standard deviation of 11.4 wins. Problem: Find and interpret the z-scores for the following teams. (a) The New York Yankees, with 103 wins. (b) The Baltimore Orioles, with 64 wins. The single-season home run record for major league baseball has been set just three times since Babe Ruth hit 60 home runs in 1927. Roger Maris hit 61 in 1961, Mark McGwire hit 70 in 1998 and Barry Bonds hit 73 in 2001. In an absolute sense, Barry Bonds had the best performance of these four players, since he hit the most home runs in a single season. However, in a relative sense this may not be true. Baseball historians suggest that hitting a home run has been easier in some eras than others. This is due to many factors, including quality of batters, quality of pitchers, hardness of the baseball, dimensions of ballparks, and possible use of performance-enhancing drugs. To make a fair comparison, we should see how these performances rate relative to others hitters during the same year.
Problem: Compute the standardized scores for each performance. Which player had the most outstanding performance relative to his peers? Transforming Data (adding and subtracting constants) Transforming converts the original observations from the original units of measurements to another scale. Transformations can affect the shape, center, and spread of a distribution. EXAMPLE: Record your guess for the width of this room (length of blue wall) on a postit and bring up to the smart panel. Here are your classes guesses: Create a dotplot of the data and describe the distribution of guesses.
The actual width of the room is: Determine the error associated with each guess by subtracting the actual width from each guess and record here: Construct a dotplot of the errors and describe their distribution. In this example we added (a negative...) to each data point/observation. What did this do to the shape? spread? center?
Adding the same number a (either positive, zero, or negative) to each observation: adds a to measures of center and location/ position (mean, median, quartiles, percentiles), BUT does NOT change the shape of the distribution or measures of spread (range, IQR, standard deviation, variance) Transforming Data (multiplying and dividing by constants) EXAMPLE (con't): Suppose we wanted to convert your guesses for the width of the room from feet to inches. Multiply all guesses from the previous example by 12 in./ft. to convert the data to inches. Construct a dotplot of the new data and describe the distribution of guesses (in inches).
In this example we multiplied each data point/ observation by the same constant. What did this do to the shape? spread? center? Multiplying (or dividing) each observation by the same number b (either positive, zero, or negative): multiplies (divides) measures of center and location/position (mean, median, quartiles, percentiles) by b multiplies (divides) measures of spread and location/position (range, IQR, standard deviation) by b (and variance by b 2 ) does NOT change the shape of the distribution
EXAMPLE: In 2010, Taxi Cabs in New York City charged an initial fee of $2.50 plus $2 per mile. In equation form, fare = 2.50 + 2(miles). At the end of a month a businessman collects all of his taxi cab receipts and calculates some numerical summaries. The mean fare he paid was $15.45 with a standard deviation of $10.20. What are the mean and standard deviation of the lengths of his cab rides in miles? In Chapter 1, we developed a kit of graphical and numerical tools for describing distributions. Now, we ll add one more step to the strategy. Exploring Quantitative Data 1. Always plot your data: make a graph. 2. Look for the overall pattern (shape, center, and spread) and for striking departures such as outliers. 3. Calculate a numerical summary (summary statistics) to brie]ly describe center and spread. 4. Sometimes the overall pattern of a large number of observations is so regular that we can describe it by a smooth curve.
De#inition: A density curve is a curve that is always on or above the horizontal axis, and has area exactly 1 underneath it. A density curve describes the overall pattern of a distribution. The area under the curve and above any interval of values on the horizontal axis is the proportion of all observations that fall in that interval. The histogram below shows the distribution of batting average (proportion of hits) for the 432 Major League Baseball players with at least 100 plate appearances in the 2009 season. The smooth curve shows the overall shape of the distribution. In the ]irst graph below, the bars in red represent the proportion of players who had batting averages of at least 0.270. There are 177 such players out of a total of 432, for a proportion of 0.410. In the second graph below, the area under the curve to the right of 0.270 is shaded. This area is 0.391, only 0.019 away from the actual proportion of 0.410.