Chapter 3.4. Measures of position and outliers. Julian Chan. September 11, Department of Mathematics Weber State University

Chapter 3.4 Measures of position and outliers Julian Chan Department of Mathematics Weber State University September 11, 2011

Intro 1 We will talk about how to measure the position of an observation which describes it s relative position with in the entire data set.

Intro 1 We will talk about how to measure the position of an observation which describes it s relative position with in the entire data set. 2 The object which allows us to do this is called the Z-score.

Intro 1 The difference x µ is a measure of how far away your observed value is from the mean.

Intro 1 The difference x µ is a measure of how far away your observed value is from the mean. 2 If we divide this by our standard deviation we have expressed how far away our observation is from the mean in terms of standard deviation.

Z-score 1 This is called the Z-score and is given in two forms

Z-score 1 This is called the Z-score and is given in two forms 2 Population Z-score: z = x µ σ.

Z-score 1 This is called the Z-score and is given in two forms 2 Population Z-score: 3 Sample Z-score: z = x µ σ. z = x x S.

Z-score 1 This is called the Z-score and is given in two forms 2 Population Z-score: z = x µ σ. 3 Sample Z-score: z = x x S. 4 The Z-score is unitless. It has a mean of 0 and standard deviation of 1.

Examples 1 In 2007 the Yankee s lead the American league with 968 runs scored, while the Phillies led the national league with 892 runs.

Examples 1 In 2007 the Yankee s lead the American league with 968 runs scored, while the Phillies led the national league with 892 runs. 2 A natural question is which team is better at scoring run?

Examples 1 It is tempting to say that the Yankees are the better team at scoring runs, but the Phillies are in the National league where there is no designated hitter. Instead the pitcher (usually a weak hitter) has to bat in place of a stronger batter. 2 We will determine which team is better at scoring runs by determining which team is better in their respective leagues according to their Z-score or in terms of their respive standard deviation. This is equivalent to determining to proportion of teams in their respective leagues which score at least the specified number of runs.

Examples 1 We can compute the population mean and standard deviation for each league.

Examples 1 We can compute the population mean and standard deviation for each league. 2 We find that µ A = 793.9, µ N = 763, and for the standard deviation σ A = 73.5, σ N = 58.9.

Examples 1 We can compute the Z-score for the Yankees as: z = x µ A 968 793.9 = = 2.73 σ A 73.5

Examples 1 We can compute the Z-score for the Yankees as: z = x µ A 968 793.9 = = 2.73 σ A 73.5 2 We can compute the Z-score for the Phillies as: z = x µ N 892 763 = = 2.19 σ N 58.9

Examples 1 We can compute the Z-score for the Yankees as: z = x µ A 968 793.9 = = 2.73 σ A 73.5 2 We can compute the Z-score for the Phillies as: z = x µ N 892 763 = = 2.19 σ N 58.9 3 The Yankees scored 2.37 standard deviations above the mean of runs scored while the Phillies only scored 2.19 standard deviations above the mean number of runs scored. We conclude that the Yankees are better at scoring runs.

Examples 1 Suppose the the population mean of cell phone calls is 15 minutes, and the standard deviation is 5 minutes.

Examples 1 Suppose the the population mean of cell phone calls is 15 minutes, and the standard deviation is 5 minutes. 2 You have two friends John and Lisa. John talks to you on the phone for 20 minutes, and Lisa for 25.

Examples 1 We can compute the Z-score for John s phone call as: z = x µ A 20 15 = = 1 σ A 5

Examples 1 We can compute the Z-score for John s phone call as: z = x µ A 20 15 = = 1 σ A 5 2 We can compute the Z-score for Lisa as: z = x µ N 25 15 = = 2 σ N 5

Examples 1 We can compute the Z-score for John s phone call as: z = x µ A 20 15 = = 1 σ A 5 2 We can compute the Z-score for Lisa as: z = x µ N 25 15 = = 2 σ N 5 3 The 68 percent rule says that 68 percent of phone calls will last between 10 to 20 minutes.

Examples 1 The 68 percent rule implies that 16 percent of phone calls will last more than 20 minutes or more. Hence only 16 percent of the population talks as long as John does or more.

Examples 1 The 68 percent rule implies that 16 percent of phone calls will last more than 20 minutes or more. Hence only 16 percent of the population talks as long as John does or more. 2 The 95 percent rule implies that only 2.5 percent of phone calls will last 25 minutes or more. Hence only 2.5 percent of the population will talk as long as Lisa does or more.

1 The Median divides the lower 50 percent of the data set from the upper 50 percent of the data set.

1 The Median divides the lower 50 percent of the data set from the upper 50 percent of the data set. 2 The Kth percentile of a data set denoted by P k is a value suck that k percent of the observations are less than or equal to to the value.

1 The 15th percentile of the head circumference of males 3 to 5 months of age is 41 centimeters.

1 The 15th percentile of the head circumference of males 3 to 5 months of age is 41 centimeters. 2 This means that 15 percent of males 3 to 5 months have a head circumference of less than 41 centimeters, and 85 percent have a head circumference of larger than 41 centimeters.

Quartiles 1 Here is how to find the quartiles of your data set

Quartiles 1 Here is how to find the quartiles of your data set 2 Arrange your data in ascending order.

Quartiles 1 Here is how to find the quartiles of your data set 2 Arrange your data in ascending order. 3 Determine the median.

Quartiles 1 Here is how to find the quartiles of your data set 2 Arrange your data in ascending order. 3 Determine the median. 4 Determine the first and third quartiles Q 1 and Q 3, by dividing the data sets into two halves. The bottom half will be the observations below the median and the top the observations above the median. The median of the lower half is Q 1 and the median of the upper half is Q 3.

Example 1 The data.97, 1.14, 1.85, 2.34, 2.47, 2.78, 3.41 3.48 represent the 8 months of rain in Chicago.

Example 1 The data.97, 1.14, 1.85, 2.34, 2.47, 2.78, 3.41 3.48 represent the 8 months of rain in Chicago. 2 We already have the data arranged in order so we find the median. We compute that the median is 2.34+2.47 2 = 2.405 3 We find that the lower data set is.97, 1.14, 1.85, 2.34 which has a median of Q 1 = 1.14+1.85 2 = 1.495. We find that the upper data set is 2.47, 2.78, 3.41 3.48, and the median of this data set is Q 3 = 2.78+3.41 2 = 3.095.

IQR 1 The interquartile range denoted by IQR is the range of the middle 50 percent of the observations in a data set. That is the IQR is given by the formula IQR = Q 3 Q 1

IQR 1 The interquartile range denoted by IQR is the range of the middle 50 percent of the observations in a data set. That is the IQR is given by the formula IQR = Q 3 Q 1 2 In our last example we can find the IQR = 3.095 2.405 =.69

Outliers 1 An outlier is an extreem observation.

Outliers 1 An outlier is an extreem observation. 2 For example if we take a simple random sample of hourly salsries and found the data collected was given by 15, 20, 21, 23, 25, 100. Once can see that 100 is an outlier.

Outliers 1 We can check for outliers using quartiles.

Outliers 1 We can check for outliers using quartiles. 2 First determine the first and third quartiles of the data set.

Outliers 1 We can check for outliers using quartiles. 2 First determine the first and third quartiles of the data set. 3 Compute the IQR

Outliers 1 We can check for outliers using quartiles. 2 First determine the first and third quartiles of the data set. 3 Compute the IQR 4 Determine the lower fence and upper fence.

Outliers 1 We can check for outliers using quartiles. 2 First determine the first and third quartiles of the data set. 3 Compute the IQR 4 Determine the lower fence and upper fence. Lower fence = Q 1 1.5(IQR) Upper fence = Q 3 1.5(IQR)

Example 1 In our last example we had

Example 1 In our last example we had Determine the lower fence and upper fence.

Example 1 In our last example we had Determine the lower fence and upper fence. Lower fence = Q 1 1.5(IQR) = 2.405 1.5(.69) = 1.37

Example 1 In our last example we had Determine the lower fence and upper fence. Lower fence = Q 1 1.5(IQR) = 2.405 1.5(.69) = 1.37 Upper fence = Q 3 1.5(IQR) = 3.095 + 1.5(.69) = 4.13

Example 1 In our last example we had Determine the lower fence and upper fence. Lower fence = Q 1 1.5(IQR) = 2.405 1.5(.69) = 1.37 Upper fence = Q 3 1.5(IQR) = 3.095 + 1.5(.69) = 4.13 2 We thus have two outliers! They are the observations.97 and 1.14.