Chapter 3.4. Measures of position and outliers. Julian Chan. September 11, Department of Mathematics Weber State University

Similar documents
How are the values related to each other? Are there values that are General Education Statistics

3.3 - Measures of Position

The pth percentile of a distribution is the value with p percent of the observations less than it.

AP Stats Chapter 2 Notes

DESCRIBE the effect of adding, subtracting, multiplying by, or dividing by a constant on the shape, center, and spread of a distribution of data.

Today s plan: Section 4.2: Normal Distribution

Full file at

CHAPTER 2 Modeling Distributions of Data

Algebra 1 Unit 7 Day 2 DP Box and Whisker Plots.notebook April 10, Algebra I 04/10/18 Aim: How Do We Create Box and Whisker Plots?

Descriptive Stats. Review

Unit 3 ~ Data about us

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

IHS AP Statistics Chapter 2 Modeling Distributions of Data MP1

Age of Fans

Warm-up. Make a bar graph to display these data. What additional information do you need to make a pie chart?

Fundamentals of Machine Learning for Predictive Data Analytics

NUMB3RS Activity: Is It for Real? Episode: Hardball

Chapter 9: Hypothesis Testing for Comparing Population Parameters

Name Date Period. E) Lowest score: 67, mean: 104, median: 112, range: 83, IQR: 102, Q1: 46, SD: 17

Section 3.2: Measures of Variability

PRACTICE PROBLEMS FOR EXAM 1

STT 315 Section /19/2014

Quantitative Literacy: Thinking Between the Lines

Solutionbank S1 Edexcel AS and A Level Modular Mathematics

Math 243 Section 4.1 The Normal Distribution

Confidence Intervals with proportions

Smoothing the histogram: The Normal Curve (Chapter 8)

Chapter 2: Modeling Distributions of Data

1 Hypothesis Testing for Comparing Population Parameters

The Five Magic Numbers

CHAPTER 2 Modeling Distributions of Data

Bivariate Data. Frequency Table Line Plot Box and Whisker Plot

Mrs. Daniel- AP Stats Ch. 2 MC Practice

Box-and-Whisker Plots

Descriptive Statistics Project Is there a home field advantage in major league baseball?

AP STATISTICS Name Chapter 6 Applications Period: Use summary statistics to answer the question. Solve the problem.

(c) The hospital decided to collect the data from the first 50 patients admitted on July 4, 2010.

Chapter 1: Why is my evil lecturer forcing me to learn statistics?

Unit 6 Day 2 Notes Central Tendency from a Histogram; Box Plots

Scaled vs. Original Socre Mean = 77 Median = 77.1

MEANS, MEDIANS and OUTLIERS

Chapter 1: Why is my evil lecturer forcing me to learn statistics?

STAT 101 Assignment 1

STANDARD SCORES AND THE NORMAL DISTRIBUTION

Practice Test Unit 6B/11A/11B: Probability and Logic

Assignment. To New Heights! Variance in Subjective and Random Samples. Use the table to answer Questions 2 through 7.

Practice Test Unit 06B 11A: Probability, Permutations and Combinations. Practice Test Unit 11B: Data Analysis

9.3 Histograms and Box Plots

CHAPTER 2 Modeling Distributions of Data

1. The data below gives the eye colors of 20 students in a Statistics class. Make a frequency table for the data.

MATH 118 Chapter 5 Sample Exam By: Maan Omran

Diameter in cm. Bubble Number. Bubble Number Diameter in cm

Effective Use of Box Charts

In my left hand I hold 15 Argentine pesos. In my right, I hold 100 Chilean

Lesson 2 Pre-Visit Slugging Percentage

Box-and-Whisker Plots

BASEBALL SALARIES: DO YOU GET WHAT YOU PAY FOR? Comparing two or more distributions by parallel box plots

ACTIVITY: Drawing a Box-and-Whisker Plot. a. Order the data set and write it on a strip of grid paper with 24 equally spaced boxes.

Unit 3 - Data. Grab a new packet from the chrome book cart. Unit 3 Day 1 PLUS Box and Whisker Plots.notebook September 28, /28 9/29 9/30?

WorkSHEET 13.3 Univariate data II Name:

Statistics. Wednesday, March 28, 2012

MEANS, MEDIANS and OUTLIERS

Averages. October 19, Discussion item: When we talk about an average, what exactly do we mean? When are they useful?

Chapter 3 Displaying and Describing Categorical Data

Chapter 2 - Displaying and Describing Categorical Data

Chapter 3 - Displaying and Describing Categorical Data

0-13 Representing Data

Year 10 Term 2 Homework

Hitting with Runners in Scoring Position

Homework 7, Due March

North Point - Advance Placement Statistics Summer Assignment

Aim: Normal Distribution and Bell Curve

AP Statistics Midterm Exam 2 hours

Looking at Spacings to Assess Streakiness

Data Set 7: Bioerosion by Parrotfish Background volume of bites The question:

Reminders. Homework scores will be up by tomorrow morning. Please me and the TAs with any grading questions by tomorrow at 5pm

Lab 5: Descriptive Statistics

Box-and-Whisker Plots

Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA

Measuring Relative Achievements: Percentile rank and Percentile point


Math 1040 Exam 2 - Spring Instructor: Ruth Trygstad Time Limit: 90 minutes

How Fast Can You Throw?

The Reliability of Intrinsic Batted Ball Statistics Appendix

Week 7 One-way ANOVA

For questions 1-15 please assume a distribution with a mean of 30 and a standard deviation of 2.

FINAL EXAM MATH 111 FALL 2009 TUESDAY 8 DECEMBER AM-NOON

Algebra 1 Unit 6 Study Guide

Highway & Transportation (I) ECIV 4333 Chapter (4): Traffic Engineering Studies. Spot Speed

To find their sum, you could simply add from left to right, or you could pair the numbers as illustrated and add each pair.

Section 3.1: Measures of Center

AP 11.1 Notes WEB.notebook March 25, 2014

In the actual exam, you will be given more space to work each problem, so work these problems on separate sheets.

Draft - 4/17/2004. A Batting Average: Does It Represent Ability or Luck?

Paul M. Sommers. March 2010 MIDDLEBURY COLLEGE ECONOMICS DISCUSSION PAPER NO

LaDawn Bisson Measures of Central Tendency


DS5 The Normal Distribution. Write down all you can remember about the mean, median, mode, and standard deviation.

MATH 114 QUANTITATIVE REASONING PRACTICE TEST 2

Stat 139 Homework 3 Solutions, Spring 2015

Transcription:

Chapter 3.4 Measures of position and outliers Julian Chan Department of Mathematics Weber State University September 11, 2011

Intro 1 We will talk about how to measure the position of an observation which describes it s relative position with in the entire data set.

Intro 1 We will talk about how to measure the position of an observation which describes it s relative position with in the entire data set. 2 The object which allows us to do this is called the Z-score.

Intro 1 We will talk about how to measure the position of an observation which describes it s relative position with in the entire data set. 2 The object which allows us to do this is called the Z-score. 3 The Z-score measures how far away your observed data point is from the mean.

Intro 1 The difference x µ is a measure of how far away your observed value is from the mean.

Intro 1 The difference x µ is a measure of how far away your observed value is from the mean. 2 If we divide this by our standard deviation we have expressed how far away our observation is from the mean in terms of standard deviation.

Z-score 1 This is called the Z-score and is given in two forms

Z-score 1 This is called the Z-score and is given in two forms 2 Population Z-score: z = x µ σ.

Z-score 1 This is called the Z-score and is given in two forms 2 Population Z-score: 3 Sample Z-score: z = x µ σ. z = x x S.

Z-score 1 This is called the Z-score and is given in two forms 2 Population Z-score: z = x µ σ. 3 Sample Z-score: z = x x S. 4 The Z-score is unitless. It has a mean of 0 and standard deviation of 1.

Examples 1 In 2007 the Yankee s lead the American league with 968 runs scored, while the Phillies led the national league with 892 runs.

Examples 1 In 2007 the Yankee s lead the American league with 968 runs scored, while the Phillies led the national league with 892 runs. 2 A natural question is which team is better at scoring run?

Examples 1 It is tempting to say that the Yankees are the better team at scoring runs, but the Phillies are in the National league where there is no designated hitter. Instead the pitcher (usually a weak hitter) has to bat in place of a stronger batter.

Examples 1 It is tempting to say that the Yankees are the better team at scoring runs, but the Phillies are in the National league where there is no designated hitter. Instead the pitcher (usually a weak hitter) has to bat in place of a stronger batter. 2 We will determine which team is better at scoring runs by determining which team is better in their respective leagues according to their Z-score or in terms of their respive standard deviation. This is equivalent to determining to proportion of teams in their respective leagues which score at least the specified number of runs.

Examples 1 We can compute the population mean and standard deviation for each league.

Examples 1 We can compute the population mean and standard deviation for each league. 2 We find that µ A = 793.9, µ N = 763, and for the standard deviation σ A = 73.5, σ N = 58.9.

Examples 1 We can compute the Z-score for the Yankees as: z = x µ A 968 793.9 = = 2.73 σ A 73.5

Examples 1 We can compute the Z-score for the Yankees as: z = x µ A 968 793.9 = = 2.73 σ A 73.5 2 We can compute the Z-score for the Phillies as: z = x µ N 892 763 = = 2.19 σ N 58.9

Examples 1 We can compute the Z-score for the Yankees as: z = x µ A 968 793.9 = = 2.73 σ A 73.5 2 We can compute the Z-score for the Phillies as: z = x µ N 892 763 = = 2.19 σ N 58.9 3 The Yankees scored 2.37 standard deviations above the mean of runs scored while the Phillies only scored 2.19 standard deviations above the mean number of runs scored. We conclude that the Yankees are better at scoring runs.

Examples 1 Suppose the the population mean of cell phone calls is 15 minutes, and the standard deviation is 5 minutes.

Examples 1 Suppose the the population mean of cell phone calls is 15 minutes, and the standard deviation is 5 minutes. 2 You have two friends John and Lisa. John talks to you on the phone for 20 minutes, and Lisa for 25.

Examples 1 We can compute the Z-score for John s phone call as: z = x µ A 20 15 = = 1 σ A 5

Examples 1 We can compute the Z-score for John s phone call as: z = x µ A 20 15 = = 1 σ A 5 2 We can compute the Z-score for Lisa as: z = x µ N 25 15 = = 2 σ N 5

Examples 1 We can compute the Z-score for John s phone call as: z = x µ A 20 15 = = 1 σ A 5 2 We can compute the Z-score for Lisa as: z = x µ N 25 15 = = 2 σ N 5 3 The 68 percent rule says that 68 percent of phone calls will last between 10 to 20 minutes.

Examples 1 We can compute the Z-score for John s phone call as: z = x µ A 20 15 = = 1 σ A 5 2 We can compute the Z-score for Lisa as: z = x µ N 25 15 = = 2 σ N 5 3 The 68 percent rule says that 68 percent of phone calls will last between 10 to 20 minutes. 4 The 95 percent rule says that 95 percent of phone calls will last between 5 to 25 minutes.

Examples 1 The 68 percent rule implies that 16 percent of phone calls will last more than 20 minutes or more. Hence only 16 percent of the population talks as long as John does or more.

Examples 1 The 68 percent rule implies that 16 percent of phone calls will last more than 20 minutes or more. Hence only 16 percent of the population talks as long as John does or more. 2 The 95 percent rule implies that only 2.5 percent of phone calls will last 25 minutes or more. Hence only 2.5 percent of the population will talk as long as Lisa does or more.

1 The Median divides the lower 50 percent of the data set from the upper 50 percent of the data set.

1 The Median divides the lower 50 percent of the data set from the upper 50 percent of the data set. 2 The Kth percentile of a data set denoted by P k is a value suck that k percent of the observations are less than or equal to to the value.

1 The 15th percentile of the head circumference of males 3 to 5 months of age is 41 centimeters.

1 The 15th percentile of the head circumference of males 3 to 5 months of age is 41 centimeters. 2 This means that 15 percent of males 3 to 5 months have a head circumference of less than 41 centimeters, and 85 percent have a head circumference of larger than 41 centimeters.

Quartiles 1 Here is how to find the quartiles of your data set

Quartiles 1 Here is how to find the quartiles of your data set 2 Arrange your data in ascending order.

Quartiles 1 Here is how to find the quartiles of your data set 2 Arrange your data in ascending order. 3 Determine the median.

Quartiles 1 Here is how to find the quartiles of your data set 2 Arrange your data in ascending order. 3 Determine the median. 4 Determine the first and third quartiles Q 1 and Q 3, by dividing the data sets into two halves. The bottom half will be the observations below the median and the top the observations above the median. The median of the lower half is Q 1 and the median of the upper half is Q 3.

Example 1 The data.97, 1.14, 1.85, 2.34, 2.47, 2.78, 3.41 3.48 represent the 8 months of rain in Chicago.

Example 1 The data.97, 1.14, 1.85, 2.34, 2.47, 2.78, 3.41 3.48 represent the 8 months of rain in Chicago. 2 We already have the data arranged in order so we find the median. We compute that the median is 2.34+2.47 2 = 2.405

Example 1 The data.97, 1.14, 1.85, 2.34, 2.47, 2.78, 3.41 3.48 represent the 8 months of rain in Chicago. 2 We already have the data arranged in order so we find the median. We compute that the median is 2.34+2.47 2 = 2.405 3 We find that the lower data set is.97, 1.14, 1.85, 2.34 which has a median of Q 1 = 1.14+1.85 2 = 1.495. We find that the upper data set is 2.47, 2.78, 3.41 3.48, and the median of this data set is Q 3 = 2.78+3.41 2 = 3.095.

IQR 1 The interquartile range denoted by IQR is the range of the middle 50 percent of the observations in a data set. That is the IQR is given by the formula IQR = Q 3 Q 1

IQR 1 The interquartile range denoted by IQR is the range of the middle 50 percent of the observations in a data set. That is the IQR is given by the formula IQR = Q 3 Q 1 2 In our last example we can find the IQR = 3.095 2.405 =.69

Outliers 1 An outlier is an extreem observation.

Outliers 1 An outlier is an extreem observation. 2 For example if we take a simple random sample of hourly salsries and found the data collected was given by 15, 20, 21, 23, 25, 100. Once can see that 100 is an outlier.

Outliers 1 We can check for outliers using quartiles.

Outliers 1 We can check for outliers using quartiles. 2 First determine the first and third quartiles of the data set.

Outliers 1 We can check for outliers using quartiles. 2 First determine the first and third quartiles of the data set. 3 Compute the IQR

Outliers 1 We can check for outliers using quartiles. 2 First determine the first and third quartiles of the data set. 3 Compute the IQR 4 Determine the lower fence and upper fence.

Outliers 1 We can check for outliers using quartiles. 2 First determine the first and third quartiles of the data set. 3 Compute the IQR 4 Determine the lower fence and upper fence. Lower fence = Q 1 1.5(IQR)

Outliers 1 We can check for outliers using quartiles. 2 First determine the first and third quartiles of the data set. 3 Compute the IQR 4 Determine the lower fence and upper fence. Lower fence = Q 1 1.5(IQR) Upper fence = Q 3 1.5(IQR)

Outliers 1 We can check for outliers using quartiles. 2 First determine the first and third quartiles of the data set. 3 Compute the IQR 4 Determine the lower fence and upper fence. Lower fence = Q 1 1.5(IQR) Upper fence = Q 3 1.5(IQR) 5 If a data value is less than the lower or upper fence it is an outlier.

Example 1 In our last example we had

Example 1 In our last example we had Determine the lower fence and upper fence.

Example 1 In our last example we had Determine the lower fence and upper fence. Lower fence = Q 1 1.5(IQR) = 2.405 1.5(.69) = 1.37

Example 1 In our last example we had Determine the lower fence and upper fence. Lower fence = Q 1 1.5(IQR) = 2.405 1.5(.69) = 1.37 Upper fence = Q 3 1.5(IQR) = 3.095 + 1.5(.69) = 4.13

Example 1 In our last example we had Determine the lower fence and upper fence. Lower fence = Q 1 1.5(IQR) = 2.405 1.5(.69) = 1.37 Upper fence = Q 3 1.5(IQR) = 3.095 + 1.5(.69) = 4.13 2 We thus have two outliers! They are the observations.97 and 1.14.