Quantitative Literacy: Thinking Between the Lines

Similar documents
Descriptive Statistics Project Is there a home field advantage in major league baseball?

STANDARD SCORES AND THE NORMAL DISTRIBUTION

Unit 3 ~ Data about us

CHAPTER 2 Modeling Distributions of Data

The pth percentile of a distribution is the value with p percent of the observations less than it.

Unit 6 Day 2 Notes Central Tendency from a Histogram; Box Plots

CHAPTER 1 ORGANIZATION OF DATA SETS

Lesson 3 Pre-Visit Teams & Players by the Numbers

Practice Test Unit 06B 11A: Probability, Permutations and Combinations. Practice Test Unit 11B: Data Analysis

Practice Test Unit 6B/11A/11B: Probability and Logic

Reminders. Homework scores will be up by tomorrow morning. Please me and the TAs with any grading questions by tomorrow at 5pm

Warm-up. Make a bar graph to display these data. What additional information do you need to make a pie chart?

9.3 Histograms and Box Plots

3.3 - Measures of Position

Chapter 2: Modeling Distributions of Data

Unit 3 - Data. Grab a new packet from the chrome book cart. Unit 3 Day 1 PLUS Box and Whisker Plots.notebook September 28, /28 9/29 9/30?

Descriptive Stats. Review

Confidence Interval Notes Calculating Confidence Intervals

Year 10 Term 2 Homework

CHAPTER 1 Exploring Data

CHAPTER 2 Modeling Distributions of Data

Exploring Measures of Central Tendency (mean, median and mode) Exploring range as a measure of dispersion

Chapter 3.4. Measures of position and outliers. Julian Chan. September 11, Department of Mathematics Weber State University

Averages. October 19, Discussion item: When we talk about an average, what exactly do we mean? When are they useful?

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

CHAPTER 2 Modeling Distributions of Data

Descriptive Statistics

Lesson 2 Pre-Visit Big Business of the Big Leagues

Internet Technology Fundamentals. To use a passing score at the percentiles listed below:

DS5 The Normal Distribution. Write down all you can remember about the mean, median, mode, and standard deviation.

IHS AP Statistics Chapter 2 Modeling Distributions of Data MP1

1. The data below gives the eye colors of 20 students in a Statistics class. Make a frequency table for the data.

STT 315 Section /19/2014

PRACTICE PROBLEMS FOR EXAM 1

Math 230 Exam 1 Name October 2, 2002

Running head: DATA ANALYSIS AND INTERPRETATION 1

PSY201: Chapter 5: The Normal Curve and Standard Scores

0-13 Representing Data

Full file at

Statistical Analysis Project - How do you decide Who s the Best?

Today s plan: Section 4.2: Normal Distribution

Average Runs per inning,

Diameter in cm. Bubble Number. Bubble Number Diameter in cm

Old Age and Treachery vs. Youth and Skill: An Analysis of the Mean Age of World Series Teams

Sample Final Exam MAT 128/SOC 251, Spring 2018

WHAT IS THE ESSENTIAL QUESTION?

How are the values related to each other? Are there values that are General Education Statistics

Grade 6 Math Circles Fall October 7/8 Statistics

Fundamentals of Machine Learning for Predictive Data Analytics

(c) The hospital decided to collect the data from the first 50 patients admitted on July 4, 2010.

46 Chapter 8 Statistics: An Introduction

Data Set 7: Bioerosion by Parrotfish Background volume of bites The question:

What s the difference between categorical and quantitative variables? Do we ever use numbers to describe the values of a categorical variable?

% per year Age (years)

Select Boxplot -> Multiple Y's (simple) and select all variable names.

How Fast Can You Throw?

AP Statistics Midterm Exam 2 hours

5.1. Data Displays Batter Up. My Notes ACTIVITY

Organizing Quantitative Data

Assignment. To New Heights! Variance in Subjective and Random Samples. Use the table to answer Questions 2 through 7.

Lesson 5 Post-Visit Do Big League Salaries Equal Big Wins?

Stats in Algebra, Oh My!

Chapter 4 Displaying Quantitative Data

Chapter 12 Practice Test

STAT 155 Introductory Statistics. Lecture 2: Displaying Distributions with Graphs

BASEBALL SALARIES: DO YOU GET WHAT YOU PAY FOR? Comparing two or more distributions by parallel box plots

STAT 101 Assignment 1

Algebra 1 Unit 6 Study Guide

MGT 233 Homework #1, Spring 2015

North Point - Advance Placement Statistics Summer Assignment

Figure 1. Winning percentage when leading by indicated margin after each inning,

PGA Tour Scores as a Gaussian Random Variable

Warm-Up: Create a Boxplot.

ACTIVITY: Drawing a Box-and-Whisker Plot. a. Order the data set and write it on a strip of grid paper with 24 equally spaced boxes.

Bivariate Data. Frequency Table Line Plot Box and Whisker Plot

Expected Value 3.1. On April 14, 1993, during halftime of a basketball game between the. One-and-One Free-Throws

NOTES: STANDARD DEVIATION DAY 4 Textbook Chapter 11.1, 11.3

Regents Style Box & Whisker Plot Problems

Stats 2002: Probabilities for Wins and Losses of Online Gambling

Math 146 Statistics for the Health Sciences Additional Exercises on Chapter 2

March Madness Basketball Tournament

Box-and-Whisker Plots

Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2

WorkSHEET 13.3 Univariate data II Name:

Mrs. Daniel- AP Stats Ch. 2 MC Practice

March Madness Basketball Tournament

Age of Fans

Solutionbank S1 Edexcel AS and A Level Modular Mathematics

4. Fortune magazine publishes the list of the world s billionaires annually. The 1992 list (Fortune,

box and whisker plot 3880C798CA037B A83B07E6C4 Box And Whisker Plot 1 / 6

Measuring Relative Achievements: Percentile rank and Percentile point

Follow links for Class Use and other Permissions. For more information send to:

Section I: Multiple Choice Select the best answer for each problem.

Lab 5: Descriptive Statistics

Name Date Period. E) Lowest score: 67, mean: 104, median: 112, range: 83, IQR: 102, Q1: 46, SD: 17

Smoothing the histogram: The Normal Curve (Chapter 8)

DATA HANDLING EXAM QUESTIONS

Skills Practice Skills Practice for Lesson 17.1

NBA TEAM SYNERGY RESEARCH REPORT 1

STAT 155 Introductory Statistics. Lecture 2-2: Displaying Distributions with Graphs

Transcription:

Quantitative Literacy: Thinking Between the Lines Crauder, Noell, Evans, Johnson Chapter 6: Statistics 2013 W. H. Freeman and Company 1

Chapter 6: Statistics Lesson Plan Data summary and presentation: Boiling down the The normal distribution: Why the bell curve? The statistics of polling: Can we believe the polls? Statistical inference and clinical trials: Effective drugs? 2

Learning Objectives: Know the statistical terms used to summarize data Calculate mean, median, and mode Understand the five-number summary and boxplots Calculate the standard deviation Understand histograms 3

The mean (average) of a list of is the sum of the divided by the number of entries in the list. The median of a list of is the middle number, the middle data point. If there is an even number of data points, take the average of the middle two. The mode is the most frequently occurring data points. If there are two such, the data set is called bimodal. If there are more than two such, the data set is multimodal. 4

Example: The Chelsea Football Club (FC) is a British soccer team. The following table shows the goals scored in the games played by Chelsea FC between September 2007 and May 2008. The data are arranged according to the total number of goals scored in each game. Goals scored by either team 0 1 2 3 4 5 6 7 8 Number of games 7 14 20 11 3 2 1 2 2 Find the mean, median, and mode for the number of goals scored per game. 5

Solution: To find the mean, we add the data values (the total number of goals scored) and divide by the number of data points. To find the total number of goals scored, for each entry we multiply the goals scored by the corresponding number of games. Then we add. The total number of goals scored: 7 0 + 14 1 + 20 2 + 11 3 + 3 4 + 2 5 + 1 6 + 2 7 + 2 8 = 145 The number of data points or the total number of games: 7 + 14 + 20 + 11 + 3 + 2 + 1 + 2 + 2 = 62 6

Solution (cont.): The mean: the total number of goals scored divided by the number of games played: Mean = 145 62 = 2.3 The median: the total number of games is 62, which is even, so we count from the bottom to find the 31 st and 32 nd lowest total goal scores. These are both 2: Median = 2 The mode: because 2 occurs most frequently as the number of goals (20 times): Mode = 2 Thus on average, the teams combined to score 2.3 goals per game. Half of the games had goals totaling 2 or more, and the most common number of goals scored in a Chelsea FC game was 2. 7

Example: The following list gives home prices (in thousands of dollars) in a small town: 80, 120, 125, 140, 180, 190, 820 The list includes the price of one luxury home. Calculate the mean and median of this data set. Which of the two is more appropriate for describing the housing market? Solution: Mean = 80+120+125+140+180+190+820 = 1655 7 7 Or about 236.4 thousand dollars. This is the average price of a home. The list of seven prices is arranged in order, so the median is the fourth value, 140 thousand dollars. Note that the mean is higher than the cost of every home on the market except for one the luxury home. The median of 140 thousand dollars is more representative of the market. 8

An outlier is a data point that is significantly different from most of the data. The first quartile of a list of is the median of the lower half of the in the list. The second quartile is the same as the median of the list. The third quartile is the median of the upper half of the in the list. The five-number summary of a list of consists of the minimum, the first quartile, the median, the third quartile, and the maximum. 9

Example: Each year Forbes magazine publishes a list it calls the Celebrity 100. The accompanying table shows the top nine names on the list for 2009, ordered according to the ranking of Forbes. The table also gives the incomes of the celebrities between June 2008 and June 2009. Calculate the five-number summary for this list of incomes. 10

Celebrity Income (millions of dollars) Angelina Jolie 27 Oprah Winfrey 275 Madonna 110 Beyonce Knowles 87 Tiger Woods 110 Bruce Springsteen 70 Steven Spielberg 150 Jennifer Aniston 25 Brad Pitt 28 11

Solution: First we arrange the incomes in order: 25 27 28 70 87 110 110 150 275 The lower half of the list consists of the four less than the median ($87 million), which are: 25 27 28 70 The median of this lower half is 27.5, so the first quartile of incomes is $27.5 million. The upper half of the list consists of the four greater than the median, which are: 110 110 150 275 The median of this upper half is 130, so the third quartile of incomes is $130 million. 12

Solution (cont.): Thus, the five-number summary is: Minimum = $25 million First quartile = $27.5 million Median = $87 million Third quartile = $130 million Maximum = $275 million 13

Boxplots: There is a commonly used pictorial display of the five-number summary known as a boxplot (also called a box and whisker diagram). Figure 6.1 shows the basic geometric figure used in a boxplot. 14

Example: A report on greenercars.org shows 2011 model cars with the best fuel economy. 1. Find the five-number summary for city mileage. 2. Present a boxplot of city mileage. 3. Comment on how the data are distributed about the median. 15

Model City Mileage (mpg) Highway Mileage (mpg) Toyota Prius 51 48 Honda Civic Hybrid 40 43 Honda CR-Z 35 39 Toyota Yaris 29 35 Audi A3 30 42 Hyundai Sonata 22 35 Hyundai Tucson 23 31 Chevrolet Equinox 22 32 Kia Rondo 20 27 Chevrolet Colorado/GMC Canyon 18 25 16

Solution: 1. The list for city mileage, in order from lowest to highest: 18, 20, 22, 22, 23, 29, 30, 35, 40,51 To find the median, we average the two in the middle: 23 + 29 Median = = 26 mpg 2 The lower half of the list is 18, 20, 22, 22, 23, and the median of this half is 22. Thus, the first quartile is 22 mpg. The upper half of the list is 29, 30, 35, 40, 51, and the median of this half is 35. Thus, the third quartile is 35 mpg. 17

Solution (cont.): 2. The corresponding boxplot appears in Figure 6.2. The vertical axis is the mileage measured in miles per gallon. 18

Solution (cont.): 3. Referring to the boxplot, we note that the first quartile is not far above the minimum, and the median is barely above the first quartile. The third quartile is well above the median, and the maximum is well above the third quartile. This emphasizes the dramatic difference between the high-mileage cars (the hybrids) and ordinary cars. 19

The standard deviation is a measure of how much the data are spread out from the mean. The smaller the standard deviation, the more closely the data clustered about the mean. Standard Deviation Formula Suppose the data points are: x 1, x 2, x 3,, x n, the formula for the standard deviation is σ = x 1 μ 2 + x 2 μ 2 + + x n μ 2 n Where the Greek letter μ (mew) denotes the mean. 20

Calculating Standard Deviation To find the standard deviation of n data points, we first calculate the mean μ. The next step is to complete the following calculation template: Data Deviation Square of deviation x i x i μ Square of second column Sum of third column Divide the above sum by n and take the square root. 21

Example: Two leading pitchers in Major League Baseball for 2011 were Roy Halladay of the Philadelphia Phillies and Felix Hernandez of the Seattle Mariners. Their ERA (Earned Run Average the lower the number, the better) histories are given in the table below. Pitcher ERA 2006 ERA 2007 ERA 2008 ERA 2009 ERA 2010 R. Halladay 3.19 3.71 2.78 2.79 2.44 F. Hernandez 4.52 3.92 3.45 2.49 2.27 Calculate the mean and the standard deviation for Hallady s ERA history. It turns out that the mean and standard deviation for Hernandez s ERA history are µ = 3.33 and σ = 0.85. What comparisons between Halladay and Hernandez can you make based on these? 22

Solution: The mean for Halladay is: 3.19 + 3.71 + 2.78 + 2.79 + 2.44 μ = = 2.98 5 ERA x i Deviation x i 2.98 Square of deviation (x i 2.98) 2 3.19 3.19 2.98 = 0.21 (0.21)² = 0.044 3.71 3.71 2.98 = 0.73 (0.73)² = 0.533 2.78 2.78 2.98 = 0.20 ( 0.20)² = 0.040 2.79 2.79 2.98 = 0.19 ( 0.19)² = 0.036 2.44 2.44 2.98 = 0.54 ( 0.54)² = 0.292 Sum of third column 0.945 Sum divided by n = 5, square root σ = 0.945/5 = 0.43 23

Solution (cont.): We conclude that the mean and the standard deviation for Halladay s ERA history are µ = 2.98 and σ = 0.43. Because Halladay s mean is smaller than Hernandez s mean of µ = 3.33, over this period Halladay had a better pitching record. Halladay s ERA had a smaller standard deviation than that of Hernandez (who had σ = 0.85), so Halladay was more consistent his are not spread as far from the mean. 24

Example: Below is a table showing the Eastern Conference NBA team free-throw percentages at home and away for the 2007 2008 season. At the bottom of the table, we have displayed the mean and standard deviation for each data set. What do these values for the mean and standard deviation tell us about free-throws shot at home compared with free-throws shot away from home? Does comparison of the minimum and maximum of each of the data sets support your conclusions? 25

Team Free-throw percentage at home Free-throw percentage away Team Free-throw percentage at home Free-throw percentage away Toronto Washington Atlanta Boston Indiana Detroit Chicago New Jersey 81.2 78.2 77.2 77.1 76.8 76.7 75.6 73.6 77.6 75.4 75.2 74.3 75.7 74.4 76.6 76.8 Milwaukee Miami New York Orlando Cleveland Charlotte Philadelphia 73.3 72.7 72.7 72.1 71.7 71.4 70.6 76.6 75.5 73.9 75.4 74.8 74.7 77.2 Mean 74.73 75.61 Standard deviation 2.95 1.09 26

Solution: The means for free-throw percentages are 74.73 at home and 75.61 away, so on average the teams do somewhat better on the road than at home. The standard deviation for home is 2.95 percentage points, which is considerably larger than the standard deviation of 1.09 percentage points away from home. This means that the freethrow percentages at home vary from the mean much more than the free-throw percentages away. The difference between the maximum and minimum percentages shows the same thing: The free-throw percentages at home range from 70.6 to 81.2%, and the free-throw percentages away range from 73.9% to 77.6%. 27

Solution (cont.): The plots of the data in figures 6.3 and 6.4 provide a visual verification that the data for home free-throws are more broadly dispersed than the data for away freethrows. 28

A histogram is a bar graph that shows the frequencies with which certain data occur. Example: Suppose we toss 1000 coins and write down the number of heads we got. We do this experiment a total of 1000 times. The accompanying table shows one part of the results from doing these experiments using a computer simulation. Number of heads 451 457 458 459 461 462 463 464 465 467 Number of tosses (out of 1000) 2 2 1 3 3 2 1 3 1 1 The first entry shows that twice we got 451 heads, twice we got 457 heads, once we got 458 heads, and so on. The raw data are hard to digest because there are so many data points. The five-number summary provides one way to analyze the data. An alternative way to get the data is to arrange them in groups and then draw a histogram. 29

Example (cont.): Suppose it turns out that the number of tosses yielding fewer than 470 heads is 23. Because 470 out of 1000 is 47%, it means that 23 tosses yields less than 47% heads. We find the accompanying table by dividing the data into groups this way. Percent heads Less than 47% 47% to 48% 48% to 49% 49% to 50% Number of tosses 23 75 140 234 Percent heads 50% to 51% 51% to 52% 52% to 53% At least 53% Number of tosses 250 157 94 27 30

Example (cont.): Figure 6.7 shows a histogram for this grouping of the data. We can clearly see that the vast majority of the tosses were between 47% and 53% heads. 31

: Chapter Summary Data summary and presentation: Boiling down Four important measures in descriptive statistics: mean, median, mode, and standard deviation The normal distribution: Why the bell curve? A plot of normally distributed data: the bell-shape curve. The z-score for a data point The Central Limit Theorem 32

: Chapter Summary The statistics of polling: Can we believe the polls? Polling involves: a margin of error, a confidence level, and a confidence interval. Statistical inference and clinical trials: Effective drugs? Statistical significance and p-values. Positive correlated, negative correlated, uncorrelated or linearly correlated 33