University of California, Los Angeles Department of Statistics. Measures of central tendency and variation Data display

Similar documents
number in a data set adds (or subtracts) that value to measures of center but does not affect measures of spread.

1. The data below gives the eye colors of 20 students in a Statistics class. Make a frequency table for the data.

Unit 6 Day 2 Notes Central Tendency from a Histogram; Box Plots

Full file at

Organizing Quantitative Data

Practice Test Unit 6B/11A/11B: Probability and Logic

Practice Test Unit 06B 11A: Probability, Permutations and Combinations. Practice Test Unit 11B: Data Analysis

Descriptive Statistics Project Is there a home field advantage in major league baseball?

3.3 - Measures of Position

STT 315 Section /19/2014

Reminders. Homework scores will be up by tomorrow morning. Please me and the TAs with any grading questions by tomorrow at 5pm

Warm-up. Make a bar graph to display these data. What additional information do you need to make a pie chart?

46 Chapter 8 Statistics: An Introduction

Polynomial functions have graphs that are smooth and continuous. c) Use your knowledge of quadratic functions to sketch the graph.

The structure of the Fibonacci numbers in the modular ring Z 5

5.1. Data Displays Batter Up. My Notes ACTIVITY

The Five Magic Numbers

Bivariate Data. Frequency Table Line Plot Box and Whisker Plot

Unit 3 - Data. Grab a new packet from the chrome book cart. Unit 3 Day 1 PLUS Box and Whisker Plots.notebook September 28, /28 9/29 9/30?

Solutionbank S1 Edexcel AS and A Level Modular Mathematics

1wsSMAM 319 Some Examples of Graphical Display of Data

Unit 3 ~ Data about us

0-13 Representing Data

The pth percentile of a distribution is the value with p percent of the observations less than it.

Statistical Studies: Analyzing Data III.B Student Activity Sheet 6: Analyzing Graphical Displays

Statistical Studies: Analyzing Data III.B Student Activity Sheet 6: Analyzing Graphical Displays

Fundamentals of Machine Learning for Predictive Data Analytics

Year 10 Term 2 Homework

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

9.3 Histograms and Box Plots

8.5. Solving Equations II. Goal Solve equations by balancing.

Available online at ScienceDirect. Procedia Engineering 113 (2015 )

CHAPTER 1 ORGANIZATION OF DATA SETS

Age of Fans

Chapter 2: Modeling Distributions of Data

How are the values related to each other? Are there values that are General Education Statistics

Diameter in cm. Bubble Number. Bubble Number Diameter in cm

SPH4U Transmission of Waves in One and Two Dimensions LoRusso

National Curriculum Statement: Determine quartiles and interquartile range (ACMSP248).

Lab 5: Descriptive Statistics

A SECOND SOLUTION FOR THE RHIND PAPYRUS UNIT FRACTION DECOMPOSITIONS

North Point - Advance Placement Statistics Summer Assignment

Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2

Basic Gas Spring Theory

MST 121: Supplementary resource material for Chapter A1, Sequences

1. Write down the ideal gas law and define all its variable and parameters. 2. Calculate the values and units of the ideal gas law constant R.

WorkSHEET 13.3 Univariate data II Name:

Descriptive Stats. Review

Representing polynominals with DFT (Discrete Fourier Transform) and FFT (Fast Fourier Transform) Arne Andersson

Math 146 Statistics for the Health Sciences Additional Exercises on Chapter 2

All AQA Unit 1 Questions Higher

STAT 101 Assignment 1

Warm-Up: Create a Boxplot.

Effective Use of Box Charts

Quantitative Literacy: Thinking Between the Lines

Fun with M&M s. By: Cassandra Gucciardo. Sorting

Chapter 9, Part B Hypothesis Tests

Patrick Boston (Leeds University) and Mark Chapman (Edinburgh University)

CLASS: XI: MATHEMATICS

Hypothesis testing: ANOVA Test of the equality of means among c groups. Flow-chart

Introduction to Algorithms 6.046J/18.401J/SMA5503

GFC NIST MASS FLOW CONTROLLERS. Typical Stainless Steel GFC Mass Flow Controller. Design Features. General Description. Principles of Operation

Chapter 4 Displaying Quantitative Data

THE LATENT DEMAND METHOD

AP Statistics Midterm Exam 2 hours

City of Valdez REQUEST FOR QUOTES. Project Name: Lowe River Levee Certification Groin 1 Freeboard Repairs PO Number: Cost Code:

PRACTICE PROBLEMS FOR EXAM 1

IHS AP Statistics Chapter 2 Modeling Distributions of Data MP1

Mrs. Daniel- AP Stats Ch. 2 MC Practice

THE EFFECTS OF COUPLED INDUCTORS IN PARALLEL INTERLEAVED BUCK CONVERTERS

operate regenerator top without boiling aq. amine solution.

Statistical Analysis Project - How do you decide Who s the Best?

This report presents an assessment of existing and future parking & traffic requirements, for the site based on the current development proposal.

GENETICS 101 GLOSSARY

Chapter 2 - Frequency Distributions and Graphs

Assignment. To New Heights! Variance in Subjective and Random Samples. Use the table to answer Questions 2 through 7.

CHAPTER 2 Modeling Distributions of Data

Name Date Period. E) Lowest score: 67, mean: 104, median: 112, range: 83, IQR: 102, Q1: 46, SD: 17

(c) The hospital decided to collect the data from the first 50 patients admitted on July 4, 2010.

STAT 155 Introductory Statistics. Lecture 2-2: Displaying Distributions with Graphs

Equipment. Rackets are fragile. Handle the shuttlecocks carefully and only by their rubber tips.

Capacity of Shared-Short Lanes at Unsignalised Intersections

BASEBALL SALARIES: DO YOU GET WHAT YOU PAY FOR? Comparing two or more distributions by parallel box plots

Computational Analysis Task Casio ClassPad

Section 3.2: Measures of Variability

AN ESTIMATION OF INTER-ZONAL BUS DEMAND ON COLOMBO - KALUTARA TRANSPORT CORRIDOR

"The twisting movement of any hoof should, for physiological reasons, not be hindered by Shoeing." (Lungwitz 1884)

Real time lane departure warning system based on principal component analysis of grayscale distribution and risk evaluation model

Footwork is the foundation for a skilled basketball player, involving moves

Lesson 3 Pre-Visit Teams & Players by the Numbers

Exemplar for Internal Achievement Standard. Mathematics and Statistics Level 1

SYMMETRY AND VARIABILITY OF VERTICAL GROUND REACTION FORCE AND CENTER OF PRESSURE IN ABLE-BODIED GAIT

Regents Style Box & Whisker Plot Problems

GFC NIST MASS FLOW CONTROLLERS. Typical Stainless Steel GFC Mass Flow Controller. Design Features. General Description. Principles of Operation

Algebra 1 Unit 7 Day 2 DP Box and Whisker Plots.notebook April 10, Algebra I 04/10/18 Aim: How Do We Create Box and Whisker Plots?

Math 227 Test 1 (Ch2 and 3) Name

Week 7 One-way ANOVA

Section 5 Critiquing Data Presentation - Teachers Notes

September Population analysis of the Finnish Spitz breed

2017 AMC 12B. 2. Real numbers,, and satisfy the inequalities,, and. Which of the following numbers is necessarily positive?

Transcription:

Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 13 Istructor: Nicolas Christou Measures of cetral tedecy Measures of cetral tedecy ad variatio Data display 1. Sample mea: Let x 1, x,, x be the observatios of a sample. The sample mea x is computed as follows: i=1 x = x i = x 1 + x + + x. Media: It is the value that falls i the middle whe the observatios are sorted from smallest to largest. To compute the media, follow the ext steps: a. Sort the observatios from smallest to largest. b. Compute the positio of the media: +1. Examples: A. Sample size is odd: 7 aual icomes: 8, 60, 6, 3, 30, 6, 9. First sort these observatios from smallest to largest: 6, 6, 8, 9, 30, 3, 60 Next compute +1 = 7+1 = th. The media is the th observatio. Media=9. B. Sample size is eve: 8 aual icomes: 6, 6, 8, 9, 30, 3, 60, 80 Agai compute +1 = 8+1 =.5 th. The media is the average of the two middle observatios. Media= 9+30 = 9.5. Questio: How do uusual observatios affect the sample mea ad the media? Example: 8 aual icomes: 6, 6, 8, 9, 30, 3, 60, 8000 1

Measures of o-cetral tedecy 1. First quartile (Q 1 ) or 5 th percetile: Its positio is +1.. Third quartile (Q 3 ) or 75 th percetile: Its positio is 3(+1). Example: Fid Q 1 ad Q 3 of the followig 8 aual icomes: 6, 6, 8, 9, 30, 3, 60, 80 +1 Positio of Q 1 : = 8+1 =.5 th d (roud to the earest iteger). 3(+1) Positio of Q 3 : = 3(8+1) = 6.75 th 7 th (roud to the earest iteger). Therefore, Q 1 = 6, Q 3 = 60. Five-umber summary of a data set: MIN Q 1 MEDIAN Q 3 MAX Box plot: A popular way to display data ad idetify outliers. You are give 11 aual icomes i thousads of dollars: 6, 6, 8, 9, 30, 3, 60, 65, 70, 0,. Costruct the boxplot of icome usig these 11 observatios. Begi by sortig these icomes: 6, 6, 8, 9, 30, 3, 0,, 60, 65, 70 Fid the positio of the first quartile, media, ad third quartile: +1 Positio of Q 1 = 3 rd +1 Positio of Media = 11+1 = 6 th Positio of Q 3 3 +1 = 3 11+1 = 9 th Fid the first quartile, media, ad third quartile: Q 1 = 8, Media = 3, Q 3 = 60 ad the iterquartile rage is IQR = Q 3 Q 1 = 60 8 = 3. = 11+1 Outliers are observatios above Q 3 + 1.5IQR or below Q 1 1.5IQR. Also, serious outliers are observatios above Q 3 + 3IQR or below Q 1 3IQR. I our example we do ot have ay outliers sice Q 3 + 1.5IQR = 60 + 1.5(3) = 108 ad Q 1 1.5IQR = 8 1.5(3) = 0. Now we ca costruct the box plot.

Box plot pathologies: Here are some iterestig box plots. Ca you write dow a set of observatios that correspod to these box plots? 0 30 0 50 6 30 3 30 50 70 3

Measures of variatio 1. Rage:. Iterquartile rage (IQR): 3. Sample variace ad sample stadard deviatio. Let x 1, x,, x be the values of a sample. The sample variace s is the average of the squared deviatios of each observatio from the sample mea ad it is computed as follows: s i=1 = (x i x) 1 where x i x is the i th deviatio from the sample mea x. It is easier for calculatios to use: [ s = 1 ] x i ( i=1 x i) 1 i=1 The stadard deviatio is simply the square root of the variace. Both x ad s have the same uits. i=1 s = (x i x) 1 or easier for calculatios [ s = 1 ] x i 1 ( i=1 x i) i=1 Note: i=1 (x i x) = 0 i=1 x i ( i=1 x i). Example: Fid the sample mea x, sample variace s, ad sample stadard deviatio s of the followig sample: 1, 1.1, 0.9, 1.3, 0.7 (weights of five orages i ouces).

Addig ad multiplyig observatios by a costat Let x 1, x,, x be the observatios of a sample of size, ad let x ad s be the sample mea ad sample variace respectively. a. Suppose that o each observatio a costat a is added. Fid the ew sample mea ad sample variace. b. Suppose that each observatio is multiplied by a costat a. Fid the ew sample mea ad sample variace. 5

Data display Three popular methods: 1. Stem-ad-leaf display. Frequecy distributio 3. Histogram Stem-ad-leaf display: Split each observatio ito a stem ad leaf. The place the stems i a colum from smallest to largest. Next to each stem place the leaves from smallest to largest. Frequecy distributio: We ca group data ito classes (bis). The first step is to defie the umber of classes ad the width of each class (defie the umber of bis). There may ways to do this. Histogram: The frequecy distributio ca be graphed. The graph is called histogram. To costruct a histogram: O the horizotal axis place the class limits. The costruct a rectagle which has base the width of the class ad height the frequecy of that class. There is also a relative frequecy histogram (the height of each rectagle is the the relative frequecy of that class). Costruct by had the stem ad leaf plot of the followig observatios (ozoe data ppm): [1] 0.0 0.081 0.035 0.080 0.053 0.077 0.051 0.059 0.01 0.07 0.090 0.069 0.057 [1] 0.09 0.05 0.083 0.068 0.078 0.096 0.019 0.065 0.061 0.09 0.035 0.097 0.057 [7] 0.036 0.060 0.03 0.036 See more examples o the ext pages. 6

a. Califoria ozoe data. You ca access the data at: http://www.stat.ucla.edu/~christo/statistics13/ozoe.txt Here are the data: [1] 0.0 0.081 0.035 0.080 0.053 0.077 0.051 0.059 0.01 0.07 0.090 0.069 [13] 0.057 0.09 0.05 0.083 0.068 0.078 0.096 0.019 0.065 0.061 0.09 0.035 [5] 0.097 0.057 0.036 0.060 0.03 0.036 0.051 0.09 0.030 0.105 0.07 0.078 [37] 0.08 0.095 0.079 0.067 0.09 0.081 0.077 0.08 0.05 0.059 0.101 0.038 [9] 0.08 0.06 0.089 0.033 0.036 0.03 0.078 0.06 0.056 0.085 0.01 0.09 [61] 0.059 0.115 0.03 0.08 0.09 0.099 0.059 0.089 0.093 0.038 0.099 0.06 [73] 0.050 0.068 0.079 0.01 0.056 0.09 0.08 0.051 0.071 0.077 0.063 0.063 [85] 0.061 0.068 0.039 0.061 0.0 0.05 0.08 0.061 0.065 0.036 0.05 0.06 [97] 0.067 0.073 0.050 0.105 0.09 0.10 0.055 0.053 0.090 0.063 0.055 0.08 [109] 0.01 0.097 0.079 0.097 0.056 0.036 0.078 0.061 0.066 0.09 0.070 0.039 [11] 0.096 0.065 0.03 0.067 0.09 0.086 0.079 0.073 0.081 0.080 0.073 0.03 [133] 0.083 0.080 0.068 0.077 0.077 0.08 0.06 0.066 0.10 0.111 0.079 0.07 [15] 0.037 0.067 0.071 0.07 0.100 0.071 0.038 0.07 0.075 0.035 0.100 0.036 [157] 0.058 0.035 0.09 0.079 0.08 0.11 0.08 0.08 0.111 0.037 0.051 0.0 [169] 0.07 0.053 0.080 0.0 0.059 0.055 0.05 Ad the stem ad leaf plot: The decimal poit is digit(s) to the left of the 1 9 77889999 3 0355556666667788899 1111333666778899 5 00111133355566677899999 6 01111133355566777788889 7 01113335777778888999999 8 0000111335699 9 00356677799 10 00155 11 115 Box plot of ozoe: 0.0 0.0 0.06 0.08 0.10 7

b. Soil lead ad zic data (area of iterest i the Netherlads - see ext hadout i R). You ca access these data at: http://www.stat.ucla.edu/~christo/statistics13/soil.txt Histogram of lead Histogram of soil lead Frequecy 0 10 0 30 0 50 0 100 00 300 00 500 600 700 Lead (ppm) Histogram of log(lead) Histogram of soil log(lead) Frequecy 0 10 0 30 0 3.5.0.5 5.0 5.5 6.0 6.5 Log_lead 8