Calculating Probabilities with the Normal distribution. David Gerard

Similar documents
STANDARD SCORES AND THE NORMAL DISTRIBUTION

1. The data below gives the eye colors of 20 students in a Statistics class. Make a frequency table for the data.

STAT 155 Introductory Statistics. Lecture 2: Displaying Distributions with Graphs

Math SL Internal Assessment What is the relationship between free throw shooting percentage and 3 point shooting percentages?

Name: Class: Date: (First Page) Name: Class: Date: (Subsequent Pages) 1. {Exercise 5.07}

STT 315 Section /19/2014

The pth percentile of a distribution is the value with p percent of the observations less than it.

PSY201: Chapter 5: The Normal Curve and Standard Scores

MATH 118 Chapter 5 Sample Exam By: Maan Omran

Descriptive Stats. Review

Which On-Base Percentage Shows. the Highest True Ability of a. Baseball Player?

Full file at

CHAPTER 2 Modeling Distributions of Data

Math 1040 Exam 2 - Spring Instructor: Ruth Trygstad Time Limit: 90 minutes

Aim: Normal Distribution and Bell Curve

!!!!!!!!!!!!! One Score All or All Score One? Points Scored Inequality and NBA Team Performance Patryk Perkowski Fall 2013 Professor Ryan Edwards!

Unit 3 - Data. Grab a new packet from the chrome book cart. Unit 3 Day 1 PLUS Box and Whisker Plots.notebook September 28, /28 9/29 9/30?

IHS AP Statistics Chapter 2 Modeling Distributions of Data MP1

PLEASE MARK YOUR ANSWERS WITH AN X, not a circle! 1. (a) (b) (c) (d) (e) 2. (a) (b) (c) (d) (e) (a) (b) (c) (d) (e) 4. (a) (b) (c) (d) (e)...

PRACTICE PROBLEMS FOR EXAM 1

Evaluating NBA Shooting Ability using Shot Location

Bivariate Data. Frequency Table Line Plot Box and Whisker Plot

Level 1 Mathematics and Statistics, 2014

Stat 139 Homework 3 Solutions, Spring 2015

Sample Final Exam MAT 128/SOC 251, Spring 2018

STA 103: Midterm I. Print clearly on this exam. Only correct solutions that can be read will be given credit.

Name Date Period. E) Lowest score: 67, mean: 104, median: 112, range: 83, IQR: 102, Q1: 46, SD: 17

Homework 7, Due March

Year 10 Term 2 Homework

Solutionbank S1 Edexcel AS and A Level Modular Mathematics

Chapter 2: Modeling Distributions of Data

Running head: DATA ANALYSIS AND INTERPRETATION 1

STAT 155 Introductory Statistics. Lecture 2-2: Displaying Distributions with Graphs

DO YOU KNOW WHO THE BEST BASEBALL HITTER OF ALL TIMES IS?...YOUR JOB IS TO FIND OUT.

Announcements. Unit 7: Multiple Linear Regression Lecture 3: Case Study. From last lab. Predicting income

AP Statistics Midterm Exam 2 hours

Smoothing the histogram: The Normal Curve (Chapter 8)

Mrs. Daniel- AP Stats Ch. 2 MC Practice

APPROVED FACILITY SCHOOLS CURRICULUM DOCUMENT SUBJECT: Mathematics GRADE: 6. TIMELINE: Quarter 1. Student Friendly Learning Objective

Unit 3 ~ Data about us

Chapter 10 Gases. Characteristics of Gases. Pressure. The Gas Laws. The Ideal-Gas Equation. Applications of the Ideal-Gas Equation

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

Descriptive Statistics. Dr. Tom Pierce Department of Psychology Radford University

In my left hand I hold 15 Argentine pesos. In my right, I hold 100 Chilean

Data Set 7: Bioerosion by Parrotfish Background volume of bites The question:

SPATIAL STATISTICS A SPATIAL ANALYSIS AND COMPARISON OF NBA PLAYERS. Introduction

CHAPTER 2 Modeling Distributions of Data

NBA TEAM SYNERGY RESEARCH REPORT 1

Chapter 2 - Frequency Distributions and Graphs

AP Stats Chapter 2 Notes

The Reliability of Intrinsic Batted Ball Statistics Appendix

Frequency Distributions

(c) The hospital decided to collect the data from the first 50 patients admitted on July 4, 2010.

Understanding Winter Road Conditions in Yellowstone National Park Using Cumulative Sum Control Charts

Statistical Theory, Conditional Probability Vaibhav Walvekar

Exploring Measures of Central Tendency (mean, median and mode) Exploring range as a measure of dispersion

3.3 - Measures of Position


Name May 3, 2007 Math Probability and Statistics

Trial # # of F.T. Made:

STAT 101 Assignment 1

Statistics. Wednesday, March 28, 2012

Chapter 5: Methods and Philosophy of Statistical Process Control

5.1 Introduction. Learning Objectives

The MACC Handicap System

Fundamentals of Machine Learning for Predictive Data Analytics

PGA Tour Scores as a Gaussian Random Variable

Measuring Relative Achievements: Percentile rank and Percentile point

Math 230 Exam 1 Name October 2, 2002

Note that all proportions are between 0 and 1. at risk. How to construct a sentence describing a. proportion:

Cambridge International Examinations Cambridge Ordinary Level

Probability & Statistics - Solutions

AREA - Find the area of each figure. 1.) 2.) 3.) 4.)

Today s plan: Section 4.2: Normal Distribution

THE NORMAL DISTRIBUTION COMMON CORE ALGEBRA II

Driv e accu racy. Green s in regul ation

ASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era. Gary Evans Stat 201B Winter, 2010

Lab 3: Pumps in Series & Pumps in Parallel

Lab 11: Introduction to Linear Regression

b) (2 pts.) Does the study show that drinking 4 or more cups of coffee a day caused the higher death rate?

DS5 The Normal Distribution. Write down all you can remember about the mean, median, mode, and standard deviation.

Averages. October 19, Discussion item: When we talk about an average, what exactly do we mean? When are they useful?

Descriptive Statistics Project Is there a home field advantage in major league baseball?

Math 146 Statistics for the Health Sciences Additional Exercises on Chapter 2

Chapter 12 Practice Test

This page intentionally left blank

Unit 6, Lesson 1: Organizing Data

Political Science 30: Political Inquiry Section 5

Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA

EFFECTS OF SITUATIONAL EFFICIENCY INDICATORS ON FINAL OUTCOME AMONG MALE BASKETBALL TEAMS ON THE OLYMPIC GAMES IN LONDON 2012

save percentages? (Name) (University)

% per year Age (years)

NCAA March Madness Statistics 2018

Graphical Model for Baskeball Match Simulation

Lab 12 Standing Waves

Use of Additional Through Lanes at Signalized Intersections

Equation 1: F spring = kx. Where F is the force of the spring, k is the spring constant and x is the displacement of the spring. Equation 2: F = mg

Constructing and Interpreting Two-Way Frequency Tables

Gerald D. Anderson. Education Technical Specialist

A Hare-Lynx Simulation Model

Transcription:

Calculating Probabilities with the Normal distribution David Gerard 2017-09-18 1

Learning Objectives Standardizing Variables Normal probability calculations. Section 3.1 of DBC 2

Standardizing Variables

NBA Data Player statistics for the 2016-2017 season of the NBA player The name of the player. pts The total points for the season two pp Two point field goal percentage. three pp Three point field goal percentage. Many others... Here, I only kept players that attempted at least 20 two-point and 20 three-point field goals. 3

NBA Data library(tidyverse) nba <- read_csv("../../data/nba2016.csv") %>% filter(two_pa >= 20, three_pa >= 20) %>% select(player, pts, two_pp, three_pp) glimpse(nba) Observations: 337 Variables: 4 $ player <chr> "Russell Westbrook", "James Harden", "... $ pts <int> 2558, 2356, 2199, 2099, 2061, 2024, 20... $ two_pp <dbl> 0.459, 0.530, 0.528, 0.524, 0.582, 0.4... $ three_pp <dbl> 0.343, 0.347, 0.379, 0.299, 0.367, 0.3... 4

LeBron James LeBron James is the greatest player in the history of basketball (you will be tested on this). Is he better at three point field goals or two point field goals relative to other players? His three-point field goal percentage is 0.363 and his two-point field goal percentage is 0.611. Can we just say that he is a better two-point field goal shooter? 5

Not as easy as you think Can t just compare the numbers three point field goals are much harder. I.e. the two statistics are in different units. We need a way to compare these observations without units. He might be better than most people at three point FG and worst than most people at two point FG, or vice versa. 6

Standardizing and z-scores standardizing and z-scores If x is an observation from a distribution that has mean µ and standard deviation σ, the standardized value of x is z = x µ σ. A standardized value is often called a z-score. The z-score is in units of standard deviations above the mean. 7

Mean and SD of two and three FG % mu2 <- mean(nba$two_pp) sigma2 <- sd(nba$two_pp) mu3 <- mean(nba$three_pp) sigma3 <- sd(nba$three_pp) c(mu2, mu3) [1] 0.4802 0.3431 c(sigma2, sigma3) [1] 0.05779 0.05827 Three point field goals are harder! 8

LeBron s z-scores z 2 = 0.611 0.4802 0.0578 = 2.2637. z 3 = 0.363 0.3431 0.0583 = 0.3414 The King (LeBron) is 2.26 SD s above the mean for two-point field goals but only 0.34 SD s above the mean for three-point field goals. Relative to everyone else, he is a lot better at two-point field goals. 9

Graphically Two point FG % Three point FG % Frequency 0 20 60 100 Frequency 0 20 40 60 80 120 LeBron 0.3 0.4 0.5 0.6 0.7 Two point FG % 0.1 0.2 0.3 0.4 0.5 Three point FG % 10

Another Example: Lance Thomas Lance Thomas (New York Knicks) has a two-point FT % of 0.371 and a three-point FG % of 0.447. Is he better at two-point field goals or three point field goals relative to his peers? (0.371 - mean(nba$two_pp)) / sd(nba$two_pp) [1] -1.889 (0.447 - mean(nba$three_pp)) / sd(nba$three_pp) [1] 1.783 He is way better at three point field goals. 11

Graphically Two point FG % Three point FG % Frequency 0 20 60 100 Frequency 0 20 40 60 80 120 Lance 0.3 0.4 0.5 0.6 0.7 Two point FG % 0.1 0.2 0.3 0.4 0.5 Three point FG % 12

Other Examples Comparing the heights of two children of different ages ( which one is taller relative to their age? ). Did you do better on the SAT or the ACT? How about the midterm vs the final exam? 13

Normal z-scores

Looks normal hist(nba$two_pp) Histogram of nba$two_pp Frequency 0 20 60 100 0.3 0.4 0.5 0.6 0.7 nba$two_pp 14

Still looks normal zscores <- (nba$two_pp - mean(nba$two_pp)) / sd(nba$two_pp) hist(zscores) Histogram of zscores Frequency 0 20 40 60 3 2 1 0 1 2 3 zscores 15

Mean and SD mean(zscores) [1] 4.166e-16 sd(zscores) [1] 1 The blah e-16 is just R s way of saying zero. 16

Recall: Relationships Let y i = a + bx i for i = 1, 2,..., n. ȳ = a + b x. median(y 1,..., y n ) = a + b median(x 1,..., x n ) SD(y) = b SD(x) MAD(y) = b MAD(x) 17

You prove Claim: Let z i = x i x s x for i = 1,..., n. Then z = 0 and s z = 1. 18

Property of Normal Distributions Actually, if we apply a linear transformation to a variable that has a normal distribution, then the resulting variable also has a normal distribution. Thus, if x is normal with mean µ and variance σ 2, then z = x µ σ is normal with mean 0 and variance 1. 19

Normal z-scores x <- rnorm(n = 100, mean = 5, sd = 2) hist(x) Histogram of x Frequency 0 5 10 15 20 0 2 4 6 8 10 x 20

Normal z-scores z <- (x - mean(x)) / sd(x) hist(z) Histogram of z Frequency 0 5 10 15 3 2 1 0 1 2 3 z 21

N(5, 2 2 ) and N(0, 1) on same plot density 0.0 0.1 0.2 0.3 0.4 N(5, 4) N(0, 1) 2 0 2 4 6 8 10 22

Check normality qqnorm(z) qqline(z) Normal Q Q Plot Sample Quantiles 2 1 0 1 2 2 1 0 1 2 Theoretical Quantiles 23

Normal Probability Calculations

Approximations We know LeBron s two-point field goal percentage. What percent of NBA players have a worse percentage? We could either calculate this out directly lj2 <- nba$two_pp[nba$player == "LeBron James"] sum(nba$two_pp < lj2) / length(nba$two_pp) [1] 0.9881 Or we could use a the normal distribution as an approximation. 24

Normal approximation Density 0 2 4 6 0.3 0.4 0.5 0.6 0.7 Two Point FG % 25

Area we want Density 0 2 4 6 0.3 0.4 0.5 0.6 0.7 Two Point FG % 26

Easy Way: use pnorm pnorm(q = lj2, mean = mean(nba$two_pp), sd = sd(nba$two_pp)) [1] 0.9882 Pretty close to the observed frequency! sum(nba$two_pp < lj2) / length(nba$two_pp) [1] 0.9881 27

The Hard Way: Convert to z-scores and use a table Proportion of players who have a two-point FG% less than that of LeBron = proportion of players whose z-score is less than that of Lebron. Recall LeBron s z-score: z lj = 2.26 Density 0.0 0.1 0.2 0.3 0.4 3 2 1 0 1 2 3 Two Point FG z scores 28

Table Want area to the left of 2.26 from a normal distribution with mean 0 and standard deviation 1. Look this up in Table B in DBC pp427-429. Second decimal place of Z Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08. 2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934. 29

Another Problem What about the Proportion of players who are better two-point field goal shooters than LeBron? Density 0 2 4 6 0.3 0.4 0.5 0.6 0.7 Two Point FG % 30

The Easy Way 1 - pnorm(q = lj2, mean = mean(nba$two_pp), sd = sd(nba$two_pp)) [1] 0.0118 pnorm(q = lj2, mean = mean(nba$two_pp), sd = sd(nba$two_pp), lower.tail = FALSE) [1] 0.0118 sum(nba$two_pp >= lj2) / length(nba$two_pp) [1] 0.01187 31

The hard way White Board 32

Another Problem What proportion of NBA players shoot between 0.4 and 0.5 for two-point FG? Density 0 2 4 6 0.3 0.4 0.5 0.6 0.7 Two Point FG % 33

The Easy Way less5 <- pnorm(0.5, mean = mean(nba$two_pp), sd = sd(nba$two_pp)) less4 <- pnorm(0.4, mean = mean(nba$two_pp), sd = sd(nba$two_pp)) less5 - less4 [1] 0.5515 sum(nba$two_pp >= 0.4 & nba$two_pp <= 0.5) / length(nba$two_pp) [1] 0.5638 34

The hard way White Board 35