Announcements. Lecture 19: Inference for SLR & Transformations. Online quiz 7 - commonly missed questions

Size: px
Start display at page:

Download "Announcements. Lecture 19: Inference for SLR & Transformations. Online quiz 7 - commonly missed questions"

Transcription

1 Announcements Announcements Lecture 19: Inference for SLR & Statistics 101 Mine Çetinkaya-Rundel April 3, 2012 HW 7 due Thursday. Correlation guessing game - ends on April 12 at noon. Winner will be announced in class. Prize: +1 (out of 100) point on the final. istics.net/ stat/ correlations Group: sta101 Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28 Recap Recap Online quiz 7 - commonly missed questions Review question Which of the following is false? Question 1: In SLR, (a) residuals should be nearly normally distributed with mean at 0 (b) residuals should have non-constant variance (c) residuals vs. x plot should show a random scatter around 0 (d) the relationship between x and y should be linear, and outliers should be handled with caution Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28 Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28

2 Major league baseball Yesterday in lab you worked with 2009 MLB data. What was the best predictor of runs? runs Runs vs. On base plus slugging ob_slg Major league baseball R 2 for the regression line for predicting runs from on-base plus slugging is 91.31%. Which of the below is the correct interpretation of this value? 91.31% of (a) runs can be accurately predicted by on-base plus slugging. (b) variability in predictions of runs is explained by on-base plus slugging. (c) variability in predictions of on-base plus slugging is explained by runs. (d) variability in runs is explained by on-base plus slugging. (e) variability in on-base plus slugging is explained by runs. Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28 Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28 Understanding regression output from software Major league baseball (regression output) m = lm(runs ob_slg, data = mlb) summary(m) Call: lm(formula = runs ob_slg, data = mlb) Residuals: Min 1Q Median 3Q Max Coefficients: (Intercept) e-10 *** ob_slg < 2e-16 *** --- Residual standard error: on 28 degrees of freedom Multiple R-squared: , Adjusted R-squared: 0.91 F-statistic: on 1 and 28 DF, p-value: < 2.2e-16 Testing for the slope Clicker question Assuming that the 2009 season is representative of all MLB seasons, we would like to test if these data provide convincing evidence that the slope of the regression line for predicting runs from on-base plus slugging is different than 0. What are the appropriate hypotheses? (a) H 0 : b 0 = 0; H A : b 0 0 (b) H 0 : β 1 = 0; H A : β 1 0 (c) H 0 : b 1 = 0; H A : b 1 0 (d) H 0 : β 0 = 0; H A : β 0 0 Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28 Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28

3 Testing for the slope (cont.) Testing for the slope (cont.) (Intercept) ob slg We always use a t-test in inference for regression Remember: Test statistic, T = point estimate null value SE Point estimate = b 1 is the observed slope, and is given in the regression output SE b1 is the standard error associated with the slope, and can be calculated as (yi ŷ i ) 2 /(n 2) SE b1 = (xi x i ) 2 is also given in the regression output (and it s silly to try to calculate it by hand, just know that it s doable and why the formula works the way it does) Degrees of freedom associated with the slope is df = n 2, where n is the sample size Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28 (Intercept) ob slg T = = df = 30 2 = 28 p value = P( T > 17.15) < 0.01 Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28 % College graduate vs. % Hispanic in LA What can you say about the relationship between of % college graduate and % Hispanic in a sample of 100 zip code areas in LA? % College educated vs. % Hispanic in LA - another look What can you say about the relationship between of % college graduate and % Hispanic in a sample of 100 zip code areas in LA? Education: College graduate 1.0 Race/Ethnicity: Hispanic % % College graduate 75% 50% 25% % Freeways No data 0.0 Freeways No data 0.0 0% 25% 50% 75% 100% % Hispanic Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28 Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28

4 % College educated vs. % Hispanic in LA - linear model % College educated vs. % Hispanic in LA - linear model Clicker question Which of the below is the best interpretation of the slope? (Intercept) %Hispanic (a) A 1% increase in Hispanic residents in a zip code area in LA is associated with a 75% decrease in % of college grads. (b) A 1% increase in Hispanic residents in a zip code area in LA is associated with a 0.75% decrease in % of college grads. (c) An additional 1% of Hispanic residents decreases the % of college graduates in a zip code area in LA by 0.75%. (d) In zip code areas with no Hispanic residents, % of college graduates is expected to be 75%. Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28 Do these data provide convincing evidence that there is a statistically significant relationship between % Hispanic and % college graduates in zip code areas in LA? (Intercept) hispanic Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28 Violent crime rate vs. unemployment Relationship between violent crime rate (annual number of violent crimes per 100,000 population) and unemployment rate (% of work eligible population not working) in 51 US States (including DC): violent_crime_rate DC unemployed Note: The data are from the 2003 Statistical Abstract of the US. A 2012 version is available online, if looking for data on states for your project, it s a good resource. Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28 Violent crime rate vs. unemployment Clicker question Which of the below is the correct set of hypotheses and the p-value for testing if the slope of the relationship between violent crime rate and unemployment is positive? (Intercept) unemployed (a) H 0 :b 1 = 0 H A :b 1 0 p value = (b) H 0 :β 1 = 0 H A :β 1 > 0 p value = /2 = (c) H 0 :β 1 = 0 H A :β 1 0 p value = /2 = (d) H 0 :b 1 = 0 H A :b 1 > 0 p value = /2 = (e) H 0 :β 1 = 0 H A :β 1 0 p value = Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28

5 CI for the slope CI for the slope Confidence interval for the slope Recap Clicker question Remember that a confidence interval is calculated as point estimate±me and the degrees of freedom associated with the slope in a simple linear regression is n 2. Which of the below is the correct 95% confidence interval for the slope parameter? Note that the model is based on observations from 51 states. (a) ± (Intercept) unemployed (b) ± (c) ± (d) ± Inference for the slope for a SLR model (only one explanatory variable): Hypothesis test: Confidence interval: T = b 1 null value SE b1 df = n 2 b 1 ± t df=n 2 SE b 1 The null value is often 0 since we are usually checking for any relationship between the explanatory and the response variable The regression output gives b 1, SE b1, and two-tailed p-value for the t-test for the slope where the null value is 0 We rarely do inference on the intercept, so we ll be focusing on the estimates and inference for the slope Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28 Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28 CI for the slope An alternative statistic Caution ANOVA Always be aware of the type of data you re working with: random sample, non-random sample, or population Statistical inference, and the resulting p-values, are meaningless when you already have population data If you have a sample that is non-random (biased), the results will be unreliable The ultimate goal is to have independent observations and you know how to check for those by now We considered the t-test as a way to evaluate the strength of evidence for a hypothesis test evaluating the relationship between x and y However, we could focus on R 2 proportion of variability in the response variable (y) explained by the explanatory variable (x) A large R 2 suggests a linear relationship between x and y exists A small R 2 suggests the evidence provided by the data may not be convincing Considering the amount of explained variability is called analysis of variance (ANOVA) In SLR, where there is only one explanatory variable (and hence one slope parameter) t-test and the ANOVA yield the same result In multiple linear regression, they provide different pieces of information Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28 Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28

6 Truck prices Remove unusual observations The scatterplot below shows the relationship between year and price of a random sample of 43 pickup trucks. Describe the relationship between these two variables. Let s remove trucks older than 20 years, and only focus on trucks made in 1992 or later. Now what can you say about the relationship? price year price year From: faculty.chicagobooth.edu/ robert.gramacy/ teaching.html Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28 Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28 Truck prices - linear model? Truck prices - log transform of the response variable price residuals year Model: price = b 0 + b 1 year The linear model doesn t appear to be a good fit since the residuals have non-constant variance. residuals log(price) year Model: log(price) = b 0 +b 1 year We applied a log transformation to the response variable. The relationship now seems linear, and the residuals no longer have non-constant variance. Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28 Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28

7 Interpreting models with log transformation Working with logs (Intercept) pu$year Model: log(price) = year For each additional year the car is newer (for each year decrease in car s age) we would expect the log price of the car to increase on average by 0.14 log dollars. which is not very useful... Subtraction and logs: log(a) log(b) = log( a b ) Natural logarithm: e log(x) = x We can these identities to undo the log transformation Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28 Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28 Interpreting models with log transformation (cont.) Recap: dealing with non-constant variance The slope coefficient for the log transformed model is 0.14, meaning the log price difference between cars that are one year apart is predicted to be 0.14 log dollars. log(price at year x + 1) log(price at year x) = 0.14 ( ) price at year x + 1 log = 0.14 price at year x price at year x + 1 e log( price at year x ) price at year x + 1 price at year x = e 0.14 = 1.15 For each additional year the car is newer (for each year decrease in car s age) we would expect the price of the car to increase on average by a factor of Non-constant variance is one of the most common model violations, however it is usually fixable by transforming the response (y) variable The most common variance stabilizing transform is the log transformation: log(y) When using a log transformation on the response variable the interpretation of the slope changes: For each unit increase in x, y is expected on average to decrease/increase by a factor of e b 1. Another useful transformation is the square root: y These transformations may also be useful when the relationship is non-linear, but in those cases a polynomial regression may also be needed (this is beyond the scope of this course, but you re welcomed to try it for your project, and I d be happy to provide further guidance) Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28 Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28

8 R code # load data pu_allyrs = read.csv(" pickups.csv") # drop trucks older than 20 yrs old pu = subset(pu_allyrs, pu_allyrs$year >= 1992) # linear model plot(pu$price pu$year) m1 = lm(pu$price pu$year) abline(m1) plot(m1$residuals pu$year) # model with log transformation plot(log(pu$price ) pu$year) m2 = lm(log(pu$price ) pu$year) abline(m2) plot(m2$residuals pu$year) # model summary and interpretation of the slope coefficient summary(m2) exp(0.14) Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & April 3, / 28

Announcements. % College graduate vs. % Hispanic in LA. % College educated vs. % Hispanic in LA. Problem Set 10 Due Wednesday.

Announcements. % College graduate vs. % Hispanic in LA. % College educated vs. % Hispanic in LA. Problem Set 10 Due Wednesday. Announcements Announcements UNIT 7: MULTIPLE LINEAR REGRESSION LECTURE 1: INTRODUCTION TO MLR STATISTICS 101 Problem Set 10 Due Wednesday Nicole Dalzell June 15, 2015 Statistics 101 (Nicole Dalzell) U7

More information

Lab 11: Introduction to Linear Regression

Lab 11: Introduction to Linear Regression Lab 11: Introduction to Linear Regression Batter up The movie Moneyball focuses on the quest for the secret of success in baseball. It follows a low-budget team, the Oakland Athletics, who believed that

More information

Chapter 12 Practice Test

Chapter 12 Practice Test Chapter 12 Practice Test 1. Which of the following is not one of the conditions that must be satisfied in order to perform inference about the slope of a least-squares regression line? (a) For each value

More information

Section I: Multiple Choice Select the best answer for each problem.

Section I: Multiple Choice Select the best answer for each problem. Inference for Linear Regression Review Section I: Multiple Choice Select the best answer for each problem. 1. Which of the following is NOT one of the conditions that must be satisfied in order to perform

More information

Running head: DATA ANALYSIS AND INTERPRETATION 1

Running head: DATA ANALYSIS AND INTERPRETATION 1 Running head: DATA ANALYSIS AND INTERPRETATION 1 Data Analysis and Interpretation Final Project Vernon Tilly Jr. University of Central Oklahoma DATA ANALYSIS AND INTERPRETATION 2 Owners of the various

More information

a) List and define all assumptions for multiple OLS regression. These are all listed in section 6.5

a) List and define all assumptions for multiple OLS regression. These are all listed in section 6.5 Prof. C. M. Dalton ECN 209A Spring 2015 Practice Problems (After HW1, HW2, before HW3) CORRECTED VERSION Question 1. Draw and describe a relationship with heteroskedastic errors. Support your claim with

More information

Announcements. Unit 7: Multiple Linear Regression Lecture 3: Case Study. From last lab. Predicting income

Announcements. Unit 7: Multiple Linear Regression Lecture 3: Case Study. From last lab. Predicting income Announcements Announcements Unit 7: Multiple Linear Regression Lecture 3: Case Study Statistics 101 Mine Çetinkaya-Rundel April 18, 2013 OH: Sunday: Virtual OH, 3-4pm - you ll receive an email invitation

More information

Unit 4: Inference for numerical variables Lecture 3: ANOVA

Unit 4: Inference for numerical variables Lecture 3: ANOVA Unit 4: Inference for numerical variables Lecture 3: ANOVA Statistics 101 Thomas Leininger June 10, 2013 Announcements Announcements Proposals due tomorrow. Will be returned to you by Wednesday. You MUST

More information

y ) s x x )(y i (x i r = 1 n 1 s y Statistics Lecture 7 Exploring Data , y 2 ,y n (x 1 ),,(x n ),(x 2 ,y 1 How two variables vary together

y ) s x x )(y i (x i r = 1 n 1 s y Statistics Lecture 7 Exploring Data , y 2 ,y n (x 1 ),,(x n ),(x 2 ,y 1 How two variables vary together Statistics 111 - Lecture 7 Exploring Data Numerical Summaries for Relationships between Variables Administrative Notes Homework 1 due in recitation: Friday, Feb. 5 Homework 2 now posted on course website:

More information

Unit4: Inferencefornumericaldata 4. ANOVA. Sta Spring Duke University, Department of Statistical Science

Unit4: Inferencefornumericaldata 4. ANOVA. Sta Spring Duke University, Department of Statistical Science Unit4: Inferencefornumericaldata 4. ANOVA Sta 101 - Spring 2016 Duke University, Department of Statistical Science Dr. Çetinkaya-Rundel Slides posted at http://bit.ly/sta101_s16 Outline 1. Housekeeping

More information

The Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

The Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD The Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD Outline Definition. Deriving the Estimates. Properties of the Estimates. Units of Measurement and Functional Form. Expected

More information

Navigate to the golf data folder and make it your working directory. Load the data by typing

Navigate to the golf data folder and make it your working directory. Load the data by typing Golf Analysis 1.1 Introduction In a round, golfers have a number of choices to make. For a particular shot, is it better to use the longest club available to try to reach the green, or would it be better

More information

Correlation and regression using the Lahman database for baseball Michael Lopez, Skidmore College

Correlation and regression using the Lahman database for baseball Michael Lopez, Skidmore College Correlation and regression using the Lahman database for baseball Michael Lopez, Skidmore College Overview The Lahman package is a gold mine for statisticians interested in studying baseball. In today

More information

Pitching Performance and Age

Pitching Performance and Age Pitching Performance and Age By: Jaime Craig, Avery Heilbron, Kasey Kirschner, Luke Rector, Will Kunin Introduction April 13, 2016 Many of the oldest players and players with the most longevity of the

More information

Name May 3, 2007 Math Probability and Statistics

Name May 3, 2007 Math Probability and Statistics Name May 3, 2007 Math 341 - Probability and Statistics Long Exam IV Instructions: Please include all relevant work to get full credit. Encircle your final answers. 1. An article in Professional Geographer

More information

Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA

Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only

More information

Distancei = BrandAi + 2 BrandBi + 3 BrandCi + i

Distancei = BrandAi + 2 BrandBi + 3 BrandCi + i . Suppose that the United States Golf Associate (USGA) wants to compare the mean distances traveled by four brands of golf balls when struck by a driver. A completely randomized design is employed with

More information

Data Set 7: Bioerosion by Parrotfish Background volume of bites The question:

Data Set 7: Bioerosion by Parrotfish Background volume of bites The question: Data Set 7: Bioerosion by Parrotfish Background Bioerosion of coral reefs results from animals taking bites out of the calcium-carbonate skeleton of the reef. Parrotfishes are major bioerosion agents,

More information

Analysis of Variance. Copyright 2014 Pearson Education, Inc.

Analysis of Variance. Copyright 2014 Pearson Education, Inc. Analysis of Variance 12-1 Learning Outcomes Outcome 1. Understand the basic logic of analysis of variance. Outcome 2. Perform a hypothesis test for a single-factor design using analysis of variance manually

More information

ISDS 4141 Sample Data Mining Work. Tool Used: SAS Enterprise Guide

ISDS 4141 Sample Data Mining Work. Tool Used: SAS Enterprise Guide ISDS 4141 Sample Data Mining Work Taylor C. Veillon Tool Used: SAS Enterprise Guide You may have seen the movie, Moneyball, about the Oakland A s baseball team and general manager, Billy Beane, who focused

More information

Stat 139 Homework 3 Solutions, Spring 2015

Stat 139 Homework 3 Solutions, Spring 2015 Stat 39 Homework 3 Solutions, Spring 05 Problem. Let i Nµ, σ ) for i,..., n, and j Nµ, σ ) for j,..., n. Also, assume that all observations are independent from each other. In Unit 4, we learned that the

More information

Pitching Performance and Age

Pitching Performance and Age Pitching Performance and Age Jaime Craig, Avery Heilbron, Kasey Kirschner, Luke Rector and Will Kunin Introduction April 13, 2016 Many of the oldest and most long- term players of the game are pitchers.

More information

ASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era. Gary Evans Stat 201B Winter, 2010

ASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era. Gary Evans Stat 201B Winter, 2010 ASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era by Gary Evans Stat 201B Winter, 2010 Introduction: After a playerʼs strike in 1994 which resulted

More information

Novel empirical correlations for estimation of bubble point pressure, saturated viscosity and gas solubility of crude oils

Novel empirical correlations for estimation of bubble point pressure, saturated viscosity and gas solubility of crude oils 86 Pet.Sci.(29)6:86-9 DOI 1.17/s12182-9-16-x Novel empirical correlations for estimation of bubble point pressure, saturated viscosity and gas solubility of crude oils Ehsan Khamehchi 1, Fariborz Rashidi

More information

Midterm Exam 1, section 2. Thursday, September hour, 15 minutes

Midterm Exam 1, section 2. Thursday, September hour, 15 minutes San Francisco State University Michael Bar ECON 312 Fall 2018 Midterm Exam 1, section 2 Thursday, September 27 1 hour, 15 minutes Name: Instructions 1. This is closed book, closed notes exam. 2. You can

More information

Week 7 One-way ANOVA

Week 7 One-way ANOVA Week 7 One-way ANOVA Objectives By the end of this lecture, you should be able to: Understand the shortcomings of comparing multiple means as pairs of hypotheses. Understand the steps of the ANOVA method

More information

ISyE 6414 Regression Analysis

ISyE 6414 Regression Analysis ISyE 6414 Regression Analysis Lecture 2: More Simple linear Regression: R-squared (coefficient of variation/determination) Correlation analysis: Pearson s correlation Spearman s rank correlation Variable

More information

Building an NFL performance metric

Building an NFL performance metric Building an NFL performance metric Seonghyun Paik (spaik1@stanford.edu) December 16, 2016 I. Introduction In current pro sports, many statistical methods are applied to evaluate player s performance and

More information

Statistical Analysis of PGA Tour Skill Rankings USGA Research and Test Center June 1, 2007

Statistical Analysis of PGA Tour Skill Rankings USGA Research and Test Center June 1, 2007 Statistical Analysis of PGA Tour Skill Rankings 198-26 USGA Research and Test Center June 1, 27 1. Introduction The PGA Tour has recorded and published Tour Player performance statistics since 198. All

More information

Stats 2002: Probabilities for Wins and Losses of Online Gambling

Stats 2002: Probabilities for Wins and Losses of Online Gambling Abstract: Jennifer Mateja Andrea Scisinger Lindsay Lacher Stats 2002: Probabilities for Wins and Losses of Online Gambling The objective of this experiment is to determine whether online gambling is a

More information

Minimal influence of wind and tidal height on underwater noise in Haro Strait

Minimal influence of wind and tidal height on underwater noise in Haro Strait Minimal influence of wind and tidal height on underwater noise in Haro Strait Introduction Scott Veirs, Beam Reach Val Veirs, Colorado College December 2, 2007 Assessing the effect of wind and currents

More information

Lesson 14: Modeling Relationships with a Line

Lesson 14: Modeling Relationships with a Line Exploratory Activity: Line of Best Fit Revisited 1. Use the link http://illuminations.nctm.org/activity.aspx?id=4186 to explore how the line of best fit changes depending on your data set. A. Enter any

More information

Driv e accu racy. Green s in regul ation

Driv e accu racy. Green s in regul ation LEARNING ACTIVITIES FOR PART II COMPILED Statistical and Measurement Concepts We are providing a database from selected characteristics of golfers on the PGA Tour. Data are for 3 of the players, based

More information

One-factor ANOVA by example

One-factor ANOVA by example ANOVA One-factor ANOVA by example 2 One-factor ANOVA by visual inspection 3 4 One-factor ANOVA H 0 H 0 : µ 1 = µ 2 = µ 3 = H A : not all means are equal 5 One-factor ANOVA but why not t-tests t-tests?

More information

Evaluation of Regression Approaches for Predicting Yellow Perch (Perca flavescens) Recreational Harvest in Ohio Waters of Lake Erie

Evaluation of Regression Approaches for Predicting Yellow Perch (Perca flavescens) Recreational Harvest in Ohio Waters of Lake Erie Evaluation of Regression Approaches for Predicting Yellow Perch (Perca flavescens) Recreational Harvest in Ohio Waters of Lake Erie QFC Technical Report T2010-01 Prepared for: Ohio Department of Natural

More information

1. Answer this student s question: Is a random sample of 5% of the students at my school large enough, or should I use 10%?

1. Answer this student s question: Is a random sample of 5% of the students at my school large enough, or should I use 10%? Econ 57 Gary Smith Fall 2011 Final Examination (150 minutes) No calculators allowed. Just set up your answers, for example, P = 49/52. BE SURE TO EXPLAIN YOUR REASONING. If you want extra time, you can

More information

Select Boxplot -> Multiple Y's (simple) and select all variable names.

Select Boxplot -> Multiple Y's (simple) and select all variable names. One Factor ANOVA in Minitab As an example, we will use the data below. A study looked at the days spent in the hospital for different regions of the United States. Can the company reject the claim the

More information

Class 23: Chapter 14 & Nested ANOVA NOTES: NOTES: NOTES:

Class 23: Chapter 14 & Nested ANOVA NOTES: NOTES: NOTES: Slide 1 Chapter 13: ANOVA for 2-way classifications (2 of 2) Fixed and Random factors, Model I, Model II, and Model III (mixed model) ANOVA Chapter 14: Unreplicated Factorial & Nested Designs Slide 2 HW

More information

Math 121 Test Questions Spring 2010 Chapters 13 and 14

Math 121 Test Questions Spring 2010 Chapters 13 and 14 Math 121 Test Questions Spring 2010 Chapters 13 and 14 1. (10 pts) The first-semester enrollment at HSC was 1120 students. If all entering freshmen classes were the same size and there were no attrition,

More information

An Analysis of the Effects of Long-Term Contracts on Performance in Major League Baseball

An Analysis of the Effects of Long-Term Contracts on Performance in Major League Baseball An Analysis of the Effects of Long-Term Contracts on Performance in Major League Baseball Zachary Taylor 1 Haverford College Department of Economics Advisor: Dave Owens Spring 2016 Abstract: This study

More information

Sample Final Exam MAT 128/SOC 251, Spring 2018

Sample Final Exam MAT 128/SOC 251, Spring 2018 Sample Final Exam MAT 128/SOC 251, Spring 2018 Name: Each question is worth 10 points. You are allowed one 8 1/2 x 11 sheet of paper with hand-written notes on both sides. 1. The CSV file citieshistpop.csv

More information

The Reliability of Intrinsic Batted Ball Statistics Appendix

The Reliability of Intrinsic Batted Ball Statistics Appendix The Reliability of ntrinsic Batted Ball Statistics Appendix Glenn Healey, EECS Department University of California, rvine, CA 92617 Given information about batted balls for a set of players, we review

More information

Failure Data Analysis for Aircraft Maintenance Planning

Failure Data Analysis for Aircraft Maintenance Planning Failure Data Analysis for Aircraft Maintenance Planning M. Tozan, A. Z. Al-Garni, A. M. Al-Garni, and A. Jamal Aerospace Engineering Department King Fahd University of Petroleum and Minerals Abstract This

More information

Accident data analysis using Statistical methods A case study of Indian Highway

Accident data analysis using Statistical methods A case study of Indian Highway Accident data analysis using Statistical methods A case study of Indian Highway Rahul Badgujar 1, Priyam Mishra 2, Mayank Chandra 3, Sayali Sandbhor 4, Humera Khanum 5 1,2,3 Undergraduate scholars, Department

More information

Multilevel Models for Other Non-Normal Outcomes in Mplus v. 7.11

Multilevel Models for Other Non-Normal Outcomes in Mplus v. 7.11 Multilevel Models for Other Non-Normal Outcomes in Mplus v. 7.11 Study Overview: These data come from a daily diary study that followed 41 male and female college students over a six-week period to examine

More information

Copy of my report. Why am I giving this talk. Overview. State highway network

Copy of my report. Why am I giving this talk. Overview. State highway network Road Surface characteristics and traffic accident rates on New Zealand s state highway network Robert Davies Statistics Research Associates http://www.statsresearch.co.nz Copy of my report There is a copy

More information

On the association of inrun velocity and jumping width in ski. jumping

On the association of inrun velocity and jumping width in ski. jumping On the association of inrun velocity and jumping width in ski jumping Oliver Kuss Institute of Medical Epidemiology, Biostatistics, and Informatics University of Halle-Wittenberg, 06097 Halle (Saale),

More information

Business Cycles. Chris Edmond NYU Stern. Spring 2007

Business Cycles. Chris Edmond NYU Stern. Spring 2007 Business Cycles Chris Edmond NYU Stern Spring 2007 1 Overview Business cycle properties GDP does not grow smoothly: booms and recessions categorize other variables relative to GDP look at correlation,

More information

A few things to remember about ANOVA

A few things to remember about ANOVA A few things to remember about ANOVA 1) The F-test that is performed is always 1-tailed. This is because your alternative hypothesis is always that the between group variation is greater than the within

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 8 1 / 22 Tip + Paper Tip In data analysis - particularly

More information

Legendre et al Appendices and Supplements, p. 1

Legendre et al Appendices and Supplements, p. 1 Legendre et al. 2010 Appendices and Supplements, p. 1 Appendices and Supplement to: Legendre, P., M. De Cáceres, and D. Borcard. 2010. Community surveys through space and time: testing the space-time interaction

More information

Equation 1: F spring = kx. Where F is the force of the spring, k is the spring constant and x is the displacement of the spring. Equation 2: F = mg

Equation 1: F spring = kx. Where F is the force of the spring, k is the spring constant and x is the displacement of the spring. Equation 2: F = mg 1 Introduction Relationship between Spring Constant and Length of Bungee Cord In this experiment, we aimed to model the behavior of the bungee cord that will be used in the Bungee Challenge. Specifically,

More information

Effect of homegrown players on professional sports teams

Effect of homegrown players on professional sports teams Effect of homegrown players on professional sports teams ISYE 2028 Rahul Patel 902949215 Problem Description: Football is commonly referred to as America s favorite pastime. However, for thousands of people

More information

Math SL Internal Assessment What is the relationship between free throw shooting percentage and 3 point shooting percentages?

Math SL Internal Assessment What is the relationship between free throw shooting percentage and 3 point shooting percentages? Math SL Internal Assessment What is the relationship between free throw shooting percentage and 3 point shooting percentages? fts6 Introduction : Basketball is a sport where the players have to be adept

More information

One-way ANOVA: round, narrow, wide

One-way ANOVA: round, narrow, wide 5/4/2009 9:19:18 AM Retrieving project from file: 'C:\DOCUMENTS AND SETTINGS\BOB S\DESKTOP\RJS\COURSES\MTAB\FIRSTBASE.MPJ' ========================================================================== This

More information

Lecture 22: Multiple Regression (Ordinary Least Squares -- OLS)

Lecture 22: Multiple Regression (Ordinary Least Squares -- OLS) Statistics 22_multiple_regression.pdf Michael Hallstone, Ph.D. hallston@hawaii.edu Lecture 22: Multiple Regression (Ordinary Least Squares -- OLS) Some Common Sense Assumptions for Multiple Regression

More information

An Empirical Comparison of Regression Analysis Strategies with Discrete Ordinal Variables

An Empirical Comparison of Regression Analysis Strategies with Discrete Ordinal Variables Kromrey & Rendina-Gobioff An Empirical Comparison of Regression Analysis Strategies with Discrete Ordinal Variables Jeffrey D. Kromrey Gianna Rendina-Gobioff University of South Florida The Type I error

More information

Lab #12:Boyle s Law, Dec. 20, 2016 Pressure-Volume Relationship in Gases

Lab #12:Boyle s Law, Dec. 20, 2016 Pressure-Volume Relationship in Gases Chemistry Unit 6:States of Matter & Basic Gas Laws Name Lab Partner Lab #12:Boyle s Law, Dec. 20, 2016 Pressure-Volume Relationship in Gases Purpose: The primary objective of this experiment is to determine

More information

A N E X P L O R AT I O N W I T H N E W Y O R K C I T Y TA X I D ATA S E T

A N E X P L O R AT I O N W I T H N E W Y O R K C I T Y TA X I D ATA S E T A N E X P L O R AT I O N W I T H N E W Y O R K C I T Y TA X I D ATA S E T GAO, Zheng 26 May 2016 Abstract The data analysis is two-part: an exploratory data analysis, and an attempt to answer an inferential

More information

Sports Predictive Analytics: NFL Prediction Model

Sports Predictive Analytics: NFL Prediction Model Sports Predictive Analytics: NFL Prediction Model By Dr. Ash Pahwa IEEE Computer Society San Diego Chapter January 17, 2017 Copyright 2017 Dr. Ash Pahwa 1 Outline Case Studies of Sports Analytics Sports

More information

Standardized catch rates of yellowtail snapper ( Ocyurus chrysurus

Standardized catch rates of yellowtail snapper ( Ocyurus chrysurus Standardized catch rates of yellowtail snapper (Ocyurus chrysurus) from the Marine Recreational Fisheries Statistics Survey in south Florida, 1981-2010 Introduction Yellowtail snapper are caught by recreational

More information

Introduction. Forestry, Wildlife and Fisheries Graduate Seminar Demand for Wildlife Hunting in the Southeastern United States

Introduction. Forestry, Wildlife and Fisheries Graduate Seminar Demand for Wildlife Hunting in the Southeastern United States Forestry, Wildlife and Fisheries Graduate Seminar Demand for Wildlife Hunting in the Southeastern United States Presented by: Neelam C. Poudyal Monday, 19 November, 2007 4:40 PM 160 PBB Introduction Hunting

More information

Empirical Example II of Chapter 7

Empirical Example II of Chapter 7 Empirical Example II of Chapter 7 1. We use NBA data. The description of variables is --- --- --- storage display value variable name type format label variable label marr byte %9.2f =1 if married wage

More information

Modeling Pedestrian Volumes on College Campuses

Modeling Pedestrian Volumes on College Campuses TRANSPORTATION RESEARCH RECORD 1405 43 Modeling Pedestrian Volumes on College Campuses LAURAL. COVE AND J. EDWIN CLARK A study was undertaken to develop a reliable method for obtaining reasonable estimates

More information

AP 11.1 Notes WEB.notebook March 25, 2014

AP 11.1 Notes WEB.notebook March 25, 2014 11.1 Chi Square Tests (Day 1) vocab *new seats* examples Objectives Comparing Observed & Expected Counts measurements of a categorical variable (ex/ color of M&Ms) Use Chi Square Goodness of Fit Test Must

More information

ANOVA - Implementation.

ANOVA - Implementation. ANOVA - Implementation http://www.pelagicos.net/classes_biometry_fa17.htm Doing an ANOVA With RCmdr Categorical Variable One-Way ANOVA Testing a single Factor dose with 3 treatments (low, mid, high) Doing

More information

4-3 Rate of Change and Slope. Warm Up. 1. Find the x- and y-intercepts of 2x 5y = 20. Describe the correlation shown by the scatter plot. 2.

4-3 Rate of Change and Slope. Warm Up. 1. Find the x- and y-intercepts of 2x 5y = 20. Describe the correlation shown by the scatter plot. 2. Warm Up 1. Find the x- and y-intercepts of 2x 5y = 20. Describe the correlation shown by the scatter plot. 2. Objectives Find rates of change and slopes. Relate a constant rate of change to the slope of

More information

Is lung capacity affected by smoking, sport, height or gender. Table of contents

Is lung capacity affected by smoking, sport, height or gender. Table of contents Sample project This Maths Studies project has been graded by a moderator. As you read through it, you will see comments from the moderator in boxes like this: At the end of the sample project is a summary

More information

Journal of Human Sport and Exercise E-ISSN: Universidad de Alicante España

Journal of Human Sport and Exercise E-ISSN: Universidad de Alicante España Journal of Human Sport and Exercise E-ISSN: 1988-5202 jhse@ua.es Universidad de Alicante España SOÓS, ISTVÁN; FLORES MARTÍNEZ, JOSÉ CARLOS; SZABO, ATTILA Before the Rio Games: A retrospective evaluation

More information

Measuring Batting Performance

Measuring Batting Performance Measuring Batting Performance Authors: Samantha Attar, Hannah Dineen, Andy Fullerton, Nora Hanson, Cam Kelso, Katie McLaughlin, and Caitlyn Nolan Introduction: The following analysis compares slugging

More information

4. A student estimated a regression model using annual data for 1990 through 2015, C = β 0. Y + β 2

4. A student estimated a regression model using annual data for 1990 through 2015, C = β 0. Y + β 2 Econ 57 Gary Smith Spring 2017 Final Examination (150 minutes) No calculators allowed. Just set up your answers, for example, P = 49/52. BE SURE TO EXPLAIN YOUR REASONING. If you want extra time, you can

More information

Warm-up. Make a bar graph to display these data. What additional information do you need to make a pie chart?

Warm-up. Make a bar graph to display these data. What additional information do you need to make a pie chart? Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group were accidents, 12,958; homicide, 5,793; suicide,

More information

100-Meter Dash Olympic Winning Times: Will Women Be As Fast As Men?

100-Meter Dash Olympic Winning Times: Will Women Be As Fast As Men? 100-Meter Dash Olympic Winning Times: Will Women Be As Fast As Men? The 100 Meter Dash has been an Olympic event since its very establishment in 1896(1928 for women). The reigning 100-meter Olympic champion

More information

CHAPTER ANALYSIS AND INTERPRETATION Average total number of collisions for a try to be scored

CHAPTER ANALYSIS AND INTERPRETATION Average total number of collisions for a try to be scored CHAPTER 8 8.1 ANALYSIS AND INTERPRETATION As mentioned in the previous chapter, four key components have been identified as indicators of the level of significance of dominant collisions when evaluating

More information

Quantitative Methods for Economics Tutorial 6. Katherine Eyal

Quantitative Methods for Economics Tutorial 6. Katherine Eyal Quantitative Methods for Economics Tutorial 6 Katherine Eyal TUTORIAL 6 13 September 2010 ECO3021S Part A: Problems 1. (a) In 1857, the German statistician Ernst Engel formulated his famous law: Households

More information

Major League Baseball Offensive Production in the Designated Hitter Era (1973 Present)

Major League Baseball Offensive Production in the Designated Hitter Era (1973 Present) Major League Baseball Offensive Production in the Designated Hitter Era (1973 Present) Jonathan Tung University of California, Riverside tung.jonathanee@gmail.com Abstract In Major League Baseball, there

More information

Real-Time Electricity Pricing

Real-Time Electricity Pricing Real-Time Electricity Pricing Xi Chen, Jonathan Hosking and Soumyadip Ghosh IBM Watson Research Center / Northwestern University Yorktown Heights, NY, USA X. Chen, J. Hosking & S. Ghosh (IBM) Real-Time

More information

DISMAS Evaluation: Dr. Elizabeth C. McMullan. Grambling State University

DISMAS Evaluation: Dr. Elizabeth C. McMullan. Grambling State University DISMAS Evaluation 1 Running head: Project Dismas Evaluation DISMAS Evaluation: 2007 2008 Dr. Elizabeth C. McMullan Grambling State University DISMAS Evaluation 2 Abstract An offender notification project

More information

Factorial Analysis of Variance

Factorial Analysis of Variance Factorial Analysis of Variance Overview of the Factorial ANOVA Factorial ANOVA (Two-Way) In the context of ANOVA, an independent variable (or a quasiindependent variable) is called a factor, and research

More information

4-3 Rate of Change and Slope. Warm Up Lesson Presentation. Lesson Quiz

4-3 Rate of Change and Slope. Warm Up Lesson Presentation. Lesson Quiz 4-3 Rate of Change and Slope Warm Up Lesson Presentation Lesson Quiz Holt Algebra McDougal 1 Algebra 1 Warm Up 1. Find the x- and y-intercepts of 2x 5y = 20. x-int.: 10; y-int.: 4 Describe the correlation

More information

Taking Your Class for a Walk, Randomly

Taking Your Class for a Walk, Randomly Taking Your Class for a Walk, Randomly Daniel Kaplan Macalester College Oct. 27, 2009 Overview of the Activity You are going to turn your students into an ensemble of random walkers. They will start at

More information

Applying Hooke s Law to Multiple Bungee Cords. Introduction

Applying Hooke s Law to Multiple Bungee Cords. Introduction Applying Hooke s Law to Multiple Bungee Cords Introduction Hooke s Law declares that the force exerted on a spring is proportional to the amount of stretch or compression on the spring, is always directed

More information

GENETICS OF RACING PERFORMANCE IN THE AMERICAN QUARTER HORSE: II. ADJUSTMENT FACTORS AND CONTEMPORARY GROUPS 1'2

GENETICS OF RACING PERFORMANCE IN THE AMERICAN QUARTER HORSE: II. ADJUSTMENT FACTORS AND CONTEMPORARY GROUPS 1'2 GENETICS OF RACING PERFORMANCE IN THE AMERICAN QUARTER HORSE: II. ADJUSTMENT FACTORS AND CONTEMPORARY GROUPS 1'2 S. T. Buttram 3, R. L. Willham 4 and D. E. Wilson 4 Iowa State University, Ames 50011 ABSTRACT

More information

The Impact of Narrow Lane on Safety of the Arterial Roads. Hyeonsup Lim

The Impact of Narrow Lane on Safety of the Arterial Roads. Hyeonsup Lim The Impact of Narrow Lane on Safety of the Arterial Roads Hyeonsup Lim What do we know about Narrow Lane? AASHTO Green book, lane widths may vary from 10 to 12 feet for rural and urban arterials. NCHRP

More information

Math 4. Unit 1: Conic Sections Lesson 1.1: What Is a Conic Section?

Math 4. Unit 1: Conic Sections Lesson 1.1: What Is a Conic Section? Unit 1: Conic Sections Lesson 1.1: What Is a Conic Section? 1.1.1: Study - What is a Conic Section? Duration: 50 min 1.1.2: Quiz - What is a Conic Section? Duration: 25 min / 18 Lesson 1.2: Geometry of

More information

Robust specification testing in regression: the FRESET test and autocorrelated disturbances

Robust specification testing in regression: the FRESET test and autocorrelated disturbances Robust specification testing in regression: the FRESET test and autocorrelated disturbances Linda F. DeBenedictis and David E. A. Giles * Policy and Research Division, Ministry of Human Resources, 614

More information

Political Science 30: Political Inquiry Section 5

Political Science 30: Political Inquiry Section 5 Political Science 30: Political Inquiry Section 5 Taylor Carlson tncarlson@ucsd.edu Link to Stats Motivation of the Week They ve done studies, you know. 60% of the time, it works every time. Brian, Anchorman

More information

Competitive Performance of Elite Olympic-Distance Triathletes: Reliability and Smallest Worthwhile Enhancement

Competitive Performance of Elite Olympic-Distance Triathletes: Reliability and Smallest Worthwhile Enhancement SPORTSCIENCE sportsci.org Original Research / Performance Competitive Performance of Elite Olympic-Distance Triathletes: Reliability and Smallest Worthwhile Enhancement Carl D Paton, Will G Hopkins Sportscience

More information

Measuring Returns to Scale in Nineteenth-Century French Industry Technical Appendix

Measuring Returns to Scale in Nineteenth-Century French Industry Technical Appendix Measuring Returns to Scale in Nineteenth-Century French Industry Technical Appendix Ulrich Doraszelski Hoover Institution, Stanford University March 2004 Formal Derivations Gross-output vs value-added

More information

FIRST NAME: (PRINT ABOVE (UNDERNEATH LAST NAME) IN CAPITALS)

FIRST NAME: (PRINT ABOVE (UNDERNEATH LAST NAME) IN CAPITALS) ANSWER SHEET FINAL EXAM MATH 111 SPRING 2010 FRIDAY 30 APRIL 2010 8AM-NOON LAST NAME: (PRINT AT TOP IN LARGE CAPITALS) FIRST NAME: (PRINT ABOVE (UNDERNEATH LAST NAME) IN CAPITALS) CIRCLE LAB DAY: TUESDAY

More information

Effects of Traffic Condition (v/c) on Safety at Freeway Facility Sections

Effects of Traffic Condition (v/c) on Safety at Freeway Facility Sections Effects of Traffic Condition (v/c) on Safety at Freeway Facility Sections JAENAM CHANG Engineer, Korea Engineering Consultants Corp., Korea CHEOL OH Graduate Student Researcher, University of California,

More information

IDENTIFYING SUBJECTIVE VALUE IN WOMEN S COLLEGE GOLF RECRUITING REGARDLESS OF SOCIO-ECONOMIC CLASS. Victoria Allred

IDENTIFYING SUBJECTIVE VALUE IN WOMEN S COLLEGE GOLF RECRUITING REGARDLESS OF SOCIO-ECONOMIC CLASS. Victoria Allred IDENTIFYING SUBJECTIVE VALUE IN WOMEN S COLLEGE GOLF RECRUITING REGARDLESS OF SOCIO-ECONOMIC CLASS by Victoria Allred A Senior Honors Project Presented to the Honors College East Carolina University In

More information

United States Commercial Vertical Line Vessel Standardized Catch Rates of Red Grouper in the US South Atlantic,

United States Commercial Vertical Line Vessel Standardized Catch Rates of Red Grouper in the US South Atlantic, SEDAR19-DW-14 United States Commercial Vertical Line Vessel Standardized Catch Rates of Red Grouper in the US South Atlantic, 1993-2008 Kevin McCarthy and Neil Baertlein National Marine Fisheries Service,

More information

NBA TEAM SYNERGY RESEARCH REPORT 1

NBA TEAM SYNERGY RESEARCH REPORT 1 NBA TEAM SYNERGY RESEARCH REPORT 1 NBA Team Synergy and Style of Play Analysis Karrie Lopshire, Michael Avendano, Amy Lee Wang University of California Los Angeles June 3, 2016 NBA TEAM SYNERGY RESEARCH

More information

FORECASTING BATTER PERFORMANCE USING STATCAST DATA IN MAJOR LEAGUE BASEBALL

FORECASTING BATTER PERFORMANCE USING STATCAST DATA IN MAJOR LEAGUE BASEBALL FORECASTING BATTER PERFORMANCE USING STATCAST DATA IN MAJOR LEAGUE BASEBALL A Thesis Submitted to the Graduate Faculty of the North Dakota State University of Agriculture and Applied Science By Nicholas

More information

Guide to Computing Minitab commands used in labs (mtbcode.out)

Guide to Computing Minitab commands used in labs (mtbcode.out) Guide to Computing Minitab commands used in labs (mtbcode.out) A full listing of Minitab commands can be found by invoking the HELP command while running Minitab. A reference card, with listing of available

More information

(per 100,000 residents) Cancer Deaths

(per 100,000 residents) Cancer Deaths Unit 3 Lesson 2 Investigation 2 Radioactive Waste Exposure Cancer Deaths (per 100,000 residents) 250 200 150 100 Name: 50 0 0 5 10 15 Index of Exposure a. Describe the direction and strength of the relationship.

More information

Setting up group models Part 1 NITP, 2011

Setting up group models Part 1 NITP, 2011 Setting up group models Part 1 NITP, 2011 What is coming up Crash course in setting up models 1-sample and 2-sample t-tests Paired t-tests ANOVA! Mean centering covariates Identifying rank deficient matrices

More information

Initial Mortality of Black Bass in B.A.S.S. Fishing Tournaments

Initial Mortality of Black Bass in B.A.S.S. Fishing Tournaments North American Journal of Fisheries Management 22:950 954, 2002 Copyright by the American Fisheries Society 2002 Initial Mortality of Black Bass in B.A.S.S. Fishing Tournaments GENE R. WILDE,* CALUB E.

More information

Systematic Review and Meta-analysis of Bicycle Helmet Efficacy to Mitigate Head, Face and Neck Injuries

Systematic Review and Meta-analysis of Bicycle Helmet Efficacy to Mitigate Head, Face and Neck Injuries Systematic Review and Meta-analysis of Bicycle Helmet Efficacy to Mitigate Head, Face and Neck Injuries Prudence Creighton & Jake Olivier MATHEMATICS & THE UNIVERSITY OF NEW STATISTICS SOUTH WALES Creighton

More information