Stat 139 Homework 3 Solutions, Spring 2015

Similar documents
Name May 3, 2007 Math Probability and Statistics

Unit 4: Inference for numerical variables Lecture 3: ANOVA

Data Set 7: Bioerosion by Parrotfish Background volume of bites The question:

Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA

Week 7 One-way ANOVA

Analysis of Variance. Copyright 2014 Pearson Education, Inc.

Chapter 12 Practice Test

Distancei = BrandAi + 2 BrandBi + 3 BrandCi + i

1. In a hypothesis test involving two-samples, the hypothesized difference in means must be 0. True. False

A few things to remember about ANOVA

Stats 2002: Probabilities for Wins and Losses of Online Gambling

Section I: Multiple Choice Select the best answer for each problem.

Legendre et al Appendices and Supplements, p. 1

AP 11.1 Notes WEB.notebook March 25, 2014

Unit4: Inferencefornumericaldata 4. ANOVA. Sta Spring Duke University, Department of Statistical Science

STANDARD SCORES AND THE NORMAL DISTRIBUTION

Announcements. Lecture 19: Inference for SLR & Transformations. Online quiz 7 - commonly missed questions

1. Answer this student s question: Is a random sample of 5% of the students at my school large enough, or should I use 10%?

Select Boxplot -> Multiple Y's (simple) and select all variable names.

PGA Tour Scores as a Gaussian Random Variable

PLANNED ORTHOGONAL CONTRASTS

Warm-up. Make a bar graph to display these data. What additional information do you need to make a pie chart?

Running head: DATA ANALYSIS AND INTERPRETATION 1

Is lung capacity affected by smoking, sport, height or gender. Table of contents

Descriptive Stats. Review

MGB 203B Homework # LSD = 1 1

Minimum Mean-Square Error (MMSE) and Linear MMSE (LMMSE) Estimation

Reminders. Homework scores will be up by tomorrow morning. Please me and the TAs with any grading questions by tomorrow at 5pm

y ) s x x )(y i (x i r = 1 n 1 s y Statistics Lecture 7 Exploring Data , y 2 ,y n (x 1 ),,(x n ),(x 2 ,y 1 How two variables vary together

May 11, 2005 (A) Name: SSN: Section # Instructors : A. Jain, H. Khan, K. Rappaport

One-factor ANOVA by example

Modelling and Simulation of Environmental Disturbances

5.1 Introduction. Learning Objectives

NCSS Statistical Software

Political Science 30: Political Inquiry Section 5

APPENDIX A COMPUTATIONALLY GENERATED RANDOM DIGITS 748 APPENDIX C CHI-SQUARE RIGHT-HAND TAIL PROBABILITIES 754

MATH 118 Chapter 5 Sample Exam By: Maan Omran

Taking Your Class for a Walk, Randomly

Chapter 1 Test B. 4. What are two advantages of using simulation techniques instead of actual situations?

How are the values related to each other? Are there values that are General Education Statistics

NAME: Math 403 Final Exam 12/10/08 15 questions, 150 points. You may use calculators and tables of z, t values on this exam.

Probability & Statistics - Solutions

Unit 3 - Data. Grab a new packet from the chrome book cart. Unit 3 Day 1 PLUS Box and Whisker Plots.notebook September 28, /28 9/29 9/30?

(c) The hospital decided to collect the data from the first 50 patients admitted on July 4, 2010.

CHAPTER ANALYSIS AND INTERPRETATION Average total number of collisions for a try to be scored

The Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

In the actual exam, you will be given more space to work each problem, so work these problems on separate sheets.

Statistical Analysis of PGA Tour Skill Rankings USGA Research and Test Center June 1, 2007

STAT 625: 2000 Olympic Diving Exploration

ASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era. Gary Evans Stat 201B Winter, 2010

PRACTICAL EXPLANATION OF THE EFFECT OF VELOCITY VARIATION IN SHAPED PROJECTILE PAINTBALL MARKERS. Document Authors David Cady & David Williams

Section 5.1 Randomness, Probability, and Simulation

Navigate to the golf data folder and make it your working directory. Load the data by typing

b) (2 pts.) Does the study show that drinking 4 or more cups of coffee a day caused the higher death rate?

That pesky golf game and the dreaded stats class

Name Date Period. E) Lowest score: 67, mean: 104, median: 112, range: 83, IQR: 102, Q1: 46, SD: 17

Driv e accu racy. Green s in regul ation

Generating Power in the Pool: An Analysis of Strength Conditioning and its Effect on Athlete Performance

An Empirical Comparison of Regression Analysis Strategies with Discrete Ordinal Variables

One-way ANOVA: round, narrow, wide

The Reliability of Intrinsic Batted Ball Statistics Appendix

Factorial Analysis of Variance

Today s plan: Section 4.2: Normal Distribution

Nonlife Actuarial Models. Chapter 7 Bühlmann Credibility

Fishery Resource Grant Program Final Report 2010

AP STATISTICS Name Chapter 6 Applications Period: Use summary statistics to answer the question. Solve the problem.

Exploring Measures of Central Tendency (mean, median and mode) Exploring range as a measure of dispersion

Sample Final Exam MAT 128/SOC 251, Spring 2018

Smoothing the histogram: The Normal Curve (Chapter 8)

Setting up group models Part 1 NITP, 2011

Practice Test Unit 6B/11A/11B: Probability and Logic

(JUN10SS0501) General Certificate of Education Advanced Level Examination June Unit Statistics TOTAL.

THE GRIND REPORT. 102 golfers evaluated for Sole Grind performance over 3 days at London Golf Club. Custom Grind Test Data

3.3 - Measures of Position

Lesson 14: Modeling Relationships with a Line

Class 23: Chapter 14 & Nested ANOVA NOTES: NOTES: NOTES:

Aim: Normal Distribution and Bell Curve

ISyE 6414 Regression Analysis

Calculation of Trail Usage from Counter Data

1 Streaks of Successes in Sports

First Server Advantage in Tennis. Michelle Okereke

Wildlife Ad Awareness & Attitudes Survey 2015

Math SL Internal Assessment What is the relationship between free throw shooting percentage and 3 point shooting percentages?

The Impact of Narrow Lane on Safety of the Arterial Roads. Hyeonsup Lim

A statistical model of Boy Scout disc golf skills by Steve West December 17, 2006

Standardized CPUE of Indian Albacore caught by Taiwanese longliners from 1980 to 2014 with simultaneous nominal CPUE portion from observer data

Chapter 3.4. Measures of position and outliers. Julian Chan. September 11, Department of Mathematics Weber State University

Major League Baseball Offensive Production in the Designated Hitter Era (1973 Present)

Competitive Performance of Elite Olympic-Distance Triathletes: Reliability and Smallest Worthwhile Enhancement

Quantitative Literacy: Thinking Between the Lines

Assignment. To New Heights! Variance in Subjective and Random Samples. Use the table to answer Questions 2 through 7.

Math 121 Test Questions Spring 2010 Chapters 13 and 14

Descriptive Statistics. Dr. Tom Pierce Department of Psychology Radford University

Using Markov Chains to Analyze a Volleyball Rally

Lecture 16: Chapter 7, Section 2 Binomial Random Variables

STANDARDIZED CATCH RATES OF BLUEFIN TUNA, THUNNUS THYNNUS, FROM THE ROD AND REEL/HANDLINE FISHERY OFF THE NORTHEAST UNITED STATES DURING

EVENTS TRADITIONAL EVENTS UNIFIED EVENTS

Chapter 1: Why is my evil lecturer forcing me to learn statistics?

Chapter 1: Why is my evil lecturer forcing me to learn statistics?

Finding your feet: modelling the batting abilities of cricketers using Gaussian processes

Transcription:

Stat 39 Homework 3 Solutions, Spring 05 Problem. Let i Nµ, σ ) for i,..., n, and j Nµ, σ ) for j,..., n. Also, assume that all observations are independent from each other. In Unit 4, we learned that the common population variance, σ, can be estimated using a pooled sample variance: where S and S S p n )S n )S n n are the corresponding sample-specific variances. a) If and Ȳ are sample means of each sample, show that Var Ȳ ) σ n n Due to the fact that the s and s are independent of one another, and independent within each type: Var Ȳ ) Var ) VarȲ ) Var... n ) Var... n ) n n n Var... n ) n Var... n ) n [Var )... Var n )] n [Var )... Var n )] [σ n... σ ] [σ n... σ ] [n n σ ] [n n σ ] σ σ n n σ ) n n b) Show that S p is an unbiased estimator of σ. Hint: Use the definition of S p and the linearity of expectation. E Sp) n )S E n )S ) n n n n n E S ) n n n E S ) n n n n σ n n σ n ) n ) σ σ n n c) Using the definition of χ n distribution by representation sum of independent squared standard Normals) and the fact that sample-specific sample variances have the following sampling distributions:, ). show that S σ n χ n, S σ n χ n, S p σ n n χ n n.

Sp n )S n )S n S n S n n n n n n n σ n σ n n n χ n n n n χ n σ ) χ σ ) n n n χ n n n σ ) χ n n n χ ) n σ ) χ n n n n We were able to combine the two separate χ distributions into one becuase they were independent, and so the squared standard Normals they are comprised of are independent, and thus we have a sum of n n independent standard Normal r.v.s squared, which is the definition of a χ n n r.v. d) Derive the sampling distribution of the following statistic using representation), µ S p / n. Later we will learn that this statistic may be used in a one-sample test if two or more groups are observed, but only one of them is of interest. µ S p / n µ )/σ S p /σ) n µ σ/ n S p σ µ σ/ n S p n n ) σ n n ) Z T χ n n /n n n n ) The definition of a T r.v. is a standard Normal r.v. dividied by the square root of an independent χ r.v. divided by its degrees of freedom), and thus the result is a T r.v. with the same number of degrees of freedom as the χ r.v. Problem. Using the BOSsnowfall.csv data from problem HW #: a) Conduct an appropriate t-test using α 0.05 to determine whether totalsnow differs between winters where the temperature was at or below the 3rd quartile vs. above the 3rd quartile without transforming totalsnow). Please use R to make your life easy. Here are the results of the unpooled -sample t-test in R: > t.testtotalsnow.high,totalsnow.low)

Welch Two Sample t-test data: totalsnow.high and totalsnow.low t -.397, df 46.07, p-value 0.0998 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -8.3868-0.973 sample estimates: mean of x mean of y 35.40435 45.009 H 0 : µ µ vs. H A : µ µ T.40 p value 0.0998 Since our estimated p-value 0.0998 which is two-sided, is less than α 0.05, we have enough evidence to conclude that the true average amount of snowfall is different in warmer winters upper quartile) than in other winters. b) Compare your results to that of the permutation test from HW #. Did you come to the same conclusion? Why or why not? The conclusion to the tests are the same: we reject the null hypothesis and see evidence of a difference in the means in these two groups. But the p-values are slightly different it was about 0.043 from the permutation test). This is likely due to the sampling distribution of the t-test statistic may not be truly t-distributed since the assumptions may be violated...most notably, these data are right-skewed). Due to this fact, one should trust the results of the permutation test over the t-test. [Also note: the p-value from the permutation test was random from a simulated reference distribution, so that could be the reason for a slightly different p-value]. Problem 3. Does it matter what kind of tee a golfer places the ball on? The company that manufactures Stinger tees claims that the small head of their tee will lessen resistance and reduce spin, which should allow the ball to travel farther. They conducted a study that compared the distance traveled by golf balls hit off regular wooden tees to those hit off Stinger tees. Identical balls were struck by the same golf club using a robotic device set to swing the club head at 95 miles per hour. The study results are shown below. Sample Size Average yards) SD yards) Stinger Tees 0 4 9 Regular Tees 0 7 5 a) Is there a difference in average ball flight between these two types of golf tees? appropriate t-test using α 0.05. H 0 : µ µ vs. H A : µ µ Conduct an 3

T ) µ µ ) S n S n p value 0.03 4 7) 0 9 0 5 0.53 Since our estimated p-value 0.03, which is two-sided, is less than α 0.05, we have enough evidence to conclude that the true average driving distance is different off the two tees. If the order of which tee was struck off of first was randomly assigned, then a causal relationship seems plausible. This may generalize to robots though technically it may not since it was not a random sample of robots or a random sample of golf balls), but certainly may not generalize to humans and all types of golf balls. b) Provide justification for the selection of which t-test you decided to use in part a). The unpooled test was chosen since the ratio of variances was greater than : 5 /9.778. c) Calculate a 95% confidence interval for the difference in average ball flight distances between these two types of golf tees. ) ± t df9 S S 9 4 7) ±.6 n n 0 5 0.49, 6.5) d) How is the test of hypotheses result in part a) consistent with the interval in part c)? The confidence interval and hypothesis test are consistent since the confidence interval does not contain the value zero for a plausible differece in population means µ µ ), and the hypothesis test rejected a null value difference of zero as well. Problem 4. The same company decided to also run the experiment on human subjects. Each person was asked to strike drive ) a golf ball off of the Stinger tee and also off a regular tee, with the order decided by a coin flip. Results from the 5 individuals are shown below: Sample Size Average yards) SD yards) Stinger Tees 5 09. 34 Regular Tees 5 04.8 3 Difference 5 4.4 0 a) Is there a difference in average ball flight between these two types of golf tees based on this experiment? Conduct an appropriate t-test using α 0.05. H 0 : µ diff 0 vs. H A : µ diff 0 T diff 0 n S 4.4 0 0. 5 p value 0.8 4

Since our estimated p-value 0.8, which is two-sided, is great than α 0.05, we do not have enough evidence to conclude that the true average driving distance is different off the two teesl the two types of tees may truly lead to the same distance in ball flights when humans hit off them. b) Calculate a 95% confidence interval for the difference in average ball flight distances between these two types of golf tees for this experiment. diff ± t S df4 4.4 ±.064 0 3.86,.66) n 5 c) Based on the results in problems and 3, write a one-paragraph summary 3-5 sentences) describing the results of the statistical analyses performed in determining whether Stinger tees really do affect the distance a golf ball travels. Please write in terms that someone who may have taken a statistics class a few years ago can understand. In #, we found a significant difference in the average ball flight distances of Stinger tees compared to regular tees, based on tests by an automated machine. In #3, we found no significant difference in the average differences, based on tests by human subjects. The two analyses may differ for various reasons. The automated machine is more precise than the humans smaller SD), since its swing should be nearly equivalent every time. Each person only got one swing on each tee, so there could be a lot of random noise in their measurements that decreases the ability of the hypothesis test to discern any differences in ball flight distance. Humans also might inherently swing differently from the machine, meaning that the increased flight distance using the machine may not be applicable for human players whatsoever. 5