Unit 4: Inference for numerical variables Lecture 3: ANOVA

Similar documents
Unit4: Inferencefornumericaldata 4. ANOVA. Sta Spring Duke University, Department of Statistical Science

Analysis of Variance. Copyright 2014 Pearson Education, Inc.

PLANNED ORTHOGONAL CONTRASTS

A few things to remember about ANOVA

Week 7 One-way ANOVA

ANOVA - Implementation.

Stat 139 Homework 3 Solutions, Spring 2015

Chapter 12 Practice Test

Section I: Multiple Choice Select the best answer for each problem.

Announcements. Lecture 19: Inference for SLR & Transformations. Online quiz 7 - commonly missed questions

Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA

MGB 203B Homework # LSD = 1 1

One-factor ANOVA by example

Experimental Design and Data Analysis Part 2

Factorial Analysis of Variance

Select Boxplot -> Multiple Y's (simple) and select all variable names.

AP 11.1 Notes WEB.notebook March 25, 2014

Announcements. % College graduate vs. % Hispanic in LA. % College educated vs. % Hispanic in LA. Problem Set 10 Due Wednesday.

Data Set 7: Bioerosion by Parrotfish Background volume of bites The question:

Stats 2002: Probabilities for Wins and Losses of Online Gambling

Name May 3, 2007 Math Probability and Statistics

Legendre et al Appendices and Supplements, p. 1

One-way ANOVA: round, narrow, wide

Navigate to the golf data folder and make it your working directory. Load the data by typing

Statistical Analysis of PGA Tour Skill Rankings USGA Research and Test Center June 1, 2007

NCSS Statistical Software

5.1 Introduction. Learning Objectives

Chapter 13. Factorial ANOVA. Patrick Mair 2015 Psych Factorial ANOVA 0 / 19

Guide to Computing Minitab commands used in labs (mtbcode.out)

Running head: DATA ANALYSIS AND INTERPRETATION 1

Biostatistics & SAS programming

Example 1: One Way ANOVA in MINITAB

Factorial ANOVA Problems

Confidence Interval Notes Calculating Confidence Intervals

Distancei = BrandAi + 2 BrandBi + 3 BrandCi + i

The Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Driv e accu racy. Green s in regul ation

Setting up group models Part 1 NITP, 2011

Analysis of recent swim performances at the 2013 FINA World Championship: Counsilman Center, Dept. Kinesiology, Indiana University

Major League Baseball Offensive Production in the Designated Hitter Era (1973 Present)

Building an NFL performance metric

y ) s x x )(y i (x i r = 1 n 1 s y Statistics Lecture 7 Exploring Data , y 2 ,y n (x 1 ),,(x n ),(x 2 ,y 1 How two variables vary together

Math 121 Test Questions Spring 2010 Chapters 13 and 14

1. Answer this student s question: Is a random sample of 5% of the students at my school large enough, or should I use 10%?

Safety at Intersections in Oregon A Preliminary Update of Statewide Intersection Crash Rates

Confidence Intervals with proportions

Lab 11: Introduction to Linear Regression

One Way ANOVA (Analysis of Variance)

CHAPTER ANALYSIS AND INTERPRETATION Average total number of collisions for a try to be scored

Lesson 22: Average Rate of Change

Chapter 10 Gases. Characteristics of Gases. Pressure. The Gas Laws. The Ideal-Gas Equation. Applications of the Ideal-Gas Equation

Is lung capacity affected by smoking, sport, height or gender. Table of contents

Warm-up. Make a bar graph to display these data. What additional information do you need to make a pie chart?

Psychology - Mr. Callaway/Mundy s Mill HS Unit Research Methods - Statistics

Lesson 4: Describing the Center of a Distribution

BBS Fall Conference, 16 September Use of modeling & simulation to support the design and analysis of a new dose and regimen finding study

1. In a hypothesis test involving two-samples, the hypothesized difference in means must be 0. True. False

Empirical Example II of Chapter 7

PGA Tour Scores as a Gaussian Random Variable

The Reliability of Intrinsic Batted Ball Statistics Appendix

STANDARD SCORES AND THE NORMAL DISTRIBUTION

Statistical Theory, Conditional Probability Vaibhav Walvekar

Taking Your Class for a Walk, Randomly

Chapter 1: Why is my evil lecturer forcing me to learn statistics?

Chapter 1: Why is my evil lecturer forcing me to learn statistics?

Measuring Relative Achievements: Percentile rank and Percentile point

C R I TFC. Columbia River Inter-Tribal Fish Commission

Chapter 5: Methods and Philosophy of Statistical Process Control

Hitting with Runners in Scoring Position

DOCUMENT RESUME. A Comparison of Type I Error Rates of Alpha-Max with Established Multiple Comparison Procedures. PUB DATE NOTE

Political Science 30: Political Inquiry Section 5

POTENTIAL ENERGY BOUNCE BALL LAB

Case Processing Summary. Cases Valid Missing Total N Percent N Percent N Percent % 0 0.0% % % 0 0.0%

ISyE 6414 Regression Analysis

How Effective is Change of Pace Bowling in Cricket?

WATER OIL RELATIVE PERMEABILITY COMPARATIVE STUDY: STEADY VERSUS UNSTEADY STATE

A Statistical Analysis of the Factors that Potentially Affect the Price of A Horse

CHAPTER 2 Modeling Distributions of Data

A COMPARATIVE STUDY OF PHYSICAL DIFFERENCES BETWEEN ATHLETES OF SELECTED EVENTS IN TRACK AND FIELD

GENETICS OF RACING PERFORMANCE IN THE AMERICAN QUARTER HORSE: II. ADJUSTMENT FACTORS AND CONTEMPORARY GROUPS 1'2

Comparison of recruitment tile materials for monitoring coralline algae responses to a changing climate

COMPLETING THE RESULTS OF THE 2013 BOSTON MARATHON

Class 23: Chapter 14 & Nested ANOVA NOTES: NOTES: NOTES:

Midterm Exam 1, section 2. Thursday, September hour, 15 minutes

Lecture 16: Chapter 7, Section 2 Binomial Random Variables

Averages. October 19, Discussion item: When we talk about an average, what exactly do we mean? When are they useful?

Finding your feet: modelling the batting abilities of cricketers using Gaussian processes

Citation for published version (APA): Canudas Romo, V. (2003). Decomposition Methods in Demography Groningen: s.n.

Sample Final Exam MAT 128/SOC 251, Spring 2018

PradiptaArdiPrastowo Sport Science. SebelasMaret University. Indonesia

Transportation Research Forum

Youngs Creek Hydroelectric Project

Modelling and Simulation of Environmental Disturbances

CHAPTER 10 TOTAL RECREATIONAL FISHING DAMAGES AND CONCLUSIONS

Multilevel Models for Other Non-Normal Outcomes in Mplus v. 7.11

An Empirical Comparison of Regression Analysis Strategies with Discrete Ordinal Variables

Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Lesson 14: Modeling Relationships with a Line

Novel empirical correlations for estimation of bubble point pressure, saturated viscosity and gas solubility of crude oils

Competitive Performance of Elite Olympic-Distance Triathletes: Reliability and Smallest Worthwhile Enhancement

Transcription:

Unit 4: Inference for numerical variables Lecture 3: ANOVA Statistics 101 Thomas Leininger June 10, 2013

Announcements Announcements Proposals due tomorrow. Will be returned to you by Wednesday. You MUST complete the proposal process. A few things to watch out for: Data is plural, data set is singular. Avoid using population data - if you have population data, you might consider taking a random sample. Exploratory analysis: should include some summary statistics and some graphics AND interpretations. If using existing data, find out how your data were collected, and discuss the sampling method as well as any possible biases. Scope of inference: generalizability & causality. Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 2 / 34

Aldrin in the Wolf River The Wolf River in Tennessee flows past an abandoned site once used by the pesticide industry for dumping wastes, including chlordane (pesticide), aldrin, and dieldrin (both insecticides). These highly toxic organic compounds can cause various cancers and birth defects. The standard methods to test whether these substances are present in a river is to take samples at six-tenths depth. But since these compounds are denser than water and their molecules tend to stick to particles of sediment, they are more likely to be found in higher concentrations near the bottom than near mid-depth. Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 3 / 34

Aldrin in the Wolf River Data Aldrin concentration (nanograms per liter) at three levels of depth. aldrin depth 1 3.80 bottom 2 4.80 bottom... 10 8.80 bottom 11 3.20 middepth 12 3.80 middepth... 20 6.60 middepth 21 3.10 surface 22 3.60 surface... 30 5.20 surface Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 4 / 34

Aldrin in the Wolf River Exploratory analysis Aldrin concentration (nanograms per liter) at three levels of depth. surface middepth bottom 3 4 5 6 7 8 9 n mean sd bottom 10 6.04 1.58 middepth 10 5.05 1.10 surface 10 4.20 0.66 overall 30 5.1 0 1.37 Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 5 / 34

Aldrin in the Wolf River Research question Is there a difference between the mean aldrin concentrations among the three levels? To compare means of 2 groups we use a Z or a T statistic. To compare means of 3+ groups we use a new test called ANOVA and a new statistic called F. Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 6 / 34

Aldrin in the Wolf River Recap: 2-sample CIs and HTs n mean sd bottom 10 6.04 1.58 middepth 10 5.05 1.10 surface 10 4.20 0.66 overall 30 5.1 0 1.37 HT: T df = ( x 1 x 2 ) null value SE where SE = df = min(n 1 1, n 2 1) CI: ( x 1 x 2 ) ± t df SE s 2 1 n 1 + s2 2 n 2 Application exercise: Perform a HT and construct a CI for each difference. and Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 7 / 34

Aldrin in the Wolf River ANOVA ANOVA is used to assess whether the mean of the outcome variable is different for different levels of a categorical variable. H 0 : The mean outcome is the same across all categories, µ 1 = µ 2 = = µ k, where µ i represents the mean of the outcome for observations in category i. H A : At least one mean is different than others. Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 8 / 34

Aldrin in the Wolf River Conditions 1 The observations should be independent within and between groups If the data are a simple random sample, this condition is satisfied. Carefully consider whether the between-group data is independent (e.g. no pairing). Always important, but sometimes difficult to check. 2 The observations within each group should be nearly normal. Especially important when the sample sizes are small. How do we check for normality? 3 The variability across the groups should be about equal. Especially important when the sample sizes differ between groups. How can we check this condition? Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 9 / 34

Aldrin in the Wolf River z/t test vs. ANOVA - Purpose z/t test ANOVA Compare means from two groups to see whether they are so far apart that the observed difference cannot reasonably be attributed to sampling variability. H 0 : µ 1 = µ 2 H A : µ 1 µ 2 H A : µ 1 < µ 2 Compare the means from two or more groups to see whether they are so far apart that the observed differences cannot all reasonably be attributed to sampling variability. H 0 : µ 1 = µ 2 = = µ k H A : At least one mean is different H A : µ 1 > µ 2 Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 10 / 34

Aldrin in the Wolf River z/t test vs. ANOVA - Method z/t test Compute a test statistic (a ratio). z/t = ( x 1 x 2 ) (µ 1 µ 2 ) SE( x 1 x 2 ) ANOVA Compute a test statistic (a ratio). F = variability bet. groups variability w/in groups Large test statistics lead to small p-values. If the p-value is small enough H 0 is rejected, and we conclude that the population means are not equal. Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 11 / 34

Aldrin in the Wolf River z/t test vs. ANOVA With only two groups t-test and ANOVA are equivalent, but only if we use a pooled standard variance in the denominator of the test statistic. With more than two groups, ANOVA compares the sample means to an overall grand mean. Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 12 / 34

Aldrin in the Wolf River Hypotheses Question What are the correct hypotheses for testing for a difference between the mean aldrin concentrations among the three levels? (a) H 0 : µ B = µ M = µ S H A : µ B µ M µ S (b) H 0 : µ B µ M µ S H A : µ B = µ M = µ S (c) H 0 : µ B = µ M = µ S H A : At least one mean is different. (d) H 0 : µ B = µ M = µ S = 0 H A : At least one mean is different. (e) H 0 : µ B = µ M = µ S H A : µ B > µ M > µ S Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 13 / 34

ANOVA and the F test Test statistic Does there appear to be a lot of variability within groups? How about between groups? F = variability bet. groups variability w/in groups Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 14 / 34

ANOVA and the F test F distribution and p-value F = variability bet. groups variability w/in groups In order to be able to reject H 0, we need a small p-value, which requires a large F statistic. In order to obtain a large F statistic, variability between sample means needs to be greater than variability within sample means. Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 15 / 34

ANOVA output, deconstructed Df Sum Sq Mean Sq F value Pr(>F) (Group) depth 2 16.96 8.48 6.13 0.0063 (Error) Residuals 27 37.33 1.38 Total 29 54.29 Degrees of freedom associated with ANOVA groups: df G = k 1, where k is the number of groups total: df T = n 1, where n is the total sample size error: df E = df T df G df G = k 1 = 3 1 = 2 df T = n 1 = 30 1 = 29 df E = 29 2 = 27 Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 16 / 34

ANOVA output, deconstructed Df Sum Sq Mean Sq F value Pr(>F) (Group) depth 2 16.96 8.48 6.13 0.0063 (Error) Residuals 27 37.33 1.38 Total 29 54.29 Sum of squares between groups, SSG Measures the variability between groups k SSG = n i ( x i x) 2 i=1 where n i is each group size, x i is the average for each group, x is the overall (grand) mean. n mean bottom 10 6.04 middepth 10 5.05 surface 10 4.2 overall 30 5.1 SSG = ( 10 (6.04 5.1) 2) + ( 10 (5.05 5.1) 2) + ( 10 (4.2 5.1) 2) = 16.96 Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 17 / 34

ANOVA output, deconstructed Df Sum Sq Mean Sq F value Pr(>F) (Group) depth 2 16.96 8.48 6.13 0.0063 (Error) Residuals 27 37.33 1.38 Total 29 54.29 Sum of squares total, SST Measures the total variability n SST = (x i x) i=1 where x i represents each observation in the dataset. SST = (3.8 5.1) 2 + (4.8 5.1) 2 + (4.9 5.1) 2 + + (5.2 5.1) 2 = ( 1.3) 2 + ( 0.3) 2 + ( 0.2) 2 + + (0.1) 2 = 1.69 + 0.09 + 0.04 + + 0.01 = 54.29 Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 18 / 34

ANOVA output, deconstructed Df Sum Sq Mean Sq F value Pr(>F) (Group) depth 2 16.96 8.48 6.13 0.0063 (Error) Residuals 27 37.33 1.38 Total 29 54.29 Sum of squares error, SSE Measures the variability within groups: SSE = SST SSG SSE = 54.29 16.96 = 37.33 Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 19 / 34

ANOVA output, deconstructed Df Sum Sq Mean Sq F value Pr(>F) (Group) depth 2 16.96 8.48 6.13 0.0063 (Error) Residuals 27 37.33 1.38 Total 29 54.29 Mean square error Mean square error is calculated as sum of squares divided by the degrees of freedom. MSG = 16.96/2 = 8.48 MSE = 37.33/27 = 1.38 Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 20 / 34

ANOVA output, deconstructed Df Sum Sq Mean Sq F value Pr(>F) (Group) depth 2 16.96 8.48 6.14 0.0063 (Error) Residuals 27 37.33 1.38 Total 29 54.29 Test statistic, F value The F statistic is the ratio of the between group and within group variability. F = MSG MSE F = 8.48 1.38 = 6.14 Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 21 / 34

ANOVA output, deconstructed p-value Df Sum Sq Mean Sq F value Pr(>F) (Group) depth 2 16.96 8.48 6.14 0.0063 (Error) Residuals 27 37.33 1.38 Total 29 54.29 p-value is the probability of at least as large a ratio between the between group and within group variability, if in fact the means of all groups are equal. It s calculated as the area under the F curve, with degrees of freedom df G and df E, above the observed F statistic. dfg = 2 ; dfe = 27 0 6.14 Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 22 / 34

ANOVA output, deconstructed Conclusion If p-value is small (less than α), reject H 0. The data provide convincing evidence that at least one mean is different from (but we can t tell which one). If p-value is large, fail to reject H 0. The data do not provide convincing evidence that at least one pair of means are different from each other, the observed differences in sample means are attributable to sampling variability (or chance). Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 23 / 34

ANOVA output, deconstructed Conclusion - in context Question What is the conclusion of the hypothesis test for α = 0.05? (p-value = 0.0063) The data provide convincing evidence that the average aldrin concentration (a) is different for all groups. (b) on the surface is lower than the other levels. (c) is different for at least one group. (d) is the same for all groups. Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 24 / 34

Checking conditions (1) independence Does this condition appear to be satisfied? Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 25 / 34

Checking conditions (2) approximately normal Does this condition appear to be satisfied? 9 8 bottom 6.5 6.0 middepth 5.0 surface 7 6 5.5 5.0 4.5 4.5 4.0 5 4 4.0 3.5 3.5 1.5 1.0 0.5 0.0 0.5 1.0 1.5 1.5 1.0 0.5 0.0 0.5 1.0 1.5 1.5 1.0 0.5 0.0 0.5 1.0 1.5 3 2 4 2 3 1 2 1 1 0 3 5 7 9 0 3 5 7 0 2.5 4.0 5.5 Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 26 / 34

Checking conditions (3) constant variance Does this condition appear to be satisfied? 9 8 7 6 5 4 3 bottom sd=1.58 middepth sd=1.10 surface sd=0.66 Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 27 / 34

Multiple comparisons & Type 1 error rate Which means differ? Earlier we concluded that at least one pair of means differ. The natural question that follows is which ones? We can do two sample t tests for differences in each possible pair of groups. Can you see any pitfalls with this approach? When we run too many tests, the Type 1 Error rate increases. This issue is resolved by using a modified significance level. Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 28 / 34

Multiple comparisons & Type 1 error rate Multiple comparisons The scenario of testing many pairs of groups is called multiple comparisons. The Bonferroni correction suggests that a more stringent significance level is more appropriate for these tests: α = α/k where K is the number of comparisons being considered. If there are k groups, then usually all possible pairs are compared and K = k(k 1) 2. Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 29 / 34

Multiple comparisons & Type 1 error rate Determining the modified α Question In the aldrin data set depth has 3 levels: bottom, mid-depth, and surface. If α = 0.05, what should be the modified significance level for two sample t tests for determining which pairs of groups have significantly different means? (a) α = 0.05 (b) α = 0.05/2 = 0.025 (c) α = 0.05/3 = 0.0167 (d) α = 0.05/6 = 0.0083 Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 30 / 34

Multiple comparisons & Type 1 error rate Which means differ? Question Based on the box plots below, which means would you expect to be significantly different? 9 (a) bottom & surface 8 (b) bottom & mid-depth 7 (c) mid-depth & surface 6 5 (d) bottom & mid-depth; mid-depth & surface 4 3 bottom sd=1.58 middepth sd=1.10 surface sd=0.66 (e) bottom & mid-depth; bottom & surface; mid-depth & surface Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 31 / 34

Multiple comparisons & Type 1 error rate Which means differ? (cont.) If the ANOVA assumption of equal variability across groups is satisfied, we can use the data from all groups to estimate variability: Estimate any within-group standard deviation with MSE, which is s pooled Use the error degrees of freedom, n k, for t-distributions Difference in two means: after ANOVA σ 2 1 SE = + σ2 2 MSE n 1 n 2 n 1 + MSE n 2 Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 32 / 34

Multiple comparisons & Type 1 error rate Is there a difference between the average aldrin concentration at the bottom and at mid depth? n mean sd bottom 10 6.04 1.58 middepth 10 5.05 1.10 surface 10 4.2 0.66 overall 30 5.1 1.37 Df Sum Sq Mean Sq F value Pr(>F) depth 2 16.96 8.48 6.13 0.0063 Residuals 27 37.33 1.38 Total 29 54.29 T dfe = ( x bottom x middepth ) MSE n bottom + MSE n middepth T 27 = (6.04 5.05) 1.38 10 + 1.38 10 = 0.99 0.53 = 1.87 0.05 < p value < 0.10 (two-sided) α = 0.05/3 = 0.0167 Fail to reject H 0, the data do not provide convincing evidence of a difference between the average aldrin concentrations at bottom and mid depth. Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 33 / 34

Multiple comparisons & Type 1 error rate Application exercise: Post-hoc comparison Is there evidence of a difference between the average aldrin concentration at the bottom and at surface? (a) yes (b) no Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 34 / 34