A few things to remember about ANOVA 1) The F-test that is performed is always 1-tailed. This is because your alternative hypothesis is always that the between group variation is greater than the within group variation (it s impossible for it to be smaller). 2) Interactions: these are hard to conceptualize; best example is a synergistic or antagonistic drug-drug interaction. 3) An ANOVA technically cannot tell you where the differences are (although often they are quite obvious). Always best to follow-up with a posthoc test. 4) Biggest problems encoutered with ANOVAs in Excel is setting up the rows and columns correctly.
Chemometrics Lecture 2.3: ANOVA Jonathan Benskin
Learning Objectives Understand when and why to use ANOVA (including assumptions). Understand difference between one-way vs two-way ANOVA (with and without replication). Be able to calculate one-way ANOVA by hand and using Excel s DataAnalysis add-on. Be aware of the various post-hoc tests which follow ANOVA.
Introduction Analysis of variance (ANOVA) is used to test the hypothesis that there is no difference between two or more group means. Recall that in the case of two group means, we use the two-sample t- test (shown in this case for equal variances): t calc
Why can t we compare multiple groups with the t-test? Every time you conduct a t-test there is a chance of making a Type I error (5% assuming α=0.05). Conducting multiple t-tests can lead to an increase in in the I error. An ANOVA controls for these errors so the Type I error remains at 5%.
Overview of ANOVA ANOVAs compare the variation between groups versus the variation within groups to assess whether there are differences in the means. Important: An ANOVA cannot tell you which groups are signficantly different, only that a signficant difference exists.
One-way ANOVA-assumptions Assumptions: Dependent variable should be continuous; independent variable should consist of 2 or more independent groups. All observations should be independent, groups should have equal variance (homoscedasticity; rule of thumb: ratio of largest to smallest sample st. dev. should be less than 2:1) and be approximately normally distributed.
Hypotheses one-way ANOVA H0 : μ1 μ2 μ3 μc All population means are equal H 1 : Not allof the populationmeans are the same At least one population mean is different. Does not mean that all population means are different.
One-way ANOVA 1) start by determining the variation within and between each group. BETWEEN groups variation for each data value look at the difference between its group mean and the overall mean Mean for group i x x 2 i Mean for entire dataset WITHIN groups variation for each data value we look at the difference between that value and the mean of its group Mean for individual j in group i x 2 ij x i Mean for group i
One-way ANOVA 2) Determine the sum of squares Total variance Degrees of freedom total SST SSW obs obs ( x ij ( x ij x) x i 2 ) 2 s 2 ( DFT ) groups s 2 i ( df i Sum of squares TOTAL Variance of group i Degrees of freedom group i ) Sum of squares WITHIN groups SSB groups n i ( x i x) 2 Sum of squares BETWEEN groups
One-way ANOVA 2) Determine the sum of squares SST SSW obs obs 2 2 ( x x) s ( DFT ) (DF = total # observations 1) ij ( x ij x i ) 2 groups s 2 i ( df Degrees of freedom assuming equal group sizes i ) (DF = total # observations # groups) SSB groups n i ( x i x) 2 (DF = total # groups 1) Note that: SST = SSW + SSB
One-way ANOVA 3) Calculate mean squares Mean square between Mean square within MSB SSB DFB MSW SSW DFW Degrees of freedom within The ANOVA F-statistic is a ratio of the Between Group Variation divided by the Within Group Variation: F calc Between Within MSB MSW A large F calc is evidence against H 0, since it indicates that there is more difference between groups than within groups. F crit (which we get from F-tables) is determined using DF between and DF within. If F calc >F crit, the null hypothesis is rejected.
One-way ANOVA Note that the equations are different when dealing with different sample sizes. MSW = 2 s w = within groups mean square = within groups variance MSB = 2 s b = between groups mean square = between groups variance Note: Excel and Unscrambler automatically adjust the equations depending on whether you are doing a one-way ANOVA with equivalent or different sample sizes.
Example 1-Excel Workbook One-way ANOVA The following data shows serum uric acid levels for 3 populations. Test whether the means are signficantly different. Group 1 Group 2 Group 3 1.2 1.7 1.3 0.8 1.5 1.5 1.1 2.0 1.4 0.7 2.1 1.0 0.9 1.1 1.8 1.1 0.9 1.4 1.5 2.2 1.9 0.8 1.8 0.9 1.6 1.3 1.9 0.9 1.5 1.8
Example 1-Excel Workbook One-way ANOVA 1. The hypothesis: H 0 : µ 1 =µ 2 =µ 3 vs. H 1 : µ 1 µ 2 µ 3 2. The assumptions: Independent random samples, 2 2 2 normal distributions, 3. The -level : = 0.05 4. The test statistic: ANOVA 1 2 3
SSW Calculating sum of squares within (SSW) Group 1 x - mean (x-mean) 2 Group 2 x - mean (x-mean) 2 Group 3 x - mean (x-mean) 2 1.2 0.14 0.0196 1.7 0.09 0.0081 1.3-0.19 0.0361 0.8-0.26 0.0676 1.5-0.11 0.0121 1.5 0.01 0.0001 1.1 0.04 0.0016 2 0.39 0.1521 1.4-0.09 0.0081 0.7-0.36 0.1296 2.1 0.49 0.2401 1-0.49 0.2401 0.9-0.16 0.0256 1.1-0.51 0.2601 1.8 0.31 0.0961 1.1 0.04 0.0016 0.9-0.71 0.5041 1.4-0.09 0.0081 1.5 0.44 0.1936 2.2 0.59 0.3481 1.9 0.41 0.1681 0.8-0.26 0.0676 1.8 0.19 0.0361 0.9-0.59 0.3481 1.6 0.54 0.2916 1.3-0.31 0.0961 1.9 0.41 0.1681 0.9-0.16 0.0256 1.5-0.11 0.0121 1.8 0.31 0.0961 Sum 10.6-2.2E-16 0.824 16.1-5.6E-16 1.669 14.9 0 1.169 Mean 1.06-2.2E-17 0.0824 1.61-5.6E-17 0.1669 1.49 0 0.1169 2 i groups SSW = 0.824 + 1.669 + 1.169 = 3.662 2 ( x x ) s ( df ) DF = total # observations # groups = 27 obs ij i Example 1-Excel Workbook One-way ANOVA i
Example 1-Excel Workbook One-way ANOVA Calculating sum of squares total (SST) SST ( x obs SST = 5.334 ij x) 2 s 2 ( DFT DF = total # observations 1 = 29 ) Observations x - mean (x-mean) 2 1.2-0.19 0.03 0.8-0.59 0.34 1.1-0.29 0.08 0.7-0.69 0.47 0.9-0.49 0.24 1.1-0.29 0.08 1.5 0.11 0.01 0.8-0.59 0.34 1.6 0.21 0.05 0.9-0.49 0.24 1.7 0.31 0.10 1.5 0.11 0.01 2 0.61 0.38 2.1 0.71 0.51 1.1-0.29 0.08 0.9-0.49 0.24 2.2 0.81 0.66 1.8 0.41 0.17 1.3-0.09 0.01 1.5 0.11 0.01 1.3-0.09 0.01 1.5 0.11 0.01 1.4 0.01 0.00 1-0.39 0.15 1.8 0.41 0.17 1.4 0.01 0.00 1.9 0.51 0.26 0.9-0.49 0.24 1.9 0.51 0.26 1.8 0.41 0.17 Sum 41.6 5.22E-15 5.3346667 mean 1.39
Example 1-Excel Workbook One-way ANOVA Calculating sum of squares between (SSB): SST = SSW + SSB SSB ni ( xi groups x) 2 Sum of Squares Between (SSB): Mean x - mean (x-mean) 2 n(x-mean) 2 Group 1 1.06-0.33 0.11 1.067111 Group 2 1.61 0.22 0.05 0.498778 Group 3 1.49 0.10 0.01 0.106778 Sum 4.16 6.66E-16 0.17 1.67 mean 1.39 0.00 0.06 0.56 SSB = 1.67 DF = total # groups 1 = 2
Example 1-Excel Workbook One-way ANOVA Calculating mean squares MSB SSB DFB MSW SSW DFW = 1.67/2 = 0.835 = 3.66/27 = 0.135 Between MSB 0.835 F 6.16 F crit(2,27) = 3.35 Within MSW 0.135 F calc >F crit, therefore null hypothesis is rejected.
Problem 1-Excel Workbook 1) Solve Example 2 in ANOVA Excel workbook by hand. 2) Use Excel s DataAnalysis add-on and compare your result.
SST Anova: Single Factor Example 2-Excel Workbook SUMMARY Groups Count Sum Average Variance Group 1 5 20 4 5.5 Group 2 5 40 8 4.5 Group 3 5 65 13 3.5 ANOVA Source of Variation SS df MS F P-value F crit Between Groups 203.3333 2 101.6667 22.59259 8.54E-05 3.885294 Within Groups 54 12 4.5 Total 257.3333 14 SSB SSW MSB MSW F calc
Two-way ANOVA When two factors may affect the results of an experiment, two-way (twofactor) ANOVA must be used to study their effects. Most common is with replication. Two-way ANOVA is commonly used for analyzing data generated from a repeated measures study (i.e. where an observation has been made on the same individual more than once).
Two-way ANOVA-without replication We use a 2-way ANOVA without replication when there is a single observation for each combination of the nominal variables. The hypotheses are that: 1) the means of observations grouped by one factor are the same, and 2) the means of observations grouped by the other factor are the same. Example 7.3.1 (M&M): Here we are testing whether (i) the different chelating agents have significantly different efficiencies, and (ii) whether the day-to-day variation is significantly greater than the variation due to the random error of measurement.
Two-way ANOVA-without replication Anova: Two-Factor Without Replication SUMMARY Count Sum Average Variance Day 1 4 326 81.5 5.666667 Day 2 4 315 78.75 1.583333 Day 3 4 319 79.75 5.583333 A 3 246 82 7 B 3 235 78.33333 2.333333 C 3 243 81 3 D 3 236 78.66667 0.333333 ANOVA Source of Variation SS df MS F P-value F crit Rows 15.5 2 7.75 4.728814 0.058482 5.143253 Columns 28.66667 3 9.555556 5.830508 0.032756 4.757063 Error 9.833333 6 1.638889 Total 54 11 Two values of F calc Tells you whether there are differences between days Tells you whether there are differences between chelating agents.
Two-way ANOVA-with replication A two-way ANOVA with replication tests 3 null hypotheses: 1) that the means of observations grouped by one factor are the same 2) that the means of observations grouped by the other factor are the same; 3) That there is no interaction between the two factors (i.e. effects of one factor which depend on the other). Modified chelating agent example Chelating Agents Day A B C D Day 1 84 80 83 79 Day 1 82 81 82 80 Day 1 83 80 81 79 Day 2 84 79 80 79 Day 2 81 70 81 77 Day 2 81 81 84 78 Day 3 83 78 80 78 Day 3 80 78 81 77 Day 3 82 80 81 79
Two-way ANOVA-with replication Anova: Two-Factor With Replication SUMMARY A B C D Total Day 1 Count 3 3 3 3 12 Sum 249 241 246 238 974 Average 83 80.33333 82 79.33333 81.16667 Variance 1 0.333333 1 0.333333 2.69697 Day 2 Count 3 3 3 3 12 Sum 246 230 245 234 955 Average 82 76.66667 81.66667 78 79.58333 Variance 3 34.33333 4.333333 1 13.53788 Day 3 Count 3 3 3 3 12 Sum 245 236 242 234 957 Average 81.66667 78.66667 80.66667 78 79.75 Variance 2.333333 1.333333 0.333333 1 3.295455 Total Count 9 9 9 9 Sum 740 707 733 706 Average 82.22222 78.55556 81.44444 78.44444 Variance 1.944444 11.52778 1.777778 1.027778 ANOVA Source of Variation SS df MS F P-value F crit Sample 18.16667 2 9.083333 2.165563 0.136574 3.402826 Columns 102.7778 3 34.25926 8.16777 0.000639 3.008787 Interaction 11.38889 6 1.898148 0.452539 0.835998 2.508189 Within 100.6667 24 4.194444 Now we have a 3rd value for F crit, which is the interaction between days and chelating agent. Total 233 35
Problems 2 and 3-Excel Workbook 1) Solve Problem 2 using the DataAnalysis tool. 2) Open Problem 2 by hand and fill in the missing data in the yellow boxes. Compare the results to those generated by the DataAnalysis tool (i.e. results from step 1). 3) Complete problem 3.
Posthoc tests ANOVA tests whether you have an overall difference between your groups, but it does not tell you which specific groups were different. This is where posthoc tests come in. Most common are Fisher s LSD, Tukey, and Scheffe. The procedures differ in the amount and kind of adjustment to alpha provided. Scheffe: most likely to lead to type 2 errors/least likely to lead to type 1 errors. Tukey: moderate chance of type 1 and 2 errors. Fisher s LSD: Least likely to lead to type 2 errors/most likely to lead to type 1 errors.
Tukey s HSD Also known as Tukey s Range test, Tukey s test, or Tukey-Kramer method. Assuming you have already performed an ANOVA and found that there is a statistically significant difference among your groups Step 1. Select two means and note the relevant variables (Means, Mean Square Within, and number per condition/group). Step 3. Calculate Tukey's test for each mean comparison using the following equation: Check to see if Tukey's score is statistically significant with Tukey's probability/critical value table taking into account appropriate df within and number of treatments.
Review of what we covered When and why to use ANOVA (including assumptions). The difference between one-way vs two-way ANOVA (with and without replication). We calculated one-way ANOVA by hand and using Excel s DataAnalysis add-on. We reviewed various post-hoc tests which follow ANOVA.