Week 7 One-way ANOVA

Similar documents
A few things to remember about ANOVA

Analysis of Variance. Copyright 2014 Pearson Education, Inc.

Unit 4: Inference for numerical variables Lecture 3: ANOVA

Unit4: Inferencefornumericaldata 4. ANOVA. Sta Spring Duke University, Department of Statistical Science

MGB 203B Homework # LSD = 1 1

Factorial Analysis of Variance

AP 11.1 Notes WEB.notebook March 25, 2014

Practice Test Unit 6B/11A/11B: Probability and Logic

PLANNED ORTHOGONAL CONTRASTS

STANDARD SCORES AND THE NORMAL DISTRIBUTION

Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA

Select Boxplot -> Multiple Y's (simple) and select all variable names.

Psychology - Mr. Callaway/Mundy s Mill HS Unit Research Methods - Statistics

Stat 139 Homework 3 Solutions, Spring 2015

Stats 2002: Probabilities for Wins and Losses of Online Gambling

Chapter 1: Why is my evil lecturer forcing me to learn statistics?

Chapter 1: Why is my evil lecturer forcing me to learn statistics?

One-way ANOVA: round, narrow, wide

Legendre et al Appendices and Supplements, p. 1

Experimental Design and Data Analysis Part 2

Practice Test Unit 06B 11A: Probability, Permutations and Combinations. Practice Test Unit 11B: Data Analysis

3.3 - Measures of Position

Biostatistics & SAS programming

Name May 3, 2007 Math Probability and Statistics

How Fast Can You Throw?

One-factor ANOVA by example

Running head: DATA ANALYSIS AND INTERPRETATION 1

Announcements. Lecture 19: Inference for SLR & Transformations. Online quiz 7 - commonly missed questions

Warm-up. Make a bar graph to display these data. What additional information do you need to make a pie chart?

Sample Final Exam MAT 128/SOC 251, Spring 2018

National Curriculum Statement: Determine quartiles and interquartile range (ACMSP248).

Data Set 7: Bioerosion by Parrotfish Background volume of bites The question:

Reminders. Homework scores will be up by tomorrow morning. Please me and the TAs with any grading questions by tomorrow at 5pm

Full file at

ANOVA - Implementation.

9.3 Histograms and Box Plots

One Way ANOVA (Analysis of Variance)

Effective Use of Box Charts

Setting up group models Part 1 NITP, 2011

Statistical Analysis of PGA Tour Skill Rankings USGA Research and Test Center June 1, 2007

Fundamentals of Machine Learning for Predictive Data Analytics

Year 10 Term 2 Homework

Lower Columbia River Dam Fish Ladder Passage Times, Eric Johnson and Christopher Peery University of Idaho

Driv e accu racy. Green s in regul ation

Solutionbank S1 Edexcel AS and A Level Modular Mathematics

Chapter 12 Practice Test

Example 1: One Way ANOVA in MINITAB

Unit 6 Day 2 Notes Central Tendency from a Histogram; Box Plots

Algebra 1 Unit 7 Day 2 DP Box and Whisker Plots.notebook April 10, Algebra I 04/10/18 Aim: How Do We Create Box and Whisker Plots?

5.1 Introduction. Learning Objectives

Safety at Intersections in Oregon A Preliminary Update of Statewide Intersection Crash Rates

Midterm Exam 1, section 2. Thursday, September hour, 15 minutes

Statistical Studies: Analyzing Data III.B Student Activity Sheet 6: Analyzing Graphical Displays

Statistical Studies: Analyzing Data III.B Student Activity Sheet 6: Analyzing Graphical Displays

Chapter 7. Comparing Two Population Means. Comparing two population means. T-tests: Independent samples and paired variables.

In the actual exam, you will be given more space to work each problem, so work these problems on separate sheets.

BIOL 101L: Principles of Biology Laboratory

Lesson 3 Pre-Visit Teams & Players by the Numbers

Section I: Multiple Choice Select the best answer for each problem.

Age of Fans

CHAPTER ANALYSIS AND INTERPRETATION Average total number of collisions for a try to be scored

Bivariate Data. Frequency Table Line Plot Box and Whisker Plot

CHAPTER 1 ORGANIZATION OF DATA SETS

March Madness Basketball Tournament

FUNCTIONAL SKILLS MATHEMATICS (level 1)

box and whisker plot 3880C798CA037B A83B07E6C4 Box And Whisker Plot 1 / 6

Announcements. % College graduate vs. % Hispanic in LA. % College educated vs. % Hispanic in LA. Problem Set 10 Due Wednesday.

Class 23: Chapter 14 & Nested ANOVA NOTES: NOTES: NOTES:

Quantitative Literacy: Thinking Between the Lines

1 Hypothesis Testing for Comparing Population Parameters

Descriptive Statistics Project Is there a home field advantage in major league baseball?

March Madness Basketball Tournament

PSY201: Chapter 5: The Normal Curve and Standard Scores

Box-and-Whisker Plots

Is lung capacity affected by smoking, sport, height or gender. Table of contents

1. The data below gives the eye colors of 20 students in a Statistics class. Make a frequency table for the data.

(c) The hospital decided to collect the data from the first 50 patients admitted on July 4, 2010.

WATER OIL RELATIVE PERMEABILITY COMPARATIVE STUDY: STEADY VERSUS UNSTEADY STATE

Lesson 14: Modeling Relationships with a Line

1. Answer this student s question: Is a random sample of 5% of the students at my school large enough, or should I use 10%?

Section 3.2: Measures of Variability

Fruit Fly Exercise 1- Level 1

Assignment. To New Heights! Variance in Subjective and Random Samples. Use the table to answer Questions 2 through 7.

Navigate to the golf data folder and make it your working directory. Load the data by typing

How Effective is Change of Pace Bowling in Cricket?

Measuring Relative Achievements: Percentile rank and Percentile point

Chapter 3.4. Measures of position and outliers. Julian Chan. September 11, Department of Mathematics Weber State University

Empirical Example II of Chapter 7

DOCUMENT RESUME. A Comparison of Type I Error Rates of Alpha-Max with Established Multiple Comparison Procedures. PUB DATE NOTE

SCTB16 Working Paper SWG 6

ASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era. Gary Evans Stat 201B Winter, 2010

% per year Age (years)

The pth percentile of a distribution is the value with p percent of the observations less than it.

Mrs. Daniel- AP Stats Ch. 2 MC Practice

USING DELTA-GAMMA GENERALIZED LINEAR MODELS TO STANDARDIZE CATCH RATES OF YELLOWFIN TUNA CAUGHT BY BRAZILIAN BAIT-BOATS

Pressured Applied by the Emergency/Israeli Bandage

Factorial ANOVA Problems

Chapter 13. Factorial ANOVA. Patrick Mair 2015 Psych Factorial ANOVA 0 / 19

DISMAS Evaluation: Dr. Elizabeth C. McMullan. Grambling State University

Taking Your Class for a Walk, Randomly

Transcription:

Week 7 One-way ANOVA

Objectives By the end of this lecture, you should be able to: Understand the shortcomings of comparing multiple means as pairs of hypotheses. Understand the steps of the ANOVA method and the method s advantages. Compare the means of three or more populations using the ANOVA method.

The Logic and the Process of Analysis of Variance Suppose a salesperson wants to compare the level of satisfaction of customers for four different insurance companies. Our question is: "Is there a difference in satisfaction scores across the four insurance companies?

The Logic and the Process of Analysis of Variance The purpose of ANOVA is much the same as the t tests presented before: the goal is to determine whether the mean differences that are obtained for sample data are sufficiently large to justify a conclusion that there are mean differences between the populations from which the samples were obtained.

The Logic and the Process of Analysis of Variance (cont.) The difference between ANOVA and the t tests is that ANOVA can be used in situations where there are two or more means being compared, whereas the t tests are limited to situations where only two means are involved. Analysis of variance is necessary to protect researchers from excessive risk of a Type I error in situations where a study is comparing more than two population means.??

Shortcomings of Comparing Multiple Means Using Multiple t-tests We could just run six different independent samples t-tests (company 1 vs. company 2; company 1 vs. company 3; company 1 vs. company 4; company 2 vs. company 3; company 2 vs. company 4; and company 3 vs. company 4). This would be tedious, but we could use a computer to compute these quickly and easily.

Shortcomings of Comparing Multiple Means Using Multiple t-tests It turns out this is a very bad idea, and has a major flaw: When more than one t-test is run, each at its own level of significance, the probability of making one or more Type I errors multiplies exponentially. Recall that a Type I error occurs when we reject the null hypothesis when we should not. The level of significance,, is the probability of a Type I error in a single test. So, for a single t-test in our example, with an of 0.05, we have a Type I error probability of 5%. When testing more than one pair of samples, the probability of making at least one Type I error is: Where c is the number of t-tests

The Logic and the Process of Analysis of Variance (cont.) ANOVA allows researcher to evaluate all of the mean differences in a single hypothesis test using a single α-level and, thereby, keeps the risk of a Type I error under control no matter how many different means are being compared. Although ANOVA can be used in a variety of different research situations, we will cover only independent-measures designs involving only one independent variable (one-way ANOVA).

To apply one-way ANOVA: 1. All observations are independent of one another and randomly selected from the population which they represent. 2. The population at each value of the categorical variable (factor level ) is approximately normal. 3. The variances for each factor level are approximately equal to one another.

Steps of ANOVA To apply the ANOVA method to the insurance companies, we are actually analyzing the total variation of the scores, including the variation of the scores within the groups and the variation between the group means. Since we are interested in two different types of variation, we first calculate each type of variation independently and then calculate the ratio between the two called an F-value. Just like our z-score, t-test, and chi-square tests, ANOVA has its own distribution that we need to use, called an F-distribution to set our critical values and test our hypothesis. Just like the t-distribution and the chi-square distribution which use degrees of freedom, the F-distribution also relies on degrees of freedom. Since the F-value is actually a ratio of two different sources of variance, we ll need two different degrees of freedom.

Steps of ANOVA When using the ANOVA method, we are testing the null hypothesis that the means of our samples are equal. When we conduct a hypothesis test, we are testing the probability of obtaining an extreme F-statistic by chance. If we reject the null hypothesis that the means and variances of the samples are equal, and then we are saying that the difference that we see could not have happened just by chance. To test a hypothesis using the ANOVA method, there are several steps that we need to take.

Steps of ANOVA We will create what is called the ANOVA table: Source: This column lists where the variation in the test is coming from: Between the groups, within the groups, or all the variance for all the observations (Total). SS: is the Sums of Squares Df: Degrees of freedom MS: Mean Square F: value of test statistic

Steps of ANOVA 1. Calculate the total sum of squares (SST ), where y is the grand mean 2. Calculate the sum of squares between (SSB) 3. Find the sum of squares within groups (SSW)

Steps of ANOVA 4. solve for degrees of freedom for the test 5. calculate the Mean Squares Between (MSB) and Mean Squares Within (MSW) 6. calculate the F statistic: 7. Find F critical using tables with the 2 degrees of freedom (between, within).

Steps of ANOVA

Steps of ANOVA 8. Make decision: If F test statistic is greater than F critical (or p-value of F statistic is less than alpha) Reject the null hypothesis at least two groups have different means. 9. If you found significant difference, you need to apply another test for finding which two groups have different means. One of these tests is Tukey Honest Significant Difference (HSD) Test.

R Example Does the price of a car depend on its body style? boxplot(automobile$price ~ Automobile$BodyStyle, main = "Cars Prices",ylab = "Price", xlab = "Body Style") How to interpret values in boxplot?

Boxplots The orientation can be vertical or horizontal. In this figure, it is drawn horizontally. Q1 is the first quartile (median of first quarter) Q2 is the 2 nd quartile (median of all data) Q3 is the 3 rd quartile (median of 3 rd quarter) IQ=Q3-Q1 is the interquartile range. Outliers are either >Q3+1.5*IQ or <Q1-1.5*IQ Here, we have only large outliers, as indicated by the dots at the right of the box. No values are smaller than Q1-1.5*IQ, hence no outliers at the left are shown. Outliers

R Example aggregate(price ~ BodyStyle, Automobile, mean)

R Example aggregate(price ~ BodyStyle, Automobile, sd)

R Example Pricesmodel =aov(automobile$price ~ Automobile$BodyStyle) summary(pricesmodel)

R Example TukeyHSD(Pricesmodel)