Biostatistics & SAS programming

Similar documents
A few things to remember about ANOVA

Analysis of Variance. Copyright 2014 Pearson Education, Inc.

Week 7 One-way ANOVA

MGB 203B Homework # LSD = 1 1

ANOVA - Implementation.

One-factor ANOVA by example

Select Boxplot -> Multiple Y's (simple) and select all variable names.

Unit 4: Inference for numerical variables Lecture 3: ANOVA

Midterm Exam 1, section 2. Thursday, September hour, 15 minutes

Experimental Design and Data Analysis Part 2

Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA

INTRODUCTION TO PATTERN RECOGNITION

PLANNED ORTHOGONAL CONTRASTS

Navigate to the golf data folder and make it your working directory. Load the data by typing

Background Information. Project Instructions. Problem Statement. EXAM REVIEW PROJECT Microsoft Excel Review Baseball Hall of Fame Problem

Statistical Analysis of PGA Tour Skill Rankings USGA Research and Test Center June 1, 2007

Legendre et al Appendices and Supplements, p. 1

IDENTIFYING SUBJECTIVE VALUE IN WOMEN S COLLEGE GOLF RECRUITING REGARDLESS OF SOCIO-ECONOMIC CLASS. Victoria Allred

On the association of inrun velocity and jumping width in ski. jumping

Estimating the Probability of Winning an NFL Game Using Random Forests

Factorial Analysis of Variance

y ) s x x )(y i (x i r = 1 n 1 s y Statistics Lecture 7 Exploring Data , y 2 ,y n (x 1 ),,(x n ),(x 2 ,y 1 How two variables vary together

Case Processing Summary. Cases Valid Missing Total N Percent N Percent N Percent % 0 0.0% % % 0 0.0%

Name May 3, 2007 Math Probability and Statistics

Chapter 7. Comparing Two Population Means. Comparing two population means. T-tests: Independent samples and paired variables.

Lab 11: Introduction to Linear Regression

Announcements. % College graduate vs. % Hispanic in LA. % College educated vs. % Hispanic in LA. Problem Set 10 Due Wednesday.

BIOL 101L: Principles of Biology Laboratory

One Way ANOVA (Analysis of Variance)

Factorial ANOVA Problems

Analysis of Car-Pedestrian Impact Scenarios for the Evaluation of a Pedestrian Sensor System Based on the Accident Data from Sweden

One-way ANOVA: round, narrow, wide

Class 23: Chapter 14 & Nested ANOVA NOTES: NOTES: NOTES:

Competitive Performance of Elite Olympic-Distance Triathletes: Reliability and Smallest Worthwhile Enhancement

Prokopios Chatzakis, National and Kapodistrian University of Athens, Faculty of Physical Education and Sport Science 1

Bungee Bonanza. Level 1

Guide to Computing Minitab commands used in labs (mtbcode.out)

Bivariate Data. Frequency Table Line Plot Box and Whisker Plot

BASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG

Design of Experiments Example: A Two-Way Split-Plot Experiment

DOCUMENT RESUME. A Comparison of Type I Error Rates of Alpha-Max with Established Multiple Comparison Procedures. PUB DATE NOTE

SUMMARIZING FROG AND TOAD COUNT DATA

POTENTIAL ENERGY BOUNCE BALL LAB

Driv e accu racy. Green s in regul ation

Multi Class Event Results Calculator User Guide Updated Nov Resource

Analysis of AGFC Historical Crappie Trap-Netting Data. Aaron Kern and Andy Yung Arkansas Game and Fish Commission District 6 Fisheries Camden, AR

Empirical Example II of Chapter 7

Can young skiers perform well both in sprint and endurance races?

Accident data analysis using Statistical methods A case study of Indian Highway

Chapter 13. Factorial ANOVA. Patrick Mair 2015 Psych Factorial ANOVA 0 / 19

Energy of a Rolling Ball

EXST7015: Salaries of all American league baseball players (1994) Salaries in thousands of dollars RAW DATA LISTING

SUPPLEMENTARY INFORMATION

1) Włodzimierz Starosta, 2) Iwona Dębczyńska-Wróbel, 3) Łukasz Lamcha

Robert Jones Bandage Report

DISMAS Evaluation: Dr. Elizabeth C. McMullan. Grambling State University

Confidence Interval Notes Calculating Confidence Intervals

Grip Force and Heart Rate Responses to Manual Carrying Tasks: Effects of Material, Weight, and Base Area of the Container

WATER OIL RELATIVE PERMEABILITY COMPARATIVE STUDY: STEADY VERSUS UNSTEADY STATE

Keywords: multiple linear regression; pedestrian crossing delay; right-turn car flow; the number of pedestrians;

N. Abid 2 and M. Idrissi 1 ABSTRACT

Stats 2002: Probabilities for Wins and Losses of Online Gambling

SCIENTIFIC COMMITTEE SEVENTH REGULAR SESSION August 2011 Pohnpei, Federated States of Micronesia

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 AUDIT TRAIL

P.O.Box 43 Blindern, 0313 Oslo, Norway Tel.: , Fax: Statkraft,Postboks 200 Lilleaker, 0216 Oslo, Norway ABSTRACT

Example 1: One Way ANOVA in MINITAB

NCSS Statistical Software

Total Morphological Comparison Between Anolis oculatus and Anolis cristatellus

Journal of Emerging Trends in Computing and Information Sciences

Aquaculture Technology - PBBT301 UNIT I - MARINE ANIMALS IN AQUACULTURE

Preliminary statistical analysis of. the international eventing. results 2013

ISDS 4141 Sample Data Mining Work. Tool Used: SAS Enterprise Guide

Section 5 Critiquing Data Presentation - Teachers Notes

CHAPTER 1 ORGANIZATION OF DATA SETS

Projecting Three-Point Percentages for the NBA Draft

ESTIMATION OF THE DESIGN WIND SPEED BASED ON

Relationship between lower limb strength and running performance in 3 populations of athletes

Improving the Serving Motion in a Volleyball Game: A Design of Experiment Approach

Organizing Quantitative Data

Emergence of a professional sports league and human capital formation for sports: The Japanese Professional Football League.

Session 2: Introduction to Multilevel Modeling Using SPSS

8-1. The Pythagorean Theorem and Its Converse. Vocabulary. Review. Vocabulary Builder. Use Your Vocabulary

DIFFERENCES BETWEEN THE WINNING AND DEFEATED FEMALE HANDBALL TEAMS IN RELATION TO THE TYPE AND DURATION OF ATTACKS

A new Decomposition Algorithm for Multistage Stochastic Programs with Endogenous Uncertainties

MJA Rev 10/17/2011 1:53:00 PM

A computer program that improves its performance at some task through experience.

Setting up group models Part 1 NITP, 2011

MIS0855: Data Science In-Class Exercise: Working with Pivot Tables in Tableau

NAME: A graph contains five major parts: a. Title b. The independent variable c. The dependent variable d. The scales for each variable e.

Table 4.1: Descriptive Statistics for FAAM 26-Item ADL Subscale

How Effective is Change of Pace Bowling in Cricket?

Is lung capacity affected by smoking, sport, height or gender. Table of contents

CS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Physical Fitness For Futsal Referee Of Football Association In Thailand

Math SL Internal Assessment What is the relationship between free throw shooting percentage and 3 point shooting percentages?

Lesson 3 Pre-Visit Teams & Players by the Numbers

Evaluation of pedestrians speed with investigation of un-marked crossing

A Machine Learning Approach to Predicting Winning Patterns in Track Cycling Omnium

save percentages? (Name) (University)

Transcription:

Biostatistics & SAS programming Kevin Zhang March 6, 2017 ANOVA 1

Two groups only Independent groups T test Comparison One subject belongs to only one groups and observed only once Thus the observations from different groups reflects different subjects Paired group T test One subject will be observed twice Values from different groups are paired thus correlated PROC TTEST Will solve everything March 6, 2017 ANOVA 2

More than 2 groups??? In a clinical trial, 3 new approved medications (A, B, and C) are given to 3 groups of diabetic volunteer. We wish to know which one is more effective to reduce the glucose level. How to compare them? Med1.GLU Med2.GLU Med3.GLU 79 86 66 78 75 71 75 75 65 90 81 63 83 88 68 79 71 66 75 87 66 81 80 62 71 84 65 78 84 64 73 68 72 66 76 62 73 64 75 65 65 65 64 March 6, 2017 ANOVA 3

ANalysis Of Variance Why do we deal with variance? ANOVA A sequence of constant value has variance of 0. Variance reflects the information contained in the sample! Partition of variance upon sources In the beginning, we have no idea to catch information We may propose a model, a classification, etc Question: Dose the model or classification REALLY catch something from the sample?? i.e. Whether your model or classification makes sense. March 6, 2017 ANOVA 4

Things we can control Total variance ALL information you collected Things out of control Info caught by your model or classification Random March 6, 2017 ANOVA 5

ANOVA table Source DF SS MS F test statistics P-value Your model or Classification Number of Parameters -1 Reflects the variance caught by your model Averaged variability upon model/classificati on Random Errors Sample size Number of Parameters Variance that out of the control Averaged variability upon randomness Total Sample Size - 1 Total variance Yes, the F value is just a comparison: See if your model/classification dominant the major information or not. March 6, 2017 ANOVA 6

Back to the glucose example Med1.GLU Med2.GLU Med3.GLU 79 86 66 78 75 71 75 75 65 90 81 63 83 88 68 79 71 66 75 87 66 81 80 62 71 84 65 78 84 64 73 68 72 66 76 62 73 64 75 65 65 65 64 We can see the difference for sure!! March 6, 2017 ANOVA 7

Med1.GLU Med2.GLU Med3.GLU 79 86 66 78 75 71 75 75 65 90 81 63 83 88 68 79 71 66 75 87 66 81 80 62 71 84 65 78 84 64 73 68 72 66 76 62 73 64 75 65 65 65 64 Total variability is 2712.418605 (SS) Classification caught 1993.507494 Left for randomness 718.911111 The DF of classification: You have 3 medications (classifications), thus the DF = 3-1 = 2 Sample size is 43 (all volunteers), thus Total DF = 43-1=42 The DF of randomness will be 42 2 = 40 March 6, 2017 ANOVA 8

Filling the blanks Source DF Sum of Squares Mean Square F Value Pr > F Model 2 1993.507494 996.753747 55.46 <.0001 Error 40 718.911111 17.972778 Total 42 2712.418605 That tells the classification is success, and we can distinguish the 3 medications. March 6, 2017 ANOVA 9

DATA step SAS programming We prefer following structure of your data set: Observed values Classes 79 Med1 78 Med1 86 Med2 March 6, 2017 ANOVA 10

How to: Reorganizing data sets Take Column 1 (Med1) out as a separate dataset, say Med1 Take Column 2 (Med2) as Med2 dataset Column 3 as Med3 Stack Med1, Med2, Med3 together Errr More actions are needed: Labels of groups Change the observation name from Medx.GLU to an unique name Med1.GLU 79 78 75 90 83 Med2.GLU 86 75 75 81 Med3.GLU 66 71 65 63 68 March 6, 2017 ANOVA 11

DATA Step Will do the same thing for all 3, to make sure value column has unique name -- Glucose Yes, right now, the value column is still named as Med1_GLU, and we want to keep the values for sure. Glucose Medication 79 Medication 1 78 Medication 1 data med1 ( rename=(med1_glu=glucose) keep=med1_glu Medication) set Glu; Medication = "Medication 1"; /* Add the label to all values in this data set*/ if cmiss(med1_glu) then delete; run; Dealing with the missing values: Thus to trim those. in the imported data 75 Medication 1 90 Medication 1 83 Medication 1 79 Medication 1 75 Medication 1 81 Medication 1 71 Medication 1 78 Medication 1 73 Medication 1 72 Medication 1 76 Medication 1 73 Medication 1 75 Medication 1 March 6, 2017 ANOVA 12

Similar code for Med2 and Med3 data med2 (rename=(med2_glu=glucose) keep=med2_glu Medication); set Glu; Medication = "Medication 2"; if cmiss(med2_glu) then delete; run; data med3 (rename=(med3_glu=glucose) keep=med3_glu Medication); set Glu; Medication = "Medication 3"; if cmiss(med3_glu) then delete; run; March 6, 2017 ANOVA 13

data med; set med1 med2 med3; run; Stack all 3 data sets Now it is ready!! March 6, 2017 ANOVA 14

PROC ANOVA proc anova data=med; class Medication; model Glucose = Medication; /* ANOVA: Taget = Factor */ means Medication/tukey; run; Post-hoc study: In case we find difference, we compare classes pairely. Tell SAS which variable in the data set is used to classify the value column. Telling your classification model, i.e. Using Medication column classify Glucose value column. March 6, 2017 ANOVA 15

Homework The CFO of a global company wishes to research the pay rate of the employees in different areas. In payrate.csv he summarized pay rate of sampled employees from US branch, Canada branch, Europe branch, Australia branch, Asia branch and Africa branch. Please analyze the values and tell: Do you think there exists significant difference between branches? Why? Demonstrate the side-by-side boxplot In case you think the difference was significant, then what is the relationship among them? Could sort the branches from largest pay rate to smallest? March 6, 2017 ANOVA 16