STAT 625: 2000 Olympic Diving Exploration

Similar documents
STAT 625: 2000 Olympic Diving Exploration

Case Studies Homework 3

1. The data below gives the eye colors of 20 students in a Statistics class. Make a frequency table for the data.

Stats 2002: Probabilities for Wins and Losses of Online Gambling

NBA TEAM SYNERGY RESEARCH REPORT 1

Bivariate Data. Frequency Table Line Plot Box and Whisker Plot

Evaluating The Best. Exploring the Relationship between Tom Brady s True and Observed Talent

Internet Technology Fundamentals. To use a passing score at the percentiles listed below:

NCSS Statistical Software

Navigate to the golf data folder and make it your working directory. Load the data by typing

How to Make, Interpret and Use a Simple Plot

Stat 139 Homework 3 Solutions, Spring 2015

Legendre et al Appendices and Supplements, p. 1

1wsSMAM 319 Some Examples of Graphical Display of Data

Reproducible Research: Peer Assessment 1

There are 3 sections to the home page shown in the first screenshot below. These are:

Organizing Quantitative Data

Lesson 14: Modeling Relationships with a Line

Opleiding Informatica

That pesky golf game and the dreaded stats class

CHAPTER 1 ORGANIZATION OF DATA SETS

Chapter 2: Modeling Distributions of Data

Descriptive Statistics Project Is there a home field advantage in major league baseball?

Average Runs per inning,

An Application of Signal Detection Theory for Understanding Driver Behavior at Highway-Rail Grade Crossings

Background Information. Project Instructions. Problem Statement. EXAM REVIEW PROJECT Microsoft Excel Review Baseball Hall of Fame Problem

Pace Handicapping with Brohamer Figures

March Madness Basketball Tournament

Data Set 7: Bioerosion by Parrotfish Background volume of bites The question:

High-Rise Fireground Field Experiments Results

12. School travel Introduction. Part III Chapter 12. School travel

Effective Use of Box Charts

STANDARD SCORES AND THE NORMAL DISTRIBUTION

Quality Assurance Charting for QC Data

Lesson 2.1 Frequency Tables and Graphs Notes Stats Page 1 of 5

Chapter 5: Methods and Philosophy of Statistical Process Control

Table 1. Average runs in each inning for home and road teams,

March Madness Basketball Tournament

PSY201: Chapter 5: The Normal Curve and Standard Scores

Sample Final Exam MAT 128/SOC 251, Spring 2018

An Investigation: Why Does West Coast Precipitation Vary from Year to Year?

Analysis of recent swim performances at the 2013 FINA World Championship: Counsilman Center, Dept. Kinesiology, Indiana University

DIFFERENCES BETWEEN THE WINNING AND DEFEATED FEMALE HANDBALL TEAMS IN RELATION TO THE TYPE AND DURATION OF ATTACKS

Announcements. Unit 7: Multiple Linear Regression Lecture 3: Case Study. From last lab. Predicting income

The pth percentile of a distribution is the value with p percent of the observations less than it.

STAT 155 Introductory Statistics. Lecture 2-2: Displaying Distributions with Graphs

Acknowledgement: Author is indebted to Dr. Jennifer Kaplan, Dr. Parthanil Roy and Dr Ashoke Sinha for allowing him to use/edit many of their slides.

Statistics Class 3. Jan 30, 2012

Analysis of Traditional Yaw Measurements

Analysis of Factors Affecting Train Derailments at Highway-Rail Grade Crossings

SCIENTIFIC COMMITTEE SEVENTH REGULAR SESSION August 2011 Pohnpei, Federated States of Micronesia

100-Meter Dash Olympic Winning Times: Will Women Be As Fast As Men?

A Hare-Lynx Simulation Model

NAME: A graph contains five major parts: a. Title b. The independent variable c. The dependent variable d. The scales for each variable e.

Understanding Winter Road Conditions in Yellowstone National Park Using Cumulative Sum Control Charts

ROSE-HULMAN INSTITUTE OF TECHNOLOGY Department of Mechanical Engineering. Mini-project 3 Tennis ball launcher

3D Turbulence at the Offshore Wind Farm Egmond aan Zee J.W. Wagenaar P.J. Eecen

WHAT IS THE ESSENTIAL QUESTION?

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 AUDIT TRAIL

Chapter 12 Practice Test

Section 5 Critiquing Data Presentation - Teachers Notes

% per year Age (years)

Ozobot Bit Classroom Application: Boyle s Law Simulation

DO YOU KNOW WHO THE BEST BASEBALL HITTER OF ALL TIMES IS?...YOUR JOB IS TO FIND OUT.

Warm-up. Make a bar graph to display these data. What additional information do you need to make a pie chart?

Handicapping Process Series

Golfers in Colorado: The Role of Golf in Recreational and Tourism Lifestyles and Expenditures

9.3 Histograms and Box Plots

Practice Test Unit 6B/11A/11B: Probability and Logic

Running head: DATA ANALYSIS AND INTERPRETATION 1

5.1 Introduction. Learning Objectives

Exemplar for Internal Achievement Standard. Mathematics and Statistics Level 1

Safety Assessment of Installing Traffic Signals at High-Speed Expressway Intersections

Lesson 16: More on Modeling Relationships with a Line

Percentage. Year. The Myth of the Closer. By David W. Smith Presented July 29, 2016 SABR46, Miami, Florida

SHOT ON GOAL. Name: Football scoring a goal and trigonometry Ian Edwards Luther College Teachers Teaching with Technology

1 Streaks of Successes in Sports

SPATIAL STATISTICS A SPATIAL ANALYSIS AND COMPARISON OF NBA PLAYERS. Introduction

Lab Report Outline the Bones of the Story

BEFORE YOU OPEN ANY FILES:

PGA Tour Scores as a Gaussian Random Variable

Diameter in cm. Bubble Number. Bubble Number Diameter in cm

STT 315 Section /19/2014

Full file at

Evaluating and Classifying NBA Free Agents

BEFORE YOU OPEN ANY FILES:

8th Grade. Data.

Compression Study: City, State. City Convention & Visitors Bureau. Prepared for

Equation 1: F spring = kx. Where F is the force of the spring, k is the spring constant and x is the displacement of the spring. Equation 2: F = mg

MTB 02 Intermediate Minitab

Major League Baseball Offensive Production in the Designated Hitter Era (1973 Present)

Supplemental Information

Note that all proportions are between 0 and 1. at risk. How to construct a sentence describing a. proportion:

4-3 Rate of Change and Slope. Warm Up. 1. Find the x- and y-intercepts of 2x 5y = 20. Describe the correlation shown by the scatter plot. 2.

Unit 3 ~ Data about us

FireWorks NFIRS BI User Manual

Unit 6 Day 2 Notes Central Tendency from a Histogram; Box Plots

Today s plan: Section 4.2: Normal Distribution

Performance Task # 1

STAT 155 Introductory Statistics. Lecture 2: Displaying Distributions with Graphs

Transcription:

Corey S Brier, Department of Statistics, Yale University 1 STAT 625: 2000 Olympic Diving Exploration Corey S Brier Yale University Abstract This document contains a preliminary investigation of data from the 2000 Olympic Diving Event. In particular, we offer an explanation for the bimodality in the degree of difficulty. The assignment for 9/17 begins in section 4. 1 Data import and formatting The data are provided in an easy to use CSV file so we may import it directly. > library(yaletoolkit) > some <- function(data, n = 7, replace = FALSE) { + sel <- sample(1:dim(data)[1], n, replace) + return(data[sel,]) + } > setwd("c:/users/corey/documents/yale/s3/625/week3") > data <- read.csv("diving2000.csv", as.is = T) > whatis(data) variable.name type missing distinct.values precision 1 Event character 0 4 NA 2 Round character 0 3 NA 3 Diver character 0 156 NA 4 Country character 0 42 NA 5 Rank numeric 0 49 1.0 6 DiveNo numeric 0 6 1.0 7 Difficulty numeric 0 20 0.1 8 JScore numeric 0 21 0.1 9 Judge character 0 25 NA 10 JCountry character 0 21 NA min max 1 M10mPF W3mSB 2 Final Semi 3 ABALLI Jesus-Iory ZHUPINA Olena 4 ARG ZIM 5 1 49 6 1 6

Corey S Brier, Department of Statistics, Yale University 2 7 1.5 3.8 8 0 10 9 ALT Walter ZAITSEV Oleg 10 AUS ZIM It is useful to change some data types and add a new column for gender: > data$event <- as.factor(data$event) > data$round <- as.factor(data$round) # This could be left as numeric > data$event <- as.factor(data$event) > data$round <- as.factor(data$round) # This could be left as numeric Let s add a column for gender: > menloc <- (data$event == "M3mSB") (data$event == "M10mPF") > femaleloc <-!menloc > data$sex[menloc] <- "M" > data$sex[femaleloc] <- "F" > data$sex <- factor(data$sex) Each row of the data corresponds to a score for a dive, not a particular contestant, so we expect some amount of clustering It could be useful to get all rows for a particular diver, so let s assign each distinct diver a different number: > data$divernumber <- rep(na,length(data$diver)) > for (i in 1:length(unique(data$Diver))) { + dname <- (unique(data$diver))[i] + data[data$diver == dname,]$divernumber <- i + } Also, for each dive, let us compute the average score and add that back into our dataset. We used a vectorized method to avoid an unnecessary loop. > dmeans <- apply(matrix(data$jscore, ncol = 7, byrow = T),1,mean) > data$avg <- rep(dmeans, each = 7) 2 Graphical Exploration We start with a simple histogram of the judge s scores:

Corey S Brier, Department of Statistics, Yale University 3 Histogram of data$jscore Frequency 0 500 1000 1500 0 2 4 6 8 10 Judge's Score We notice there is quite a bit of bimodality in the difficulty: Histogram of data$diff Frequency 0 500 1000 1500 2000 2500 3000 1.5 2.0 2.5 3.0 3.5 Dive Difficulties

Corey S Brier, Department of Statistics, Yale University 4 Constructing side by side box plots of the dive difficulties reveals that the difficulties from dives in the semifinal round are much lower than those of the final or preliminary rounds: Difficulty 1.5 2.0 2.5 3.0 3.5 Final Prelim Semi To confirm suspicions that Round is a large source of bimodality, we plot the difficulty vs. Judge s Score, jittering each point to deal somewhat with the over-plotting. Additionally, all of the points in the Semi-Final round are colored red.

Corey S Brier, Department of Statistics, Yale University 5 Judge's Score (jittered) 0 2 4 6 8 10 1.5 2.0 2.5 3.0 3.5 Difficulty (jittered) This clearly indicates that those dives performed in the Semi-Final round had lower difficulties than the other two rounds. Knowledge of the exact dive requirements and scoring system for the 2000 Olympics would also shed more insight onto why this is the case. Now, let us subset out data from the semi-final round and see if there is any bimodality:

Corey S Brier, Department of Statistics, Yale University 6 > datanosemi <- data[(data$round!="semi"),] > hist(datanosemi$diff, xlab = "Difficulty without semifinal round") Histogram of datanosemi$diff Frequency 0 500 1000 1500 2000 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 Difficulty without semifinal round Looking at the above figure, it certainly seems that bimodality is less of an issue, although there is still some concern which may merit further investigation. Next, we consider 4 plots, where the difficulty is plotted on the vertical axis. First (top-left) we construct box plots that contrast the 4 different events. The men s events (left two box plots) seem to possibly indicate slightly higher difficulties, so we isolate the men and women without considering the specific event in the top-right plot. We see that perhaps there is a small difference, but nothing drastic is occurring. Of course, due to the size of our data set, we should not be surprised if the standard statistical tests would indicate that there is a significant effect. The bottom-left plot compares the difficulties of the the dive numbers across all of the contestants. There seems to be little difference initially, with perhaps higher than average difficulties on dive number six. Finally, the bottom right graphic plots rank versus dive difficulty. Certainly for each value on the horizontal axis, multiple dives are present, but what is more interesting is the cluster on the bottom which seems to stop at the diver ranked number 20. Those points which correspond to the semi-final round are colored red and they in-fact match this cluster. One possible explanation is that divers ranked higher numerically (i.e. a lower position) only participated in the preliminary round.

Corey S Brier, Department of Statistics, Yale University 7 1.5 2.5 3.5 1.5 2.5 3.5 M10mPF W10mPF F M 1.5 2.5 3.5 jitter(data$diff) 1.5 2.5 3.5 1 2 3 4 5 6 0 10 20 30 40 50 data$rank We can confirm that divers with rank at best 20 only participated in the preliminary round as follows: > table(data[data$rank >= 20,]$Round) Final Prelim Semi 0 3710 0

Corey S Brier, Department of Statistics, Yale University 8 3 Considering the judges and the scoring The data include the countries that the divers are from as well as the countries of the Judges. One possible analysis might search for any bias, such as a judge giving preferential treatment to a competitor for his or her own country. Although this section is not a complete analysis, we present some preliminary steps. First, it makes sense to actually find out if any Judge evaluated a competitor for their own country: > finalsdata <- data[data$round == "Final",] > sum(as.numeric(finalsdata$country == finalsdata$jcountry)) [1] 0 > prelimdata <- data[data$round == "Prelim",] > sum(as.numeric(prelimdata$country == prelimdata$jcountry)) [1] 201 > semidata <- data[data$round == "Semi",] > sum(as.numeric(semidata$country == semidata$jcountry)) [1] 113 Although a single diver is represented on multiple rows of our data set, because each row corresponds to a judge s score for a dive, we do not need to worry about over-counting using this code. The results are clear: No one judged their own country s team in the finals, but did in the preliminary and semi-final rounds. An additional option is to extract the data where the diver s country and the judge s country were the same, and where they were not the same, to allow for a comparison: > samecountry <- data[data$country == data$jcountry,] > diffcountry <- data[!(data$country == data$jcountry),] > summary(samecountry$jscore) Min. 1st Qu. Median Mean 3rd Qu. Max. 3.000 7.000 7.500 7.462 8.500 10.000 > summary(diffcountry$jscore) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.000 6.000 7.000 6.814 8.000 10.000

Corey S Brier, Department of Statistics, Yale University 9 Of course, the data are very unbalanced now, but the univariate summaries indicate that scores are higher in both mean and median when a judge evaluated a diver from his or her own country. However, it is not yet clear how significant this relationship is. Focusing only on data from the preliminary round, we can plot each diver on the horizontal axis (DiverNumber was generated above) and the score for each of their dives on the vertical axis. Also, we have colored and enlarged any point where a judge has given a score for a diver of the same country: > plot(jitter(prelimdata$divernumber),jitter(prelimdata$jscore), + pch = 20, + col = 1 + as.numeric(prelimdata$country == prelimdata$jcountry), + cex = 1 + 1*as.numeric(prelimdata$Country == prelimdata$jcountry), + xlab ="Diver Number", ylab = "Judges score") Judge's score 0 2 4 6 8 10 0 50 100 150 Diver Number Right away we wonder if something is wrong because the graph appears to be in 4 distinct regions, and within each region the scores of the divers seem to be decreasing. This actually makes sense however! The data was given to us sorted first by event (starting with the men s spring board), and within each event the divers were ordered by rank. So the overall shape of the graph may be slightly distracting, but it should not be alarming. More importantly, we wish to look for patterns in the red points. It is very tempting to say that the scores corresponding to the red points seem artificially inflated, but the graph does not provide conclusive evidence (especially when compared to our investigation of the

Corey S Brier, Department of Statistics, Yale University 10 bimodality above.) 4 Investigating Steve McFarland for Potential Bias Although misleading, to begin our search for bias from Steve McFarland, we compute his average score for US competitors and non-us competitors: > steveusa <- data[data$judge == "McFARLAND Steve" & data$country == "USA",] > mean(steveusa$jscore) [1] 7.797619 > stevenousa <- data[data$judge == "McFARLAND Steve" & data$country!= "USA",] > mean(stevenousa$jscore) [1] 6.698374 We see that on average, Steve McFarland scored American divers 1.1 points higher than non- American divers. We have to be careful, however. It could be the case that the American divers are actually better, on the average, than the other competitors. Thus, we calculate the average score of all of the judges, except Steve McFarland, for American Divers: > nosteve <- data[data$judge!= "McFARLAND Steve",] > mean(nosteve[nosteve$country == "USA",]$JScore) [1] 7.460177 This reveals that indeed the scores for USA divers are higher than Steve s scores for non- USA divers. However, McFarland s scores for the Americans are still about.34 points higher than the other judge s scores for the Americans. This might indicate some bias, so let s look more closely at those US divers that Steve McFarland judged. We proceed by, for each of those 7 divers, plotting all of their scores. Black points indicate scores from judge s besides McFarland, while points in red correspond to McFarland s scores. The green triangles represent the average of McFarland s scores, for that diver, and the blue diamonds represent the average of all of the other judge s score, for that diver. Data from the final round is excluded, but some within-diver clustering is expected because for each dive, and within each event, we expect reasonably comparable scores:

Corey S Brier, Department of Statistics, Yale University 11 jitter(final$jscore) 4 5 6 7 8 9 1 2 3 4 5 6 7 jitter(final$divernumber2) We see right away that McFarland s average score is always above the average score from the other judges, for each of these 7 divers. The greatest absolute discrepancy between Steve s average score and the other judge s score occurs for diver 5 on this chart, corresponding to DAVISON, Michelle. To statistically search for bias, we can assume that all judge s are unbiased and then permute the judges over the dives. This will preserve the performance standard within countries and individual competitors, but will test against judge s being extreme in scoring: > dataperm <- data[data$round!= "Final",] > dataperm$judge <- sample(dataperm$judge) > print(m1 <- mean(dataperm[dataperm$judge == "McFARLAND Steve" & + dataperm$country == "USA",]$JScore)) [1] 7.444444 > print(m2 <- mean(dataperm[dataperm$judge!= "McFARLAND Steve" & + dataperm$country == "USA",]$JScore)) [1] 7.481553 > abs(m1 - m2) [1] 0.03710895

Corey S Brier, Department of Statistics, Yale University 12 As before, the results here are both for the mean scores of US competitors. The first assumed the judge is McFarland (under permutation), while the second assumes it is not. We see that indeed these results are very similar, indicating that the difference we saw initially may be significant. By considering many permutations, the absolute difference remains very small, so we may reasonably assume that McFarland has some amount of bias. Earlier, we computed the mean score for each dive. Also, we already found those dives for which So, for each dive we can compare the mean score given by the judges besides McFarland and McFarland s score. > mean(steveusa$jscore - steveusa$avg) [1] 0.2006803 We see that McFarland scored about.20 higher than the judges across dives performed by an Americans. Let s see if he is enthusiastic and grades non-usa divers higher by.2 as well: > discrep <- mean(stevenousa$jscore - stevenousa$avg) > discrep [1] 0.01045296 This is a value very close to zero! It is positive, so on the average McFarland does score higher on a particular dive the the other judges, but the amount is not nearly so great as the bias he seems to give to the Americans. Subtracting out this average deviation, we have an estimate of his actual bias: > mean(steveusa$jscore - steveusa$avg) - mean(stevenousa$jscore - stevenousa$avg) [1] 0.1902273 Now, if McFarland is really unbiased, subtracting the discrepancy from his scores and comparing the mean to the scores given by the other judges to USA competitors should not yield a difference. Thus we have a (1 sided hypothesis test): > t.test(steveusa$jscore - discrep,steveusa$avg, alternative = "greater") Welch Two Sample t-test data: steveusa$jscore - discrep and steveusa$avg t = 1.2284, df = 80.173, p-value = 0.1114 alternative hypothesis: true difference in means is greater than 0 95 percent confidence interval: -0.06746283 Inf sample estimates: mean of x mean of y 7.787166 7.596939

Corey S Brier, Department of Statistics, Yale University 13 Which yields a p-value of about.111, which indicates it may not be truly significant. Now, there are a number of issues with this test so we need to be careful. We would like both steveusa$jscore - discrep and steveusa$avg to be roughly normal. So, we can plot some basic histograms: stogram of steveusa$jscore d Histogram of steveusa$avg Frequency 0 5 10 15 Frequency 0 5 10 15 5.5 6.5 7.5 8.5 steveusa$jscore discrep 5 6 7 8 9 steveusa$avg The first histogram appears roughly acceptable, though there is perhaps some cause for concern in the second. Also the two samples here are not independent since certainly the average scores for all of the judges will include McFarland s score. We suspect then that excluding McFarland s score from the average would slightly increase the significance level. A non-parametric test we could try is the (2-sample) Mann Whitney U Test: > wilcox.test(steveusa$jscore - discrep,steveusa$avg, alternative = "greater", + exact = FALSE) Wilcoxon rank sum test with continuity correction data: steveusa$jscore - discrep and steveusa$avg W = 941, p-value = 0.2991 alternative hypothesis: true location shift is greater than 0 Again we see a result that does not seem significant. Also, we could try using a permutation test which would not require the data follow a normal distribution as well. Another option would be to create an indicator variable that designates if McFarland is adjudicating a US Diver:

Corey S Brier, Department of Statistics, Yale University 14 > data$issteveusa <- rep(0,length(data$avg)) > data[data$judge == "McFARLAND Steve" & data$country =="USA",]$isSteveUSA <- 1 We could then create a regression model including this indicator variable, and see if it is significant. Some care would need to be taken because fitting JScore as the response would include each dive as seven separate observations which is not appropriate. Further explorations might consider the bias of all judges on their home country. If most or all judges are biased, then it would be useful to compare how biased McFarland is to the others. Perhaps he is not as biased as some of the other judges. Alternatively, perhaps if most judges are biased, then there is actually no net effect on the rankings, since each competitor s score will be similarly inflated. These are only speculations, but provide direction for additional analyses.