Case Studies Homework 3 Breanne Chryst September 11, 2013 1 In this assignment I did some exploratory analysis on a data set containing diving information from the 2000 Olympics. My code and output is listed below. 2 Code with Comments library(yaletoolkit) library(rlab) setwd("~/documents") dive <- read.csv("diving2000.csv", header = TRUE) whatis(dive) variable.name type missing distinct.values precision 1 Event mixed factor 0 4 NA 2 Round pure factor 0 3 NA 3 Diver mixed factor 0 156 NA 4 Country pure factor 0 42 NA 5 Rank numeric 0 49 1.0 6 DiveNo numeric 0 6 1.0 7 Difficulty numeric 0 20 0.1 8 JScore numeric 0 21 0.1 9 Judge mixed factor 0 25 NA 10 JCountry pure factor 0 21 NA min max 1 M10mPF W3mSB 2 Final Semi 3 ABALLI Jesus-Iory ZHUPINA Olena 4 ARG ZIM 5 1 49 6 1 6 7 1.5 3.8 8 0 10 1
9 ALT Walter ZAITSEV Oleg 10 AUS ZIM per.scored AUS.P AUT.P CAN.P CHN.P GBR.P GER.P HUN.P 1 0.06127746 0.035413 0.06415129 0.05914527 0.03810142 0.06090665 0.0200241 MEX.P NOR.P NZL.P PUR.P RUS.P SUI.P SWE.P 1 0.07935478 0.02530824 0.08157968 0.0664689 0.05515899 0.03717438 0.0376379 USA.P ZIM.P 1 0.06090665 0.01473996 #'This variable was created to take the proportion of dives #'judges by each country. A disproportionately large number #'of dives judged by Mexico and New Zealand. summary(lm(jscore ~ Difficulty, data = dive)) Call: lm(formula = JScore ~ Difficulty, data = dive) Residuals: Min 1Q Median 3Q Max -6.4865-0.6449 0.2460 0.9045 3.8304 Coefficients: Estimate Std. Error t value Pr( t ) (Intercept) 9.02145 0.07570 119.2 <2e-16 *** Difficulty -0.79217 0.02695-29.4 <2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 1.419 on 10785 degrees of freedom Multiple R-squared: 0.07419, Adjusted R-squared: 0.07411 F-statistic: 864.3 on 1 and 10785 DF, p-value: < 2.2e-16 #'Difficulty seems to have an inverse relationship #'with score. chisq.test(table(dive$difficulty, dive$country), + sim=t,b=50000) 2
Pearson's Chi-squared test with simulated p-value (based on 50000 replicates) data: table(dive$difficulty, dive$country) X-squared = 6862.435, df = NA, p-value = 2e-05 chisq.test(table(dive$jscore, dive$country),sim=t,b=50000) Pearson's Chi-squared test with simulated p-value (based on 50000 replicates) data: table(dive$jscore, dive$country) X-squared = 6595.219, df = NA, p-value = 2e-05 plot(jitter(dive$difficulty), jitter(dive$jscore), + col = as.factor(dive$round), + pch = c(as.factor(dive$round)), + xlab = "Dive Difficulty", ylab = "Judges' Score", + main = "Olympic Dives") legend(1.5, 1.9, c("final", "Preliminary", "Semi-Final"), + col = c(1, 2, 3), pch = c(1, 2, 3), bg = "gray90") #'From this plot we can see that divers chose to have a #'dive of lower difficulty in the Semifinal round. #'This explains the bimodality of the difficulty of dives. 3
Olympic Dives Judges' Score 0 2 4 6 8 10 Final Preliminary Semi Final 1.5 2.0 2.5 3.0 3.5 Dive Difficulty bplot(dive$jscore, dive$jcountry, + main = "Scores by Country of the Judge", + xlab = "Country of the Judge", + ylab = "Judges' Scores") #'From this plot we see that the Chinese judges do not #'tend to give higher scores than the other judges. 4
Scores by Country of the Judge Judges' Scores CUB NZL CAN NOR FRA PUR ESP CHN GER USA MEX RUS GBR AUS SWE ZIM SUI AUT EGY CZE HUN 0 2 4 6 8 10 Country of the Judge bplot(dive$difficulty, dive$country, + main = "Difficulty by Diver's Country", + xlab = "Diver's Country", ylab = "Difficulty") 5
Difficulty by Diver's Country Difficulty CHN MEX RUS AUS USA JPN GER HUN GBR BRA CUB ESP ITA CAN FIN VEN MAS BLR PER UKR AUT GRE KOR ROM ZIM CZE FRA COL KAZ THA PHI SUI PRK HKG TPE AZE ARM INA SWE 1.5 2.0 2.5 3.0 3.5 Diver's Country PUR ARG GEO bplot(dive$difficulty, dive$jscore, + main = "Scores by Level of Difficulty", + xlab = "Judges' Scores", ylab = "Difficulty") #'Chinese don't tend to score higher than other judges. 6
Scores by Level of Difficulty 8 9 8.5 9.5 7.5 7 6 6.5 5.5 5 4.5 4 3 3.5 1.5 2 2.5 1 0.5 10 0 Difficulty 1.5 2.0 2.5 3.0 3.5 Judges' Scores hist(dive$difficulty, main = "Histogram of the Difficulty", + xlab = "Difficulty", col = "light blue") #'Slightly bimodal, mostly concentrated around 3. 7
Histogram of the Difficulty Frequency 0 500 1000 1500 2000 2500 3000 1.5 2.0 2.5 3.0 3.5 Difficulty hist(dive$jscore, main = "Histogram of Judges' Scores", + col = "light yellow") #'Concentrated around 6-9. 8
Histogram of Judges' Scores Frequency 0 500 1000 1500 0 2 4 6 8 10 dive$jscore plot(dive$jcountry, col = "gold", + main = "Number of Judges' Scores per Country") #'Dispropoationately high number of dives judged by Mexico #'and New Zealand. 9
Number of Judges' Scores per Country 0 200 400 600 800 AUS CHN EGY GBR MEX PUR SWE plot(dive$country, col = "lavender", + main = "Plot of the Number of Dives per Country") #'There are clearly some countries with many more dives #'than others. In particular, the U.S. and China have a #'800+ divers. 10
Plot of the Number of Dives per Country 0 200 400 600 800 ARG BLR CUB GBR HUN KOR PRK SWE VEN 11