Statistical Analyses on Roger Federer s Performances in 2013 and 2014 James Kong,
|
|
- Roland Bryant
- 6 years ago
- Views:
Transcription
1 Statistical Analyses on Roger Federer s Performances in 2013 and 2014 James Kong, kong.james@berkeley.edu Introduction Tennis has become a global sport and I am a big fan of tennis. Since I played college tennis for Cal, and I would like to do statistical analyses to my favorite player of all time - Roger Federer, about his performances in 2013 and 2014.Here were why I specifically choose these two years: In 2013: A. He had a back injury throughout his career, but in 2013, it was his lowest point of his career. B. He hadn t reached a grand slam final for the first time in 10 years ( ), and he had the lowest win/loss record in his entire career. In 2014: A. He changed his coach from Paul Annacone to one of the tennis legends - Stefan Edberg. B. He changed his racket for the first time in his career. One year may not be long or short, but it could turn things upside down for Federer. I m going to examine all of his match records from 2013 and 2014, excluding Davis Cup which is an individual represent his country to participate a team event that battle against other countries. It occurs every year, but not every professional play Davis Cup each year, so I would leave Davis Cup out for the analysis. I was going to use various methods to analyze his performances so it would be fitting to show his worst and one of his good years after a short period of time. The data I got was from Datahub, which compiled men s and women s tennis results from 2000 to Website: Methods: I wanted to achieve a couple things for 2013 and 2014 results. Objectives were: 1. How many sets he had won and lost 2. How many tournaments he had won and how many tournaments he had lost in the finals 3. His winning percentage Per Year 4. His performances in the 4 Grand Slams (Australian Open, French Open, Wimbledon, US Open) Per Year 5. Which surface he performed better (Hard, Clay, Grass) 6. Show his Total Wins Per Surface Per Year Imported datasets for 2013 and Also, I filtered which I would take some columns that are useful to do analysis - tournament, date, surface, round, best of, winner, loser, Wsets, Lsets. results_13 <- read.csv("/users/kingkong1/desktop/gis Work/R/Roger Federer's Analysis/Final Project/full_r esults_2013.csv", header = TRUE) results_14 <- read.csv("/users/kingkong1/desktop/gis Work/R/Roger Federer's Analysis/Final Project/full_r esults_2014.csv", header = TRUE) subset_13 <- results_13[, c(3, 4, 6, 7, 8, 9,10, 11, 26, 27) ] subset_14 <- results_14[, c(3, 4, 6, 7, 8, 9,10, 11, 26, 27) ] Filtered results by combining wins and losses. The column headings were as following from two datasets. rf_13 <- subset_13[which(subset_13$winner == 'Federer R.' subset_13$loser == 'Federer R.'), ] rf_14 <- subset_14[which(subset_14$winner == 'Federer R.' subset_14$loser == 'Federer R.'), ] head(rf_13)
2 Tournament Date Court Surface Round Best.of 183 Australian Open 1/15/13 Outdoor Hard 1st Round Australian Open 1/17/13 Outdoor Hard 2nd Round Australian Open 1/19/13 Outdoor Hard 3rd Round Australian Open 1/21/13 Outdoor Hard 4th Round Australian Open 1/23/13 Outdoor Hard Quarterfinals Australian Open 1/25/13 Outdoor Hard Semifinals 5 Winner Loser Wsets Lsets 183 Federer R. Paire B Federer R. Davydenko N Federer R. Tomic B Federer R. Raonic M Federer R. Tsonga J.W Murray A. Federer R. 3 2 head(rf_14) Tournament Date Court Surface Round Best.of 4 Brisbane International 1/1/14 Outdoor Hard 2nd Round 3 12 Brisbane International 1/3/14 Outdoor Hard Quarterfinals 3 14 Brisbane International 1/4/14 Outdoor Hard Semifinals 3 15 Brisbane International 1/5/14 Outdoor Hard The Final Australian Open 1/14/14 Outdoor Hard 1st Round Australian Open 1/16/14 Outdoor Hard 2nd Round 5 Winner Loser Wsets Lsets 4 Federer R. Nieminen J Federer R. Matosevic M Federer R. Chardy J Hewitt L. Federer R Federer R. Duckworth J Federer R. Kavcic B. 3 0 Noticed that both datasets columns headings were identical; I was going to do a merge between tables. However, I couldn t merge tables because I didn t have unique IDs to join two tables. In addition, if I had had joined two tables, my data would be messed up because it would use a column from one table to join to the others, which would comprise results and changed the data structure. Therefore, I would use append because number of columns and column headings were identical. Also, I will show the strings of the append; replace NA values to 0 on the sets he won and lost. rf_13_14 <- rbind(rf_13,rf_14) str(rf_13_14) 'data.frame': 142 obs. of 10 variables: $ Tournament: Factor w/ 72 levels "Abierto Mexicano",..: $ Date : Factor w/ 560 levels "1/1/13","1/10/13",..: $ Court : Factor w/ 2 levels "Indoor","Outdoor": $ Surface : Factor w/ 3 levels "Clay","Grass",..: $ Round : Factor w/ 8 levels "1st Round","2nd Round",..: $ Best.of : int $ Winner : Factor w/ 244 levels "Almagro N.","Alund M.",..: $ Loser : Factor w/ 388 levels "Aguilar J.","Ali Mutawa J.M.",..: $ Wsets : int $ Lsets : int rf_13_14$wsets[is.na(rf_13_14$wsets)] <- 0 rf_13_14$lsets[is.na(rf_13_14$lsets)] <- 0 Changed the columns type - Date and added a new column Year to categorize all matches that Federer played in 2013 or rf_13_14$date <- as.date(rf_13_14$date, format = "%m/%d/%y") rf_13_14$year <- format(rf_13_14$date, format = "%Y") Comments: Later, when I did sum, subract or other statistical functions, I wouldn t get errors saying in my columns that had NA values or can t sort by dates/year.
3 1. How many sets he won and lost library(plyr) rf_win_loss <- ddply(rf_13_14, ~Year, summarise, Sets_won = sum(wsets), Sets_Loss = sum(lsets)) print(rf_win_loss) Year Sets_won Sets_Loss How many tournaments he had won and had lost in the final by year. I had to subset wins and losses in the final, so I used round = the final & winner = federer to specify my subset. It was because that was the only way to know if he had won a tournament or not, and vice versa. After that, I bind two datasets to see his overall performances in finals whether wins or losses, and then added a column with condition statement, if winner = Federer R, print Win, else Loss. rf_tournaments_won <- rf_13_14[which(rf_13_14$round == "The Final" & rf_13_14$winner == "Federer R."), ] rf_tournaments_lost <- rf_13_14[which(rf_13_14$round == "The Final" & rf_13_14$loser == "Federer R."), ] rf_tournament_record <- rbind(rf_tournaments_won, rf_tournaments_lost) rf_tournament_record$w_l <- ifelse(rf_tournament_record$winner == "Federer R.", "Win", "Loss") print(head(rf_tournament_record)) Tournament Date Court Surface 1355 Gerry Weber Open Outdoor Grass 575 Dubai Tennis Championships Outdoor Hard 1347 Gerry Weber Open Outdoor Grass 2011 Western & Southern Financial Group Masters Outdoor Hard 2383 Shanghai Masters Outdoor Hard 2495 Swiss Indoors Indoor Hard Round Best.of Winner Loser Wsets Lsets Year W_L 1355 The Final 3 Federer R. Youzhny M Win 575 The Final 3 Federer R. Berdych T Win 1347 The Final 3 Federer R. Falla A Win 2011 The Final 3 Federer R. Ferrer D Win 2383 The Final 3 Federer R. Simon G Win 2495 The Final 3 Federer R. Goffin D Win Separated 2013 and 2014 records and found out W-L record by calculating the frequency. library(knitr) Warning: package 'knitr' was built under R version rf_tournament_record$date <- as.date(rf_tournament_record$date, format = "%m/%d/%y") rf_tournament_record$year <- format(rf_tournament_record$date, format = "%Y") final_records_2013 <- rf_tournament_record[rf_tournament_record$year == 2013, ] final_records_2014 <- rf_tournament_record[rf_tournament_record$year == 2014, ] freq_2014 <- count(final_records_2014, c("w_l", "Year")) freq_2013 <- count(final_records_2013, c("w_l", "Year")) c <- cbind(freq_2013,freq_2014) W_L <- c[order(c$freq ),] colnames(w_l)[c(1,4)] <- "Number of Finals" colnames(w_l)[c(3,6)] <- "Total" kable(w_l) Number of Finals Year Total Number of Finals Year Total 2 Win Win Loss Loss Comments: From the table above, there were significant changes between 2013 and 2014 s win-loss record. Even though 2014 had more losses in the final, he still won more finals compared to 2013.
4 3. His winning percentage from 2013 to Added Year and W_L columns, and organize them by categories to both years. rf_13$date <- as.date(rf_13$date, format = "%m/%d/%y") rf_13$year <- format(rf_13$date, format = "%Y") rf_13$w_l <- ifelse(rf_13$winner == "Federer R.", "Win", "Loss") rf_14$date <- as.date(rf_14$date, format = "%m/%d/%y") rf_14$year <- format(rf_14$date, format = "%Y") rf_14$w_l <- ifelse(rf_14$winner == "Federer R.", "Win", "Loss") 2013 and 2014 winning percentage total_percentages_2013 <- count(rf_13, c("w_l", "Year")) print(total_percentages_2013) W_L Year freq 1 Loss Win winning_percentages_2013 <- total_percentages_2013$freq/ sum(total_percentages_2013$freq) print( winning_percentages_2013[2]) [1] total_percentages_2014 <- count(rf_14, c("w_l", "Year")) print(total_percentages_2014) W_L Year freq 1 Loss Win winning_percentages_2014 <- total_percentages_2014$freq/ sum(total_percentages_2014$freq) print( winning_percentages_2014[2]) [1] 0.85 Now, I transformed the digits to actual percentage using % sign. To do that, I wrote a empty function and then added the winning percentage of both years. Then, Put 2013 and 2014 percentages together in matrix and used new labels. percent <- function(x, digits = 2, format = "f",...) { paste0(formatc(100 * x, format = format, digits = digits,...), "%") } percentage_2013 <- percent(winning_percentages_2013[2]) percentage_2014 <- percent(winning_percentages_2014[2]) matrix_2013_2014 <- matrix(c(percentage_2013, percentage_2014), nrow = 2, ncol = 1, byrow = TRUE) dimnames(matrix_2013_2014) = list( c("2013", "2014"), c("win - Loss Percent/ Year")) kable(matrix_2013_2014) % % Win - Loss Percent/ Year 4. His performances in 4 Grand Slams (Australian Open, French Open,
5 Wimbledon, US Open) 1st, I made each round from text to sequence of numbers. It was because I could better represent which round he made per grand slam in descending method, only if I had numeric columns. E.g. 1st,2nd,3nd,4th, quarter final,semi final, the final -> 1,2,3,4,5,6,7 grandslams_2013 <- rf_13[which(rf_13$tournament == "Australian Open" rf_13$tournament == "French Open" rf_13$tournament == "Wimbledon" rf_13$tournament == "US Open"), ] grandslams_2013$round_number <- c(1,2,3,4,5,6,1,2,3,4,5,1,2,1,2,3,4) grandslams_2014 <- rf_14[which(rf_14$tournament == "Australian Open" rf_14$tournament == "French Open" rf_14$tournament == "Wimbledon" rf_14$tournament == "US Open"), ] grandslams_2014$round_number <- c(1,2,3,4,5,6,1,2,3,4,1,2,3,4,5,6,7,1,2,3,4,5,6) grandslams_2013_loss <- grandslams_2013[which(grandslams_2013$loser == "Federer R."), ] grandslams_2014_loss <- grandslams_2014[which(grandslams_2014$loser == "Federer R."), ] Reordered the factor as the calendar year grand slam played in the sequence - Aussie Open -> French Open -> Wimbledon -> US Open, so that when projecting graphs, it would be according to this level I set up. library(ggplot2) Warning: package 'ggplot2' was built under R version grandslams_2013_loss$tournament <- factor(grandslams_2013_loss$tournament, levels = c("australian Open", "French Open", "Wimbledon", "US Open")) graph_2013 <- ggplot(grandslams_2013_loss, aes(x = Tournament, y = Round_Number, group = 1)) + geom_line(colour = "red", linetype = "longdash") + ggtitle("2013 Grand Slam Results") + labs( x = "Grand Slam", y = "Round Number") + geom_point(shape = 13,colour = "grey60", size = 2) + geom_text(aes(label = Round), vjust = 2, hjust = 0, nudge_y = 0.3, colour = "blue", check_overlap = FALSE) + theme(panel.background = element_rect(fill = "grey")) graph_2013 Create 2014 Grand Slam results.
6 grandslams_2014_loss$tournament <- factor(grandslams_2014_loss$tournament, levels = c("australian Open", "French Open", "Wimbledon", "US Open")) graph_2014 <- ggplot(grandslams_2014_loss, aes(x = Tournament, y = Round_Number, group = 1)) + geom_line(colour = "gray", linetype = "dotdash") + ggtitle("2014 Grand Slam Results") + labs( x = "Grand Slam", y = "Round Number") + geom_point(shape= 8, fill = "white", size = 2) + geom_text(aes(label = Round), vjust = 2, hjust = 0, nudge_y = 0.3, colour = "red", check_overlap = FALSE) + theme(panel.grid.minor = element_line(colour = "red", linetype = "dotted")) graph_2014 Lined them up together in the same plot because it might show different perspective. grandslams_combine_res <- rbind(grandslams_2013_loss, grandslams_2014_loss) graph_combine <- ggplot(data=grandslams_combine_res, aes(x=factor(tournament), y=round_number, group=year, colour=year)) + ggtitle("2013 vs 2014 Grand Slam Results") + xlab(" Grand Slam ") + ylab( "Round Number ") + geom_line(aes(linetype = Year)) + scale_linetype_manual(values=c("twodash", "dotted")) + scale_color_brewer(palette="paired") + theme_minimal() + geom_point(aes(shape=year)) + theme(legend.text = element_text(size = 10, colour = "red", angle = 10), legend.background = element_rect(colour = "black"), panel.background = element_rect(fill = "white")) graph_combine
7 Comments: Federer did extremely well in 3 out of 4 Grand Slams from 2013 to Which surface he performed better (Hard, Clay, Grass) and combine both years to do regression analysis First, I would add columns that represent each match played in each surface as 1, else 0. E.g. I created 3 columns to represent number of matches played in each surface. These 3 columns - hard, clay and grass would be independent variables. Second, depedent variable would be total number of sets he lost (column = Lsets) Third, my hypothesis: Total number of sets he lost had no relationship on 3 surfaces. rf_13_14$hard <- ifelse(rf_13_14$surface == "Hard", 1, 0) rf_13_14$clay <- ifelse(rf_13_14$surface == "Clay", 1, 0) rf_13_14$grass <- ifelse(rf_13_14$surface == "Grass", 1, 0) regression_l <- lm( Lsets ~ Hard + Clay + Grass, data = rf_13_14) summary(regression_l) Call: lm(formula = Lsets ~ Hard + Clay + Grass, data = rf_13_14) Residuals: Min 1Q Median 3Q Max Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(> t ) (Intercept) * Hard Clay Grass NA NA NA NA --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 139 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 2 and 139 DF, p-value: plot(regression_l)
8
9 From the regression mode, the variables were all 1s and 0s. Therefore, there were no variation in the plots as the graphs showed. In addition, they were one sided and clustered in space. Moreover, from the summary, it showed that p-value was too high; normally, in a good model, p-value would be below 0.5%. Also, R- squared was negative which hypothesis could be rejected since there were relationships in the model and variables were not randomly distributed. 6. Show his Total Wins Per Surface Per Year 2013 and 2014 Surfaces performance summary. To calculate total wins per surface, it would be better to add a numeric column for every win = 1, else 0, and calculate sum of each win/ surface
10 rf_13$number <- ifelse(rf_13$w_l == "Win", 1,0) surface_2013 <- data.frame(tapply(rf_13$number, rf_13$surface,sum)) surface_2013$year < colnames(surface_2013) <- c("total Wins", "Year") rf_14$number <- ifelse(rf_14$w_l == "Win", 1,0) surface_2014 <- data.frame(tapply(rf_14$number, rf_14$surface,sum)) surface_2014$year < colnames(surface_2014) <- c("total Wins", "Year") surface_performance <- rbind(surface_2013, surface_2014) surface_performance$surface <- c("clay", "Grass", "Hard") Surface results in barplot by year. surface_performance$year<- as.character(surface_performance$year) surface_combine <- ggplot(data = surface_performance, aes(x = Surface, y = `Total Wins`, group = Year, fi ll = Year)) + ggtitle("total Wins Per Surface") + theme(plot.title = element_text(lineheight=.8, face="bold")) + geom_bar( stat = 'identity', color = 'black', position="dodge") + geom_text(aes(label=`total Wins`), vjust=1.5, colour="white", position=position_dodge(.9), size=5) + scale_fill_manual(values= c("#d8b365", "#5ab4ac")) surface_combine Facets the barplot result to look at different angle surface_combine_facet <- ggplot(data = surface_performance, aes(x = Surface, y = `Total Wins`, group = Ye ar, fill = Year)) + geom_bar( stat = 'identity', color = 'red', position="dodge") + geom_text(aes(label=`total Wins`), vjust=1.5, colour="black", position=position_dodge(.9), size=5) + scale_fill_brewer(palette = "Spectral") surface_combine_facet + facet_grid(year ~.) + theme(legend.position="none", strip.text.x = element_text(size=8, angle=75), strip.text.y = element_text(size=12, face="bold"), strip.background = element_rect(colour="red", fill="#ccccff"))
11 Conclusion Roger Federer performed significantly better in terms of statistics including number of win-loss sets and finals he won and lost, win-loss percentage, Grand Slam results, and wins per year. Perhaps, change of racket and coach helped him to bring back his old self.
Model Selection Erwan Le Pennec Fall 2015
Model Selection Erwan Le Pennec Fall 2015 library("dplyr") library("ggplot2") library("ggfortify") library("reshape2") Model Selection We will now use another classical dataset birthwt which corresponds
More informationNavigate to the golf data folder and make it your working directory. Load the data by typing
Golf Analysis 1.1 Introduction In a round, golfers have a number of choices to make. For a particular shot, is it better to use the longest club available to try to reach the green, or would it be better
More informationEl Cerrito Sporting Goods Ira Sharenow January 7, 2019
El Cerrito Sporting Goods Ira Sharenow January 7, 2019 R Markdown The goal of the analysis is to determine if any of the salespersons are performing exceptionally well or exceptionally poorly. In particular,
More informationPitching Performance and Age
Pitching Performance and Age By: Jaime Craig, Avery Heilbron, Kasey Kirschner, Luke Rector, Will Kunin Introduction April 13, 2016 Many of the oldest players and players with the most longevity of the
More informationPitching Performance and Age
Pitching Performance and Age Jaime Craig, Avery Heilbron, Kasey Kirschner, Luke Rector and Will Kunin Introduction April 13, 2016 Many of the oldest and most long- term players of the game are pitchers.
More informationSTAT 625: 2000 Olympic Diving Exploration
Corey S Brier, Department of Statistics, Yale University 1 STAT 625: 2000 Olympic Diving Exploration Corey S Brier Yale University Abstract This document contains a preliminary investigation of data from
More informationMinimal influence of wind and tidal height on underwater noise in Haro Strait
Minimal influence of wind and tidal height on underwater noise in Haro Strait Introduction Scott Veirs, Beam Reach Val Veirs, Colorado College December 2, 2007 Assessing the effect of wind and currents
More informationPredictors for Winning in Men s Professional Tennis
Predictors for Winning in Men s Professional Tennis Abstract In this project, we use logistic regression, combined with AIC and BIC criteria, to find an optimal model in R for predicting the outcome of
More informationWeek 7 One-way ANOVA
Week 7 One-way ANOVA Objectives By the end of this lecture, you should be able to: Understand the shortcomings of comparing multiple means as pairs of hypotheses. Understand the steps of the ANOVA method
More informationCase Studies Homework 3
Case Studies Homework 3 Breanne Chryst September 11, 2013 1 In this assignment I did some exploratory analysis on a data set containing diving information from the 2000 Olympics. My code and output is
More informationHOW THE TENNIS COURT SURFACE AFFECTS PLAYER PERFORMANCE AND INJURIES. Tristan Barnett Swinburne University. Graham Pollard University of Canberra
HOW THE TENNIS COURT SURFACE AFFECTS PLAYER PERFORMANCE AND INJURIES Tristan Barnett Swinburne University Graham Pollard University of Canberra Introduction There are four major Grand Slam tennis events
More information1. The data below gives the eye colors of 20 students in a Statistics class. Make a frequency table for the data.
1. The data below gives the eye colors of 20 students in a Statistics class. Make a frequency table for the data. Green Blue Brown Blue Blue Brown Blue Blue Blue Green Blue Brown Blue Brown Brown Blue
More informationAnnouncements. Lecture 19: Inference for SLR & Transformations. Online quiz 7 - commonly missed questions
Announcements Announcements Lecture 19: Inference for SLR & Statistics 101 Mine Çetinkaya-Rundel April 3, 2012 HW 7 due Thursday. Correlation guessing game - ends on April 12 at noon. Winner will be announced
More informationA Shallow Dive into Deep Sea Data Sarah Solie and Arielle Fogel 7/18/2018
A Shallow Dive into Deep Sea Data Sarah Solie and Arielle Fogel 7/18/2018 Introduction The datasets This data expedition will utilize the World Ocean Atlas (WOA) database to explore two deep sea physical
More informationMeasuring Batting Performance
Measuring Batting Performance Authors: Samantha Attar, Hannah Dineen, Andy Fullerton, Nora Hanson, Cam Kelso, Katie McLaughlin, and Caitlyn Nolan Introduction: The following analysis compares slugging
More informationMidterm Exam 1, section 2. Thursday, September hour, 15 minutes
San Francisco State University Michael Bar ECON 312 Fall 2018 Midterm Exam 1, section 2 Thursday, September 27 1 hour, 15 minutes Name: Instructions 1. This is closed book, closed notes exam. 2. You can
More informationEmpirical Example II of Chapter 7
Empirical Example II of Chapter 7 1. We use NBA data. The description of variables is --- --- --- storage display value variable name type format label variable label marr byte %9.2f =1 if married wage
More informationSample Final Exam MAT 128/SOC 251, Spring 2018
Sample Final Exam MAT 128/SOC 251, Spring 2018 Name: Each question is worth 10 points. You are allowed one 8 1/2 x 11 sheet of paper with hand-written notes on both sides. 1. The CSV file citieshistpop.csv
More informationBackground Information. Project Instructions. Problem Statement. EXAM REVIEW PROJECT Microsoft Excel Review Baseball Hall of Fame Problem
Background Information Every year, the National Baseball Hall of Fame conducts an election to select new inductees from candidates nationally recognized for their talent or association with the sport of
More informationOpening up the court (surface) in tennis grand slams
Opening up the court (surface) in tennis grand slams Kayla Frisoli, Shannon Gallagher, and Amanda Luby Department of Statistics & Data Science Carnegie Mellon University CMSAC -- October 20, 2018 Tennis,
More informationMath SL Internal Assessment What is the relationship between free throw shooting percentage and 3 point shooting percentages?
Math SL Internal Assessment What is the relationship between free throw shooting percentage and 3 point shooting percentages? fts6 Introduction : Basketball is a sport where the players have to be adept
More informationASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era. Gary Evans Stat 201B Winter, 2010
ASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era by Gary Evans Stat 201B Winter, 2010 Introduction: After a playerʼs strike in 1994 which resulted
More informationHow the interpretation of match statistics affects player performance
How the interpretation of match statistics affects player performance Tristan Barnett 1. Introduction Commentators often refer to match statistics during a live match to make an assessment as to how the
More informationNCSS Statistical Software
Chapter 256 Introduction This procedure computes summary statistics and common non-parametric, single-sample runs tests for a series of n numeric, binary, or categorical data values. For numeric data,
More informationStats 2002: Probabilities for Wins and Losses of Online Gambling
Abstract: Jennifer Mateja Andrea Scisinger Lindsay Lacher Stats 2002: Probabilities for Wins and Losses of Online Gambling The objective of this experiment is to determine whether online gambling is a
More informationsave percentages? (Name) (University)
1 IB Maths Essay: What is the correlation between the height of football players and their save percentages? (Name) (University) Table of Contents Raw Data for Analysis...3 Table 1: Raw Data...3 Rationale
More informationRunning head: DATA ANALYSIS AND INTERPRETATION 1
Running head: DATA ANALYSIS AND INTERPRETATION 1 Data Analysis and Interpretation Final Project Vernon Tilly Jr. University of Central Oklahoma DATA ANALYSIS AND INTERPRETATION 2 Owners of the various
More informationSTAT Week 7. Advanced R Graphics and ggplot2. Advanced R Graphics. ggplot2. February 22, 2018
and and February 22, 2018 and NCAA Basketball data and We will use data from the NCAA basketball tournament from 2011-2016. url
More informationAthlete Development Criteria Athlete Development Scholarship Criteria
Athlete Development Scholarship Criteria Introduction The Athlete Development criteria outlines objective benchmarks for players considering a professional playing career that aspire to enter the National
More informationP L A Y S I G H T. C O M
PLAYSIGHT.COM WHAT IS A? A PlaySight SmartCourt is an interactive system composed of permanently installed high-performance HD cameras with an on-court kiosk, which is connected to the internet. This connection
More informationOrganizing Quantitative Data
Organizing Quantitative Data MATH 130, Elements of Statistics I J. Robert Buchanan Department of Mathematics Fall 2018 Objectives At the end of this lesson we will be able to: organize discrete data in
More informationBiostatistics & SAS programming
Biostatistics & SAS programming Kevin Zhang March 6, 2017 ANOVA 1 Two groups only Independent groups T test Comparison One subject belongs to only one groups and observed only once Thus the observations
More informationDATA SCIENCE SUMMER UNI VIENNA
Prerequisites - You have installed Tableau Desktop on your computer. Available here: http://www.tableau.com/academic/students - You have downloaded the data (athlete_events.csv) available here: https://www.kaggle.com/heesoo37/120-years-of-olympic-historyathletes-and-results
More informationReproducible Research: Peer Assessment 1
Introduction Reproducible Research: Peer Assessment 1 It is now possible to collect a large amount of data about personal movement using activity monitoring devices such as a Fitbit, Nike Fuelband, or
More informationDriv e accu racy. Green s in regul ation
LEARNING ACTIVITIES FOR PART II COMPILED Statistical and Measurement Concepts We are providing a database from selected characteristics of golfers on the PGA Tour. Data are for 3 of the players, based
More information5.1. Data Displays Batter Up. My Notes ACTIVITY
SUGGESTED LEARNING STRATEGIES: Activating Prior Knowledge, Marking the Text, Group Presentation, Interactive Word Wall Henry Hank Aaron and Harmon Killebrew are among the alltime leaders in home runs in
More informationReshaping data in R. Duncan Golicher. December 9, 2008
Reshaping data in R Duncan Golicher December 9, 2008 One of the most frustrating and time consuming parts of statistical analysis is shuffling data into a format for analysis. No one enjoys changing data
More informationTennis Coach Roger Federer Us Open 2012 Schedule
Tennis Coach Roger Federer Us Open 2012 Schedule Roger Federer Profile - Roger Federer bio, stats, and information at the 2014 US Open Tennis Championships. Some tennis players say they look deep into
More informationa) List and define all assumptions for multiple OLS regression. These are all listed in section 6.5
Prof. C. M. Dalton ECN 209A Spring 2015 Practice Problems (After HW1, HW2, before HW3) CORRECTED VERSION Question 1. Draw and describe a relationship with heteroskedastic errors. Support your claim with
More informationGrand Slam Tennis Computer Game (Version ) Table of Contents
Grand Slam Tennis Computer Game (Version 2010.3) Table of Contents 1. Introduction - What is the grand slam tennis program? 2 2. Options - What are the available playing options? 3 3. History - How has
More informationFundamentals of Machine Learning for Predictive Data Analytics
Fundamentals of Machine Learning for Predictive Data Analytics Appendix A Descriptive Statistics and Data Visualization for Machine learning John Kelleher and Brian Mac Namee and Aoife D Arcy john.d.kelleher@dit.ie
More informationName: Analyzing Graphs of Quadratic Functions 1. Use the graph at the right to fill in the blanks for each point. a) (, 24) represents Point. b) (12, ) represents Point. c) (, ) represents Point F. d)
More informationANOVA - Implementation.
ANOVA - Implementation http://www.pelagicos.net/classes_biometry_fa17.htm Doing an ANOVA With RCmdr Categorical Variable One-Way ANOVA Testing a single Factor dose with 3 treatments (low, mid, high) Doing
More informationQuality Assurance Charting for QC Data
Quality Assurance Charting for QC Data September 2018 Iowa s Environmental & Public Health Laboratory Copyright the State Hygienic Laboratory at the University of Iowa 2017. All rights reserved. Images
More informationThere are 3 sections to the home page shown in the first screenshot below. These are:
Welcome to pacecards! Pacecards is a unique service that places the likely pace in the race and the running style of each horse at the centre of horseracing form analysis. This user guide takes you through
More informationYear 10 Term 2 Homework
Yimin Math Centre Year 10 Term 2 Homework Student Name: Grade: Date: Score: Table of contents 6 Year 10 Term 2 Week 6 Homework 1 6.1 Data analysis and evaluation............................... 1 6.1.1
More informationdplyr & Functions stat 480 Heike Hofmann
dplyr & Functions stat 480 Heike Hofmann Outline dplyr functions and package Functions library(dplyr) data(baseball, package= plyr )) Your Turn Use data(baseball, package="plyr") to make the baseball dataset
More informationchildren were born in Italy while he played there had opened a door for him to play on the Italian Davis Cup team. He accepted, and officially joined
Martin Mulligan arrived in the Bay Area after an extensive career traveling the world playing tennis. The first non-italian to receive the Golden Racquet Award by the Italian Tennis Federation, Martin
More informationWas John Adams more consistent his Junior or Senior year of High School Wrestling?
Was John Adams more consistent his Junior or Senior year of High School Wrestling? An investigation into my Dad s high school Wrestling Career Amanda Adams Period 1 Statistical Reasoning in Sports December
More informationGizachew Tiruneh, Ph. D., Department of Political Science, University of Central Arkansas, Conway, Arkansas
Gizachew Tiruneh, Ph. D., Department of Political Science, University of Central Arkansas, Conway, Arkansas [A revised version of the paper is published by the Journal of Quantitative Analysis in Sports,
More informationProjecting Three-Point Percentages for the NBA Draft
Projecting Three-Point Percentages for the NBA Draft Hilary Sun hsun3@stanford.edu Jerold Yu jeroldyu@stanford.edu December 16, 2017 Roland Centeno rcenteno@stanford.edu 1 Introduction As NBA teams have
More informationLab 5: Descriptive Statistics
Page 1 Technical Math II Lab 5: Descriptive Stats Lab 5: Descriptive Statistics Purpose: To gain experience in the descriptive statistical analysis of a large (173 scores) data set. You should do most
More informationHow Effective is Change of Pace Bowling in Cricket?
How Effective is Change of Pace Bowling in Cricket? SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries.
More informationExploring Measures of Central Tendency (mean, median and mode) Exploring range as a measure of dispersion
Unit 5 Statistical Reasoning 1 5.1 Exploring Data Goals: Exploring Measures of Central Tendency (mean, median and mode) Exploring range as a measure of dispersion Data: A set of values. A set of data can
More informationCorrelation and regression using the Lahman database for baseball Michael Lopez, Skidmore College
Correlation and regression using the Lahman database for baseball Michael Lopez, Skidmore College Overview The Lahman package is a gold mine for statisticians interested in studying baseball. In today
More informationNFL Direction-Oriented Rushing O -Def Plus-Minus
NFL Direction-Oriented Rushing O -Def Plus-Minus ID: 6289 In football, rushing is an action of advancing the ball forward by running with it, instead of passing. Rush o ense refers to how well a team is
More informationLesson 14: Modeling Relationships with a Line
Exploratory Activity: Line of Best Fit Revisited 1. Use the link http://illuminations.nctm.org/activity.aspx?id=4186 to explore how the line of best fit changes depending on your data set. A. Enter any
More informationUnit 4: Inference for numerical variables Lecture 3: ANOVA
Unit 4: Inference for numerical variables Lecture 3: ANOVA Statistics 101 Thomas Leininger June 10, 2013 Announcements Announcements Proposals due tomorrow. Will be returned to you by Wednesday. You MUST
More informationNBA TEAM SYNERGY RESEARCH REPORT 1
NBA TEAM SYNERGY RESEARCH REPORT 1 NBA Team Synergy and Style of Play Analysis Karrie Lopshire, Michael Avendano, Amy Lee Wang University of California Los Angeles June 3, 2016 NBA TEAM SYNERGY RESEARCH
More informationSession 2: Introduction to Multilevel Modeling Using SPSS
Session 2: Introduction to Multilevel Modeling Using SPSS Exercise 1 Description of Data: exerc1 This is a dataset from Kasia Kordas s research. It is data collected on 457 children clustered in schools.
More informationBuilding an NFL performance metric
Building an NFL performance metric Seonghyun Paik (spaik1@stanford.edu) December 16, 2016 I. Introduction In current pro sports, many statistical methods are applied to evaluate player s performance and
More informationThe final set in a tennis match: four years at Wimbledon 1
Published as: Journal of Applied Statistics, Vol. 26, No. 4, 1999, 461-468. The final set in a tennis match: four years at Wimbledon 1 Jan R. Magnus, CentER, Tilburg University, the Netherlands and Franc
More informationAnnouncements. % College graduate vs. % Hispanic in LA. % College educated vs. % Hispanic in LA. Problem Set 10 Due Wednesday.
Announcements Announcements UNIT 7: MULTIPLE LINEAR REGRESSION LECTURE 1: INTRODUCTION TO MLR STATISTICS 101 Problem Set 10 Due Wednesday Nicole Dalzell June 15, 2015 Statistics 101 (Nicole Dalzell) U7
More informationBIOL 101L: Principles of Biology Laboratory
BIOL 101L: Principles of Biology Laboratory Sampling populations To understand how the world works, scientists collect, record, and analyze data. In this lab, you will learn concepts that pertain to these
More informationAge of Fans
Measures of Central Tendency SUGGESTED LEARNING STRATEGIES: Activating Prior Knowledge, Interactive Word Wall, Marking the Text, Summarize/Paraphrase/Retell, Think/Pair/Share Matthew is a student reporter
More informationMen s Best Shots Poll
Men s Best Shots Poll Stephanie Kovalchik is the Senior Data Scientist for the Game Insight Group of Tennis Australia and Vicoria University and the creator of the tennis stats blog On The T (onthe-t.com).
More informationAnnouncements. Unit 7: Multiple Linear Regression Lecture 3: Case Study. From last lab. Predicting income
Announcements Announcements Unit 7: Multiple Linear Regression Lecture 3: Case Study Statistics 101 Mine Çetinkaya-Rundel April 18, 2013 OH: Sunday: Virtual OH, 3-4pm - you ll receive an email invitation
More informationBoyle s Law. Pressure-Volume Relationship in Gases. Figure 1
Boyle s Law Pressure-Volume Relationship in Gases The primary objective of this experiment is to determine the relationship between the pressure and volume of a confined gas. The gas we use will be air,
More informationSTAT 155 Introductory Statistics. Lecture 2: Displaying Distributions with Graphs
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL STAT 155 Introductory Statistics Lecture 2: Displaying Distributions with Graphs 8/29/06 Lecture 2-1 1 Recall Statistics is the science of data. Collecting
More informationBASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG
BASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG GOAL OF PROJECT The goal is to predict the winners between college men s basketball teams competing in the 2018 (NCAA) s March
More informationThere are various possibilities of data outcomes
ORIGINAL ARTICLE TRENDS in Sport Sciences 2017; 4(24): 151-155 ISSN 2299-9590 DOI: 10.23829/TSS.2017.24.4-2 Comparison of game characteristics of male and female tennis players at grand-slam tournaments
More informationCENTER PIVOT EVALUATION AND DESIGN
CENTER PIVOT EVALUATION AND DESIGN Dale F. Heermann Agricultural Engineer USDA-ARS 2150 Centre Avenue, Building D, Suite 320 Fort Collins, CO 80526 Voice -970-492-7410 Fax - 970-492-7408 Email - dale.heermann@ars.usda.gov
More informationMTB 02 Intermediate Minitab
MTB 02 Intermediate Minitab This module will cover: Advanced graphing Changing data types Value Order Making similar graphs Zooming worksheet Brushing Multi-graphs: By variables Interactively upgrading
More informationy ) s x x )(y i (x i r = 1 n 1 s y Statistics Lecture 7 Exploring Data , y 2 ,y n (x 1 ),,(x n ),(x 2 ,y 1 How two variables vary together
Statistics 111 - Lecture 7 Exploring Data Numerical Summaries for Relationships between Variables Administrative Notes Homework 1 due in recitation: Friday, Feb. 5 Homework 2 now posted on course website:
More informationSTAT 155 Introductory Statistics. Lecture 2-2: Displaying Distributions with Graphs
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL STAT 155 Introductory Statistics Lecture 2-2: Displaying Distributions with Graphs 8/31/06 Lecture 2-2 1 Recall Data: Individuals Variables Categorical variables
More informationGender Differences in Performance in Competitive Environments: Field Evidence from Professional Tennis Players *
Gender Differences in Performance in Competitive Environments: Field Evidence from Professional Tennis Players * M. Daniele Paserman Hebrew University dpaserma@shum.huji.ac.il August 2006 Abstract This
More informationPsychology - Mr. Callaway/Mundy s Mill HS Unit Research Methods - Statistics
Psychology - Mr. Callaway/Mundy s Mill HS Unit 2.3 - Research Methods - Statistics How do psychologists ask & answer questions? Last time we asked that we were discussing Research Methods. This time we
More informationImproving the Australian Open Extreme Heat Policy. Tristan Barnett
Improving the Australian Open Extreme Heat Policy Tristan Barnett Introduction One of the characteristics of tennis is that you do not know when the match is going to finish, and a long match is a real
More informationStandardized CPUE of Indian Albacore caught by Taiwanese longliners from 1980 to 2014 with simultaneous nominal CPUE portion from observer data
Received: 4 July 2016 Standardized CPUE of Indian Albacore caught by Taiwanese longliners from 1980 to 2014 with simultaneous nominal CPUE portion from observer data Yin Chang1, Liang-Kang Lee2 and Shean-Ya
More information8 th grade. Name Date Block
Name Date Block The Plot & the Pendulum Lab A pendulum is any mass that swings back and forth on a rope, string, or chain. Pendulums can be found in old clocks and other machinery. A playground swing is
More informationISDS 4141 Sample Data Mining Work. Tool Used: SAS Enterprise Guide
ISDS 4141 Sample Data Mining Work Taylor C. Veillon Tool Used: SAS Enterprise Guide You may have seen the movie, Moneyball, about the Oakland A s baseball team and general manager, Billy Beane, who focused
More informationAEGON CHAMPIONSHIPS: DAY 6 MEDIA NOTES Saturday, June 20, 2015
AEGON CHAMPIONSHIPS: DAY 6 MEDIA NOTES Saturday, June 20, 2015 Queen s Club, London, Great Britain Jun 15 21, 2015 Draw: S-32, D-16 Prize Money: 1,574,640 Surface: Grass ATP Info: Tournament Info: ATP
More informationMarch Madness Basketball Tournament
March Madness Basketball Tournament Math Project COMMON Core Aligned Decimals, Fractions, Percents, Probability, Rates, Algebra, Word Problems, and more! To Use: -Print out all the worksheets. -Introduce
More informationPOTENTIAL ENERGY BOUNCE BALL LAB
Energy cannot be created or destroyed. Stored energy is called potential energy, and the energy of motion is called kinetic energy. Potential energy changes as the height of an object changes due to gravity;
More informationStat 139 Homework 3 Solutions, Spring 2015
Stat 39 Homework 3 Solutions, Spring 05 Problem. Let i Nµ, σ ) for i,..., n, and j Nµ, σ ) for j,..., n. Also, assume that all observations are independent from each other. In Unit 4, we learned that the
More informationHellgate 100k Race Analysis February 12, 2015
Hellgate 100k Race Analysis brockwebb45@gmail.com February 12, 2015 Synopsis The Hellgate 100k is a tough, but rewarding race directed by Dr. David Horton. Taking place around the second week of December
More informationA few things to remember about ANOVA
A few things to remember about ANOVA 1) The F-test that is performed is always 1-tailed. This is because your alternative hypothesis is always that the between group variation is greater than the within
More informationFirst Server Advantage in Tennis. Michelle Okereke
First Server Advantage in Tennis Michelle Okereke Overview! Background! Research Question! Methodology! Results! Conclusion Background! Scoring! Advantage Set: First player to win 6 games by a margin of
More informationFactorial Analysis of Variance
Factorial Analysis of Variance Overview of the Factorial ANOVA Factorial ANOVA (Two-Way) In the context of ANOVA, an independent variable (or a quasiindependent variable) is called a factor, and research
More informationReal-Time Electricity Pricing
Real-Time Electricity Pricing Xi Chen, Jonathan Hosking and Soumyadip Ghosh IBM Watson Research Center / Northwestern University Yorktown Heights, NY, USA X. Chen, J. Hosking & S. Ghosh (IBM) Real-Time
More informationThe Effect of Pressure on Mixed-Strategy Play in Tennis: The Effect of Court Surface on Service Decisions
International Journal of Business and Social Science Vol. 3 No. 20 [Special Issue October 2012] The Effect of Pressure on Mixed-Strategy Play in Tennis: The Effect of Court Surface on Service Decisions
More informationALL-SPANISH QUARTER-FINAL CLASH HEADLINES BASTAD ACTION
Page 1 of 5 SKISTAR SWEDISH OPEN: DAY 5 MEDIA NOTES Friday, July 21, 2017 Bastad Tennis Stadium, Bastad, Sweden July 17 23, 2017 Draw: S-28, D-16 Prize Money: 482,060 Surface: Clay ATP World Tour Info
More informationAnalysis of Variance. Copyright 2014 Pearson Education, Inc.
Analysis of Variance 12-1 Learning Outcomes Outcome 1. Understand the basic logic of analysis of variance. Outcome 2. Perform a hypothesis test for a single-factor design using analysis of variance manually
More informationPREPARED BY: Marshall K. Cheung, Ph.D., Laboratory Director. REVISED BY: Marshall K. Cheung, Ph.D., Laboratory Director
DOCUMENT TYPE: DOCUMENT CLASS: Standard Operating Procedure Physical Property Procedure TITLE: Conductivity, EPA 120.1 INSTRUMENTATON: HACH CO150 Conductivity Meter PREPARED BY: Marshall K. Cheung, Ph.D.,
More informationModelling Exposure at Default Without Conversion Factors for Revolving Facilities
Modelling Exposure at Default Without Conversion Factors for Revolving Facilities Mark Thackham Credit Scoring and Credit Control XV, Edinburgh, August 2017 1 / 27 Objective The objective of this presentation
More informationPredicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007
Predicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007 Group 6 Charles Gallagher Brian Gilbert Neelay Mehta Chao Rao Executive Summary Background When a runner is on-base
More informationPreparation for Salinity Control ME 121
Preparation for Salinity Control ME 121 This document describes a set of measurements and analyses that will help you to write an Arduino program to control the salinity of water in your fish tank. The
More informationStatistical Analysis Project - How do you decide Who s the Best?
Statistical Analysis Project - How do you decide Who s the Best? In order to choose the best shot put thrower to go to IASAS, the three candidates were asked to throw the shot put for a total of times
More informationEvaluating The Best. Exploring the Relationship between Tom Brady s True and Observed Talent
Evaluating The Best Exploring the Relationship between Tom Brady s True and Observed Talent Heather Glenny, Emily Clancy, and Alex Monahan MCS 100: Mathematics of Sports Spring 2016 Tom Brady s recently
More informationESP 178 Applied Research Methods. 2/26/16 Class Exercise: Quantitative Analysis
ESP 178 Applied Research Methods 2/26/16 Class Exercise: Quantitative Analysis Introduction: In summer 2006, my student Ted Buehler and I conducted a survey of residents in Davis and five other cities.
More information