Statistical Analyses on Roger Federer s Performances in 2013 and 2014 James Kong,

Similar documents
Model Selection Erwan Le Pennec Fall 2015

Navigate to the golf data folder and make it your working directory. Load the data by typing

El Cerrito Sporting Goods Ira Sharenow January 7, 2019

Pitching Performance and Age

Pitching Performance and Age

STAT 625: 2000 Olympic Diving Exploration

Minimal influence of wind and tidal height on underwater noise in Haro Strait

Predictors for Winning in Men s Professional Tennis

Week 7 One-way ANOVA

Case Studies Homework 3

HOW THE TENNIS COURT SURFACE AFFECTS PLAYER PERFORMANCE AND INJURIES. Tristan Barnett Swinburne University. Graham Pollard University of Canberra

1. The data below gives the eye colors of 20 students in a Statistics class. Make a frequency table for the data.

Announcements. Lecture 19: Inference for SLR & Transformations. Online quiz 7 - commonly missed questions

A Shallow Dive into Deep Sea Data Sarah Solie and Arielle Fogel 7/18/2018

Measuring Batting Performance

Midterm Exam 1, section 2. Thursday, September hour, 15 minutes

Empirical Example II of Chapter 7

Sample Final Exam MAT 128/SOC 251, Spring 2018

Background Information. Project Instructions. Problem Statement. EXAM REVIEW PROJECT Microsoft Excel Review Baseball Hall of Fame Problem

Opening up the court (surface) in tennis grand slams

Math SL Internal Assessment What is the relationship between free throw shooting percentage and 3 point shooting percentages?

ASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era. Gary Evans Stat 201B Winter, 2010

How the interpretation of match statistics affects player performance

NCSS Statistical Software

Stats 2002: Probabilities for Wins and Losses of Online Gambling

save percentages? (Name) (University)

Running head: DATA ANALYSIS AND INTERPRETATION 1

STAT Week 7. Advanced R Graphics and ggplot2. Advanced R Graphics. ggplot2. February 22, 2018

Athlete Development Criteria Athlete Development Scholarship Criteria

P L A Y S I G H T. C O M

Organizing Quantitative Data

Biostatistics & SAS programming

DATA SCIENCE SUMMER UNI VIENNA

Reproducible Research: Peer Assessment 1

Driv e accu racy. Green s in regul ation

5.1. Data Displays Batter Up. My Notes ACTIVITY

Reshaping data in R. Duncan Golicher. December 9, 2008

Tennis Coach Roger Federer Us Open 2012 Schedule

a) List and define all assumptions for multiple OLS regression. These are all listed in section 6.5

Grand Slam Tennis Computer Game (Version ) Table of Contents

Fundamentals of Machine Learning for Predictive Data Analytics


ANOVA - Implementation.

Quality Assurance Charting for QC Data

There are 3 sections to the home page shown in the first screenshot below. These are:

Year 10 Term 2 Homework

dplyr & Functions stat 480 Heike Hofmann

children were born in Italy while he played there had opened a door for him to play on the Italian Davis Cup team. He accepted, and officially joined

Was John Adams more consistent his Junior or Senior year of High School Wrestling?

Gizachew Tiruneh, Ph. D., Department of Political Science, University of Central Arkansas, Conway, Arkansas

Projecting Three-Point Percentages for the NBA Draft

Lab 5: Descriptive Statistics

How Effective is Change of Pace Bowling in Cricket?

Exploring Measures of Central Tendency (mean, median and mode) Exploring range as a measure of dispersion

Correlation and regression using the Lahman database for baseball Michael Lopez, Skidmore College

NFL Direction-Oriented Rushing O -Def Plus-Minus

Lesson 14: Modeling Relationships with a Line

Unit 4: Inference for numerical variables Lecture 3: ANOVA

NBA TEAM SYNERGY RESEARCH REPORT 1

Session 2: Introduction to Multilevel Modeling Using SPSS

Building an NFL performance metric

The final set in a tennis match: four years at Wimbledon 1

Announcements. % College graduate vs. % Hispanic in LA. % College educated vs. % Hispanic in LA. Problem Set 10 Due Wednesday.

BIOL 101L: Principles of Biology Laboratory

Age of Fans

Men s Best Shots Poll

Announcements. Unit 7: Multiple Linear Regression Lecture 3: Case Study. From last lab. Predicting income

Boyle s Law. Pressure-Volume Relationship in Gases. Figure 1

STAT 155 Introductory Statistics. Lecture 2: Displaying Distributions with Graphs

BASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG

There are various possibilities of data outcomes

CENTER PIVOT EVALUATION AND DESIGN

MTB 02 Intermediate Minitab

y ) s x x )(y i (x i r = 1 n 1 s y Statistics Lecture 7 Exploring Data , y 2 ,y n (x 1 ),,(x n ),(x 2 ,y 1 How two variables vary together

STAT 155 Introductory Statistics. Lecture 2-2: Displaying Distributions with Graphs

Gender Differences in Performance in Competitive Environments: Field Evidence from Professional Tennis Players *

Psychology - Mr. Callaway/Mundy s Mill HS Unit Research Methods - Statistics

Improving the Australian Open Extreme Heat Policy. Tristan Barnett

Standardized CPUE of Indian Albacore caught by Taiwanese longliners from 1980 to 2014 with simultaneous nominal CPUE portion from observer data

8 th grade. Name Date Block

ISDS 4141 Sample Data Mining Work. Tool Used: SAS Enterprise Guide

AEGON CHAMPIONSHIPS: DAY 6 MEDIA NOTES Saturday, June 20, 2015

March Madness Basketball Tournament

POTENTIAL ENERGY BOUNCE BALL LAB

Stat 139 Homework 3 Solutions, Spring 2015

Hellgate 100k Race Analysis February 12, 2015

A few things to remember about ANOVA

First Server Advantage in Tennis. Michelle Okereke

Factorial Analysis of Variance

Real-Time Electricity Pricing

The Effect of Pressure on Mixed-Strategy Play in Tennis: The Effect of Court Surface on Service Decisions

ALL-SPANISH QUARTER-FINAL CLASH HEADLINES BASTAD ACTION

Analysis of Variance. Copyright 2014 Pearson Education, Inc.

PREPARED BY: Marshall K. Cheung, Ph.D., Laboratory Director. REVISED BY: Marshall K. Cheung, Ph.D., Laboratory Director

Modelling Exposure at Default Without Conversion Factors for Revolving Facilities

Predicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007

Preparation for Salinity Control ME 121

Statistical Analysis Project - How do you decide Who s the Best?

Evaluating The Best. Exploring the Relationship between Tom Brady s True and Observed Talent

ESP 178 Applied Research Methods. 2/26/16 Class Exercise: Quantitative Analysis

Transcription:

Statistical Analyses on Roger Federer s Performances in 2013 and 2014 James Kong, kong.james@berkeley.edu Introduction Tennis has become a global sport and I am a big fan of tennis. Since I played college tennis for Cal, and I would like to do statistical analyses to my favorite player of all time - Roger Federer, about his performances in 2013 and 2014.Here were why I specifically choose these two years: In 2013: A. He had a back injury throughout his career, but in 2013, it was his lowest point of his career. B. He hadn t reached a grand slam final for the first time in 10 years (2003-2012), and he had the lowest win/loss record in his entire career. In 2014: A. He changed his coach from Paul Annacone to one of the tennis legends - Stefan Edberg. B. He changed his racket for the first time in his career. One year may not be long or short, but it could turn things upside down for Federer. I m going to examine all of his match records from 2013 and 2014, excluding Davis Cup which is an individual represent his country to participate a team event that battle against other countries. It occurs every year, but not every professional play Davis Cup each year, so I would leave Davis Cup out for the analysis. I was going to use various methods to analyze his performances so it would be fitting to show his worst and one of his good years after a short period of time. The data I got was from Datahub, which compiled men s and women s tennis results from 2000 to 2016. Website: https://datahub.io/dataset/atp-wta-professional-tennis-tournament-data. Methods: I wanted to achieve a couple things for 2013 and 2014 results. Objectives were: 1. How many sets he had won and lost 2. How many tournaments he had won and how many tournaments he had lost in the finals 3. His winning percentage Per Year 4. His performances in the 4 Grand Slams (Australian Open, French Open, Wimbledon, US Open) Per Year 5. Which surface he performed better (Hard, Clay, Grass) 6. Show his Total Wins Per Surface Per Year Imported datasets for 2013 and 2014. Also, I filtered which I would take some columns that are useful to do analysis - tournament, date, surface, round, best of, winner, loser, Wsets, Lsets. results_13 <- read.csv("/users/kingkong1/desktop/gis Work/R/Roger Federer's Analysis/Final Project/full_r esults_2013.csv", header = TRUE) results_14 <- read.csv("/users/kingkong1/desktop/gis Work/R/Roger Federer's Analysis/Final Project/full_r esults_2014.csv", header = TRUE) subset_13 <- results_13[, c(3, 4, 6, 7, 8, 9,10, 11, 26, 27) ] subset_14 <- results_14[, c(3, 4, 6, 7, 8, 9,10, 11, 26, 27) ] Filtered results by combining wins and losses. The column headings were as following from two datasets. rf_13 <- subset_13[which(subset_13$winner == 'Federer R.' subset_13$loser == 'Federer R.'), ] rf_14 <- subset_14[which(subset_14$winner == 'Federer R.' subset_14$loser == 'Federer R.'), ] head(rf_13)

Tournament Date Court Surface Round Best.of 183 Australian Open 1/15/13 Outdoor Hard 1st Round 5 229 Australian Open 1/17/13 Outdoor Hard 2nd Round 5 246 Australian Open 1/19/13 Outdoor Hard 3rd Round 5 255 Australian Open 1/21/13 Outdoor Hard 4th Round 5 259 Australian Open 1/23/13 Outdoor Hard Quarterfinals 5 261 Australian Open 1/25/13 Outdoor Hard Semifinals 5 Winner Loser Wsets Lsets 183 Federer R. Paire B. 3 0 229 Federer R. Davydenko N. 3 0 246 Federer R. Tomic B. 3 0 255 Federer R. Raonic M. 3 0 259 Federer R. Tsonga J.W. 3 2 261 Murray A. Federer R. 3 2 head(rf_14) Tournament Date Court Surface Round Best.of 4 Brisbane International 1/1/14 Outdoor Hard 2nd Round 3 12 Brisbane International 1/3/14 Outdoor Hard Quarterfinals 3 14 Brisbane International 1/4/14 Outdoor Hard Semifinals 3 15 Brisbane International 1/5/14 Outdoor Hard The Final 3 171 Australian Open 1/14/14 Outdoor Hard 1st Round 5 213 Australian Open 1/16/14 Outdoor Hard 2nd Round 5 Winner Loser Wsets Lsets 4 Federer R. Nieminen J. 2 0 12 Federer R. Matosevic M. 2 0 14 Federer R. Chardy J. 2 1 15 Hewitt L. Federer R. 2 1 171 Federer R. Duckworth J. 3 0 213 Federer R. Kavcic B. 3 0 Noticed that both datasets columns headings were identical; I was going to do a merge between tables. However, I couldn t merge tables because I didn t have unique IDs to join two tables. In addition, if I had had joined two tables, my data would be messed up because it would use a column from one table to join to the others, which would comprise results and changed the data structure. Therefore, I would use append because number of columns and column headings were identical. Also, I will show the strings of the append; replace NA values to 0 on the sets he won and lost. rf_13_14 <- rbind(rf_13,rf_14) str(rf_13_14) 'data.frame': 142 obs. of 10 variables: $ Tournament: Factor w/ 72 levels "Abierto Mexicano",..: 7 7 7 7 7 7 2 2 2 22... $ Date : Factor w/ 560 levels "1/1/13","1/10/13",..: 6 8 10 13 15 17 74 75 76 86... $ Court : Factor w/ 2 levels "Indoor","Outdoor": 2 2 2 2 2 2 1 1 1 2... $ Surface : Factor w/ 3 levels "Clay","Grass",..: 3 3 3 3 3 3 3 3 3 3... $ Round : Factor w/ 8 levels "1st Round","2nd Round",..: 1 2 3 4 5 7 1 2 5 1... $ Best.of : int 5 5 5 5 5 5 3 3 3 3... $ Winner : Factor w/ 244 levels "Almagro N.","Alund M.",..: 54 54 54 54 54 131 54 54 15 54... $ Loser : Factor w/ 388 levels "Aguilar J.","Ali Mutawa J.M.",..: 211 60 272 229 277 85 301 61 85 13 0... $ Wsets : int 3 3 3 3 3 3 2 2 2 2... $ Lsets : int 0 0 0 0 2 2 0 0 0 1... rf_13_14$wsets[is.na(rf_13_14$wsets)] <- 0 rf_13_14$lsets[is.na(rf_13_14$lsets)] <- 0 Changed the columns type - Date and added a new column Year to categorize all matches that Federer played in 2013 or 2014. rf_13_14$date <- as.date(rf_13_14$date, format = "%m/%d/%y") rf_13_14$year <- format(rf_13_14$date, format = "%Y") Comments: Later, when I did sum, subract or other statistical functions, I wouldn t get errors saying in my columns that had NA values or can t sort by dates/year.

1. How many sets he won and lost library(plyr) rf_win_loss <- ddply(rf_13_14, ~Year, summarise, Sets_won = sum(wsets), Sets_Loss = sum(lsets)) print(rf_win_loss) Year Sets_won Sets_Loss 1 2013 141 24 2 2014 179 32 2. How many tournaments he had won and had lost in the final by year. I had to subset wins and losses in the final, so I used round = the final & winner = federer to specify my subset. It was because that was the only way to know if he had won a tournament or not, and vice versa. After that, I bind two datasets to see his overall performances in finals whether wins or losses, and then added a column with condition statement, if winner = Federer R, print Win, else Loss. rf_tournaments_won <- rf_13_14[which(rf_13_14$round == "The Final" & rf_13_14$winner == "Federer R."), ] rf_tournaments_lost <- rf_13_14[which(rf_13_14$round == "The Final" & rf_13_14$loser == "Federer R."), ] rf_tournament_record <- rbind(rf_tournaments_won, rf_tournaments_lost) rf_tournament_record$w_l <- ifelse(rf_tournament_record$winner == "Federer R.", "Win", "Loss") print(head(rf_tournament_record)) Tournament Date Court Surface 1355 Gerry Weber Open 2013-06-16 Outdoor Grass 575 Dubai Tennis Championships 2014-03-01 Outdoor Hard 1347 Gerry Weber Open 2014-06-15 Outdoor Grass 2011 Western & Southern Financial Group Masters 2014-08-17 Outdoor Hard 2383 Shanghai Masters 2014-10-12 Outdoor Hard 2495 Swiss Indoors 2014-10-26 Indoor Hard Round Best.of Winner Loser Wsets Lsets Year W_L 1355 The Final 3 Federer R. Youzhny M. 2 0 2013 Win 575 The Final 3 Federer R. Berdych T. 2 1 2014 Win 1347 The Final 3 Federer R. Falla A. 2 0 2014 Win 2011 The Final 3 Federer R. Ferrer D. 2 1 2014 Win 2383 The Final 3 Federer R. Simon G. 2 0 2014 Win 2495 The Final 3 Federer R. Goffin D. 2 0 2014 Win Separated 2013 and 2014 records and found out W-L record by calculating the frequency. library(knitr) Warning: package 'knitr' was built under R version 3.3.2 rf_tournament_record$date <- as.date(rf_tournament_record$date, format = "%m/%d/%y") rf_tournament_record$year <- format(rf_tournament_record$date, format = "%Y") final_records_2013 <- rf_tournament_record[rf_tournament_record$year == 2013, ] final_records_2014 <- rf_tournament_record[rf_tournament_record$year == 2014, ] freq_2014 <- count(final_records_2014, c("w_l", "Year")) freq_2013 <- count(final_records_2013, c("w_l", "Year")) c <- cbind(freq_2013,freq_2014) W_L <- c[order(c$freq ),] colnames(w_l)[c(1,4)] <- "Number of Finals" colnames(w_l)[c(3,6)] <- "Total" kable(w_l) Number of Finals Year Total Number of Finals Year Total 2 Win 2013 1 Win 2014 5 1 Loss 2013 2 Loss 2014 6 Comments: From the table above, there were significant changes between 2013 and 2014 s win-loss record. Even though 2014 had more losses in the final, he still won more finals compared to 2013.

3. His winning percentage from 2013 to 2014. Added Year and W_L columns, and organize them by categories to both years. rf_13$date <- as.date(rf_13$date, format = "%m/%d/%y") rf_13$year <- format(rf_13$date, format = "%Y") rf_13$w_l <- ifelse(rf_13$winner == "Federer R.", "Win", "Loss") rf_14$date <- as.date(rf_14$date, format = "%m/%d/%y") rf_14$year <- format(rf_14$date, format = "%Y") rf_14$w_l <- ifelse(rf_14$winner == "Federer R.", "Win", "Loss") 2013 and 2014 winning percentage total_percentages_2013 <- count(rf_13, c("w_l", "Year")) print(total_percentages_2013) W_L Year freq 1 Loss 2013 17 2 Win 2013 45 winning_percentages_2013 <- total_percentages_2013$freq/ sum(total_percentages_2013$freq) print( winning_percentages_2013[2]) [1] 0.7258065 total_percentages_2014 <- count(rf_14, c("w_l", "Year")) print(total_percentages_2014) W_L Year freq 1 Loss 2014 12 2 Win 2014 68 winning_percentages_2014 <- total_percentages_2014$freq/ sum(total_percentages_2014$freq) print( winning_percentages_2014[2]) [1] 0.85 Now, I transformed the digits to actual percentage using % sign. To do that, I wrote a empty function and then added the winning percentage of both years. Then, Put 2013 and 2014 percentages together in matrix and used new labels. percent <- function(x, digits = 2, format = "f",...) { paste0(formatc(100 * x, format = format, digits = digits,...), "%") } percentage_2013 <- percent(winning_percentages_2013[2]) percentage_2014 <- percent(winning_percentages_2014[2]) matrix_2013_2014 <- matrix(c(percentage_2013, percentage_2014), nrow = 2, ncol = 1, byrow = TRUE) dimnames(matrix_2013_2014) = list( c("2013", "2014"), c("win - Loss Percent/ Year")) kable(matrix_2013_2014) 2013 72.58% 2014 85.00% Win - Loss Percent/ Year 4. His performances in 4 Grand Slams (Australian Open, French Open,

Wimbledon, US Open) 1st, I made each round from text to sequence of numbers. It was because I could better represent which round he made per grand slam in descending method, only if I had numeric columns. E.g. 1st,2nd,3nd,4th, quarter final,semi final, the final -> 1,2,3,4,5,6,7 grandslams_2013 <- rf_13[which(rf_13$tournament == "Australian Open" rf_13$tournament == "French Open" rf_13$tournament == "Wimbledon" rf_13$tournament == "US Open"), ] grandslams_2013$round_number <- c(1,2,3,4,5,6,1,2,3,4,5,1,2,1,2,3,4) grandslams_2014 <- rf_14[which(rf_14$tournament == "Australian Open" rf_14$tournament == "French Open" rf_14$tournament == "Wimbledon" rf_14$tournament == "US Open"), ] grandslams_2014$round_number <- c(1,2,3,4,5,6,1,2,3,4,1,2,3,4,5,6,7,1,2,3,4,5,6) grandslams_2013_loss <- grandslams_2013[which(grandslams_2013$loser == "Federer R."), ] grandslams_2014_loss <- grandslams_2014[which(grandslams_2014$loser == "Federer R."), ] Reordered the factor as the calendar year grand slam played in the sequence - Aussie Open -> French Open -> Wimbledon -> US Open, so that when projecting graphs, it would be according to this level I set up. library(ggplot2) Warning: package 'ggplot2' was built under R version 3.3.2 grandslams_2013_loss$tournament <- factor(grandslams_2013_loss$tournament, levels = c("australian Open", "French Open", "Wimbledon", "US Open")) graph_2013 <- ggplot(grandslams_2013_loss, aes(x = Tournament, y = Round_Number, group = 1)) + geom_line(colour = "red", linetype = "longdash") + ggtitle("2013 Grand Slam Results") + labs( x = "Grand Slam", y = "Round Number") + geom_point(shape = 13,colour = "grey60", size = 2) + geom_text(aes(label = Round), vjust = 2, hjust = 0, nudge_y = 0.3, colour = "blue", check_overlap = FALSE) + theme(panel.background = element_rect(fill = "grey")) graph_2013 Create 2014 Grand Slam results.

grandslams_2014_loss$tournament <- factor(grandslams_2014_loss$tournament, levels = c("australian Open", "French Open", "Wimbledon", "US Open")) graph_2014 <- ggplot(grandslams_2014_loss, aes(x = Tournament, y = Round_Number, group = 1)) + geom_line(colour = "gray", linetype = "dotdash") + ggtitle("2014 Grand Slam Results") + labs( x = "Grand Slam", y = "Round Number") + geom_point(shape= 8, fill = "white", size = 2) + geom_text(aes(label = Round), vjust = 2, hjust = 0, nudge_y = 0.3, colour = "red", check_overlap = FALSE) + theme(panel.grid.minor = element_line(colour = "red", linetype = "dotted")) graph_2014 Lined them up together in the same plot because it might show different perspective. grandslams_combine_res <- rbind(grandslams_2013_loss, grandslams_2014_loss) graph_combine <- ggplot(data=grandslams_combine_res, aes(x=factor(tournament), y=round_number, group=year, colour=year)) + ggtitle("2013 vs 2014 Grand Slam Results") + xlab(" Grand Slam ") + ylab( "Round Number ") + geom_line(aes(linetype = Year)) + scale_linetype_manual(values=c("twodash", "dotted")) + scale_color_brewer(palette="paired") + theme_minimal() + geom_point(aes(shape=year)) + theme(legend.text = element_text(size = 10, colour = "red", angle = 10), legend.background = element_rect(colour = "black"), panel.background = element_rect(fill = "white")) graph_combine

Comments: Federer did extremely well in 3 out of 4 Grand Slams from 2013 to 2014. 5. Which surface he performed better (Hard, Clay, Grass) and combine both years to do regression analysis First, I would add columns that represent each match played in each surface as 1, else 0. E.g. I created 3 columns to represent number of matches played in each surface. These 3 columns - hard, clay and grass would be independent variables. Second, depedent variable would be total number of sets he lost (column = Lsets) Third, my hypothesis: Total number of sets he lost had no relationship on 3 surfaces. rf_13_14$hard <- ifelse(rf_13_14$surface == "Hard", 1, 0) rf_13_14$clay <- ifelse(rf_13_14$surface == "Clay", 1, 0) rf_13_14$grass <- ifelse(rf_13_14$surface == "Grass", 1, 0) regression_l <- lm( Lsets ~ Hard + Clay + Grass, data = rf_13_14) summary(regression_l) Call: lm(formula = Lsets ~ Hard + Clay + Grass, data = rf_13_14) Residuals: Min 1Q Median 3Q Max -0.4074-0.3980-0.3980 0.6020 1.6471 Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(> t ) (Intercept) 0.35294 0.13934 2.533 0.0124 * Hard 0.04502 0.15094 0.298 0.7660 Clay 0.05447 0.17788 0.306 0.7599 Grass NA NA NA NA --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.5745 on 139 degrees of freedom Multiple R-squared: 0.0007629, Adjusted R-squared: -0.01361 F-statistic: 0.05306 on 2 and 139 DF, p-value: 0.9483 plot(regression_l)

From the regression mode, the variables were all 1s and 0s. Therefore, there were no variation in the plots as the graphs showed. In addition, they were one sided and clustered in space. Moreover, from the summary, it showed that p-value was too high; normally, in a good model, p-value would be below 0.5%. Also, R- squared was negative which hypothesis could be rejected since there were relationships in the model and variables were not randomly distributed. 6. Show his Total Wins Per Surface Per Year 2013 and 2014 Surfaces performance summary. To calculate total wins per surface, it would be better to add a numeric column for every win = 1, else 0, and calculate sum of each win/ surface

rf_13$number <- ifelse(rf_13$w_l == "Win", 1,0) surface_2013 <- data.frame(tapply(rf_13$number, rf_13$surface,sum)) surface_2013$year <- 2013 colnames(surface_2013) <- c("total Wins", "Year") rf_14$number <- ifelse(rf_14$w_l == "Win", 1,0) surface_2014 <- data.frame(tapply(rf_14$number, rf_14$surface,sum)) surface_2014$year <- 2014 colnames(surface_2014) <- c("total Wins", "Year") surface_performance <- rbind(surface_2013, surface_2014) surface_performance$surface <- c("clay", "Grass", "Hard") Surface results in barplot by year. surface_performance$year<- as.character(surface_performance$year) surface_combine <- ggplot(data = surface_performance, aes(x = Surface, y = `Total Wins`, group = Year, fi ll = Year)) + ggtitle("total Wins Per Surface") + theme(plot.title = element_text(lineheight=.8, face="bold")) + geom_bar( stat = 'identity', color = 'black', position="dodge") + geom_text(aes(label=`total Wins`), vjust=1.5, colour="white", position=position_dodge(.9), size=5) + scale_fill_manual(values= c("#d8b365", "#5ab4ac")) surface_combine Facets the barplot result to look at different angle surface_combine_facet <- ggplot(data = surface_performance, aes(x = Surface, y = `Total Wins`, group = Ye ar, fill = Year)) + geom_bar( stat = 'identity', color = 'red', position="dodge") + geom_text(aes(label=`total Wins`), vjust=1.5, colour="black", position=position_dodge(.9), size=5) + scale_fill_brewer(palette = "Spectral") surface_combine_facet + facet_grid(year ~.) + theme(legend.position="none", strip.text.x = element_text(size=8, angle=75), strip.text.y = element_text(size=12, face="bold"), strip.background = element_rect(colour="red", fill="#ccccff"))

Conclusion Roger Federer performed significantly better in terms of statistics including number of win-loss sets and finals he won and lost, win-loss percentage, Grand Slam results, and wins per year. Perhaps, change of racket and coach helped him to bring back his old self.