Building an NFL performance metric

Similar documents
Projecting Three-Point Percentages for the NBA Draft

PREDICTING the outcomes of sporting events

Estimating the Probability of Winning an NFL Game Using Random Forests

Machine Learning an American Pastime

CS 221 PROJECT FINAL

Modeling Fantasy Football Quarterbacks

Rating Player Performance - The Old Argument of Who is Bes

Predicting the Total Number of Points Scored in NFL Games

Lab 11: Introduction to Linear Regression

y ) s x x )(y i (x i r = 1 n 1 s y Statistics Lecture 7 Exploring Data , y 2 ,y n (x 1 ),,(x n ),(x 2 ,y 1 How two variables vary together

Predicting Season-Long Baseball Statistics. By: Brandon Liu and Bryan McLellan

Chapter 12 Practice Test

Running head: DATA ANALYSIS AND INTERPRETATION 1

Predicting the Draft and Career Success of Tight Ends in the National Football League

Pairwise Comparison Models: A Two-Tiered Approach to Predicting Wins and Losses for NBA Games

B. AA228/CS238 Component

2017 PFF RUSHING REPORT

Section I: Multiple Choice Select the best answer for each problem.

Announcements. Lecture 19: Inference for SLR & Transformations. Online quiz 7 - commonly missed questions

Navigate to the golf data folder and make it your working directory. Load the data by typing

Using New Iterative Methods and Fine Grain Data to Rank College Football Teams. Maggie Wigness Michael Rowell & Chadd Williams Pacific University

A Novel Approach to Predicting the Results of NBA Matches

The probability of winning a high school football game.

Simulating Major League Baseball Games

The Reliability of Intrinsic Batted Ball Statistics Appendix

ISDS 4141 Sample Data Mining Work. Tool Used: SAS Enterprise Guide

Lecture 5. Optimisation. Regularisation

Which On-Base Percentage Shows. the Highest True Ability of a. Baseball Player?

a) List and define all assumptions for multiple OLS regression. These are all listed in section 6.5

NCAA Football Statistics Summary

This page intentionally left blank

Apple Device Instruction Guide- High School Game Center (HSGC) Football Statware

Forecasting Baseball

CS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan

Pitching Performance and Age

BASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG

The factors affecting team performance in the NFL: does off-field conduct matter? Abstract

Evaluating The Best. Exploring the Relationship between Tom Brady s True and Observed Talent

arxiv: v1 [stat.ml] 15 Dec 2017

Deconstructing Data Science

AggPro: The Aggregate Projection System

Competitive Performance of Elite Olympic-Distance Triathletes: Reliability and Smallest Worthwhile Enhancement

Two Machine Learning Approaches to Understand the NBA Data

PENALIZED LOGISTIC REGRESSION TO ASSESS NFL QUARTERBACK PERFORMANCE

intended velocity ( u k arm movements

Additional On-base Worth 3x Additional Slugging?

An Analysis of Factors Contributing to Wins in the National Hockey League

Evaluating and Classifying NBA Free Agents

Predicting Tennis Match Outcomes Through Classification Shuyang Fang CS074 - Dartmouth College

Bayesian Optimized Random Forest for Movement Classification with Smartphones

Deconstructing Data Science

Predictive Model for the NCAA Men's Basketball Tournament. An Honors Thesis (HONR 499) Thesis Advisor. Ball State University.

Minimum Mean-Square Error (MMSE) and Linear MMSE (LMMSE) Estimation

A Network-Assisted Approach to Predicting Passing Distributions

Sports Predictive Analytics: NFL Prediction Model

Predicting NBA Shots

If a fair coin is tossed 10 times, what will we see? 24.61% 20.51% 20.51% 11.72% 11.72% 4.39% 4.39% 0.98% 0.98% 0.098% 0.098%

PREDICTING THE FUTURE OF FREE AGENT RECEIVERS AND TIGHT ENDS IN THE NFL

Pitching Performance and Age

Unit 4: Inference for numerical variables Lecture 3: ANOVA

The Rise in Infield Hits

2. Use the following information to find your course average in Mr. Tats class: Homework {83, 92, 95, 90}

ASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era. Gary Evans Stat 201B Winter, 2010

Driv e accu racy. Green s in regul ation

save percentages? (Name) (University)

Predictors for Winning in Men s Professional Tennis

Novel empirical correlations for estimation of bubble point pressure, saturated viscosity and gas solubility of crude oils

Infield Hits. Infield Hits. Parker Phillips Harry Simon PARKER PHILLIPS HARRY SIMON

Analysis of Variance. Copyright 2014 Pearson Education, Inc.

Rules of Play REDZONE! I. DESCRIPTION OF THE GAME: II. EQUIPMENT INCLUDED: III. PLAY OF THE GAME: IV. HOW TO USE THE GAME EQUIPMENT: 1.

MONEYBALL. The Power of Sports Analytics The Analytics Edge

Honest Mirror: Quantitative Assessment of Player Performances in an ODI Cricket Match

Effect of homegrown players on professional sports teams

Defense dominant as Archbishop Wood rolls past Hollidaysburg in PIAA 5A quarterfinals

Major League Baseball Offensive Production in the Designated Hitter Era (1973 Present)

This file is part of the following reference:

A point-based Bayesian hierarchical model to predict the outcome of tennis matches

Correlation and regression using the Lahman database for baseball Michael Lopez, Skidmore College

Rules of Play REDZONE! I. DESCRIPTION OF THE GAME: III. PLAY OF THE GAME: IV. HOW TO USE THE GAME EQUIPMENT: 1. Game Dice:

NUMB3RS Activity: Is It for Real? Episode: Hardball

Predicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007

PREDICTING THE NCAA BASKETBALL TOURNAMENT WITH MACHINE LEARNING. The Ringer/Getty Images

Algebra I: A Fresh Approach. By Christy Walters

This is Not a Game Economics and Sports

Failure Data Analysis for Aircraft Maintenance Planning

(Under the Direction of Cheolwoo Park) ABSTRACT. Major League Baseball is a sport complete with a multitude of statistics to evaluate a player s

Guide for Statisticians

Lesson 14: Modeling Relationships with a Line

Equation 1: F spring = kx. Where F is the force of the spring, k is the spring constant and x is the displacement of the spring. Equation 2: F = mg

Deriving an Optimal Fantasy Football Draft Strategy

GSIS Video Director s Report ASCII File Layout 2013 Season

Do Clutch Hitters Exist?

Journal of Quantitative Analysis in Sports. Rush versus Pass: Modeling the NFL

EFFICIENCY OF TRIPLE LEFT-TURN LANES AT SIGNALIZED INTERSECTIONS

Evaluating the Performance of Offensive Linemen in the NFL

Predicting Baseball Win/Loss Records from Player. Projections

A history-based estimation for LHCb job requirements

Should bonus points be included in the Six Nations Championship?

Is lung capacity affected by smoking, sport, height or gender. Table of contents

Player Lists Explanation

Transcription:

Building an NFL performance metric Seonghyun Paik (spaik1@stanford.edu) December 16, 2016 I. Introduction In current pro sports, many statistical methods are applied to evaluate player s performance and ability. For example, WAR (wins above replacement) is a very popular measure of the player s performance in major league baseball (MLB). Especially, FWAR (Fangraphs WAR) for pitchers is very simple but powerful since it only uses 3 categories (Homerun, Strikeout and Based on balls) to measure performance of individual pitchers. Unlike baseball, which is easier as to build an individual performance metric due to nature of batter vs hitter match up, American football has more aspect of a team sports, and it s difficult and even can be misleading to evaluate performance based on individual statistics. However, it is still worthwhile to analyze which statistic category (such as passing vs. rushing, total passing yards vs completion rate) is more important to get more scores/win in a game since it can give information which aspect of game a team should fortify to improve their performance. Also, it will be an interesting to analyze how a current performance metric such as passer rate is reflecting to the actual team score. In this project, I will focus on a supervised learning on building a performance metric which is directly related to the team s score, and finally a win. As inputs, all offensive and defensive statistic features except the direct way to get the score (touchdown, field goal, etc.) are included. Then I used different methods (linear regression, weighted regression, random forest and gradient boosting) to output a predicted score of the team and win/loss of a game. II. Previous Work In the last year, a few groups worked on prediction and optimization of a fantasy football team. For example, P. Steenkiste used linear regression and random forest model to optimize a fantasy team [1] and P. Dolan et.al. used linear regression to predict quarterback performance in terms of fantasy points [2]. Although the target variables will be different for this project (game score of a game instead of fantasy points), both projects should deal with regression problems using similar inputs. III. Data Acquisition and feature selection 1. Data acquisition and processing Individual NFL game statistics in 2013 is obtained from on the web [3]. Since each 32 teams will have 16 games in the regular season, the raw data contains data for 256 games. Two features are selected for my prediction variables - the scored points for each team and win/loss of the games. I ve included additional 5 features (yard/attempt, pass completion percentage, etc.) since they could provide more intuitive analysis and traditional metric is also based on them. From the 512 team score samples, I set aside 0 samples for training and validation, and the remainder (112 samples) for testing. For prediction of win/loss, I selected 200 game results for training and validation, and remaining 56 results are reserved for testing. 2. Feature selection There were 28 features in the data set, but using all of the features can result in overfitting. I therefore explored the use of feature selection to shrink the number of input features. To select the most important features to keep, I ran sequential forwardbased feature selection on the training examples.

1 First Down 2 Passyard/Attempt 3 Turnover from opponent 4 Turnover given up 5 Rushyard/Attempt 6 Yard loss by sack 7 Penalty yard from opponent 8 Yard loss by penalty Table 1. Top features for a team score Features were selected based on their mean squared error, using 5-fold-cross-validation. The features with top importance listed in the table 1, ordered by their importance. MSE from using different numbers of features are described in figure 1. It turns out using all of the features doesn t result in overfitting, but having similar performance comparing using top 8 features. 30 20 10 0 Top 1 Top 2 Top 3 Top 5 Top 8 All Figure 1. MSE vs. number of features included IV. Methods 1. Linear Regression I suppose my learning task as a regression problem: given a processed list of features, I would like to predict scores of a team. Linear regression is a natural choice of baseline model for regression problems, so I first ran a linear regression using the top 8 features. To reduce the variance of the model, I first bootstrapped my training samples (make 100 cross validation sets by random sampling from the training data) and ran the linear regression model each time, and then used a weighted average, where the weights are inversely proportional to the model s cross validation error. The obtained model is (score) = 12.09 + c i f i The coefficients and features are described in the table 2. MSE of this model is 36.7 points 2, equivalent to 6.05 point RMSE (root mean squared error). fi 1 First Down 0.736 2 Passyard/Attempt 2.185 3 Turnover from opponent 2.629 4 Turnover given up -1.472 5 Rushyard/Attempt 1.229 6 Yard loss by sack -0.105 7 Penalty yard from opponent 0.037 8 Yard loss by penalty -0.022 Table 2. Coefficients for the linear model 2. Weighted linear regression To capture local nonlinearities between the features and the score, I tried local weighting to the cost features for our linear regression model. Using the Euclidean norm, my weight function for a training example x(i) with respect to an input example x was: x(i) x 2 w(i) = exp ( τ 2 ) To make the Euclidean distance (the norm in the equation above) meaningful, I standardized features to zero mean and unit variance prior to computing the weights. The parameter τ in the weighting function is tried 0.5 < τ < 100, but I couldn t find a noticeable performance change in terms of MSE compared to the simple linear regression model. i ci

MSE MSE MSE 90 weighted linear regression Random Forest Performance 30 20 0 10 20 30 tau 0 10 20 30 Number of Trees Figure 2. MSE vs.τ in weighted linear model As described in figure 2, MSE is almost identical when τ 2, but trended to increase sharply as τ reduces when τ < 2. 3. Random Forest Random Forest method is one of the tree-based model. It builds a number of decision trees on bootstrapped training samples, but only consider a random sample of all possible predictors when split the data set so that each trees are decorrelated[4]. By taking average of the resulting trees, we can reduce the variance of the model and the model becomes more reliable. An ensemble method that used a weighted sum of multiple random forest model, where each of the random forest models was fit using different parameter settings is used. The weights were inversely proportional to the models crossvalidation error. The MSE curve vs. number of the bootstrapped tree is plotted in the figure 3. As described in the figure, the MSE of the model is saturated at around 48, about 33% higher than the linear regression model. 4. Gradient boosting Gradient boosting is another tree-based method, but unlike the random forest model, the trees are grown sequentially without using bootstrap sampling. In this model, we apply a slow learning rate and update the result weighted by the learning rate, and Figure 3. MSE vs. number of trees in random forest model iteratively fit the decision tree to the residual from the model. Each of these trees are rather small with a few terminal node. In this project, I ran a gradient boosting regression with tree depth of 1 and 2. Learning rate of 0.02 is applied. The MSE vs. number of iteration is plotted in the figure 4 below. The MSE is saturated after 0 runs of the iterations. The performance of the gradient boosting is comparable to the random forest method, about 33% higher than the linear regression model. 200 1 100 0 Performance of boosting 0 200 0 0 Run of the iterations Tree Depth = 1 Tree Depth = 2 Figure 4. MSE vs. number of iteration in boosting model

V. Results Table 3 shows the test set performance of the optimized (with respect to the training set) model. The primary error metric is RMSE, which penalizes larger deviations superlinearly. For the best model, linear regression, the R 2 measure between predicted score and actual values in the test set was 0.6447. MSE(point 2 ) RMSE(point) Linear Regression 36.652 6.054 Random Forest 48.379 6.956 Boosting 47.14 6.866 Table 3. Error for the scores across all models This model is also used to predict the outcome (win/loss) of the test samples based on calculated score prediction, and obtained 88.2% accuracy. VI. Discussion 1. Passer rate As a popular traditional metric to evaluate performance of a quarterback, passer rating has a formula of [5] (rate) = (PCT 0.3) 83.3 + (YPA 3) 4.167 + ( TD ) 333.3 (INT ) 416.7 A developed linear model from the training data, using the same features except the touch down rate in the passer rate is (score) = (PCT) 4.92 + (YPA) 2.53 + ( INT ) opp 55.3 (INT).4 + 3.45 Comparing two expressions, it turns out the passer rating overestimates the importance of pass completion percentage and intercept rate by factor of 10 and 2, compared to average pass yard. In other words, pass completion rate is not an important indicator to a team score, but has a large weight when evaluating the passer rate. 2. First down, pass vs rush Number of the first down a team gets in a game is the strongest indicator of the score, but it s not included in the passer rate or fantasy point for running back or wide receiver. Instead, they include the number of touchdown scored, which is directly related to the score. However, when it comes to evaluate the performance of a player, touchdown is not a good metric since it does not credit previous plays which contribute to. Also current metrics credit a touchdown too significantly (equivalent to yard in passer rate, 100 yard for quarter back and yard for running back and wide receiver in fantasy point system.) Comparison of average pass and rush yard per attempt showed that pass average per attempt is much strong indicator related the team score. My model with 3 times bigger coefficient for the average pass yard, when all the coefficients are normalized to compare. 3. Building a new passer rate Based on the discussion above, I suggest a new metric to evaluate quarterback performance using three most important features (first down, average pass yard and intercept rate) (rate) = ( firstdown ) 31.5 + ( PassYard ) 2.07 ( INT ) 110 VII. Conclusion and future work In this project, I focused on predicting NFL team score based on features not directly related to the score itself. Linear regression shows better estimate of the score than two tree based models, random forest and gradient boosting. The result of this project can be used to compare the importance of each NFL stat categories. It turns out that number of the first down, which is not included in either passer rate or fantasy point evaluation, is the strongest indicator on a team s score. On the other hand, pass completion rate, which is one of the four component

in evaluation of passer rate, affects very little on the team s score. From the developed model, the effect of 10% pass completion difference in scoring is equivalent to merely 0.2 yard/pass attempt. As a future work, this model can be extended to actually predict the outcome of the future games. In order to do this, we will need to build a model which predict each stat categories based on the data of previous games of a team. References [1] Paul Steenkiste (2015), Finding the Optimal Fantasy Football Team, Stanford CS229 final report. [2] Peter Dolan et.al.(2015), Machine Learning for Daily Fantasy Football Quarterback Selection CS229 final report [3] https://www.statcrunch.com/app/index.php? dataid=1903988. [4] Gareth James et.al., (2013) An Introduction to Statistical Learning with Applications in R Springer [5]http://www.nfl.com/help/quarterbackratingform ula