PENALIZED LOGISTIC REGRESSION TO ASSESS NFL QUARTERBACK PERFORMANCE

Similar documents
Building an NFL performance metric

Entrainment of Air into a Balcony Spill Plume

Design for Safety and Stability

Evaluating Individual Player Contributions in Basketball

Reliability Analysis of Hydraulic Components and Excavator Engine in Maintenance of Mine Heavy Equipment

Predicting the Total Number of Points Scored in NFL Games

The factors affecting team performance in the NFL: does off-field conduct matter? Abstract

Centre Annual Re-Accreditation

A tennis curriculum for the fundamental stage of development

Tag Reporting Rate Estimation: 3. Use of Planted Tags in One Component of a Multiple-Component Fishery

Effect of Bus Bays on Capacity of Curb Lanes

Guide on static dilution method for NO, NO 2 and SO 2 at limit values

Evaluating The Best. Exploring the Relationship between Tom Brady s True and Observed Talent

Sports Analytics Workshop

Optimal Design of Pressure Relief Valves in Hydropower Stations Jianxu Zhou 1, Bryan W. Karney 2, Fulin Cai 1

Estimating the Probability of Winning an NFL Game Using Random Forests

NCAA Football Statistics Summary

Modeling Fantasy Football Quarterbacks

DESIGN CHALLENGES FOR DISTRIBUTION OVERHEAD LINES SUBJECT TO HIGH IMPACT LOW PROBABILITY EVENTS

SPORTSCIENCE sportsci.org

DISMAS Evaluation: Dr. Elizabeth C. McMullan. Grambling State University

The probability of winning a high school football game.

Quick Start Guide. Quick Start Guide

An Evaluation of Transit Signal Priority and SCOOT Adaptive Signal Control

O bservational studies show that people who undertake

Player Lists Explanation

PREDICTING THE FUTURE OF FREE AGENT RECEIVERS AND TIGHT ENDS IN THE NFL

Predicting the Draft and Career Success of Tight Ends in the National Football League

Pierce 0. Measuring How NBA Players Were Paid in the Season Based on Previous Season Play

Highway Capacity and LOS. Reading Assignment: pgs

Fantasy Football 2018 Rules

100Yards.com. MIS 510, Project Report Spring Chandrasekhar Manda Rahul Chahar Karishma Khalsa

HARMONIC DISTORTION IN STORM WAVES AND CONSEQUENCES FOR EXTREME CREST HEIGHTS. Miami, FL, USA. Bergen, Norway

Research on the Leak Testing Technique of Flexible Accumulation Chamber

B. AA228/CS238 Component

Factors Affecting the Probability of Arrests at an NFL Game

1. Answer this student s question: Is a random sample of 5% of the students at my school large enough, or should I use 10%?

Effect of homegrown players on professional sports teams

Sports Board IIT Guwahati. Presents. Rule Book

PLAYOFF RACES HEATING UP AS NFL SEASON ROLLS ON

Coach s Clipboard

Training for Environmental Risks in the Black Sea Basin

Modification of air standard composition by diffusive and surface processes

Analytical approach of availability and maintainability for structural lifting cables using reliability-based optimization

Equation 1: F spring = kx. Where F is the force of the spring, k is the spring constant and x is the displacement of the spring. Equation 2: F = mg

Lesson 14: Modeling Relationships with a Line

Running head: DATA ANALYSIS AND INTERPRETATION 1

NFL Overtime-Is an Onside Kick Worth It?

Rating Player Performance - The Old Argument of Who is Bes

Why We Should Use the Bullpen Differently

A Reproducible Method for Offensive Player Evaluation in Football

Journal of Quantitative Analysis in Sports. Rush versus Pass: Modeling the NFL

95 KYLE WILLIAMS. p CAREER HIGHS WILLIAMS CAREER STATISTICS

Pairwise Comparison Models: A Two-Tiered Approach to Predicting Wins and Losses for NBA Games

Navigate to the golf data folder and make it your working directory. Load the data by typing

GSIS Video Director s Report ASCII File Layout 2013 Season

MOST RECEIVING YARDS IN A SIX-SEASON SPAN, NFL HISTORY

A Simple Visualization Tool for NBA Statistics

Scoring summary Qtr Time Scoring play And HHS Andrean 1 8:17 #38 10 yd run (#32 kick) 7 0 Andrean 2 11:50 #3 33 yd pass to #82 (#32 kick) 14 0

Rules of Play REDZONE! I. DESCRIPTION OF THE GAME: II. EQUIPMENT INCLUDED: III. PLAY OF THE GAME: IV. HOW TO USE THE GAME EQUIPMENT: 1.

Can Ryan's upstart Falcons stop Brady's juggernaut Patriots?

PREDICTING the outcomes of sporting events

The MACC Handicap System

Shakedown analysis of soil materials based on an incremental approach

Risk-Bearing in a Winner-Take-All Contest

AN EXPERIMENTAL STUDY OF WAVE FORCES ON VERTICAL BREAKWATER

c. Smoking/Vaping: The KFFSC Draft will be a non-smoking & non-vaping event.

Rules of Play REDZONE! I. DESCRIPTION OF THE GAME: III. PLAY OF THE GAME: IV. HOW TO USE THE GAME EQUIPMENT: 1. Game Dice:

AVAILABILITY ANALYSIS OF THE INTEGRATED MAINTENANCE TECHNIQUE BASED ON RELIABILITY, RISK, AND CONDITION IN POWER PLANTS

9: Scheduling: Proportional Share

TEMPERATURE FIELD INSIDE THE DIAPHRAGM GAS METER

A MARKOV CHAIN STATE TRANSITION APPROACH TO ESTABLISHING CRITICAL PHASES FOR AUV RELIABILITY

2017 PFF RUSHING REPORT

Major League Baseball Offensive Production in the Designated Hitter Era (1973 Present)

GAME SETUP: There are two teams orange and grey. Each team has 12 offense and 12 defense play cards.

Application of bi-directional static loading test to deep foundations

Evaluating and Classifying NBA Free Agents

Using Markov Chains to Analyze a Volleyball Rally

Friday, October 14, 2011 (07:02 PM) League game at Highland

THE SPECTRAL WAVE CLIMATE IN THE BARENTS SEA

In Search of David Ross

Wind wave evolution in finite depth water

Predicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007

FANTASY FOOTBALL RULES

MIDDLE SCHOOL RECREATIONAL SPORTS FLAG FOOTBALL RULES

4th Down Expected Points

ASSESSMENT OF TRAVEL TIME RELIABILITY WITH AND WITHOUT THE DYNAMIC USE OF THE HARD SHOULDER FIELD TEST FROM A FRENCH MOTORWAY

Department of Economics Working Paper

Columbia University. Department of Economics Discussion Paper Series. Auctioning the NFL Overtime Possession. Yeon-Koo Che Terry Hendershott

a) List and define all assumptions for multiple OLS regression. These are all listed in section 6.5

Football Player s Performance and Market Value

Strong Postcranial Size Dimorphism in Australopithecus afarensis: Results From Two New Resampling Methods for Multivariate Data Sets With Missing Data

This page intentionally left blank

save percentages? (Name) (University)

Apple Device Instruction Guide- High School Game Center (HSGC) Football Statware

Kurt Warner. Quarterback 6-2, 220 Northern Iowa St. Louis Rams, 2004 New York Giants, Arizona Cardinals (12 playing seasons)

Efficiency Wages in Major League Baseball Starting. Pitchers Greg Madonia

High intensity in football: is it correlated with technical events outcome? Submission Type: Original investigation

DakStats by Daktronics Inc. Brookings, SD

= = 1 2. (6) What is the probability that Jack will get a wooden bat and Peter will not = 25

Transcription:

PENALIZED LOGISTIC REGRESSION TO ASSESS NFL QUARTERBACK PERFORMANCE 4/26/2016 Abstract [Draw your reader in with an engaging abstract. It is tyically a short summary of the document. When you re ready to add your content, just click here and start tying.] Peters, Benjamin G [Email address]

Introduction The remier rofessional American football league is the National Football League (NFL). Over the ast decade, there aears to have been a large emhasis in the quarterback osition. For examle, NFL quarterbacks are often assigned a win-loss record similar to that of baseball itchers or hockey goaltenders. The quarterback osition is the only osition in the NFL to be assigned a win-loss record. As can be seen in Figure 1, the emhasis on the quarterback osition is understandable. Scatterlot of Cm, Att, Yds, TD vs Year 1960 1980 2000 2020 350 Cm 600 Att 300 500 250 400 200 150 4000 3500 Yds 300 25 TD 3000 20 2500 15 2000 1960 1980 2000 2020 10 Year Figure 1: Team Yearly Average Figure 1 shows that over the years there has been a general trend in the yearly assing attemts, comletions, yards and touchdowns er team (It should be noted that the 1970s have a reutation for being a time eriod when the running backs were most rominent. As a result, teams assed much less in this decade than in any other decade since 1960). With the increasing trend in assing, it makes sense that quarterback would be the highest aid osition in the league. In the NFL, quarterbacks average over $3.8 million. With that much money being aid to quarterbacks, there is heavy resonsibility on a team s quarterback to erform. The goal of this study is to find quarterback erformance metrics (ass comletion ercentage, yards, touchdowns, intercetions, etc.) that are indicators of a team s robability of winning a game and use these imortant metrics to determine a quarterback s contribution to their team s chances of winning. There have been a few metrics introduced to measure quarterback erformance. This aer discusses two of them. The first is the asser rating [1]. The asser rating was adoted by the NFL in 1973. It is a function of ass comletions er attemt, assing yards er attemt, assing touchdowns er attemt, and intercetions er attemt. The metric ranges from 0 to 158.3. The metric fails to account for game situations and for other variables the quarterback 1

may be resonsible. The other metric is Total Quarterback Rating (Total QBR) [2]. This metric was designed by ESPN s Stats & Information Grou and is designed to measure how a quarterback s lay contributes to scoring oints and winning. ESPN has not released an actual formula for this metric. While the metric has the benefit of easy interretability (scale 0-100), its comutation is very detailed and is deendent on data not necessarily obtainable to the general ublic. In this study, a enalized logistic regression is used to determine weights for several common game statistics. These weights can then be used on a team s er-game averages to determine a value which reflects the team s chances of winning a game. This value is broken u between the quarterback and the rest of the team based on how resonsible the quarterback is for certain statistics. As a result, the quarterback s contribution to their team s chances of winning can be determined. The method has the benefit of being easy to use while utilizing all the statistics a quarterback is directly resonsible for. The next section rovides a brief overview of logistic regression and enalized logistic regression. The model is then develoed. Then using the fitted coefficients of the regression model, the aforementioned value is develoed and a small demonstration with 4 teams is shown. The reort ends with some ideas of imrovement. Logistic Regression In ordinary least squares regression, given a set of continuous observations y i R, i = 1,, n and a feature matrix X R +1, the objective is to fit a model to the conditional exectation of the resonse given the set of features. Tyically, the model is assumed to be of the linear form β 0 + β j x j. In order to fit this model, the least squares aroach looks to minimize the residual sum of squares which results in the closed form solution β = (X X) 1 X y, where β = (β 0, β 1,, β ) and y = (y 1,, y n ). In logistic regression, the resonse is now binary (0,1). Therefore, the normal least squares aroach is not aroriate as it assumes the resonse can take on any real value. Instead of modeling the resonse, we can model the robability that the resonse takes on one of either two otions. However, robabilities are limited to being between 0 and 1. Therefore least squares is still not aroriate. Instead we can use the logit transform of the conditional robability that the resonse is 1 given a set of features. If we let (x) = P(Y = 1 X = x), we can model the logit of (x) as: log ( (x) 1 (x) ) = β 0 + β j x j Solving for (x) yields 1 (x) = 1 + ex [ (β 0 + β j x j )] Tyically, the arameters β are fitted by maximizing the log-likelihood function: n l(β) = [(1 y i ) (β 0 + β j x j ) + log (1 + ex (β 0 + β j x j ))] i=1 2

While no closed form solution exists, the arameters can be fit using Newton s method or gradient descent among other iterative aroaches. However, the interest is in erforming variable selection. Therefore, a slight modification is made. Penalized Logistic Regression: Elastic Net Just as in ordinary least squares regression we can aly a enalty to the log-likelihood function such as: n l (β) = [(1 y i ) (β 0 + β j x j ) + log (1 + ex (β 0 + β j x j ))] λp α (β) i=1 where P α (β) = 1 α β 2 2 2 + α β 1 [3]. It is not difficult to see that when α = 1, the enalty is the same as the LASSO enalty. As α 0, the enalty acts like the ridge arameter. The result of maximizing this enalized log-likelihood equation is β j s that shrink to zero as λ grows. The α term acts to shrink arameters that are correlated. In the roblem for this roject, there are several features and some are correlated. Therefore, this enalty shall reduce the number of significant features while shrinking the coefficients of the remaining features to account for correlations. Data Collection For this roject, 170 games were samled over the course of 5 seasons (2010-2014). The NFL season is broken down into 17 weeks. Every team lays 16 games and they have one bye week (a week off). For each of the 17 weeks, a winning team and a losing team were chosen randomly. Data was collected from Pro-Football-Reference.com [4] and The Football Database [5]. Initially, 46 features were considered. These features include ass yards, rushing yards, assing touchdowns, etc. The full list can be seen in Table 1 on the following age. The resonse y i is the result of the game for the team of interest. Due to the method of samling, there are an equal amount of wins (1) and losses (0). It should be noted that the chance of a tie is ossible. However, there were no ties in the samled games. If there were, the ties would be lumed with the losses and be considered Not Win. After creating a 170 row matrix with 46 feature columns, the features are standardized into z-scores so that the variable selection removes features based on their imortance and not because of relative scale. For examle, the number of assing touchdowns will be much smaller than assing yards since it is tyical for a team to ass for more than 200 yards while scoring 2 touchdowns. Therefore, the coefficient for yards may be very small as the unit change in a yard would not lead to a large change in the robability of winning comared to the unit change of a touchdown. Regression Model In order to fit the enalized logistic regression, there needs to be a choice of the enalty terms α and λ. α is chosen to be 0.75 in order to lean towards variable selection. The LASSO arameter is chosen using 10-fold cross validation. The MATLAB function lassoglm erforms this oeration. MATLAB looks to find the λ which minimizes the deviance which is the same as maximizing the log-likelihood. The result is shown in the Figure 2. 3

400 Cross-validated Deviance of Elastic Net fit Alha = 0.75 350 300 Deviance 250 200 150 100 50 10-1 10-2 Lambda Figure 2: Cross Validation The result is λ = 0.0114. The model is refit using the two arameters. The results are shown in the Table 1: Table 1: Features and their Coefficients 10-3 10-4 4

If the exonential of the coefficients is taken, the result can be interreted as the change in the odds of winning for a unit change in the variable. Since the data is standardized, β 0 = 0. With the coefficients fitted, the next ste is to use these coefficients to measure the influence a quarterback has on the result of a game. Quarterback Influence From Table 1, the significant features related to the Quarterback osition are Pass Att, Pass Yards, Pass TD, Rush Att, Rush TD, Turnovers, O TO Return TD, O Safeties, and Penalties. Note that the sacks features could be attributed to the quarterback but for this study, they are attributed to the offensive line (5 layers rotecting the quarterback). It is not surrising that scoring touchdowns (TD) would lead to a higher chance of winning. However, it is interesting that Passing Attemts (Pass Att) is negatively weighted. This could be due to the fact that when teams fall behind they tend to ass more frequently in an attemt to catch u. As a result they have a large amount of assing attemts, but they still lose. While rushing attemts, yards, and touchdowns are tyically associated with the running back osition, quarterbacks are also quite involved in this art of the game as well. While few will contribute a large amount, the rushing roduction of the quarterback should be considered. The reasoning behind the strong weight for the rushing attemts relates to the tendency of teams relying on the running the ball while they are ahead to kee the clock ticking down. In order to searate the influence of the quarterback on the outcome of the game from that of the rest of the team, a roortion of each feature is assigned to the quarterback based on their average roduction relative to the team s average roduction. Take this excert from the data as an examle: The to row shows the names of the features. The second row shows the 2014 season totals for each feature and the third row shows quarterback Tom Brady s 2014 season totals. The final row is the roortion of each feature assigned to Tom Brady. In the analysis, the totals are divided by 16 to yield the er-game average. These er-game averages are standardized using the means and standard deviations for each resective feature from the 2010-2014 game data. Then the aroriate roortion of each standardized feature is allocated to the quarterback. Using the weights (β j s), a linear combination, Xβ, with the standardized features for each team, and by extension each quarterback, can be calculated. For simlicity, this is defined as Score. A Score greater than 0 is favorable as it corresonds to the chance of winning being greater than 0.5. A negative value indicates either sub-ar assing statistics or a roensity to commit turnovers or worse having those turnovers returned for touchdowns. By subtracting the total team Score by the quarterback s Score, the remaining team contribution is determined. Since the quarterback does not lay defense nor do they kick field goals, the remaining team contribution consists rimarily of the defense and kicking features along with the rest of the rushing features. A grahical reresentation of this is shown in Figure 3. 5

Figure 3: AFC East Scores The grah shows the results for the 4 teams from the Eastern division of the American Football Conference of the National Football League. Colloquially, this is known as the AFC East. It consists of 4 teams: the New England Patriots, Buffalo Bills, Miami Dolhins, and New York Jets. The results indicate how quarterback lay can affect a team s win robability. The obvious examle is Tom Brady s large score contributing to his team s large overall score. On the other hand, Kyle Orton s erformance aears to hinder his team who otherwise has a rather good score. The score and remaining team contribution can be added together and converted to a robability. (x) = 1 1 + ex[ (Xβ)] For these four teams the results are shown in Table 2: Table 2: Summary of AFC East The Score can rovide a way for teams to assess their quarterback. It can also be used by the layers themselves to gauge their relative contribution. For layers erforming at a high level, it is exected that they would have a high Score. Thus the metric can be used as a bargaining tool when the time comes for contract negotiations. Conclusion As time has rogressed, the focus on the assing game has increased immensely. Whereas running backs were seen as the remier osition in the 1970s, that title now belongs to the quarterback. On average, they are the highest aid layers in the league and the statistics have been steadily increasing. The focus of this roject was to 1) determine the metrics recorded during an NFL game that are most imortant for redicting whether a team wins and 2) use these imortant 6

metrics to determine a quarterback s contribution to their team s chances of winning. This metric can be used in several ways. It can be used to rank the quarterbacks roviding a list for those that are interested in discussions of which layer is better. More imortantly, it can be used by teams and layers alike to measure erformance which lays a art in the contracts the layers sign. There are some drawbacks to the overall design of the model. The model does not account for game situation. A layer can accumulate yards or even touchdowns late in the game when the result is decided and that will count the same as if they had done when the result of the game had not been decided. In addition, the data collected for the grah in the revious section does not account for games missed. For examle, Geno Smith of the New York Jets did not lay in two games during the 2014 season. Therefore, his averages are only for fourteen games not sixteen. In order to account for this, the overall team values should only include the games that he layed in. However, it is more challenging to get the data filtered in such a way. In order to exand on this roject, a more detailed look at layer statistics can be used. Factors ignored such as game situation can be used to weight erformance. Using the examle of the layer accumulating stats that do not affect the game, those stats can be weighted lightly to give a fair reresentation of the layer s accomlishments. In addition, the finances of the layer can be accounted for. Based on erformance, the amount of money a layer is exected to be aid when it is time for a new contract can be studied using the Score defined in this study. 7

References [1] "Passer Rating." Wikiedia. Wikimedia Foundation. Web. 26 Ar. 2016. [2] "Total Quarterback Rating." Wikiedia. Wikimedia Foundation. Web. 26 Ar. 2016. [3] "Documentation." Lasso or Elastic Net Regularization for Generalized Linear Model Regression. Web. 26 Ar. 2016. [4] "Pro Football Reference." Pro-Football-Reference.com. Web. 26 Ar. 2016. [5] "Football Statistics and History The Football Database." FootballDB.com. Web. 26 Ar. 2016. 8