Rating Player Performance - The Old Argument of Who is Bes

Similar documents
Building an NFL performance metric

Lesson 2 Pre-Visit Slugging Percentage

Chapter. 1 Who s the Best Hitter? Averages

Modeling Fantasy Football Quarterbacks

Clutch Hitters Revisited Pete Palmer and Dick Cramer National SABR Convention June 30, 2008

NFL Player Evaluation Using Expected Points Added with nflscrapr

1: MONEYBALL S ECTION ECTION 1: AP STATISTICS ASSIGNMENT: NAME: 1. In 1991, what was the total payroll for:

Do Clutch Hitters Exist?

How percentages are used in sports

Looking at Spacings to Assess Streakiness

2. Use the following information to find your course average in Mr. Tats class: Homework {83, 92, 95, 90}

George F. Will, Men at Work

A Critical Re-Evaluation of the Career of Sam Baugh. Glenn Gerstner St. John s University PCA/ACA Conference March 28, 2013

A Reproducible Method for Offensive Player Evaluation in Football

Draft - 4/17/2004. A Batting Average: Does It Represent Ability or Luck?

Major League Baseball Offensive Production in the Designated Hitter Era (1973 Present)

Since the National League started in 1876, there have been

One could argue that the United States is sports driven. Many cities are passionate and

2014 NATIONAL BASEBALL ARBITRATION COMPETITION ERIC HOSMER V. KANSAS CITY ROYALS (MLB) SUBMISSION ON BEHALF OF THE CLUB KANSAS CITY ROYALS

When Should Bonds be Walked Intentionally?

CS 221 PROJECT FINAL

Navigate to the golf data folder and make it your working directory. Load the data by typing

Lesson 1 Pre-Visit Batting Average

Evaluating The Best. Exploring the Relationship between Tom Brady s True and Observed Talent

Finite AMDM Name Unit 1 Review Date

Triple Lite Baseball

Predicting the Total Number of Points Scored in NFL Games

Running head: DATA ANALYSIS AND INTERPRETATION 1

Additional On-base Worth 3x Additional Slugging?

DO YOU KNOW WHO THE BEST BASEBALL HITTER OF ALL TIMES IS?...YOUR JOB IS TO FIND OUT.

Kurt Warner. Quarterback 6-2, 220 Northern Iowa St. Louis Rams, 2004 New York Giants, Arizona Cardinals (12 playing seasons)

95 KYLE WILLIAMS. p CAREER HIGHS WILLIAMS CAREER STATISTICS

Final statistics October 9, 1960 Oakland Raiders at Dallas Texans. Cotton Bowl Dallas, Texas. Temperature 82 Humidity 60. Precipitation None

2017 PFF RUSHING REPORT

The ULTIMATE Bowlbound! Solitaire System. (and by cracky, it had better be the last!) by John Houston

Lab 11: Introduction to Linear Regression

Cleveland Urban News.Com Sports. Written by Kathy Thursday, 08 August :21 -

The following are post-game notes from the Detroit Lions win against the Green Bay Packers at Lambeau Field on Monday, November 6, 2017.

NUMB3RS Activity: Is It for Real? Episode: Hardball

Tampa Bay. Buccaneers Recap

PLAYOFF RACES HEATING UP AS NFL SEASON ROLLS ON

SCOUTING REPORT ALEX KARRAS. Updated: March 13,

2014 Tulane Baseball Arbitration Competition Eric Hosmer v. Kansas City Royals (MLB)

Hitting with Runners in Scoring Position

An average pitcher's PG = 50. Higher numbers are worse, and lower are better. Great seasons will have negative PG ratings.

Can Ryan's upstart Falcons stop Brady's juggernaut Patriots?

2014 National Baseball Arbitration Competition

GUIDE TO BASIC SCORING

Lesson 3 Pre-Visit Teams & Players by the Numbers

The final set in a tennis match: four years at Wimbledon 1

First, we will take note of two histograms ("percent frequency histograms" more

THE HISTORY THIS IS FLYER BASEBALL THE PLAYERS THE COACHES 2007 IN REVIEW FLYER HISTORY THE UNIVERSITY TIME WARNER CABLE STADIUM 2008 DAYTON BASEBALL

Lorenzo Cain v. Kansas City Royals. Submission on Behalf of the Kansas City Royals. Team 14

Correlation and regression using the Lahman database for baseball Michael Lopez, Skidmore College

Which On-Base Percentage Shows. the Highest True Ability of a. Baseball Player?

Speak to a football fan long enough and the conversation will cease to cover simple questions

Joe Flacco (Today's Great Quarterbacks) By Ryan Nagelhout READ ONLINE

One of the most-celebrated feats

Pitching Performance and Age

VOL. XV; NO. 10 GREEN BAY, SEPT. 24, 2013 BYE WEEK

Seattle Mariners (15-15) 4, Oakland Athletics (19-13) 2 May 5, 2014

NCAA Football Statistics Summary

Predicting the Draft and Career Success of Tight Ends in the National Football League

An Analysis of the Effects of Long-Term Contracts on Performance in Major League Baseball

2015 NATIONAL BASEBALL ARBITRATION COMPETITION

It s conventional sabermetric wisdom that players

CALIFORNIA Golden Bears

Offensive & Defensive Tactics. Plan Development & Analysis


4th Down Expected Points

Tom Brady is the Most Overrated Quarterback In NFL History

AMERICA S MAIL LEAGUE CONSTITUTION. Updated October 13, (2011 points of emphasis or changes in BOLD)

JEROD MAYO Round 1, No. 10 overall Linebacker University of Tennessee Volunteers 6-1, 242 lbs Hampton, Virginia Kecoughtan High School

The probability of winning a high school football game.

Measuring Batting Performance

COWBOYS & PEPSI HEAD COACHES

Legendre et al Appendices and Supplements, p. 1

A New Measure of Baseball Batters Using DEA

2013 Tulane National Baseball Arbitration Competition

The difference between a statistic and a parameter is that statistics describe a sample. A parameter describes an entire population.

Quasigeometric Distributions and Extra Inning Baseball Games

This page intentionally left blank

EDGAR MARTINEZ: HALL OF FAME CANDIDATE

2013 National Baseball Arbitration Competition

SPORTING LEGENDS: JOE NAMATH

JULY 2012 GREYSCALE Mathletics_Workbook_6-8.indd 1

Machine Learning an American Pastime

How Well They Play the Game. Ken Cale

a) List and define all assumptions for multiple OLS regression. These are all listed in section 6.5

JIMMY GAROPPOLO EASTERN ILLINOIS

Right Questions, Wrong Answers -- A Review of The Wages of Wins

Welcome to Westford Flag Football!

The SUV Statistic for Baseball and Football Situational Underlying Value SUV Baseball Expected Runs/Chance of Scoring Table

Banks, Ernie. Belt, Brandon. Bochy, Bruce. Bowman, NaVorro. Bumgarner, Madison

An Analysis of Factors Contributing to Wins in the National Hockey League

Coach s Clipboard

JIMMY GAROPPOLO EASTERN ILLINOIS

September 29, New type of file on Retrosheet website. Overview by Dave Smith

JIMMY GAROPPOLO EASTERN ILLINOIS

The Use of Rank-sum Ratio on the Attack and Defensive Ability of the Top 8 Men s Basketball Competition in the 16th Asian Games

Transcription:

Rating Player Performance - The Old Argument of Who is Best Ron Gould Emory University October 28-29, 2011

Who is the best quarterback in the NFL?

Who is the best quarterback in the NFL? Tom Brady,

Who is the best quarterback in the NFL? Tom Brady, Aaron Rodgers,

Who is the best quarterback in the NFL? Tom Brady, Aaron Rodgers, Drew Breeze,

Who is the best quarterback in the NFL? Tom Brady, Aaron Rodgers, Drew Breeze, Philip Rivers,

Who is the best quarterback in the NFL? Tom Brady, Aaron Rodgers, Drew Breeze, Philip Rivers, Peyton Manning

Who is the best quarterback in the NFL? Tom Brady, Aaron Rodgers, Drew Breeze, Philip Rivers, Peyton Manning Michael Vick,

How do you attack such a question Most attempts at player rating (or evaluation) have been done via Linear Weight Systems. Here, you identify various aspects of the game, say x 1, x 2,..., x k and what you wish to see produced, say Y, and you seek constants c 0, c 1,..., c k so that Y = c 0 + c 1 x 1 + c 2 x 2 +... + c k x.

First, we are interested in how quarterbacks are presently rated. In particular, we are interested in questions about

First, we are interested in how quarterbacks are presently rated. In particular, we are interested in questions about How the rating is done.

First, we are interested in how quarterbacks are presently rated. In particular, we are interested in questions about How the rating is done. How it might be modified.

First, we are interested in how quarterbacks are presently rated. In particular, we are interested in questions about How the rating is done. How it might be modified. What other things might be important in building any such formula.

History: Mid 1930 s, NFL begins keeping player stats; Passing Leader is: the one with the most yards.

History: Mid 1930 s, NFL begins keeping player stats; Passing Leader is: the one with the most yards. 1938-1940, highest completion percentage.

History: Mid 1930 s, NFL begins keeping player stats; Passing Leader is: the one with the most yards. 1938-1940, highest completion percentage. 1941, Special point system: 1 to 10 points based on your place among the top 10 in each of 6 categories including: TD passes, yards, interception percentage, etc.

History: Mid 1930 s, NFL begins keeping player stats; Passing Leader is: the one with the most yards. 1938-1940, highest completion percentage. 1941, Special point system: 1 to 10 points based on your place among the top 10 in each of 6 categories including: TD passes, yards, interception percentage, etc. For the next 30 years they kept changing from one system to another.

The big change Early 1970s, a special study committee headed by:

The big change Early 1970s, a special study committee headed by: Don Smith of the Pro Football Hall of Fame;

The big change Early 1970s, a special study committee headed by: Don Smith of the Pro Football Hall of Fame; Seymour Siwoff of Elias Sports Bureau (the official NFL statisticians);

The big change Early 1970s, a special study committee headed by: Don Smith of the Pro Football Hall of Fame; Seymour Siwoff of Elias Sports Bureau (the official NFL statisticians); and Don Weiss of the NFL created the passer rating that is now in use in the NFL.

The Formula has 4 parts A = ( COMP ATT.3) 5 B = ( YARDS ATT 3).25 C = ( TD ATT ) 20 D = 2.375 ( INT ATT 25)

The Formula Let mm(x) = max (0, min(x, 2.375)) mm(a) + mm(b) + mm(c) + mm(d) PasserRating = ( ) 100 6 Linear weighting system: The four parts receive equal weight. Note: The PasserRating can run from 0 to 158.3. Note: 4 2.375 = 9.5

Rational for the formula Their Goal A system that did not depend on how other quarterbacks did. Devise a formula to give one point for average performance, two + points for a record performance, and 0 for really bad performance.

They used the four stats kept at that time (1971-2). They also used league averages (at that time) for: Completion percentage (around 50%) Yards-per-attempt (7), Percentage of passes resulting in TDs (5%), Percentage intercepted (5.5%),

Example: Note: Average yards per attempt = 7 (at that time) B = ( YARDS ATT 3).25 Contributes 1 at that average as they wanted!

Some Strengths and Weaknesses Strength: Does not depend on others perfomances.

Some Strengths and Weaknesses Strength: Does not depend on others perfomances. Weakness: Scale is ridiculus!

Some Strengths and Weaknesses Strength: Does not depend on others perfomances. Weakness: Scale is ridiculus! Strength: Applies to a game, season or career!

Some Strengths and Weaknesses Strength: Does not depend on others perfomances. Weakness: Scale is ridiculus! Strength: Applies to a game, season or career! Weakness: Game has changed greatly from 1970 s

Some Strengths and Weaknesses Strength: Does not depend on others perfomances. Weakness: Scale is ridiculus! Strength: Applies to a game, season or career! Weakness: Game has changed greatly from 1970 s Strength: Not dependent upon judgements.

Some Strengths and Weaknesses Strength: Does not depend on others perfomances. Weakness: Scale is ridiculus! Strength: Applies to a game, season or career! Weakness: Game has changed greatly from 1970 s Strength: Not dependent upon judgements. Weakness: This was never meant to be a quarterback rating formula, but rather a passer efficiency formula. Running stats, winning stats, etc. play no role.

Another Weakness - Or Is It? 2 QB s - Super Joe and Broadway Joe: Super Joe takes his team 30 yards for a TD on 3 consecutive 10 yard passes, the final one a TD pass. Rating = 147.9 Broadway Joe throws 2 incomplete passes, then a 30 yard TD pass. Same end result but his Rating = 111.1 Some would say a weakness of the formula - some would say a strength!

The REALLY BIG Problems: There is no set of variables that are agreed upon for measuring a QB s performance.

The REALLY BIG Problems: There is no set of variables that are agreed upon for measuring a QB s performance. There are no generally held beliefs as to how to weigh the various aspects of the game!

The REALLY BIG Problems: There is no set of variables that are agreed upon for measuring a QB s performance. There are no generally held beliefs as to how to weigh the various aspects of the game! There is no generally held view as to what is even meant by best!

The REALLY BIG Problems: There is no set of variables that are agreed upon for measuring a QB s performance. There are no generally held beliefs as to how to weigh the various aspects of the game! There is no generally held view as to what is even meant by best! How do you measure the unmeasurable? To build a formula for the best quarterback you probably need to consider things like: 3rd down performance, red-zone performance, ability to avoid sacks, play-calling (audibles), overall leadership, and maybe even more!

Conclusions: Separating one persons performance from what is clearly a team outcome is difficult. It is unlikely we can build a formula to rate quarterbacks that everyone would find satisfactory! The game has changed enough that the old averages used no longer apply. New long term averages could be created, and then reapplied to past performances. A simpler and more intuitive formula can certainly be devised.

Turning To Baseball - Rating Offensive Performance Who is the best hitter in baseball? We start to run into similar problems as with the QB formula! What do we mean by best?

Is best:

Is best: The top batting average?

Is best: The top batting average? The top homerun hitter?

Is best: The top batting average? The top homerun hitter? The top RBI producer?

Is best: The top batting average? The top homerun hitter? The top RBI producer? Some combination of these? If so, what combination?

Clearly each of the 3 traditional measures clearly produces a best player in that categaory. Each part is simple and direct to measure. But overall we again face a number of problems.

Power vs Consistency vs Something Else? Over the years since the end of WWII, there has been a growing movement to use other means to measure player performance. Sabermetrics is the mathematical and statistical study of baseball records.

Many argue that the true offensive value of a player is in the number of runs they create. So any measure of offensive performance should measure run production. Bill James, (1982) devised the runs created (RC) measure. RC = (Hits + BB) TOTALBASES. AB + BB Rational: Hits + BB counts times reached base and the fraction measures rate at which the runners are advanced.

Example Johnny Pesky (1942) had 620 ABs, 205 hits, 42 BBs, 258 total bases Dick Stuart (1960) had 532 ABs, 160 hits, 34 BBs, 309 TBs

Example Johnny Pesky (1942) had 620 ABs, 205 hits, 42 BBs, 258 total bases Dick Stuart (1960) had 532 ABs, 160 hits, 34 BBs, 309 TBs RC(Pesky) = 96.

Example Johnny Pesky (1942) had 620 ABs, 205 hits, 42 BBs, 258 total bases Dick Stuart (1960) had 532 ABs, 160 hits, 34 BBs, 309 TBs RC(Pesky) = 96. RC(Stuart) = 106.

linear weights First tried by George Linsey (1963) (using recorded data and probability theory) Runs = (.41) 1B + (.82) 2B + (1.06) 3B + (1.42) HR

Thorn and Palmer (1984) produced a linear weights formula using all offensive aspects recorded during a game. Runs = (.46)1B + (.8)2B + (1.02)3B + (1.4)HR + (.33)(BB + HBP) + (.3)SB - (.6)CS - (.25)(AB - Hits) - (.5)(OOB)

Thomas Boswell, 1981 - sports writer Total Average (TA) TA = TB + BB + HBP + SB AB H + CS + GIDP Note: Denominator is a measure of outs made, not at bats.

Earnshaw Cook: 1964 book, Percentage Baaseball Scoring Index (DX) (modifed in 1971) DX = H + BB + HBP + E AB + BB + HBP TB + SB CS AB + BB + HBP Note:

Jay Bennett and John Flueck An evaluation of major league baseball offensive performance models, The American Statistician, Vol. 37, No. 1, (1983), 76-82. Compared 10 models of offensive performance including one they created. Results: Several of the models were very close in performance, including DX (scoring index) and these were far superior to the classic baseball measures.

Jim Albert and Jay Bennett, 2001 Curve Ball, Springer Verlag, New York. Using all the data from 1954-1999, they used the variables from Total Average (TA), namely SB, BB, 1b, 2B, 3B, and HR and created a Least Squares Linear Regression Model Weight for SB BB 1B 2B 3B HR RMSE TA 1 1 1 2 3 4.159 LSLR.16.36.52.67 1.18 1.50.142

LSLR Formula LSLR = (.52)1B + (.67)2B + (1.18)3B + (1.5)HR + (.16)SB + (.36)BB Games Note: modified to include HBP as well.

Results of tests Using the 1954-1999 data: TA and LSLR produced very similar results. LSLR had a slightly better RMSE (but it had to!) When looking at individual years in this period the results were a bit more mixed. LSLR outperformed TA in 35 of the 46 seasons. So there were still seasons when TA was best.

WHY????? LSLR was built on a particular data set and was sure to have a better RMSE on that data set. But on other data sets it is clearly not guaranteed to be best. This is a problem of any model built from a specific data set, and hence there may be uses for more than one formula.

Better RC? Albert and Bennett also noted that since RC and DX (and other formulas) can be described as weighted sum of cross product terms, linear regression can be used to better fit the coefficients in these formulas. Doing this with the 1954-1999 data they improved the RMSE for RC to.140 (better than their LSLR model!). Independent verification of how well James did in fitting his model.

An attempt at the all - time best Michael J. Schell wrote Baseball s All-Time Best Hitters Princeton University Press, 1999. After 282 pages of arguments, assumptions, and data analysis, he concluded Tony Gwynn was the best all-time hitter. Clearly no simple argument here! (Ty Cobb 2nd, Rod Carew, Joe Jackson, Rogers Hornsby, Ted Williams,...)