Additional On-base Worth 3x Additional Slugging?

Similar documents
Relative Value of On-Base Pct. and Slugging Avg.

When Should Bonds be Walked Intentionally?

Should pitchers bat 9th?

A Markov Model of Baseball: Applications to Two Sluggers

Bragan s 1956 Pirates Unconventional Lineups

Table 1. Average runs in each inning for home and road teams,

MONEYBALL. The Power of Sports Analytics The Analytics Edge

Major League Baseball Offensive Production in the Designated Hitter Era (1973 Present)

Do Clutch Hitters Exist?

Average Runs per inning,

Correction to Is OBP really worth three times as much as SLG?

Figure 1. Winning percentage when leading by indicated margin after each inning,

Chapter. 1 Who s the Best Hitter? Averages

George F. Will, Men at Work

An average pitcher's PG = 50. Higher numbers are worse, and lower are better. Great seasons will have negative PG ratings.

Salary correlations with batting performance

The Rise in Infield Hits

Expansion: does it add muscle or fat? by June 26, 1999

Simulating Major League Baseball Games

a) List and define all assumptions for multiple OLS regression. These are all listed in section 6.5

Draft - 4/17/2004. A Batting Average: Does It Represent Ability or Luck?

Machine Learning an American Pastime

SAP Predictive Analysis and the MLB Post Season

A Brief Shutdown Innings Study. Bob Mecca

CS 221 PROJECT FINAL

2015 Winter Combined League Web Draft Rule Packet (USING YEARS )

Clutch Hitters Revisited Pete Palmer and Dick Cramer National SABR Convention June 30, 2008

Matt Halper 12/10/14 Stats 50. The Batting Pitcher:

Lorenzo Cain v. Kansas City Royals. Submission on Behalf of the Kansas City Royals. Team 14

2018 Winter League N.L. Web Draft Packet

Lesson 3 Pre-Visit Teams & Players by the Numbers

2017 B.L. DRAFT and RULES PACKET

2011 COMBINED LEAGUE (with a DH) DRAFT / RULES PACKET

One could argue that the United States is sports driven. Many cities are passionate and

Correlation and regression using the Lahman database for baseball Michael Lopez, Skidmore College

Lesson 2 Pre-Visit Slugging Percentage

Chapter 2 - Displaying and Describing Categorical Data

Chapter 3 - Displaying and Describing Categorical Data

Billy Beane s Three Fundamental Insights on Baseball and Investing

Manny Mota: Does Retrosheet Back Up My Memories? As a kid in the 70's, I was a huge Dodger fan and Manny Mota was my favorite player.

It s conventional sabermetric wisdom that players

BABE: THE SULTAN OF PITCHING STATS? by. August 2010 MIDDLEBURY COLLEGE ECONOMICS DISCUSSION PAPER NO

Which On-Base Percentage Shows. the Highest True Ability of a. Baseball Player?

Jonathan White Paper Title: An Analysis of the Relationship between Pressure and Performance in Major League Baseball Players

Betaball. Using Finance to Evaluate. Baseball Contracts. Jamie O Donohue

Running head: DATA ANALYSIS AND INTERPRETATION 1

2013 National Baseball Arbitration Competition. Tommy Hanson v. Atlanta Braves. Submission on behalf of Atlanta Braves. Submitted by Team 28

Offensive & Defensive Tactics. Plan Development & Analysis

Predicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007

Lab 11: Introduction to Linear Regression

2014 Tulane Baseball Arbitration Competition Eric Hosmer v. Kansas City Royals (MLB)

Pitching Performance and Age

Baseball Scorekeeping for First Timers

B. AA228/CS238 Component

Building an NFL performance metric

A Markov Model for Baseball with Applications

Predicting the use of the Sacrifice Bunt in Major League Baseball. Charlie Gallagher Brian Gilbert Neelay Mehta Chao Rao

Dexter Fowler v. Colorado Rockies (MLB)

ANALYSIS OF A BASEBALL SIMULATION GAME USING MARKOV CHAINS

Predicting Season-Long Baseball Statistics. By: Brandon Liu and Bryan McLellan

Stats in Algebra, Oh My!

Computer Scorekeeping Procedures Updated: 6/10/2015

Southern U. Baseball 2017 Overall Statistics for Southern U. (as of Apr 01, 2017) (All games Sorted by Batting avg)

2013 Tulane National Baseball Arbitration Competition

#35 CODY BELLINGER #58 EDWARD PAREDES

Modeling Fantasy Football Quarterbacks

1988 Graphics Section Poster Session. Displaying Analysis of Baseball Salaries

2014 NATIONAL BASEBALL ARBITRATION COMPETITION ERIC HOSMER V. KANSAS CITY ROYALS (MLB) SUBMISSION ON BEHALF OF THE CLUB KANSAS CITY ROYALS

Raymond HV Gallucci, PhD, PE;

Player AVG GP-GS AB R H 2B 3B HR RBI TB SLG% BB HBP SO GDP OB% SF SH SB-ATT PO A E FLD%

A V C A - B A D G E R R E G I O N E D U C A T I O N A L T I P O F T H E W E E K

September 29, New type of file on Retrosheet website. Overview by Dave Smith

Department of Economics Working Paper Series

Computer Scorekeeping Procedures

Fastball Baseball Manager 2.5 for Joomla 2.5x

Project 1 Those amazing Red Sox!

Chapter 3 Displaying and Describing Categorical Data

2014 National Baseball Arbitration Competition

My Tournament Team Selection Process

2014 National Baseball Arbitration Competition

Descriptive Statistics Project Is there a home field advantage in major league baseball?

The MLB Language. Figure 1.

The Influence of Free-Agent Filing on MLB Player Performance. Evan C. Holden Paul M. Sommers. June 2005

Forecasting Baseball

APBA Baseball for Windows 5.75 Update 22

An Analysis of the Effects of Long-Term Contracts on Performance in Major League Baseball

NUMB3RS Activity: Is It for Real? Episode: Hardball

IHS AP Statistics Chapter 2 Modeling Distributions of Data MP1

PREDICTING the outcomes of sporting events

Department of Economics Working Paper Series

ISDS 4141 Sample Data Mining Work. Tool Used: SAS Enterprise Guide

AP Stats Chapter 2 Notes

2014 Tulane Baseball Arbitration Competition Josh Reddick v. Oakland Athletics (MLB)

(Under the Direction of Cheolwoo Park) ABSTRACT. Major League Baseball is a sport complete with a multitude of statistics to evaluate a player s

Pitching Performance and Age

Jenrry Mejia v. New York Mets Submission on Behalf of the New York Mets Midpoint: $2.6M Submission by Team 32

Regression Analysis of Success in Major League Baseball

Percentage. Year. The Myth of the Closer. By David W. Smith Presented July 29, 2016 SABR46, Miami, Florida

ASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era. Gary Evans Stat 201B Winter, 2010

Background Information. Project Instructions. Problem Statement. EXAM REVIEW PROJECT Microsoft Excel Review Baseball Hall of Fame Problem

Transcription:

Additional On-base Worth 3x Additional Slugging? Mark Pankin SABR 36 July 1, 2006 Seattle, Washington Notes provide additional information and were reminders during the presentation. They are not supposed to be anything close to a complete text of the presentation or thorough discussion of the subject. If some of the tables or graphs are hard to read, you can make them larger with Acrobats viewing controls: box with size percentage or from the View menu. 1

Relative Marginal Values Moneyball: DePodesta 3:1 (an extra point of OBP is worth 3 of SLG) Moneyball: Conventional wisdom <1.5:1 OPS = OBP + SLG implies 1:1 Can we determine the correct value? How does the relationship vary over time and by scoring levels? Research inspired by comments on page 128 of the book. It says DePodesta tinkered with Runs Created formula to come up with the most accurate runs estimator he knew, and it led to a 3:1 ratio. His analysis assumes (I think) all batters are the same. In a real lineup, the relationship may well vary by lineup position and the preceding or following batters. Alan Schwarz, author of The Numbers Game, told me at SABR 34 that 3:1 was economic; got three times the bang per buck trying to increase OBP vs. increasing SLG. Note: I have not tried to get in touch with DePodesta or author Michael Lewis. 2

Analytical Approach Changing number of BB (for player, team, league) affects OBP, leaves SLG the same Changing distribution of 1B, 2B, 3B, HR affects SLG, leaves OBP unchanged Change OBP by a specified amount, find change in runs using Markov model How are runs changed by same SLG change? While not likely realistic when comparing two players or teams, it is possible to vary OBP without affecting SLG by varying the number of walks and to vary SLG without affecting OBP by changing the distribution of hits while leaving the number of hits the same. That is change some singles to extra base hits (in proportion to the actual distribution of extra base hits) or vice-versa. Will change OBP and SLG by +/- 10, 20 points and compare the change in expected runs to obtain ratios. Note: information used is mostly from www.retrosheet.org (disclosure: I am the webmaster). Many thanks to Dave Smith for providing convenient data files based on Retrosheet site. 3

Markov Process Model Based on probabilities of going from one runners/outs situation to another Calculates number of runs per game All batters the same (team, league data) or lineup of different players Also useful for analysis of strategies and batting order optimization I have used the Markov model extensively for baseball strategy analysis, batting order optimization, and have given several talks on the subject at SABR meetings. It is well suited to study the OBP vs. SLG question because one of its strengths is determining the change in expected scoring when its inputs are varied. The model version used incorporates ML averages (84-92) for several events on the bases and some other events. None of that is going to have much of an effect on the OBP vs. SLG analysis with the possible exception of early years when errors were much more frequent. 4

Markov Compared to Runs Created, Linear Weights Markov builds innings and determines runs through play-by-play analysis RC, LW relate scoring directly to total stats (e.g. AB, H, 2B, HR, BB) RC, LW need to be re-fit for different scoring level eras Markov adapts better to changes in scoring levels Others (and myself to a lesser extent) have analyzed the OBP/SLG question using Linear Weights and Runs Created to evaluate the marginal effects on scoring. Those models generally lead to smaller variations than the Markov model. The main reason, I think, is because in a sense RC, LW hard wire scoring to the input stats because they are developed with scoring as the dependent variable, so they are in effect assuming the relationships hold over all scoring environments. One way around this is to re-fit those models for different scoring eras, but I have not seen that done. Regression analysis would also have the same difficulties. Markov, in contrast, builds up the scoring a play at a time, so the effects of the types of events occurs in the context of all the model inputs. Matrix math evaluates all possible events in compact calculations. 5

Markov Analysis Example 2005 MLB (combined AL & NL data) 2005 MLB: OBP = 0.333, SLG = 0.419, Team R/G = 4.592 Markov Model "base case": Team R/G = 4.893 (+6.6%) SLG unchanged OBP unchanged OBP R/G "+/-" SLG R/G "+/-" OBP/SLG 0.313 4.513-0.380 0.399 4.677-0.216 1.76 0.323 4.697-0.196 0.409 4.785-0.108 1.81 0.343 5.101 0.208 0.429 5.000 0.107 1.94 0.353 5.322 0.429 0.439 5.106 0.213 2.01 Average: 1.88 Inputs: AB,H,2B,3B,HR,BB,HBP,K,SB,CS OBP does not include SF (not available for many seasons), so may be larger than published value Model vs. actual error (+6.6%) is unusually large, the worst since 1908. Most years are within 4%. Model has been running high recently. Since model does not contain everything and assumes all batters are the same (in this analysis), differences from actual scoring are to be expected. Key point: Even if model is off, the differences between the cases should be consistent and accurate (roughly same amount error that cancels), so the ratio analysis is valid. Make this point when discussing first line in box. 6

Calculations for 1901-2005 Data not complete, so estimates made Strikeouts before 1913 CS: 1901-1919, 1926-1950 Markov Model Low for dead ball era More than 10% under actual 1901-05 Likely due to higher number of errors then Differences in runs should be valid 1910+ K estimates based on 1913, 1916, 1917 (avoid Federal League although likely not much different) K/AB Have CS for some teams and used average CS/SB for teams with data in 1914-16, which had a lot of teams with CS data, for the dead ball era. Used 1923-25 CS/SB as representative of 1926-50. Differences between model and actuals are normal for 1909 on, so ratios should be valid except possibly for the first decade of modern play. 7

MLB Runs/Game, Marginal OBP vs. SLG ratio (1901-2005) 6 5.5 SLG (right scale) 0.500 0.400 5 4.5 OBP (right scale) 0.300 0.200 4 3.5 Team R/G (left scale) 0.100 0.000 3-0.100 2.5 2 MLG Marginal OBP vs. SLG (left scale) -0.200-0.300 1.5 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000-0.400 Graph shows the marginal ratios for each year in the bottom line (left scale). Due to scale, variation may not look that large, but low of about 1.5 to high over 2 is about a 33% change. Top three lines provide batting, scoring context. Red line is slugging average (right scale), green line is OBP (right scale), and black line is the average runs per game per team. R/G varies from about 3.5 to 5.5. High scoring eras around the 1930 peak and recently. OBP does not fluctuate that much. SLG was quite low in dead ball era (not a surprise) and is now consistently over 0.400. Was also high in the 1920s and 1930s, with a peak in 1930. Will look at ratio sorted by R/G next to see the relationship. 8

Marginal OBP vs. SLG ratio (left scale) by Runs/Game 2.1 2 1999 2000 1930 1.9 1.8 1901 1.7 1972 1902, 1903 1.6 1908 1968 1.5 3 3.5 4 4.5 5 5.5 6 Some seasons have been highlighted and/or labeled. In the lower left, 1968 is the red diamond. It was the year of the pitcher and the mound was lowered after it. It was similar to 1908. The next red point is 1972, the last year before the DH was introduced (boo!) in the AL. At the upper right, 1930 stands out, the highest scoring season by far. 1999 and 2000 are also highlighted as two recent high scoring seasons. Clear that the OBP/SLG ratio goes up as scoring does. This makes intuitive sense. When runs are hard to get, being able to drive in runners (SLG) gains in relative importance. When scoring is high, it is easier to bring in runners on the bases, so there is a larger gain from getting on (OBP). 1901-1903 are outliers, likely due to higher frequency of errors then. Even with those points, correlation coefficient for the 105 years is 0.92, which is quite high and indicates a strong positive relationship between scoring levels and OBP/SLG. 9

1901-2005 MLB OBP/SLG Ranges from low of 1.53 (1908, 1968) to high of 2.05 (1930, 2000) Average over last 10 seasons: 1.95 Look at very low, high scoring teams to see if range is wider Nothing close to 3:1 on the MLB level. What about for highest and lowest scoring teams. Will start with lowest scoring. Only title shown when next slide appears. Ask audience to name lowest scoring non-dead ball team. I expect that they will fairly quickly. 10

Very Low Scoring Teams 1968 CHA: 2.86 R/G, OBP=0.284, SLG=0.311 1963 HOU: 2.86 R/G, OBP=0.283, SLG=0.301 Marginal OBP/SLG ratios: 1968 CHA: 1.39 1963 HOU: 1.40 First line will display on first click or down arrow. White Sox scored 463 in 162 games. 63 Colt 45 s scored one more, which rounds the same to two places. Consistent with the MLB results, these teams have even lower marginal OBP/SLG ratios. Now go to the highest scoring teams. Same steps: only title shows and ask audience to name some. 11

Very High Scoring Teams 1930 NYA: 6.90 R/G, OBP=0.384, SLG=0.488 1931 NYA: 6.88 R/G, OBP=0.383, SLG=0.457 (1936 NYA: 6.87 R/G, OBP=0.381, SLG=0.483) 1950 BOS: 6.67 R/G, OBP=0.385, SLG=0.464 Marginal OBP/SLG ratios: 1930 NYA: 2.59 1931 NYA: 2.49 1950 BOS: 2.46 Before for first key press, ask for highest scoring season. (Tom Ruane, according to Dave Smith, pointed out that the real answer is opponents of the 1930 Phillies: 1199 runs allowed in 156 games.) 1930 Yankees are highest R/G (1062/154). Next two Yankee teams scored more (1067 in 1931, 1065 in 1936), but played one more game. 1936 in () since won t show ratio for them. Good chance 1950 Red Sox will be mentioned. They had 1027 for a 154 game season, which is the next best after the three Yankee teams. Ratios are higher as expected, but still well below 3:1. 12

3:1 OBP/SLG possible? Almost certainly not for league or team using Markov model Moneyball said DePodesta had more accurate version of RC that yielded 3:1 Analysis using RC, LW not close to 3:1 What about for a team s lineup? Different ratios by batting order position? Markov model can be applied Look at 1927 Yankees I am skeptical that a more accurate version of RC could yield a 3:1 ratio since versions I have seen are very good and nowhere close to 3:1 for any team or league I have seen analyzed. 1927 Yankees scored 975 runs in 155 games, which was by far and away a new team record. More interestingly, they are the only team to have two hitters with SLG over 0.750. Only other team I have looked at is 2001 Oakland (because of Moneyball). I had planned to look at more teams and present them in a poster session, but the scheduling did not permit that. Maybe next year. 13

Markov Model: 1927 NYA added runs/154 games Player AVG OBP SLG OBP +.020 SLG +.020 OBP/SLG Combs 0.356 0.414 0.511 10.15 4.51 2.25 Koenig 0.285 0.320 0.382 13.17 3.45 3.82 Ruth 0.356 0.486 0.772 9.09 2.81 3.23 Gehrig 0.373 0.474 0.765 7.83 3.88 2.02 Meusel 0.337 0.393 0.510 9.05 4.75 1.90 Lazzeri 0.309 0.383 0.482 7.92 4.40 1.80 Dugan 0.269 0.321 0.362 9.16 4.23 2.16 Collins 0.275 0.407 0.418 7.12 3.07 2.32 Pitchers 0.215 0.288 0.260 8.35 3.33 2.51 Lineup shown is most common for the team that year. Only change used in analysis is increase of 20 points. That is due to the complexity of handling individual players. Also, decreases might not be possible if a player does not have enough BB, and would be more complex to figure if he does not have enough (or any!) triples or HR. Based on team data and 2005 MLB example shown earlier, these ratios may be a little higher than true values. First box shows added R/154 G for Combs when is OBP is increased by 20 points (and his SLG other players unchanged) and when SLG is up by 20. Ratio of the two is 2.25. Second box highlights all the ratios. Green box highlights the two (well above 3). Ruth may be a bit surprising, but he is followed by a great hitter and two very good ones. Lowest ratio for Lazzeri since he his followed by the weakest hitters. Note how well their pitchers hit although they were pretty poor in comparison with the rest of the team. I did not try to improve the batting order. It looks like switching Lazerri and Koenig would help. I think Collins s OBP is inflated by IBB (no data) in front of pitchers. 14

1927 Yankees Lineup OBP is most valuable for batters in front of power hitters (Koenig, Ruth) OBP/SLG well over 3 in front of Ruth, Gehrig; below 2 before weakest hitters 0.020 additional OBP at top of strong lineup can add a win per season per better hitter Given high scoring team, it probably would take more than 10 runs in a season to add a win. Combs improving by 20 OBP points might not do it, but it looks like Koenig would. 15

Conclusions Marginal OBP/SLG varies by scoring level Range of about 1.5 to 2.1 for MLB totals Wider for teams: 1.4 to 2.6 Recent values for MLB close to 2 Marginal ratio can vary much more for individual batters 3:1 or higher is possible for some batters in some lineups We see that the marginal values vary quite a bit on the level of scoring (Markov estimates) for a particular season or a particular team. In a lineup, the values can vary quite a bit due to the strengths of the preceding and following hitters. Insight rather than huge surprises here. There is a strong relationship (0.92 correlation) between ratio and team runs/game at the MLB level. Reached 3:1 for one hitter on 2001 A s (only other team analyzed so far). I will post this talk on my web site (next slide) and hope to write an article based on it for a SABR publication (but no firm promises). 16

Web sites, e-mail www.pankin.com/baseball.htm has details about Markov model and other baseball studies By the Numbers newsletter: www.philbirnbaum.com E-mail: mark@pankin.com 17