Trends in Baseball Scoring & Strikeouts, Geoffrey Holland ECON 5341 Advanced Data Analysis 16 November 2015

Similar documents
Boston Marathon Data. Instructor: G. William Schwert

Modelling residential prices with cointegration techniques and automatic selection algorithms

Real-Time Electricity Pricing

a) List and define all assumptions for multiple OLS regression. These are all listed in section 6.5

ASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era. Gary Evans Stat 201B Winter, 2010

Effects of Incentives: Evidence from Major League Baseball. Guy Stevens April 27, 2013

Paul M. Sommers. March 2010 MIDDLEBURY COLLEGE ECONOMICS DISCUSSION PAPER NO

Running head: DATA ANALYSIS AND INTERPRETATION 1

Stats in Algebra, Oh My!

SAP Predictive Analysis and the MLB Post Season

Distancei = BrandAi + 2 BrandBi + 3 BrandCi + i

Simulating Major League Baseball Games

Announcements. Lecture 19: Inference for SLR & Transformations. Online quiz 7 - commonly missed questions

Major League Baseball Offensive Production in the Designated Hitter Era (1973 Present)

Introduction to Time Series Analysis of Macroeconomic- and Financial-Data. Lecture 5: Trends, Model Selection, and Summary

Using Actual Betting Percentages to Analyze Sportsbook Behavior: The Canadian and Arena Football Leagues

Structural Breaks in the Game: The Case of Major League Baseball

CS 221 PROJECT FINAL

1. Answer this student s question: Is a random sample of 5% of the students at my school large enough, or should I use 10%?

OBSERVED VARIABILITY IN OIL SARDINE AND MACKEREL FISHERY OF SOUTHWEST COAST OF INDIA STATISTICAL APPROACH

Lesson 2 Pre-Visit Big Business of the Big Leagues

Predicting Season-Long Baseball Statistics. By: Brandon Liu and Bryan McLellan

Name May 3, 2007 Math Probability and Statistics

Predicting the use of the Sacrifice Bunt in Major League Baseball. Charlie Gallagher Brian Gilbert Neelay Mehta Chao Rao

Pitching Performance and Age

Statistical analysis of wave parameters in the north coast of the Persian Gulf

Consumer behaviour and the welfare eects of gambling in

Pitching Performance and Age

Jenrry Mejia v. New York Mets Submission on Behalf of the New York Mets Midpoint: $2.6M Submission by Team 32

Lesson 5 Post-Visit Do Big League Salaries Equal Big Wins?

Statistical Analysis of PGA Tour Skill Rankings USGA Research and Test Center June 1, 2007

How are the values related to each other? Are there values that are General Education Statistics

CHAPTER 2 Modeling Distributions of Data

Matt Halper 12/10/14 Stats 50. The Batting Pitcher:

United States Commercial Vertical Line Vessel Standardized Catch Rates of Red Grouper in the US South Atlantic,

The Reliability of Intrinsic Batted Ball Statistics Appendix

One Chrome Autographed Card Per Hobby Box!

A) The linear correlation is weak, and the two variables vary in the same direction.

Forecasting Persistent Data with Possible Structural Breaks: Old School and New School. Lessons Using OECD Unemployment Rates

Psychology - Mr. Callaway/Mundy s Mill HS Unit Research Methods - Statistics

IHS AP Statistics Chapter 2 Modeling Distributions of Data MP1

Encompassing Of Nested and Non-nested Models:Energy-Growth Models

2014 MAJOR LEAGUE LEAGUE BASEBALL ATTENDANCE NOTES

A Competitive Edge? The Impact of State Income Taxes on the Acquisition of Free Agents by Major League Baseball Franchises

Lesson 3 Pre-Visit Teams & Players by the Numbers

Factors Affecting Minor League Baseball Attendance. League of AA minor league baseball. Initially launched as the Akron Aeros in 1997, the team

How do changes in gasoline prices affect bus ridership in the Twin Cities?

B. AA228/CS238 Component

The Labor Market of Major League Baseball: Player Salaries and How They Relate to On-Field Performance

An Analysis of the Effects of Long-Term Contracts on Performance in Major League Baseball

arxiv:math/ v4 [math.st] 7 Mar 2006

SCDNR Charterboat Logbook Program Data, Mike Errigo, Eric Hiltz, and Amy Dukes SEDAR32-DW-08

ISDS 4141 Sample Data Mining Work. Tool Used: SAS Enterprise Guide

An average pitcher's PG = 50. Higher numbers are worse, and lower are better. Great seasons will have negative PG ratings.

Title: AJAE Appendix for Measuring Benefits from a Marketing Cooperative in the Copper

Stats 2002: Probabilities for Wins and Losses of Online Gambling

The Changing Hitting Performance Profile In the Major League, September 2007 MIDDLEBURY COLLEGE ECONOMICS DISCUSSION PAPER NO.

Section I: Multiple Choice Select the best answer for each problem.

Average Runs per inning,

Legendre et al Appendices and Supplements, p. 1

THE INTEGRATION OF THE SEA BREAM AND SEA BASS MARKET: EVIDENCE FROM GREECE AND SPAIN

Forecasting and Visualisation. Time series in R

The Sports Labor Market Part 3 ECONOMICS OF SPORTS (ECON 325) BEN VAN KAMMEN, PHD

76112F22 BALL COLOR: PMS 802 LOGO COLOR CHART LACE COLOR: PMS 380. White C-0 M-0 Y-0 K-0. Process Black C C-0 M-0 Y-0 K-100 BALL COLOR: PMS 380

How Effective is Change of Pace Bowling in Cricket?

MAJOR LEAGUE BASEBALL 2014 ATTENDANCE ANALYSIS. Compiled and Written by David P. Kronheim.

Correlation and regression using the Lahman database for baseball Michael Lopez, Skidmore College

2016 MAJOR LEAGUE BASEBALL ATTENDANCE HIGHLIGHTS

DO LONG-TERM CONTRACTS ENCOURAGE SHIRKING IN MAJOR LEAGUE BASEBALL? A THESIS. Presented to. The Faculty of the Department of Economics and Business

76112F15 BALL COLOR: PMS 802 LOGO COLOR CHART LACE COLOR: PMS 380. White C-0 M-0 Y-0 K-0. Process Black C C-0 M-0 Y-0 K-100 BALL COLOR: PMS 380

Impact of Bike Facilities on Residential Property Prices

Reminders. Homework scores will be up by tomorrow morning. Please me and the TAs with any grading questions by tomorrow at 5pm

Business Cycles. Chris Edmond NYU Stern. Spring 2007

Data Set 7: Bioerosion by Parrotfish Background volume of bites The question:

The Rise in Infield Hits

1. In a hypothesis test involving two-samples, the hypothesized difference in means must be 0. True. False

Scotty s Spring Training

MAJOR LEAGUE BASEBALL 2013 ATTENDANCE ANALYSIS. Compiled and Written by David P. Kronheim.

The U.S. Congress established the East-West Center in 1960 to foster mutual understanding and cooperation among the governments and peoples of the

The Effect of Newspaper Entry and Exit on Electoral Politics Matthew Gentzkow, Jesse M. Shapiro, and Michael Sinkinson Web Appendix

Predicting Baseball Win/Loss Records from Player. Projections

Fatal Train accidents on Europe`s railways: Prof. Andrew Evans from CTS, Imperial College London. Wednesday, 02 March :00

STADIUM CONSTRUCTION AND MINOR LEAGUE BASEBALL ATTENDANCE

Sample Final Exam MAT 128/SOC 251, Spring 2018

Habit Formation in Voting: Evidence from Rainy Elections Thomas Fujiwara, Kyle Meng, and Tom Vogl ONLINE APPENDIX

PREDICTING the outcomes of sporting events

Tide gauge location and the measurement of global sea level rise

A Markov Model for Baseball with Applications

International Discrimination in NBA

Sportsbook pricing and the behavioral biases of bettors in the NHL

Math SL Internal Assessment What is the relationship between free throw shooting percentage and 3 point shooting percentages?

Predicting the Medal Wins by Country at the 2006 Winter Olympic Games: An Econometrics Approach

4. A student estimated a regression model using annual data for 1990 through 2015, C = β 0. Y + β 2

Competitive Balance and Attendance

ISyE 6414 Regression Analysis

Lesson 5 Post-Visit Factors of Sports Production

Cami T. McCandless and Joesph J. Mello SEDAR39- DW June 2014

The Designated Hitter Rule and Team Defensive Strategy. in Japan s Professional Baseball Leagues

The streak ended Aug. 17, 2017, with a loss at home to Cincinnati.

Analysis of Signalized Intersection Crashes

Transcription:

Trends in Baseball Scoring & Strikeouts, 1962-2014 Geoffrey Holland ECON 5341 Advanced Data Analysis 16 November 2015

Background Statistics are intrinsic part of Baseball Series under study are Runs Scored and Strikeouts Data from Lahman Baseball Database Contains individual player records back to 1871 Nearly 100,000 player-seasons available Time period: 1962-2014 Year of the Pitcher, 1968 Steroid Era, 1990s-2000s Second Year of the Pitcher, 2010 Data points normalized to team average for a season Work stoppages (1972, 1981, 1994) Changing number of teams (18 in 1961, 30 in 2014) 162-game season (adopted by AL 1961, NL in 1962) Examine both league-wide and individual teams

1,400 Runs & Strikeouts League Average 1,200 1,000 1968, Year of the Pitcher 800 600 R SO 400 200 -

1,400 1,200 Runs & Strikeouts League Average Steroid Era 1,000 800 600 R SO 400 200 -

1,400 1,200 Runs & Strikeouts League Average 2010, Second Year of the Pitcher 1,000 800 600 R SO 400 200 -

Hypothesis Runs and Strikeouts will demonstrate Autocorrelation Good players tend to stay in the league, so positive performance begets positive performance Both series will be stationary Baseball seasons are finite Runs and Strikeouts are count variables, not continuous Individual Teams will have higher order Autocorrelation due to persistence of exceptional players

1,400 Runs & Strikeouts League Average 1,200 1,000 800 600 R SO 400 200 - Runs appears stationary, but SO may not be

ACF & PACF Autocorrelations of so -1.00-0.50 0.00 0.50 1.00 0 5 10 15 20 25 Lag Bartlett's formula for MA(q) 95% confidence bands Partial autocorrelations of so -0.50 0.00 0.50 1.00 0 5 10 15 20 25 Lag 95% Confidence bands [se = 1/sqrt(n)] Autocorrelations of r -1.00-0.50 0.00 0.50 1.00 0 5 10 15 20 25 Lag Bartlett's formula for MA(q) 95% confidence bands Partial autocorrelations of r -0.20 0.00 0.20 0.40 0.60 0.80 0 5 10 15 20 25 Lag 95% Confidence bands [se = 1/sqrt(n)] Underlying Process appears to be AR(1)

OLS Results - Runs T-stat 6.93 > 2.678 99% critical value β <1, so Runs is stationary

OLS Results - Strikeouts T-stat 28.43 > 2.678 99% critical value But β >1, so Strikeouts is non-stationary

First Difference of Strikeouts -50 SO, D 0 50-100 100 1960 1980 2000 2020 yearid Autocorrelations of D.so -0.40-0.20 0.00 0.20 0.40 0 5 10 15 20 25 Lag Bartlett's formula for MA(q) 95% confidence bands Partial autocorrelations of D.so -0.60-0.40-0.20 0.00 0.20 0 5 10 15 20 25 Lag 95% Confidence bands [se = 1/sqrt(n)]

VAR Model

Conclusions Runs and Strikeouts will demonstrate Autocorrelation First lag only High turnover Difficult to sustain performance Too many structural breaks X Both series will be stationary Runs is stationary because β <1 Strikeouts is non-stationary I(1) KPSS rejects null, ADF and PP fail to reject at α = 0.01 Probably a coincidence due to late-year outliers, but interesting nonetheless

Forecasting Method 1. Normalize Runs and Strikeouts for full 162 game season 2. Regress Runs and Strikeouts on Trend term 1. Save detrended Runs and Strikeouts 2. Save constants & coefficients from OLS trend regression 3. Calculate straight-line 2015 Runs and Strikeouts from trend regression coefficient & constant 4. Create VAR model from detrended Runs and Strikeouts 1. Lag selection using AIC, HQIC, and SBIC. Lag length varies by team. 5. Use VAR model to predict Runs and Strikeouts above or below trend in 2015 6. Add the straight-line regression to VAR-predicted delta for final adjusted forecast

1,400 Runs & Strikeouts League Average 1,200 1,000 800 600 R SO 400 200 -

League Avg Forecast 1 Lag 0-200 -100 100 200 2010 2012 2014 2016 2018 2020 yearid R (detrended) SO (detrended) frstat, dyn(2015) fsostat, dyn(2015) 95% LB for frstat 95% UB for frstat 95% LB for fsostat 95% UB for fsostat 2015 Forecast Strikeouts Runs OLS Straight Line Forecast 1,123 768 VAR Delta 126 (82) Adjusted Forecast 1,249 687 Actual 1,248 688 Delta from Forecast 1 (1) Delta % from Forecast 0.1% -0.2%

Runs and SO by Team 1000 1500 Los Angeles Angels Baltimore Orioles 1000 1500 500 Pittsburgh Pirates Texas Rangers 500 1960 1980 2000 20201960 1980 2000 2020 Graphs by Team Dummy Year Runs Scored per 162 Strikeouts per 162

Rangers Forecast 1 Lag -150-100 -50 0 50 100 2010 2011 2012 2013 2014 2015 Year R (detrended) frstat, dyn(2015) SO (detrended) fsostat, dyn(2015) 2015 Forecast Strikeouts Runs OLS Straight Trend Forecast 1,084 873 VAR Delta 49 (79) Adjusted Forecast 1,133 794 Actual 1,233 751 Delta from Forecast 100 (43) Delta % from Forecast 8.8% -5.4%

Angels Forecast 2 Lags -200-100 0 100 200 2010 2011 2012 2013 2014 2015 Year R (detrended) SO (detrended) frstat, dyn(2015) fsostat, dyn(2015) 95% LB for frstat 95% UB for frstat 95% LB for fsostat 95% UB for fsostat 2015 Forecast Strikeouts Runs OLS Straight Trend Forecast 1,007 811 VAR Delta 58 (69) Adjusted Forecast 1,065 742 Actual 1,150 661 Delta from Forecast 85 (81) Delta % from Forecast 8.0% -10.9%

Pirates Forecast 3 Lags -200-100 0 100 200 300 2010 2011 2012 2013 2014 2015 Year R (detrended) frstat, dyn(2015) SO (detrended) fsostat, dyn(2015) 2015 Forecast Strikeouts Runs OLS Straight Trend Forecast 1,170 671 VAR Delta 186 (69) Adjusted Forecast 1,356 602 Actual 1,322 697 Delta from Forecast (34) 95 Delta % from Forecast -2.5% 15.8%

Orioles Forecast 4 Lags -200-100 0 100 200 2010 2011 2012 2013 2014 2015 Year R (detrended) frstat, dyn(2015) SO (detrended) fsostat, dyn(2015) 2015 Forecast Strikeouts Runs OLS Straight Trend Forecast 1,010 758 VAR Delta - (9) Adjusted Forecast 1,010 749 Actual 1,331 713 Delta from Forecast 321 (36) Delta % from Forecast 31.8% -4.8%

Acknowledgements Lahman Baseball Database copyright 1996-2015 Sean Lahman. Used with permission under Creative Common License 3.0. http://www.seanlahman.com/baseball-archive/statistics/ This presentation was created with design template from SmileTemplates.com. http://www.smiletemplates.com/powerpoint-templates/baseball/00405/

Questions?