Predicting Baseball Home Run Records Using Exponential Frequency Distributions

Similar documents
CHAPTER 2 Modeling Distributions of Data

DESCRIBE the effect of adding, subtracting, multiplying by, or dividing by a constant on the shape, center, and spread of a distribution of data.

IHS AP Statistics Chapter 2 Modeling Distributions of Data MP1

The pth percentile of a distribution is the value with p percent of the observations less than it.

Determining Occurrence in FMEA Using Hazard Function

AP Stats Chapter 2 Notes

Chapter 2: Modeling Distributions of Data

SCIENTIFIC COMMITTEE SEVENTH REGULAR SESSION August 2011 Pohnpei, Federated States of Micronesia

The Rise in Infield Hits

NUMB3RS Activity: Is It for Real? Episode: Hardball

Running head: DATA ANALYSIS AND INTERPRETATION 1

CHAPTER 2 Modeling Distributions of Data

Scaled vs. Original Socre Mean = 77 Median = 77.1

Age and Winning in the Professional Golf Tours

Stats 2002: Probabilities for Wins and Losses of Online Gambling

Average Runs per inning,

Simulating Major League Baseball Games

When Should Bonds be Walked Intentionally?

DO YOU KNOW WHO THE BEST BASEBALL HITTER OF ALL TIMES IS?...YOUR JOB IS TO FIND OUT.

Internet Technology Fundamentals. To use a passing score at the percentiles listed below:

5GRADE Informational/ Explanatory Genre

Fatal Train accidents on Europe`s railways: Prof. Andrew Evans from CTS, Imperial College London. Wednesday, 02 March :00

Useful Tools in Mosquito Surveillance

Psychology - Mr. Callaway/Mundy s Mill HS Unit Research Methods - Statistics

Relative Size Selectivity of Trap Nets for Eight Species of Fish'

Chasing Hank Aaron s Home Run Record. Steven P. Bisgaier, Benjamin S. Bradley, Peter D. Harwood and Paul M. Sommers. June, 2002

How are the values related to each other? Are there values that are General Education Statistics

APPENDIX H LAKE OKEECHOBEE FLOOD ROUTINES

46 Chapter 8 Statistics: An Introduction

STUDY PERFORMANCE REPORT

Lab Report Outline the Bones of the Story

Lab 11: Introduction to Linear Regression

Constructing and Interpreting Two-Way Frequency Tables

Special edition paper

JEFF SAMARDZIJA CHICAGO CUBS BRIEF FOR THE CHICAGO CUBS TEAM 4

SECTION 2 HYDROLOGY AND FLOW REGIMES

Procedia Engineering Procedia Engineering 2 (2010)

Jonathan White Paper Title: An Analysis of the Relationship between Pressure and Performance in Major League Baseball Players

Robert Jones Bandage Report

2015 NATIONAL BASEBALL ARBITRATION COMPETITION. Mark Trumbo v. Arizona Diamondbacks. Submission on Behalf of Mark Trumbo. Midpoint: $5,900,000

B. AA228/CS238 Component

Statistical Analysis of PGA Tour Skill Rankings USGA Research and Test Center June 1, 2007

Citation for published version (APA): Canudas Romo, V. (2003). Decomposition Methods in Demography Groningen: s.n.

renewable energy projects by renewable energy people

AIS data analysis for vessel behavior during strong currents and during encounters in the Botlek area in the Port of Rotterdam

2015 National Baseball Arbitration Competition

Practice Test Unit 06B 11A: Probability, Permutations and Combinations. Practice Test Unit 11B: Data Analysis

(c) The hospital decided to collect the data from the first 50 patients admitted on July 4, 2010.

LOW PRESSURE EFFUSION OF GASES revised by Igor Bolotin 03/05/12

STANDARD SCORES AND THE NORMAL DISTRIBUTION

Salary correlations with batting performance

Wind Data Verification Report Arriga 50m

Analysis of NAM Forecast Wind Shear for Dissipation of Mesoscale Convective Systems

NBA TEAM SYNERGY RESEARCH REPORT 1

Discussion on the Selection of the Recommended Fish Passage Design Discharge

Journal of Chemical and Pharmaceutical Research, 2014, 6(3): Research Article

2014 NATIONAL BASEBALL ARBITRATION COMPETITION ERIC HOSMER V. KANSAS CITY ROYALS (MLB) SUBMISSION ON BEHALF OF THE CLUB KANSAS CITY ROYALS

2013 National Baseball Arbitration Competition

Chapter. 1 Who s the Best Hitter? Averages

What s UP in the. Pacific Ocean? Learning Objectives

IMPLICATIONS OF THE WEIBULL K FACTOR IN RESOURCE ASSESSMENT

Averages. October 19, Discussion item: When we talk about an average, what exactly do we mean? When are they useful?

Descriptive Statistics Project Is there a home field advantage in major league baseball?

Grade 6 Math Circles Fall October 7/8 Statistics

A Theory for Strong Long-Lived Squall Lines Revisited

Queue analysis for the toll station of the Öresund fixed link. Pontus Matstoms *

Ch.5 Reliability System Modeling.

1. If x represents a number, six ubtracted from four times the number can be represented by

USING A PROBABILITY APPROACH TO RANK IN-LINE INSPECTION ANOMALIES FOR EXCAVATION AND FOR SETTING REINSPECTION INTERVALS

Geometric Categories as Intersection Safety Evaluation Tools

Evaluating the Influence of R3 Treatments on Fishing License Sales in Pennsylvania

EXPERIMENTAL ANALYSIS OF MECHANICAL PROPERTIES OF SELECTED TAKRAW BALLS IN MALAYSIA

Regional Analysis of Extremal Wave Height Variability Oregon Coast, USA. Heidi P. Moritz and Hans R. Moritz

Chapter 2 - Frequency Distributions and Graphs

IMPROVING POPULATION MANAGEMENT AND HARVEST QUOTAS OF MOOSE IN RUSSIA

Infield Hits. Infield Hits. Parker Phillips Harry Simon PARKER PHILLIPS HARRY SIMON

STAT 155 Introductory Statistics. Lecture 2: Displaying Distributions with Graphs

Honest Mirror: Quantitative Assessment of Player Performances in an ODI Cricket Match

5.1. Data Displays Batter Up. My Notes ACTIVITY

An Examination of the Effects of a Curriculum Based Pedometer Program in Two Age Groups: Adults and Pre-Adolescent Children

In my left hand I hold 15 Argentine pesos. In my right, I hold 100 Chilean

Friction properties of the face of a hand-held tennis racket

The Effect of a Seven Week Exercise Program on Golf Swing Performance and Musculoskeletal Screening Scores

2009 W orld C hampions. Legends Suite Champions Suite Delta Sky360 Suite Jim Beam Suite

RUNNING VELOCITY DYNAMICS IN 100 M SPRINT: COMPARATIVE ANALYSIS OF THE WORLD TOP AND ESTONIAN TOP MALE SPRINTERS

Practice Test Unit 6B/11A/11B: Probability and Logic

Figure 1. Winning percentage when leading by indicated margin after each inning,

Growth: Humans & Surf Clams

Lesson 2 Pre-Visit Slugging Percentage

Structural Breaks in the Game: The Case of Major League Baseball

Influence of wind direction on noise emission and propagation from wind turbines

LOW PRESSURE EFFUSION OF GASES adapted by Luke Hanley and Mike Trenary

The 3 Mental Must-Haves To Avoid A Hitting Slump

1) Włodzimierz Starosta, 2) Iwona Dębczyńska-Wróbel, 3) Łukasz Lamcha

Regression to the Mean at The Masters Golf Tournament A comparative analysis of regression to the mean on the PGA tour and at the Masters Tournament

2. Use the following information to find your course average in Mr. Tats class: Homework {83, 92, 95, 90}

Statistics. Wednesday, March 28, 2012

SAP Predictive Analysis and the MLB Post Season

Design of a Microcontroller-Based Pitch Angle Controller for a Wind Powered Generator

The relationship between payroll and performance disparity in major league baseball: an alternative measure. Abstract

Transcription:

arxiv:physics/0608228v1 [physics.pop-ph] 23 Aug 2006 Predicting Baseball Home Run Records Using Exponential Frequency Distributions Abstract Daniel J. Kelley, Jonas R. Mureika, Jeffrey A. Phillips Department of Physics, Loyola Marymount University 1 LMU Drive, MS-8227, Los Angeles, CA 90045 Contact E-mail: jphillips@lmu.edu A new model, which uses the frequency of individuals annual home run totals, is employed to predict future home run totals and records in Major League Baseball. Complete home run frequency data from 1903 2005 is analyzed, resulting in annual exponential distributions whose changes can be a used as a measure of progression in the sport and serve as a basis against which record-setting performances can be compared. We show that there is an 80% chance that Barry Bonds current 73 home run record will be broken in the next 10 years, despite the longevity of previous records held by baseball legends Babe Ruth and Roger Marris. 1

Previous attempts to predict athletic progressions have focused on trends in either world records [1, 2] or the best annual performance [3]. Our new model looks at all of the athletes and is loosely inspired by the Gutenberg- Richter relationship for earthquake distributions [4], which relates the likelihood of large, rare events to the frequency of smaller ones. By analyzing the entire population, we can make predictions as to what the top performances, interpreted as large events, will be. We have successfully applied this model to a variety of sports exemplifying individual achievement, including track and field, weight lifting, and baseball [5], observing that the distribution of such performances is exponential. In this study we examined annual home run totals for players in Major League Baseball between 1903 and 2005 [6]. For each year, we created frequency distributions based on how many players hit a given number of homeruns in that year (fig. 1a-c). These distributions show that not only are there more players hitting few homeruns, which serve as the small events, but also that there is a relationship between this portion of the distribution and the players with the greatest number of homeruns. The exponential fits shown in figure 1a-c are determined by the 95% of players with the lowest home run totals. The annual and all-time rarity of a performance is assessed by a player s deviation from a given year s exponential distribution. For example, the distribution of the lowest home run totals implies that one-third of a player would hit 51 home runs in 2005 (fig. 1c). If we were to have three times as 2

many players, or simply repeat the same season three times, we would expect tohaveoneplayer hitting51homeruns. Andruw Jones 2005seasoncanthus be viewed as a once in three-year performance, while Barry Bonds current record is a one in 10-year event (fig. 1b). By comparison, Babe Ruth s 60 home runs in 1927 is an outstanding one in 10,000-year performance(fig. 1a); far more impressive than that of Bonds, even though the latter hit 73 home runs. By considering relative performances, it becomes possible to compare players of different eras, even though baseball has progressed, largely due to improved training. To predict home run records, we have analyzed the exponential distributions over the past 103 years and extrapolated the progression of these distributions to the future. Since 1903, the parameters of the exponential distributions have been changing in a continuous manner with the rate of change of each being approximately constant since 1948. We have assumed that these rates will continue for the near future. The top player of a given year often performs at a level beyond what is predicted by the lower 95% of players. This exceedance, which may be due to the additional motivation that stems from lucrative financial pay-offs and sponsorship deals that often go to the top player, is also included in our model. The result of these factors suggests that the probability of somebody hitting 74 home runs within the next five years, breaking Bonds current record, is greater than 50%, and over 80% after ten years (fig. 1d). 3

Acknowledgments This project was made possible through financial support from a Rains Research Grant at Loyola Marymount University. References [1] Savaglio, S. and Carbone, V. Nature, 404, 244 (2000) [2] Katz, J. S. and Katz, L., J. Sport Sci. 17, 467 (1999) [3] Gembris, D., Taylor, J. G. and Suter, D. Nature, 417, 506 (2002) [4] Gutenberg, B. and Richter, C. F. Ann. Geofis. 9, 1 (1956) [5] Kelley, D. J., Mureika J. R. and Phillips, J. A., manuscript in preparation [6] Home run data and statistics were obtained from the website http://www.thebaseballcube.com/. 4

10-2 a. b. 10 1 10-4 10 1 10-1 0 20 40 60 0 20 40 60 10-1 0.6 0.4 0.2 0 20 40 60 80 1 c. d. 0.8 Probability of Hitting 74 HR 0 2005 2015 2025 2035 Year Figure 1: Parts a, b and c show the frequency of players hitting a number of home runs in 1927, 2001 and 2005 respectively. To ensure that all players were playing under similar conditions, only players with at least 100 at bats have been included. For all years, the frequency of the 95% of players with the lowest home run totals are fit with an exponential curve be rn, where r is the rate parameter, b is the scale and N is the number of home runs. Part d shows the probability that somebody will break Barry Bondss current home run record before a given year. 5