Predicting Season-Long Baseball Statistics. By: Brandon Liu and Bryan McLellan

Size: px
Start display at page:

Download "Predicting Season-Long Baseball Statistics. By: Brandon Liu and Bryan McLellan"

Transcription

1 Stanford CS 221 Predicting Season-Long Baseball Statistics By: Brandon Liu and Bryan McLellan Task Definition Though handwritten baseball scorecards have become obsolete, baseball is at its core a statistical goldmine, full of well-kept statistics and measurables. Every aspect of the game is televised, tracked, and stored, everything from a player s home run total to the the speed, movement, and type of pitch on a given at bat. Using more traditional statistics and some of these more recently tracked statistics presents interesting opportunities to apply Machine Learning algorithms in order to predict baseball player performance over season-long data. Predicting player performance accurately can lend insight into predicting team performance and individual player awards like the MVP (Most Valuable Player) and Cy Young Awards. These predictive insights also provide an interesting model for player valuations for instance, how do players provide team value compared to what they are paid for salary? This project applies various Machine Learning algorithms to predict several mainstream baseball statistics for the active MLB batters and pitchers. These statistics include but are not limited to for batters: [RBIs, Runs, HRs, AVG] and for pitchers: [W, L, ERA, BB, H]. Evaluation In evaluating our results, we ultimately truncated our data set to the top 200 active MLB batters and starting pitchers. Relief pitchers frequently change roles, so the dataset was not optimal. For any given statistic, our prediction engine takes in statistics from one year and 1

2 outputs predictions for the next year. Our evaluation metrics consisted of inputting 2015 data and outputting 2016 predictions, which were then compared against actuals. Our primary evaluation metrics were raw difference compared to actual as well as percent difference [(actual - predicted) / actual]. For a less stringent error metric, attempting to predict the top 20 players in each category gave us ~45% accuracy, which is deflated slightly due to the presence of 3-4 rookies making the top of the statistical categories in As an additional case study, we attempted to predict the Cy Young and MVP award winners for each league by aggregating leaders in our predicted statistical categories. The predictions were quite realistic, in selecting viable candidates and usually picking the actual winner as either the 2nd or 3rd place finisher. Infrastructure: The project infrastructure largely involved data collection from various sources: Sean Lahman s baseball database contains batting and pitching statistics from 1871 to 2015 Fangraphs.com contains records of advanced statistics on all players Baseball-reference.com traditional baseball encyclopedia 2

3 The Lahman baseball data sets were the primary source for our statistics and querying, as they contained unique playerids which we could use to guarantee matching data for a single player. Some of the data like fangraphs had to be manually scraped with python code that would automatically export csv files (fangraphs includes an export button on their website). The Lahman baseball data set and the baseball reference data sets, though downloadable, required some cleaning and validation to remove blanks, resolve duplicates, and process improperly formatted rows and columns. The Baseball-reference.com data set, for example, would intermittently include extraneous rows within the data set for the header names. Once the data was properly cleaned, we had to then link data from all three data sources from player name to the unique player ID that was included in the Lahman baseball data set. We additionally had to match the names of different statistics and in some cases manually write code to perform conversions. For example, IPouts from the Lahman data set is the number of outs a pitcher generated in a given year, but IP is the number of innings pitched; we had to modulus IPouts by three and then convert the remainder to outs in order to calculate the traditional Innings Pitched (IP) statistic. Furthermore, the first iteration through the data set was quite time consuming and required a significant amount of manual data entry, which is not entirely reflected in the truncated ~200 person data set that we used for the final code. Approach: We modeled the task as a prediction problem and applied a number of regression algorithms, where we could essentially input 2015 data and output predictions for 2016 and then compare results. Primarily, we applied linear regression, support vector regression, and a neural 3

4 net. This combination of methods allowed us to explore both linear and nonlinear attempts to fit the data. It also allowed us to attempt various degrees of specificity with our predictions. The neural net, for instance, tended to overfit the training data, and we found our best performance using support vector regression. Baseline and Oracle For our baseline implementation, we took the historical baseball batting numbers over the last 5 years and took the raw averages of each statistic Hits, At Bats, RBI, Runs, Stolen Bases, Games) over these 5 years. This left us with a raw average for each statistic over the past 5 years. We calculated the percent difference of these average statistics with the actual statistics. The baseline performed relatively poorly, since it did not account for the number of games played or injuries it yielded ~60% median error [(actual - predicted)/ actual] for statistics, with some like 1 stolen bases even going above 100% deviations over expected. For our oracle implementation, extrapolated mid-year predictions and multiplied those by player career second-half averages starting from the halfway point of the season in order predict the end of the season. We thresholded on players with a minimum of 100 at bats. The model yielded roughly ~70% accuracy. Feature selection and implementation choices with peripheral statistics: 2 In 2015, Fangraphs introduced Baseball Info Solutions contact strength ratings data, more advanced batted ball statistics to provide additional analysis about player performance. For batters, Fangraphs provides GB/FB, LD%, FB, IFFB, HR/FB, IF%, Pull%, Cent%, Opp%, 1 For instance, a player predicted to steal 20 bases in a season but getting injured and stealing 1 base would have ~2000% error

5 Soft%, Med%, Hard%. In particular, these batted ball statistics about how and where a ball is hit factor heavily into other statistics like batting average (AVG) and runs (R). In certain cases, some of these statistics can deviate heavily from the norm and lend insights about key performance statistics. For instance, if a batter s HR/FB ratio is 50%, then half of his fly balls are home runs, an unsustainable statistic that may be inflating his Home Runs, AVG, Runs and RBI statistics. For pitching, K/9, BB/9, BABIP, LOB%, GB%, HR/FB, GB/FB, Balls, Strikes, Total pitches, Pull %, Cent%, Opposite%, Soft%, Medium%, Hard%. Batted ball and peripheral statistics can allow us to diagnose a pitcher s performance. Traditional pitcher statistics like ERA or Wins can often be skewed due to flukey peripherals (e.g. a high concentration of softly batted fluke hits despite actually performing well). Unfortunately, since these peripheral statistics were introduced relatively recently, the prediction numbers were not entirely the most effective or well correlated with player performance. As Bradley Woodrum writes in a study for the Hardball times The tools we have for evaluating and predicting hitter performance are still growing When we re tempted to 3 cite batted ball data, we need to be more careful. With our smaller sample size of 200 players, we found that historical player performance was actually a more effective approach. Nonetheless, these peripheral statistics provided valuable insight as to which features of our input statistics would have the most impact in predicting our output. Optimization and Tweaking: Naturally, the players with the smallest amount of historical data proved the most difficult to predict. Rookies, injured players, and newer players (e.g. only one or two full season) 3 5

6 lacked the same data quality as veteran players. Consequently, we were able to account for this by normalizing our predictions against the number of games played by each player. For starting pitchers, this was games started; for batters this was just total games. We projected each player s statistics over a full game season and then compared our statistics to the actual number of games they played in doing so, we essentially removed games played as a prediction factor. In terms of implementation, this involved scaling each batter s statistics in proportion to the number of games a batter plays in a regular season of baseball. For pitchers, we made a similar implementation decision. We also included add-1 smoothing for our prediction numbers to avoid division by zero and to normalize players who were not predicted to post high stats in a given season but might skew error. The motivating reason that we discovered for this would be that for example, if a player had zero home runs during a season where they were injured and only played ten games, we would want to scale up this player s statistics towards a full season. However, multiplying zero by any number would still give us zero as a final statistic for the player and therefore, to avoid this, we introduced uniform add-1 smoothing. The effect of these normalizations was great improvement across each of our regressions in our final prediction results. Literature Review A number of different baseball projection systems exist with similar and different applications to our own. 6

7 4 FiveThirtyEight: Nate Silver s team at ESPN FiveThirtyEight runs a forecasted simulation based on 50,000 simulations of the season to predict team records, playoff, division, and world series results. The simulations account for traditional stats, starting pitchers, travel distance, and rest. They then update their probabilities after each game. They scrape game-by-game data to generate an Elo-based rating system and predictive model to make the predictions. The main contributor is a score and rating maintained for each starting pitcher, on which they base Monte Carlo simulations to play out the season thousands of times; each simulation will update a team s rating, adding bonuses if a team makes the World Series or the playoffs. 5 Steamer: Steamer was created from a high school project collaboration. The projection system uses a weighted average of past performance regressing towards league average. The weights are set using regression analysis of past players, using a relatively simplistic regression. In 2015,

8 Steamer projections performed better than other competitors. This system is quite similar to ours, although they generate weight vectors across the entire baseball population, while we minimized our sample to ~200 players and performed regression on a player basis. 6 PECOTA: PECOTA was developed by Nate Silver. It fits a given player s past performance stats to that of a comparable MLB players using similarity scores. The primary characteristics for similarity are: 1. Basic production metrics like batting average, ISO, strikeout rate and groundball rate 2. Usage metrics: career length and plate appearances 3. Physical attributes: height, weight, throwing or batting right/left 4. Fielding position, Starting pitcher/relief pitcher PECOTA then finds the nearest neighbors of the player to determine comparable players. From there, the projection system determines the player s future performance based on the historical performance of these comparable players at a similar age range. Our model does not account for decreased performance due to age factors. Comparison A projections comparison run between the different data sets and actuals for the 2015 season shows average error on different statistical categories (K, BB, HR, ERA). Steamer outperforms the competition in most categories. Thus, relatively simplistic regression can be quite effective (although this is only considering the 2015 sample)

9 7 By comparison, our average error for HR, ERA, and K on 2016 data (without normalizing over number of games is): K: 0.4, BB: 0.4, HR: 0.387, ERA: 0.44 and does not perform as well as established systems. Error Analysis In terms of analysis of our algorithms, each one of them is taking the same space complexity by building python dictionaries to store the features and training data that we make our predictions based on. For time complexity from a practical perspective, running linear regression and support vector machine regression take a matter of seconds (somewhere from eight to eleven seconds generally). However, running our neural network prediction takes far longer on the order of closer to ten minutes to generate all predictions

10 For most of our predicted statistics, our plots of residual value vs predicted statistic generally appear similar to the above figure. We can see that we have a healthy amount of clustering around a residual of 0 which means that we were very close to getting our prediction correct. Across all of our methodologies, we generate the following spreadsheet : 10

11 This sheet allows us to compare our error metrics across all of our approaches to predicting 2016 statistics. Looking at our final results, we can see that our support vector machine based regression was generally the most successful algorithm. Similar to all of our algorithms, support vector machine regression is further improved when we normalized our data before running the regression to account for rookies, injuries and other anomalies in our data set. For batters, we see that SVR provided a median percent difference of under 30% in most cases. Also, a general trend that we see in our results is that our predictions tend to be more accurate for batters than for pitchers. This is likely caused by the fact that batters play more games in any given season and therefore give us more data to use when predicting statistics. Pitcher predictions are highly volatile and can be impacted by error-prone defenses, strength of schedule, or poor offensive support. There are also a number of outliers heavily skewing the data; despite normalization efforts, mean error is still skewed considerably. But notably, in our aggregated statistics to create leaderboards of top performers for MVP and Cy Young selection, we saw that our results provided accurate predictions. Another point of analysis that has to be made about our results is the fact that the 2016 season of baseball on which we predicted, was an unusual year for baseball. This year in baseball saw a 35% percent increase in home runs, a 10% increases in runs over the past two years, and 8 the league record for strikeouts was broken. These changes in statistics can be explained by shifting trends in the strategy of baseball teams like defensive shifts and more attempts by players to cut upwards on the ball and hit home runs. Our model was not able take this into account since we simply had the statistics of previous years to use. 8 s 11

CS 221 PROJECT FINAL

CS 221 PROJECT FINAL CS 221 PROJECT FINAL STUART SY AND YUSHI HOMMA 1. INTRODUCTION OF TASK ESPN fantasy baseball is a common pastime for many Americans, which, coincidentally, defines a problem whose solution could potentially

More information

AggPro: The Aggregate Projection System

AggPro: The Aggregate Projection System Gore, Snapp and Highley AggPro: The Aggregate Projection System 1 AggPro: The Aggregate Projection System Ross J. Gore, Cameron T. Snapp and Timothy Highley Abstract Currently there exist many different

More information

Machine Learning an American Pastime

Machine Learning an American Pastime Nikhil Bhargava, Andy Fang, Peter Tseng CS 229 Paper Machine Learning an American Pastime I. Introduction Baseball has been a popular American sport that has steadily gained worldwide appreciation in the

More information

B. AA228/CS238 Component

B. AA228/CS238 Component Abstract Two supervised learning methods, one employing logistic classification and another employing an artificial neural network, are used to predict the outcome of baseball postseason series, given

More information

Building an NFL performance metric

Building an NFL performance metric Building an NFL performance metric Seonghyun Paik (spaik1@stanford.edu) December 16, 2016 I. Introduction In current pro sports, many statistical methods are applied to evaluate player s performance and

More information

The stats that I am using to project player stats are Hits (H), Runs (R), Homeruns (HR), Walks (BB), and Strikeouts (SO). I will look at each batter

The stats that I am using to project player stats are Hits (H), Runs (R), Homeruns (HR), Walks (BB), and Strikeouts (SO). I will look at each batter The stats that I am using to project player stats are Hits (H), Runs (R), Homeruns (HR), Walks (BB), and Strikeouts (SO). I will look at each batter on a 25-man roster and average their total H, R, HR,

More information

A Novel Approach to Predicting the Results of NBA Matches

A Novel Approach to Predicting the Results of NBA Matches A Novel Approach to Predicting the Results of NBA Matches Omid Aryan Stanford University aryano@stanford.edu Ali Reza Sharafat Stanford University sharafat@stanford.edu Abstract The current paper presents

More information

PREDICTING the outcomes of sporting events

PREDICTING the outcomes of sporting events CS 229 FINAL PROJECT, AUTUMN 2014 1 Predicting National Basketball Association Winners Jasper Lin, Logan Short, and Vishnu Sundaresan Abstract We used National Basketball Associations box scores from 1991-1998

More information

2014 Tulane Baseball Arbitration Competition Eric Hosmer v. Kansas City Royals (MLB)

2014 Tulane Baseball Arbitration Competition Eric Hosmer v. Kansas City Royals (MLB) 2014 Tulane Baseball Arbitration Competition Eric Hosmer v. Kansas City Royals (MLB) Submission on behalf of Kansas City Royals Team 15 TABLE OF CONTENTS I. INTRODUCTION AND REQUEST FOR HEARING DECISION...

More information

Simulating Major League Baseball Games

Simulating Major League Baseball Games ABSTRACT Paper 2875-2018 Simulating Major League Baseball Games Justin Long, Slippery Rock University; Brad Schweitzer, Slippery Rock University; Christy Crute Ph.D, Slippery Rock University The game of

More information

Pitching Performance and Age

Pitching Performance and Age Pitching Performance and Age By: Jaime Craig, Avery Heilbron, Kasey Kirschner, Luke Rector, Will Kunin Introduction April 13, 2016 Many of the oldest players and players with the most longevity of the

More information

The Rise in Infield Hits

The Rise in Infield Hits The Rise in Infield Hits Parker Phillips Harry Simon December 10, 2014 Abstract For the project, we looked at infield hits in major league baseball. Our first question was whether or not infield hits have

More information

Lorenzo Cain v. Kansas City Royals. Submission on Behalf of the Kansas City Royals. Team 14

Lorenzo Cain v. Kansas City Royals. Submission on Behalf of the Kansas City Royals. Team 14 Lorenzo Cain v. Kansas City Royals Submission on Behalf of the Kansas City Royals Team 14 Table of Contents I. Introduction and Request for Hearing Decision... 1 II. Quality of the Player s Contributions

More information

Additional On-base Worth 3x Additional Slugging?

Additional On-base Worth 3x Additional Slugging? Additional On-base Worth 3x Additional Slugging? Mark Pankin SABR 36 July 1, 2006 Seattle, Washington Notes provide additional information and were reminders during the presentation. They are not supposed

More information

Pitching Performance and Age

Pitching Performance and Age Pitching Performance and Age Jaime Craig, Avery Heilbron, Kasey Kirschner, Luke Rector and Will Kunin Introduction April 13, 2016 Many of the oldest and most long- term players of the game are pitchers.

More information

SAP Predictive Analysis and the MLB Post Season

SAP Predictive Analysis and the MLB Post Season SAP Predictive Analysis and the MLB Post Season Since September is drawing to a close and October is rapidly approaching, I decided to hunt down some baseball data and see if we can draw any insights on

More information

2014 National Baseball Arbitration Competition

2014 National Baseball Arbitration Competition 2014 National Baseball Arbitration Competition Eric Hosmer v. Kansas City Royals Submission on Behalf of Eric Hosmer Midpoint: $3.65 million Submission by: Team 26 Table of Contents I. Introduction and

More information

2013 National Baseball Arbitration Competition. Tommy Hanson v. Atlanta Braves. Submission on behalf of Atlanta Braves. Submitted by Team 28

2013 National Baseball Arbitration Competition. Tommy Hanson v. Atlanta Braves. Submission on behalf of Atlanta Braves. Submitted by Team 28 2013 National Baseball Arbitration Competition Tommy Hanson v. Atlanta Braves Submission on behalf of Atlanta Braves Submitted by Team 28 1 TABLE OF CONTENTS I. INTRODUCTION AND REQUEST FOR DECISION...

More information

TULANE UNIVERISTY BASEBALL ARBITRATION COMPETITION NELSON CRUZ V. TEXAS RANGERS BRIEF FOR THE TEXAS RANGERS TEAM # 13 SPRING 2012

TULANE UNIVERISTY BASEBALL ARBITRATION COMPETITION NELSON CRUZ V. TEXAS RANGERS BRIEF FOR THE TEXAS RANGERS TEAM # 13 SPRING 2012 TULANE UNIVERISTY BASEBALL ARBITRATION COMPETITION NELSON CRUZ V. TEXAS RANGERS BRIEF FOR THE TEXAS RANGERS TEAM # 13 SPRING 2012 TABLE OF CONTENTS I. Introduction 3 II. III. IV. Quality of the Player

More information

JEFF SAMARDZIJA CHICAGO CUBS BRIEF FOR THE CHICAGO CUBS TEAM 4

JEFF SAMARDZIJA CHICAGO CUBS BRIEF FOR THE CHICAGO CUBS TEAM 4 JEFF SAMARDZIJA V. CHICAGO CUBS BRIEF FOR THE CHICAGO CUBS TEAM 4 Table of Contents I. Introduction...1 II. III. IV. Performance and Failure to Meet Expectations...2 Recent Performance of the Chicago Cubs...4

More information

y ) s x x )(y i (x i r = 1 n 1 s y Statistics Lecture 7 Exploring Data , y 2 ,y n (x 1 ),,(x n ),(x 2 ,y 1 How two variables vary together

y ) s x x )(y i (x i r = 1 n 1 s y Statistics Lecture 7 Exploring Data , y 2 ,y n (x 1 ),,(x n ),(x 2 ,y 1 How two variables vary together Statistics 111 - Lecture 7 Exploring Data Numerical Summaries for Relationships between Variables Administrative Notes Homework 1 due in recitation: Friday, Feb. 5 Homework 2 now posted on course website:

More information

Salary correlations with batting performance

Salary correlations with batting performance Salary correlations with batting performance By: Jaime Craig, Avery Heilbron, Kasey Kirschner, Luke Rector, Will Kunin Introduction Many teams pay very high prices to acquire the players needed to make

More information

Team Number 6. Tommy Hanson v. Atlanta Braves. Side represented: Atlanta Braves

Team Number 6. Tommy Hanson v. Atlanta Braves. Side represented: Atlanta Braves Team Number 6 Tommy Hanson v. Atlanta Braves Side represented: Atlanta Braves Table of Contents I. Introduction... 1 II. Hanson s career has been in decline since his debut and he has dealt with major

More information

Table of Contents. Pitch Counter s Role Pitching Rules Scorekeeper s Role Minimum Scorekeeping Requirements Line Ups...

Table of Contents. Pitch Counter s Role Pitching Rules Scorekeeper s Role Minimum Scorekeeping Requirements Line Ups... Fontana Community Little League Pitch Counter and Scorekeeper s Guide February, 2011 Table of Contents Pitch Counter s Role... 2 Pitching Rules... 6 Scorekeeper s Role... 7 Minimum Scorekeeping Requirements...

More information

OFFICIAL RULEBOOK. Version 1.08

OFFICIAL RULEBOOK. Version 1.08 OFFICIAL RULEBOOK Version 1.08 2017 CLUTCH HOBBIES, LLC. ALL RIGHTS RESERVED. Version 1.08 3 1. Types of Cards Player Cards...4 Strategy Cards...8 Stadium Cards...9 2. Deck Building Team Roster...10 Strategy

More information

Modeling Fantasy Football Quarterbacks

Modeling Fantasy Football Quarterbacks Augustana College Augustana Digital Commons Celebration of Learning Modeling Fantasy Football Quarterbacks Kyle Zeberlein Augustana College, Rock Island Illinois Myles Wallin Augustana College, Rock Island

More information

Dexter Fowler v. Colorado Rockies (MLB)

Dexter Fowler v. Colorado Rockies (MLB) 2013 NATIONAL BASEBALL ARBITRATION COMPETITION Dexter Fowler v. Colorado Rockies (MLB) SUBMISSION ON BEHALF OF: Dexter Fowler Club Offer: $4.0 million Midpoint: $4.3 million Player Request: $4.6 million

More information

Running head: DATA ANALYSIS AND INTERPRETATION 1

Running head: DATA ANALYSIS AND INTERPRETATION 1 Running head: DATA ANALYSIS AND INTERPRETATION 1 Data Analysis and Interpretation Final Project Vernon Tilly Jr. University of Central Oklahoma DATA ANALYSIS AND INTERPRETATION 2 Owners of the various

More information

A Database Design for Selecting a Golden Glove Winner using Sabermetrics

A Database Design for Selecting a Golden Glove Winner using Sabermetrics , pp.38-42 http://dx.doi.org/10.14257/astl.2015.110.08 A Database Design for Selecting a Golden Glove Winner using Sabermetrics Wu-In Jang and Young-Ho Park *, Department of Multimedia Science, Sookmyung

More information

Do Clutch Hitters Exist?

Do Clutch Hitters Exist? Do Clutch Hitters Exist? David Grabiner SABRBoston Presents Sabermetrics May 20, 2006 http://remarque.org/~grabiner/bosclutch.pdf (Includes some slides skipped in the original presentation) 1 Two possible

More information

2014 Tulane Baseball Arbitration Competition Josh Reddick v. Oakland Athletics (MLB)

2014 Tulane Baseball Arbitration Competition Josh Reddick v. Oakland Athletics (MLB) 2014 Tulane Baseball Arbitration Competition Josh Reddick v. Oakland Athletics (MLB) Submission on Behalf of the Oakland Athletics Team 15 Table of Contents I. INTRODUCTION AND REQUEST FOR HEARING DECISION...

More information

Fastball Baseball Manager 2.5 for Joomla 2.5x

Fastball Baseball Manager 2.5 for Joomla 2.5x Fastball Baseball Manager 2.5 for Joomla 2.5x Contents Requirements... 1 IMPORTANT NOTES ON UPGRADING... 1 Important Notes on Upgrading from Fastball 1.7... 1 Important Notes on Migrating from Joomla 1.5x

More information

2013 National Baseball Arbitration Competition

2013 National Baseball Arbitration Competition 2013 National Baseball Arbitration Competition Dexter Fowler v. Colorado Rockies Submission on behalf of the Colorado Rockies Midpoint: $4.3 million Submission by: Team 27 Table of Contents: I. Introduction

More information

2014 NATIONAL BASEBALL ARBITRATION COMPETITION

2014 NATIONAL BASEBALL ARBITRATION COMPETITION 2014 NATIONAL BASEBALL ARBITRATION COMPETITION Jeff Samardzija v. Chicago Cubs Submission on Behalf of Jeff Samardzija Midpoint: $4,900,000 Submission by Team 17 Table of Contents I. Introduction and Request

More information

2014 National Baseball Arbitration Competition

2014 National Baseball Arbitration Competition 2014 National Baseball Arbitration Competition Jeff Samardzija v. Chicago Cubs Submission on Behalf of Chicago Cubs Midpoint: $4.9 million Submission by: Team 26 Table of Contents I. Introduction and Request

More information

Fairfax Little League PPR Input Guide

Fairfax Little League PPR Input Guide Fairfax Little League PPR Input Guide Each level has different participation requirements. Please refer to the League Bylaws section 7 for specific details. Player Participation Records (PPR) will be reported

More information

OFFICIAL RULEBOOK. Version 1.16

OFFICIAL RULEBOOK. Version 1.16 OFFICIAL RULEBOOK Version.6 3. Types of Cards Player Cards...4 Strategy Cards...8 Stadium Cards...9 2. Deck Building Team Roster...0 Strategy Deck...0 Stadium Selection... 207 CLUTCH BASEBALL ALL RIGHTS

More information

Jenrry Mejia v. New York Mets Submission on Behalf of the New York Mets Midpoint: $2.6M Submission by Team 32

Jenrry Mejia v. New York Mets Submission on Behalf of the New York Mets Midpoint: $2.6M Submission by Team 32 2015 NATIONAL BASEBALL ARBITRATION COMPETITION Jenrry Mejia v. New York Mets Submission on Behalf of the New York Mets Midpoint: $2.6M Submission by Team 32 Table of Contents 1. INTRODUCTION AND REQUEST

More information

Draft - 4/17/2004. A Batting Average: Does It Represent Ability or Luck?

Draft - 4/17/2004. A Batting Average: Does It Represent Ability or Luck? A Batting Average: Does It Represent Ability or Luck? Jim Albert Department of Mathematics and Statistics Bowling Green State University albert@bgnet.bgsu.edu ABSTRACT Recently Bickel and Stotz (2003)

More information

BASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG

BASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG BASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG GOAL OF PROJECT The goal is to predict the winners between college men s basketball teams competing in the 2018 (NCAA) s March

More information

Why We Should Use the Bullpen Differently

Why We Should Use the Bullpen Differently Why We Should Use the Bullpen Differently A look into how the bullpen can be better used to save runs in Major League Baseball. Andrew Soncrant Statistics 157 Final Report University of California, Berkeley

More information

Predicting Horse Racing Results with Machine Learning

Predicting Horse Racing Results with Machine Learning Predicting Horse Racing Results with Machine Learning LYU 1703 LIU YIDE 1155062194 Supervisor: Professor Michael R. Lyu Outline Recap of last semester Object of this semester Data Preparation Set to sequence

More information

2015 NATIONAL BASEBALL ARBITRATION COMPETITION. Lorenzo Cain v. Kansas City Royals (MLB) SUBMISSION ON BEHALF OF KANSAS CITY ROYALS BASEBALL CLUB

2015 NATIONAL BASEBALL ARBITRATION COMPETITION. Lorenzo Cain v. Kansas City Royals (MLB) SUBMISSION ON BEHALF OF KANSAS CITY ROYALS BASEBALL CLUB 2015 NATIONAL BASEBALL ARBITRATION COMPETITION Lorenzo Cain v. Kansas City Royals (MLB) SUBMISSION ON BEHALF OF KANSAS CITY ROYALS BASEBALL CLUB Salary Midpoint: $2.725 Submission by: Team 27 TABLE OF

More information

2013 Tulane National Baseball Arbitration Competition

2013 Tulane National Baseball Arbitration Competition 2013 Tulane National Baseball Arbitration Competition Dexter Fowler vs. Colorado Rockies Submission on Behalf of Mr. Dexter Fowler Midpoint: $4.3 million Submission by Team 38 Table of Contents I. Introduction

More information

Jenrry Mejia v. New York Mets Submission on Behalf of New York Mets Midpoint: $2.6 Million Submission by Team 18

Jenrry Mejia v. New York Mets Submission on Behalf of New York Mets Midpoint: $2.6 Million Submission by Team 18 2015 NATIONAL BASEBALL ARBITRATION COMPETITION Jenrry Mejia v. New York Mets Submission on Behalf of New York Mets Midpoint: $2.6 Million Submission by Team 18 TABLE OF CONTENTS Page I. Introduction and

More information

2014 Tulane National Baseball Arbitration Competition Jeff Samardzija v. Chicago Cubs (MLB)

2014 Tulane National Baseball Arbitration Competition Jeff Samardzija v. Chicago Cubs (MLB) 2014 Tulane National Baseball Arbitration Competition Jeff Samardzija v. Chicago Cubs (MLB) Submission on behalf of Jeff Samardzija Team 15 TABLE OF CONTENTS I. Introduction and Request for Hearing Decision..

More information

arxiv: v1 [stat.ml] 15 Dec 2017

arxiv: v1 [stat.ml] 15 Dec 2017 Understanding Career Progression in Baseball Through Machine Learning* Brian Bierig, 1 Jonathan Hollenbeck, 2 and Alexander Stroud 3 arxiv:1712.05754v1 [stat.ml] 15 Dec 2017 Abstract Professional baseball

More information

2015 NATIONAL BASEBALL ARBITRATION COMPETITION

2015 NATIONAL BASEBALL ARBITRATION COMPETITION 2015 NATIONAL BASEBALL ARBITRATION COMPETITION Arizona Diamondbacks v. Mark Trumbo Submission on Behalf of Arizona Diamondbacks Midpoint: $5,900,000 Submission by Team: 5 Table of Contents I. Introduction

More information

Forecasting Baseball

Forecasting Baseball Forecasting Baseball Clint Riley clintr@stanford.edu December 14, 2012 Abstract Forecasts for the outcome of sporting events are coveted by nearly everyone in the sporting world. In this paper, a number

More information

Major League Baseball Offensive Production in the Designated Hitter Era (1973 Present)

Major League Baseball Offensive Production in the Designated Hitter Era (1973 Present) Major League Baseball Offensive Production in the Designated Hitter Era (1973 Present) Jonathan Tung University of California, Riverside tung.jonathanee@gmail.com Abstract In Major League Baseball, there

More information

An Analysis of the Effects of Long-Term Contracts on Performance in Major League Baseball

An Analysis of the Effects of Long-Term Contracts on Performance in Major League Baseball An Analysis of the Effects of Long-Term Contracts on Performance in Major League Baseball Zachary Taylor 1 Haverford College Department of Economics Advisor: Dave Owens Spring 2016 Abstract: This study

More information

MONEYBALL. The Power of Sports Analytics The Analytics Edge

MONEYBALL. The Power of Sports Analytics The Analytics Edge MONEYBALL The Power of Sports Analytics 15.071 The Analytics Edge The Story Moneyball tells the story of the Oakland A s in 2002 One of the poorest teams in baseball New ownership and budget cuts in 1995

More information

September 29, New type of file on Retrosheet website. Overview by Dave Smith

September 29, New type of file on Retrosheet website. Overview by Dave Smith September 29, 2011 New type of file on Retrosheet website Overview by Dave Smith Below is a detailed description of a major new effort by Retrosheet, one that has never been undertaken by any other group

More information

Hitting with Runners in Scoring Position

Hitting with Runners in Scoring Position Hitting with Runners in Scoring Position Jim Albert Department of Mathematics and Statistics Bowling Green State University November 25, 2001 Abstract Sportscasters typically tell us about the batting

More information

GUIDE TO BASIC SCORING

GUIDE TO BASIC SCORING GUIDE TO BASIC SCORING The Score Sheet Fill in this section with as much information as possible. Opposition Fielding changes are indicated in the space around the Innings Number. This is the innings box,

More information

Pine Tar Baseball. Game Rules Manual - version 2.1 A dice simulation game ~ copyright by LIS Games

Pine Tar Baseball. Game Rules Manual - version 2.1 A dice simulation game ~ copyright by LIS Games Introduction to Pine Tar Baseball Pine Tar Baseball Game Rules Manual - version 2.1 A dice simulation game ~ copyright 2015-2017 by LIS Games Pine Tar baseball is intended to be a game that can be played

More information

Predicting the Total Number of Points Scored in NFL Games

Predicting the Total Number of Points Scored in NFL Games Predicting the Total Number of Points Scored in NFL Games Max Flores (mflores7@stanford.edu), Ajay Sohmshetty (ajay14@stanford.edu) CS 229 Fall 2014 1 Introduction Predicting the outcome of National Football

More information

2015 National Baseball Arbitration Competition

2015 National Baseball Arbitration Competition 2015 National Baseball Arbitration Competition Lorenzo Cain v Kansas City Royals Submission on Behalf of Kansas City Royals Midpoint: 2.725 million Submission by Team 28 Table of Contents I. Introduction

More information

Predicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007

Predicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007 Predicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007 Group 6 Charles Gallagher Brian Gilbert Neelay Mehta Chao Rao Executive Summary Background When a runner is on-base

More information

2014 National Baseball Arbitration Competition

2014 National Baseball Arbitration Competition 2014 National Baseball Arbitration Competition Eric Hosmer v. Kansas City Royals Submission on Behalf of Eric Hosmer Midpoint: $3,650,000 Submission by Team 2 Table of Contents I. Introduction and Request

More information

Matt Halper 12/10/14 Stats 50. The Batting Pitcher:

Matt Halper 12/10/14 Stats 50. The Batting Pitcher: Matt Halper 12/10/14 Stats 50 The Batting Pitcher: A Statistical Analysis based on NL vs. AL Pitchers Batting Statistics in the World Series and the Implications on their Team s Success in the Series Matt

More information

Psychology - Mr. Callaway/Mundy s Mill HS Unit Research Methods - Statistics

Psychology - Mr. Callaway/Mundy s Mill HS Unit Research Methods - Statistics Psychology - Mr. Callaway/Mundy s Mill HS Unit 2.3 - Research Methods - Statistics How do psychologists ask & answer questions? Last time we asked that we were discussing Research Methods. This time we

More information

How are the values related to each other? Are there values that are General Education Statistics

How are the values related to each other? Are there values that are General Education Statistics How are the values related to each other? Are there values that are General Education Statistics far away from the others? Class Notes Measures of Position and Outliers: Z-scores, Percentiles, Quartiles,

More information

BABE: THE SULTAN OF PITCHING STATS? by. August 2010 MIDDLEBURY COLLEGE ECONOMICS DISCUSSION PAPER NO

BABE: THE SULTAN OF PITCHING STATS? by. August 2010 MIDDLEBURY COLLEGE ECONOMICS DISCUSSION PAPER NO BABE: THE SULTAN OF PITCHING STATS? by Matthew H. LoRusso Paul M. Sommers August 2010 MIDDLEBURY COLLEGE ECONOMICS DISCUSSION PAPER NO. 10-30 DEPARTMENT OF ECONOMICS MIDDLEBURY COLLEGE MIDDLEBURY, VERMONT

More information

2015 National Baseball Arbitration Competition. Kansas City Royals v. Lorenzo Cain. Submitted by Team 33. Brief on Behalf of Lorenzo Cain

2015 National Baseball Arbitration Competition. Kansas City Royals v. Lorenzo Cain. Submitted by Team 33. Brief on Behalf of Lorenzo Cain 2015 National Baseball Arbitration Competition Kansas City Royals v. Lorenzo Cain Submitted by Team 33 Brief on Behalf of Lorenzo Cain 1 Table of Contents I. Introduction and Request for Arbitration Award

More information

Relative Value of On-Base Pct. and Slugging Avg.

Relative Value of On-Base Pct. and Slugging Avg. Relative Value of On-Base Pct. and Slugging Avg. Mark Pankin SABR 34 July 16, 2004 Cincinnati, OH Notes provide additional information and were reminders during the presentation. They are not supposed

More information

Computer Scorekeeping Procedures Updated: 6/10/2015

Computer Scorekeeping Procedures Updated: 6/10/2015 Computer Scorekeeping Procedures Updated: 6/10/2015 SET-UP COMPUTERS: Computers are stored for: Saddlebrook - in the cabinet under the counter by field 1 Buffalo Glen - to be determined Setup Computers:

More information

(Under the Direction of Cheolwoo Park) ABSTRACT. Major League Baseball is a sport complete with a multitude of statistics to evaluate a player s

(Under the Direction of Cheolwoo Park) ABSTRACT. Major League Baseball is a sport complete with a multitude of statistics to evaluate a player s PENALIZED REGRESSION MODELS FOR MAJOR LEAGUE BASEBALL METRICS by MUSHIMIE LONA PANDA (Under the Direction of Cheolwoo Park) ABSTRACT Major League Baseball is a sport complete with a multitude of statistics

More information

Offensive & Defensive Tactics. Plan Development & Analysis

Offensive & Defensive Tactics. Plan Development & Analysis Offensive & Defensive Tactics Plan Development & Analysis Content Head Coach Creating a Lineup Starting Players Characterizing their Positions Offensive Tactics Defensive Tactics Head Coach Creating a

More information

Triple Lite Baseball

Triple Lite Baseball Triple Lite Baseball As the name implies, it doesn't cover all the bases like a game like Playball, but it still gives a great feel for the game and is really quick to play. One roll per at bat, a quick-look

More information

The 2015 MLB Season in Review. Using Pitch Quantification and the QOP 1 Metric

The 2015 MLB Season in Review. Using Pitch Quantification and the QOP 1 Metric 1 The 2015 MLB Season in Review Using Pitch Quantification and the QOP 1 Metric Dr. Jason Wilson 2 and Wayne Greiner 3 1. Introduction The purpose of this paper is to document new research in the field

More information

Our Shining Moment: Hierarchical Clustering to Determine NCAA Tournament Seeding

Our Shining Moment: Hierarchical Clustering to Determine NCAA Tournament Seeding Trunzo Scholz 1 Dan Trunzo and Libby Scholz MCS 100 June 4, 2016 Our Shining Moment: Hierarchical Clustering to Determine NCAA Tournament Seeding This project tries to correctly predict the NCAA Tournament

More information

When you think of baseball, you think of a game that never changes, right? The

When you think of baseball, you think of a game that never changes, right? The The Strike Zone During the PITCHf/x Era by Jon Roegele When you think of baseball, you think of a game that never changes, right? The rules are the same as they were over 100 years ago, right? The bases

More information

An average pitcher's PG = 50. Higher numbers are worse, and lower are better. Great seasons will have negative PG ratings.

An average pitcher's PG = 50. Higher numbers are worse, and lower are better. Great seasons will have negative PG ratings. Fastball 1-2-3! This simple game gives quick results on the outcome of a baseball game in under 5 minutes. You roll 3 ten-sided dice (10d) of different colors. If the die has a 10 on it, count it as 0.

More information

Richmond City Baseball. Player Handbook

Richmond City Baseball. Player Handbook Richmond City Baseball Player Handbook Updated June 2015 Visit the 13U Pee Wee Chuckers website at: www.ballcharts.com/pwchuckers 1 P a g e Welcome to the Richmond 13U Pee Wee AAA Chuckers! Our collective

More information

Baseball Prospectus 2016

Baseball Prospectus 2016 Baseball Prospectus 2016 1 / 6 2 / 6 3 / 6 Baseball Prospectus 2016 Postseason projections are courtesy of FanGraphs and indicate each team's probability of winning the division or wild card, or any postseason

More information

Computer Scorekeeping Procedures

Computer Scorekeeping Procedures Computer Scorekeeping Procedures 3-23-16 COMPUTER SETUP: Unlock Computer Storage Box: Enter combination so that it appears on the side of the lock Computer Setup: Place a computer, keyboard & mouse at

More information

Projecting Three-Point Percentages for the NBA Draft

Projecting Three-Point Percentages for the NBA Draft Projecting Three-Point Percentages for the NBA Draft Hilary Sun hsun3@stanford.edu Jerold Yu jeroldyu@stanford.edu December 16, 2017 Roland Centeno rcenteno@stanford.edu 1 Introduction As NBA teams have

More information

This file contains the main manual, optional rules manual, game tables, score sheet, game mat, and two teams from the 1889 season.

This file contains the main manual, optional rules manual, game tables, score sheet, game mat, and two teams from the 1889 season. This pdf contains everything needed to play Pine Tar Baseball except the dice. You may print out anything in this manual for personal use only. Redistribution of either this pdf or the material contained

More information

FORECASTING BATTER PERFORMANCE USING STATCAST DATA IN MAJOR LEAGUE BASEBALL

FORECASTING BATTER PERFORMANCE USING STATCAST DATA IN MAJOR LEAGUE BASEBALL FORECASTING BATTER PERFORMANCE USING STATCAST DATA IN MAJOR LEAGUE BASEBALL A Thesis Submitted to the Graduate Faculty of the North Dakota State University of Agriculture and Applied Science By Nicholas

More information

2014 NATIONAL BASEBALL ARBITRATION COMPETITION ERIC HOSMER V. KANSAS CITY ROYALS (MLB) SUBMISSION ON BEHALF OF THE CLUB KANSAS CITY ROYALS

2014 NATIONAL BASEBALL ARBITRATION COMPETITION ERIC HOSMER V. KANSAS CITY ROYALS (MLB) SUBMISSION ON BEHALF OF THE CLUB KANSAS CITY ROYALS 2014 NATIONAL BASEBALL ARBITRATION COMPETITION ERIC HOSMER V. KANSAS CITY ROYALS (MLB) SUBMISSION ON BEHALF OF THE CLUB KANSAS CITY ROYALS Player Demand: $4.00 Million Club Offer: $3.30 Million Midpoint:

More information

Antelope Little League

Antelope Little League Antelope Little League Scorekeeper Training Thank you for volunteering to be a scorekeeper! It s an essential role, not only for keeping track of the score but also for the safety of the players. Being

More information

Chapter 1 The official score-sheet

Chapter 1 The official score-sheet Chapter 1 The official score-sheet - Symbols and abbreviations - The official score-sheet - Substitutions - Insufficient space on score-sheet 13 Symbols and abbreviations Symbols and abbreviations Numbers

More information

For any inquiries into the game contact James Formo at

For any inquiries into the game contact James Formo at This pdf contains everything needed to play Pine Tar Baseball except the dice. You may print out anything in this manual for personal use only. Redistribution of either this pdf or the material contained

More information

NUMB3RS Activity: Is It for Real? Episode: Hardball

NUMB3RS Activity: Is It for Real? Episode: Hardball Teacher Page 1 NUMB3RS Activity: Is It for Real? Topic: Data analysis Grade Level: 9-10 Objective: Use formulas to generate data points. Produce line graphs of which inferences are made. Time: 20 minutes

More information

Deriving an Optimal Fantasy Football Draft Strategy

Deriving an Optimal Fantasy Football Draft Strategy Stanford University CS 221 Artificial Intelligence: Principles and Techniques Deriving an Optimal Fantasy Football Draft Strategy Team Members: Alex Haigh Jack Payette Cameron Van de Graaf December 16,

More information

Analytics Improving Professional Sports Today

Analytics Improving Professional Sports Today University of Tennessee, Knoxville Trace: Tennessee Research and Creative Exchange University of Tennessee Honors Thesis Projects University of Tennessee Honors Program 12-2018 Analytics Improving Professional

More information

Gouwan Strike English Manual

Gouwan Strike English Manual Gouwan Strike English Manual Number of Players 2 Set Description Game board 1 Ball piece 1 Bat piece 1 Count pieces/runner pieces 11 Pitcher cards 26 Pitching cards 64 Batter cards 24 Batting cards 6 Situation

More information

How Effective is Change of Pace Bowling in Cricket?

How Effective is Change of Pace Bowling in Cricket? How Effective is Change of Pace Bowling in Cricket? SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries.

More information

A Markov Model of Baseball: Applications to Two Sluggers

A Markov Model of Baseball: Applications to Two Sluggers A Markov Model of Baseball: Applications to Two Sluggers Mark Pankin INFORMS November 5, 2006 Pittsburgh, PA Notes are not intended to be a complete discussion or the text of my presentation. The notes

More information

HCLL Scorekeeping Clinic

HCLL Scorekeeping Clinic HCLL Scorekeeping Clinic 2013 Season Prepared by: C. Kalaw Scorekeeping Duties at HCLL Begins with Rookie Games important for minors and majors divisions official scorekeeping is duty of home team this

More information

2014 National Baseball Arbitration Competition

2014 National Baseball Arbitration Competition 2014 National Baseball Arbitration Competition Jeff Samardzija v. Chicago Cubs Submission on Behalf of Jeff Samardzija Midpoint: $4.9 million Submission by: Team 18 i Table of Contents I. Introduction

More information

2015 Winter Combined League Web Draft Rule Packet (USING YEARS )

2015 Winter Combined League Web Draft Rule Packet (USING YEARS ) 2015 Winter Combined League Web Draft Rule Packet (USING YEARS 1969-1972) Welcome to Scoresheet Baseball: the winter game. This document details the process of drafting your Old Timers Baseball team on

More information

to the Kansas City Royals for the purposes of an arbitration hearing governed by the Major

to the Kansas City Royals for the purposes of an arbitration hearing governed by the Major I. Introduction and Request for Hearing Decision This brief identifies and analyzes the contributions made by center fielder Lorenzo Cain to the Kansas City Royals for the purposes of an arbitration hearing

More information

Old Age and Treachery vs. Youth and Skill: An Analysis of the Mean Age of World Series Teams

Old Age and Treachery vs. Youth and Skill: An Analysis of the Mean Age of World Series Teams ABSTRACT SESUG Paper BB-67-2017 Old Age and Treachery vs. Youth and Skill: An Analysis of the Mean Age of World Series Teams Joe DeMaio, Kennesaw State University Every October, baseball fans discuss and

More information

Dexter Fowler v. Colorado Rockies. Submission on Behalf of the Colorado Rockies. Team 18

Dexter Fowler v. Colorado Rockies. Submission on Behalf of the Colorado Rockies. Team 18 Dexter Fowler v. Colorado Rockies Submission on Behalf of the Colorado Rockies Team 18 I. Introduction The Colorado Rockies ( Rockies ), a Major League Baseball ( MLB ) team in the National League West

More information

DRILL #1 FROM THE TEE

DRILL #1 FROM THE TEE 1 Hitting Drills DRILL #1 FROM THE TEE DRILL #2 GROUNDER, PO PUP, LINE DRIVE DRILL #3 BATTER STANCE DRILL #4 EYE ON THE SPOT DRILL #5 COLORED BALL TOSS DRILL #6 CONTACT AND FREEZE DRILL #7 BALLOON DRILL

More information

Lab 11: Introduction to Linear Regression

Lab 11: Introduction to Linear Regression Lab 11: Introduction to Linear Regression Batter up The movie Moneyball focuses on the quest for the secret of success in baseball. It follows a low-budget team, the Oakland Athletics, who believed that

More information

Effects of Incentives: Evidence from Major League Baseball. Guy Stevens April 27, 2013

Effects of Incentives: Evidence from Major League Baseball. Guy Stevens April 27, 2013 Effects of Incentives: Evidence from Major League Baseball Guy Stevens April 27, 2013 1 Contents 1 Introduction 2 2 Data 3 3 Models and Results 4 3.1 Total Offense................................... 4

More information

Effect of homegrown players on professional sports teams

Effect of homegrown players on professional sports teams Effect of homegrown players on professional sports teams ISYE 2028 Rahul Patel 902949215 Problem Description: Football is commonly referred to as America s favorite pastime. However, for thousands of people

More information