September 29, New type of file on Retrosheet website. Overview by Dave Smith

Similar documents
Table 1. Average runs in each inning for home and road teams,

Baseball Scorekeeping for First Timers

Do Clutch Hitters Exist?

Major League Baseball Offensive Production in the Designated Hitter Era (1973 Present)

Average Runs per inning,

HCLL Scorekeeping Clinic

The MLB Language. Figure 1.

2011 COMBINED LEAGUE (with a DH) DRAFT / RULES PACKET

Clutch Hitters Revisited Pete Palmer and Dick Cramer National SABR Convention June 30, 2008

Table of Contents. Pitch Counter s Role Pitching Rules Scorekeeper s Role Minimum Scorekeeping Requirements Line Ups...

2018 Winter League N.L. Web Draft Packet

Additional On-base Worth 3x Additional Slugging?

To find their sum, you could simply add from left to right, or you could pair the numbers as illustrated and add each pair.

Simulating Major League Baseball Games

2015 Winter Combined League Web Draft Rule Packet (USING YEARS )

Matt Halper 12/10/14 Stats 50. The Batting Pitcher:

Baseball Scorekeeping for First Timers

2017 B.L. DRAFT and RULES PACKET

DISTRICT 53 SCOREKEEPER CLINIC

Softball/Baseball Scorekeeping for First Timers. Written for novice scorekeeper volunteers scoring CHAA Softball/Baseball games.

Web Site:

A One-Parameter Markov Chain Model for Baseball Run Production

Safety Through Technology. gastag.co.uk

When you think of baseball, you think of a game that never changes, right? The

February 12, Winthrop University A MARKOV CHAIN MODEL FOR RUN PRODUCTION IN BASEBALL. Thomas W. Polaski. Introduction.

IFPAA League Rules (Mustang - 10-year olds)

1 Suburban Friendship League 2 3 Procedures and Processes SFL Team Rosters 4 (Effective March 9, 2018) OVERVIEW 8 9 The SFL Team Roster and

Pace Handicapping with Brohamer Figures

Mental Approach to Pitching

Fairfax Little League PPR Input Guide

Machine Learning an American Pastime

How to Make, Interpret and Use a Simple Plot

QUARTERBACK PASSING COACHING COACHING QUARTERBACK MECHANICS PASSING MECHANICS. Steve Axman. Second Edition. Second Edition

Lab 11: Introduction to Linear Regression

BASEBALL 2. AIM OF THE GAME

Dulles Little League TOURNAMENT TEAM SELECTION PROCESS

GRASSLAND BASEBALL RULES & REGULATIONS 9-10 LEAGUE

APBA Baseball for Windows 5.75 Update 22

2013 National Baseball Arbitration Competition

An average pitcher's PG = 50. Higher numbers are worse, and lower are better. Great seasons will have negative PG ratings.

OFFICIALS EDUCATION PROGRAM BATTING

Tournament Rules & Guidelines

Left Fielder. Homework

Ward will bowl the first over, and Brown the second.

Jupiter Holiday Wood Bat Tournament Supplemental Rules

2015 Shetland Score Keeping Guide

Computer Scorekeeping Procedures

George Brett - #5. Third Baseman, Brett s Major League Career Statistics

2014 Tulane Baseball Arbitration Competition Eric Hosmer v. Kansas City Royals (MLB)

THIS GAME OF BASEBALL

Big 12 Baseball Replay In-Game Guidelines and Process

It s conventional sabermetric wisdom that players

Computer Scorekeeping Procedures Updated: 6/10/2015

Baseball Cutoff and Backup Responsibilities - Pitchers

APBA Baseball for Windows 5.75 Update 17

GUIDE TO BASIC SCORING

Domestic Energy Fact File (2006): Owner occupied, Local authority, Private rented and Registered social landlord homes

Touching All Bases. NCOA Softball 2018 Newsletter Volume 6, Issue 8 April 3, Group Leadership Corner. From Your JSS Assignors:

Softball

RELATIVE PLACEMENT SCORING SYSTEM

HMB Little League Scorekeeping

THE BASEBALL ENCYCLOPEDIA: THE COMPLETE AND DEFINITIVE RECORD OF MAJOR LEAGUE BASEBALL BY SIMON AND SCHUSTER

2015 BVSBL MAJOR DIVISION RULES

2014 National Baseball Arbitration Competition

Jenrry Mejia v. New York Mets Submission on Behalf of the New York Mets Midpoint: $2.6M Submission by Team 32

IBAF Scoring Manual INTERNATIONAL BASEBALL FEDERATION FEDERACION INTERNACIONAL DE BEISBOL

Analysis of recent swim performances at the 2013 FINA World Championship: Counsilman Center, Dept. Kinesiology, Indiana University

170 Rule 14 / scoring

Predicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007

VIRGINIA DISTRICT 7. SCORE KEEPER AND ANNOUNCER S BOOK SOFTBALL (Updated June 12, 2017)

Lesson 5 Post-Visit Baseball After Jackie: Ongoing Issues of Equality

We agree the CGA publication more adequately reflects the equipment accuracy requirements for performing a pressure test on cylinders.

Predicting Season-Long Baseball Statistics. By: Brandon Liu and Bryan McLellan

2015 NATIONAL BASEBALL ARBITRATION COMPETITION

On Historic Doubleday Field in Cooperstown, NY The Baseball Experience of a Lifetime! August 12 th -15 th, 2016

VIRGINIA DISTRICT 7. SCORE KEEPER AND ANNOUNCER S BOOK BASEBALL (Updated June 10, 2016)

DRILL #1 BALL TO FIRST

MBSA Boys Pinto Rules

Figure 1. Winning percentage when leading by indicated margin after each inning,

CDI DISCIPLINE POLICY Using Consequences as A Teaching Tool. This is the policy we use as it pertains to ALL PLAYERS AND STUDENTS:

TPLL 2017 MINORS RULES

BERKSHIRE II: AN EXPERIENTIAL DECISION MAKING EXERCISE. Tom F. Badgett, Texas Christian University Halsey R. Jones, Texas Christian University

The Games Approach to Coaching Baseball

When Should Bonds be Walked Intentionally?

Computer Scorekeeping Procedures Page 1

The Dartboard. Anyone interested in doing some programming could also find a lot of things to do here.

2017 International Baseball Tournament. Scorekeeping Hints

TOURNAMENT RULES. *Major League Baseball Rules will apply except as modified or covered below.

Professional athletes naturally experience hot and

UABA Coaches Manual. Mission Statement: The Coaches:

The Changing Hitting Performance Profile In the Major League, September 2007 MIDDLEBURY COLLEGE ECONOMICS DISCUSSION PAPER NO.

Analyses of the Scoring of Writing Essays For the Pennsylvania System of Student Assessment

RULES AND REGULATIONS OF FIXED ODDS BETTING GAMES

Mid-Summer Classic Baseball Tournament

RUSSELL ARTHUR SENIOR SOFTBALL ASSOCIATION (RASSA) RULES OF PLAY AND LEAGUE RESPONSIBILITIES FOR MORNING LEAGUES

Averages. October 19, Discussion item: When we talk about an average, what exactly do we mean? When are they useful?

Future Expectations for Over-Performing Teams

THE VERY BEST OF THE RAT PACK DOWNLOAD EBOOK : THE VERY BEST OF THE RAT PACK PDF

SECTION 1. READING AND WRITING NUMBERS PLACE VALUE

NAGAAA Ratings Definitions

Transcription:

September 29, 2011 New type of file on Retrosheet website Overview by Dave Smith Below is a detailed description of a major new effort by Retrosheet, one that has never been undertaken by any other group or company or by Major League Baseball itself. We have done a detailed examination of the game by game totals for batters, pitchers and fielders to check for inconsistencies in the official records. We have recently published the first set of results from this examination for several seasons, following the detailed methodology laid out below. What are the official records in baseball? This turns out to be less obvious than most people understand. Leaving aside the modern, instantaneous computerized recording of events, the historical pattern had two basic steps. First, the official scorer at the game completed a standardized form with the totals for each player in all the categories required in the rule book. These have varied over the years, but there are around 25 for batters (including fielding data) and about the same for pitchers. Second, this form was sent to the American or National League office where the numbers were transcribed onto large ledger pages, a separate one for each player and pitcher. At the end of the season, the columns were totaled for each player and they formed the basis of the summaries in the annual guides (Spalding, Reach, The Sporting News). Examples of these handwritten pages may be seen for a 1927 batter at http://www.retrosheet.org/ruth1927page5.pdf and a 1934 pitcher at http://www.retrosheet.org/dean1934page1.pdf. We must always remember what "official" means. The literal meaning is "of the office," not "guaranteed to be accurate." That is, a designation of "official" is a statement of authority, not of reliability. The ledger pages have long since been transferred to the Hall of Fame Library in Cooperstown where they have been microfilmed. Retrosheet is fortunate to have been able to purchase a copy of these microfilms (nearly 200 reels) and they were the basis of the work we have just released. Of course, all of this information is now transmitted from the press box to the MLB offices in "real time", so the issue is primarily for games played prior to 1980. Most people have never seen these images of the official daily pages and the concept of "official" is therefore usually associated with season or career totals. There have been many disputes and controversies about these aggregate totals, but in reality there is no single official set of yearly or career totals, even though various encyclopedias have been designated as "official" over the years. MLB tends to defer to the Elias Sports Bureau

when disagreements arise, but Elias does not publish its information either in print or electronic versions, with one consequence being that resolution of differences is difficult. The distinction between "official" and "accurate" may seem odd in 2011, but 100 years ago, it was a very serious issue. The late Leonard Koppett, 1992 recipient of the J.G. Taylor Spink award from the Baseball Writers Association of America, urged caution in these matters, noting that in the early 20th century there were "dueling statistics" (his phrase to me) and that everyone was better served by having one set agreed upon by everyone. It was this conversation that led me to appreciate the powerful significance of "from the office." I wrote an article for Memories and Dreams, the Hall of Fame publication, on this topic in the spring of 2011. A longer version of that story is at http://www.retrosheet.org/baseballrecords.pdf. How does all of this relate to Retrosheet's recent release? The approach was to examine the daily official totals from the microfilm for each game and to check for internal consistency. For example, as noted below, the number of strikeouts charged to one team's batters must equal the number of strikeouts credited to the opposing team's pitchers. Other types of checks were done as well (again, see below for details), and thousands of inconsistencies have been identified and carefully documented. At Retrosheet, we have traditionally referred to these as "discrepancies" and not as "errors" since the latter term has a great potential to be inflammatory, with the likely result that antagonistic relations will obscure the overall effort. However, many of the differences that were found truly are errors and it is time that we note them that way. Details of the Proofing Process by Tom Ruane In light of the recent release of Retrosheet's discrepancy files, I thought it might be a good idea to discuss their history and explain how the first set will differ from subsequent releases. Back in July of 2004, Retrosheet announced the formation of the Box Score Project, an effort designed to generate computerized box score event files (a format devised by Ted Turocy and myself years earlier while we were working on the 1911 NL) for the years prior to those covered by our event files. This effort had three main components: 1) digitizing lineup, line score and any other information not in the dailies that would be needed to generate box scores, 2) digitizing the Hall of Fame player, pitcher and team dailies and,

3) combining the results of the first two steps into proofed box score event files. We started with a call for volunteers that summer and within six years had a complete set of box score event files from 1920 to 1949. Now our volunteers are meticulous and dedicated people, but they are not infallible, so we needed to ensure that all of the work we received was thoroughly proofed in order to minimize the number of errors we released. So we did a series of sanity checks on all the digitized files we received. Some of the checks we did were obvious. We ensured, for example, that: 1) the sum of each batter or pitcher's dailies matched their official seasonal totals, 2) each team had a player at each position at both the beginning and end of each game, 3) the sum of a team batters and fielders' in each game matched the applicable value in the team dailies and, 4) the sum of a team batters' data matched the sum of the opposing team's pitchers' data. And some of our checks were not as obvious. For example, ensuring that: 1) each team's plate appearances by lineup position made sense, unless there was a documented case of the team batting out of order during the game and, 2) no one player scored and drove in all of his team's runs without being credited with the same number of home runs. There were dozens of different checks, but all of them were designed to eliminate data entry errors on the part of our volunteers and allow us to produce box scores on our site that made sense. What we found, however, was that once all of the data entries were corrected, the official Hall of Fame dailies still failed many of these checks. In short, each league/season contained hundred of instances where the official dailies had to be wrong. The most common error involved having incorrect defensive positions marked so that, for example, a team officially had two second basemen in a game and no shortstop. But there were a wide variety of other apparent official mistakes and all of them had to be investigated and resolved. In some cases,

we couldn't determine what actually happened and have left anomalies in the box scores. But most of the time, we found evidence to support a version of the game that corrected these errors. To be sure, mistakes still exist and I'm sure there are occasions where we have compounded the original error by a mistaken "correction," but these discrepancy files are the first step to both documenting possible errors in the official record as well as helping to bring to light our mistakes. This is certainly not the final word on the subject; rather, we hope it is the start of a dialogue. The format of the discrepancy files are described in detail at http://www.retrosheet.org/discrepancy%20file%20format.pdf but they are currently displayed in three areas on our web-site: 1) Each team page, if there are discrepancies associated with that team, contains a link to a page describing them, 2) Each player page, if there are discrepancies associated with that player, contains a link to a page describing them, 3) Each game (box score) page, if there are discrepancies associated with that game, contains a link to a page describing them, Earlier I had hinted that the first set of discrepancy files will differ from subsequent ones. What I meant was that this release contains discrepancies between our box score event files (created from Official Dailies) and the official statistics (or least Pete Palmer's statistical DB, which is even better than the official statistics). As a result, almost all of the discrepancies in these files involve cases where the official version has to be incorrect. No team can play a game with two second-basemen and no shortstop. No player can go 2-2 with two strikeouts, or have more home runs than RBIs, and so on. We may be wrong in how we resolved the problem, but we know that the official version can't be right. Subsequent releases will also have discrepancies like this, but in addition they will include differences between the Hall of Fame dailies and the statistics derived from our play-by-play files. In these cases, both views of the game may be equally plausible on the surface; we simply believe that the play-by-play file is correct. In the next release, we will be including the discrepancy file for the 1922 NL so this might be a good example. There are a total of 1223 discrepancies in that file. 356 of them are differences between the box score files and the official dailes (in other words, instances where we can be confident there IS an error in the official view of things). But the other 867 are not as clear-cut. We have invesigated the games involved and think our version is the correct one, but further research may not uphold this view.

One final word: almost all of the initial generation of these discrepancy files was done by a single Retrosheet volunteer. If you have been reading the credits on the website or in our release notes over the last decade or so, Dave Lamoureaux's name should be familiar to you by now. He was responsible for a lion share of both of the first two digitization steps listed above (the lineup entry and the dailies), and he was a one-man show when it came to taking lists of statistical discrepancies and turning it into a game-by-game description of all of our differences. Much like the box scores from this era, these discrepancies files would not exist without thousands of hours of his work.