Paul M. Sommers David U. Cha And Daniel P. Glatt. March 2010 MIDDLEBURY COLLEGE ECONOMICS DISCUSSION PAPER NO

Similar documents
Homework 2. is unbiased if. Y is consistent if. c. in real life you typically get to sample many times.

The t-test. What We Will Cover in This Section. A Research Situation

Using Rates of Change to Create a Graphical Model. LEARN ABOUT the Math. Create a speed versus time graph for Steve s walk to work.

Capacity Utilization Metrics Revisited: Delay Weighting vs Demand Weighting. Mark Hansen Chieh-Yu Hsiao University of California, Berkeley 01/29/04

The Changing Hitting Performance Profile In the Major League, September 2007 MIDDLEBURY COLLEGE ECONOMICS DISCUSSION PAPER NO.

AP Physics 1 Per. Unit 2 Homework. s av

What is a Practical (ASTM C 618) SAI--Strength Activity Index for Fly Ashes that can be used to Proportion Concretes Containing Fly Ash?

Time & Distance SAKSHI If an object travels the same distance (D) with two different speeds S 1 taking different times t 1

Morningstar Investor Return

Interpreting Sinusoidal Functions

Overview. Do white-tailed tailed and mule deer compete? Ecological Definitions (Birch 1957): Mule and white-tailed tailed deer potentially compete.

An Alternative Mathematical Model for Oxygen Transfer Evaluation in Clean Water

MODEL SELECTION FOR VALUE-AT-RISK: UNIVARIATE AND MULTIVARIATE APPROACHES SANG JIN LEE

1. The value of the digit 4 in the number 42,780 is 10 times the value of the digit 4 in which number?

CALCULATORS: Casio: ClassPad 300 ClassPad 300 Plus ClassPad Manager TI: TI-89, TI-89 Titanium Voyage 200. The Casio ClassPad 300

Do Competitive Advantages Lead to Higher Future Rates of Return?

Chapter : Linear Motion 1

What the Puck? an exploration of Two-Dimensional collisions

A Probabilistic Approach to Worst Case Scenarios

Stock Return Expectations in the Credit Market

Market Timing with GEYR in Emerging Stock Market: The Evidence from Stock Exchange of Thailand

CMA DiRECtions for ADMinistRAtion GRADE 6. California Modified Assessment. test Examiner and Proctor Responsibilities

3. The amount to which $1,000 will grow in 5 years at a 6 percent annual interest rate compounded annually is

San Francisco State University ECON 560 Fall Midterm Exam 2. Tuesday, October hour, 15 minutes

Review of Economics & Finance Submitted on 27/03/2017 Article ID: Mackenzie D. Wood, and Jungho Baek

Idiosyncratic Volatility, Stock Returns and Economy Conditions: The Role of Idiosyncratic Volatility in the Australian Stock Market

Lifecycle Funds. T. Rowe Price Target Retirement Fund. Lifecycle Asset Allocation

Strategic Decision Making in Portfolio Management with Goal Programming Model

Time-Variation in Diversification Benefits of Commodity, REITs, and TIPS 1

KEY CONCEPTS AND PROCESS SKILLS. 1. An allele is one of the two or more forms of a gene present in a population. MATERIALS AND ADVANCE PREPARATION

KINEMATICS IN ONE DIMENSION

Proportional Reasoning

Economics 487. Homework #4 Solution Key Portfolio Calculations and the Markowitz Algorithm

INSTRUCTIONS FOR USE. This file can only be used to produce a handout master:

Explore Graphs of Linear Relations. 1. a) Use a method of your choice to determine how much water will be needed each day of a seven-day cruise.

CHAPTER TEST REVIEW, LESSONS 4-1 TO 4-5

Rolling ADF Tests: Detecting Rational Bubbles in Greater China Stock Markets

ANALYSIS OF RELIABILITY, MAINTENANCE AND RISK BASED INSPECTION OF PRESSURE SAFETY VALVES

Reexamining Sports-Sentiment Hypothesis: Microeconomic Evidences from Borsa Istanbul

Breeding Incentive Programs and Demand for California Thoroughbred Racing: The Tradeoff Between Quantity and Quality. Martin D.

The Current Account as A Dynamic Portfolio Choice Problem

WHO RIDE THE HIGH SPEED RAIL IN THE UNITED STATES THE ACELA EXPRESS CASE STUDY

Market timing and statistical arbitrage: Which market timing opportunities arise from equity price busts coinciding with recessions?

Predicting Genotypes

FORECASTING TECHNIQUES ADE 2013 Prof Antoni Espasa TOPIC 1 PART 2 TRENDS AND ACCUMULATION OF KNOWLEDGE. SEASONALITY HANDOUT

Oath. The. Life-changing Impact TEACH HEAL DISCOVER. Going Into the Wild to Save Rhinos. Tracking Down Outbreaks page 2. Teaming Up for Nekot page 7

CALCULATION OF EXPECTED SLIDING DISTANCE OF BREAKWATER CAISSON CONSIDERING VARIABILITY IN WAVE DIRECTION

Betting Against Beta

296 Finance a úvěr-czech Journal of Economics and Finance, 64, 2014, no. 4

The APT with Lagged, Value-at-Risk and Asset Allocations by Using Econometric Approach

MORTALITY ESTIMATES FOR JUVENILE DUSKY SHARKS CARCHARHINUS OBSCURUS IN SOUTH AFRICA USING MARK-RECAPTURE DATA. A. GOVENDER* and S. L.

Chapter / rev/min Ans. C / in. C mm Ans teeth Ans. C / mm Ans.

Flexible Seasonal Closures in the Northern Prawn Fishery

Making Sense of Genetics Problems

Semester Review Session

Slippery Slope? Assessing the Economic Impact of the 2002 Winter Olympic Games in Salt Lake City, Utah

Constructing Absolute Return Funds with ETFs: A Dynamic Risk-Budgeting Approach. July 2008

Valuing Volatility Spillovers

Is the Decline in the Frequency of Draws in Test Match Cricket Detrimental to the Long Form of the Game? # Liam J. A. Lenten *

A Stable Money Demand: Looking for the Right Monetary Aggregate

The Great Recession in the U.K. Labour Market: A Transatlantic View

Improving Measurement Uncertainty of Differential Pressures at High Line Pressures & the Potential Impact on the Global Economy & Environment.

Bootstrapping Multilayer Neural Networks for Portfolio Construction

8/31/11. the distance it travelled. The slope of the tangent to a curve in the position vs time graph for a particles motion gives:

Simulation based approach for measuring concentration risk

Automatic air-main charging and pressure control system for compressed air supplies

Prepared by: Candice A. Churchwell, Senior Consultant Aimee C. Savage, Project Analyst. June 17, 2014 CALMAC ID SCE0350

Zelio Control Measurement Relays RM4L Liquid Level Relays

3.00 m. 8. At La Ronde, the free-fall ride called the Orbit" causes a 60.0 kg person to accelerate at a rate of 9.81 m/s 2 down.

Centre for Investment Research Discussion Paper Series. Momentum Profits and Time-Varying Unsystematic Risk

Methods for Estimating Term Structure of Interest Rates

Performance Attribution for Equity Portfolios

67.301/1. RLP 10: Pneumatic volume-flow controller. Sauter Components

Momentum profits and time varying unsystematic risk

Dual Boost High Performances Power Factor Correction (PFC)

Transit Priority Strategies for Multiple Routes Under Headway-Based Operations

Measuring Potential Output and Output Gap and Macroeconomic Policy: The Case of Kenya

Corresponding Author

Name Class Date. Step 2: Rearrange the acceleration equation to solve for final speed. a v final v initial v. final v initial v.

Single Index and Portfolio Models for Forecasting Value-at- Risk Thresholds *

Neighborhood & Community Services Department

Local Does as Local Is: Information Content of the Geography of Individual Investors Common Stock Investments

SLIPPERY SLOPE? ASSESSING THE ECONOMIC IMPACT OF THE 2002 WINTER OLYMPIC GAMES IN SALT LAKE CITY, UTAH

Towards a New Dynamic Measure of Competitive Balance: A Study Applied to Australia s Two Major Professional Football Leagues *

Sources of Over-Performance in Equity Markets: Mean Reversion, Common Trends and Herding

Unsystematic Risk. Xiafei Li Cass Business School, City University. Joëlle Miffre Cass Business School, City University

Performance Optimization of Markov Models in Simulating Computer Networks

Type Control action Setpoint range Air Weight Volume flow % capacity I n /h kg. Pressure diff. 1) Pa

TRACK PROCEDURES 2016 RACE DAY

A Statistical, Age-Structured, Life-History-Based Stock Assessment Model for Anadromous Alosa

A Study on the Powering Performance of Multi-Axes Propulsion Ships with Wing Pods

The Construction of a Bioeconomic Model of the Indonesian Flying Fish Fishery

August 2010 MIDDLEBURY COLLEGE ECONOMICS DISCUSSION PAPER NO DEPARTMENT OF ECONOMICS MIDDLEBURY COLLEGE MIDDLEBURY, VERMONT 05753

Guidance Statement on Calculation Methodology

Portfolio Strategies Based on Analysts Consensus

Testing Portfolio Efficiency with Non-Traded Assets: Taking into Account Labor Income, Housing and Liabilities

Endogenous Fishing Mortality in Life History Models: Relaxing Some Implicit Assumptions

SIMULATION OF WAVE EFFECT ON SHIP HYDRODYNAMICS BY RANSE

Avoiding Component Failure in Industrial Refrigeration Systems

Sudden Stops, Sectoral Reallocations, and Real Exchange Rates

Transcription:

AN EMPIRICAL TEST OF BILL JAMES S PYTHAGOREAN FORMULA by Paul M. Sommers David U. Cha And Daniel P. Gla March 2010 MIDDLEBURY COLLEGE ECONOMICS DISCUSSION PAPER NO. 10-06 DEPARTMENT OF ECONOMICS MIDDLEBURY COLLEGE MIDDLEBURY, VERMONT 05753 hp://www.middlebury.edu/~econ

2 AN EMPIRICAL TEST OF BILL JAMES S PYTHAGOREAN FORMULA by David U. Cha Daniel P. Gla Paul M. Sommers Deparmen of Economics Middlebury College Middlebury, Vermon 05753 psommers@middlebury.edu

3 AN EMPIRICAL TEST OF BILL JAMES S PYTHAGOREAN FORMULA The Gians do no usually need o score many runs. All ha we mus do is score more han he oher fellow. Bill Terry, manager of he 1932 New York Gians [1, p.136] Bill James, baseball wrier and saisician, in 1980 developed a formula ha relaed a eam s won-los percenage o he number of runs hey scored and allowed, as follows: (1) Won-Los Percenage = 2 ( RunsScored ) ( RunsScored ) 2 + ( RunsAllowed ) 2 Since he Won-Los Percenage is he raio of games won o he oal number of games played (games won plus games los), equaion (1) can be re-wrien as follows: (2) Wins Losses = 2 ( RunsScored ) ( RunsAllowed ) 2 = RunsScored RunsAllowed 2 If, for example, he Boson Red Sox score 867 runs and allow 657 runs (as hey did in 2007), Bill James s Pyhagorean mehod [so dubbed because of he presence of hree squared erms in equaion (1)] 1 projecs ha he eam would have a won-los percenage of (867) 2 /[(867) 2 + (657) 2 ] or.635 (and hence win abou.635 162 or 103 games). In fac, he Red Sox (world champions in 2007) won 96 regular season games or abou 6.8 percen fewer games han he Pyhagorean mehod would predic. In his insance, he exponen of 2 on he righ-hand side of equaion (2)

4 is oo high. Or, one could argue ha 2 is accurae, bu he Boson Red Sox should have won more regular season games in 2007 han hey acually did. The purpose of his brief noe is o empirically es Bill James s Pyhagorean mehod for all eams in boh leagues, by decade, from 1950 o 2007. Does he mehod work as well since 1980 as i did before 1980? Does he mehod work beer for one league (American or Naional) han he oher? Has he exponen in equaion (2) changed in recen decades? The Models Equaion (2) can be wrien in log-linear form as follows: Wins RS (3) ln = 2ln Losses RA where ln is he naural logarihm, RS denoes runs scored, and RA denoes runs allowed. Tha is, if one firs akes logs of boh sides of equaion (2) and hen if we define Wins y = ln and Losses RS x = ln for each eam i in year, we can esimae he coefficiens β 0 and β 1 by applying RA leas squares o y and x in he following regression: (4) y = β 0 + β1x + ε where ε is a disurbance erm. According o Bill James, β 0 should be indisinguishable from zero and β 1 should be close o 2. To es he null hypohesis H 0 : β 1 = 2, we employ a -es. The es saisic is calc b1 2 =, where b 1 is he esimaed slope coefficien and SE(b 1 ) is he SE ( b ) 1

5 sandard error of he esimaed slope coefficien. 2 Hereafer, equaion (4) where Wins RS y = ln and x = ln will be called Model (1). Losses RA Model (1) assumes ha one more run scored has he same impac on a eam s win percenage as does one less run allowed. Bu wha if scoring runs was more (or less) imporan o winning games han allowing runs? Model (1) migh hen be revised as follows: Wins (5) ln = β 0 + β1 ln( RS) + β 2 ln( RA) + ε Losses If we relax he assumpion ha he exponen on he raio RS RA is he same (and, according o James, equal o 2 ), hen he revised model would be described by equaion (5), hereafer, Model (2). The Daa Daa on regular season wins, losses, runs scored, and runs allowed for all eams were gleaned from wo primary sources: Toal Baseball [3] for he years 1950 hrough 2003 and hp://spors.espn.go.com/mlb/sandings for he years 2004 hrough 2007. The Resuls Table 1 shows he regression resuls for each league (as well as for boh leagues combined) for each decade since he 1950s. The esimaed inercep (b 0 ) in all regressions is no discernible from zero, as Bill James would expec. Since he year 2000, he exponen in he raio of runs scored o runs allowed in James s Pyhagorean formula has been indisinguishable from 2. Bu, in decades before he urn of he millennium he exponen was no equal o 2. And, in all cases when we could rejec H 0 : β 1 = 2 (in favor he alernaive hypohesis H A : β 1 2),

6 our esimae b 1 was invariably less han 2. A comparison of he 30-year period 1950-1979 o he 28-year period 1980-2007 shows ha b 1 was in mos cases (wih he excepion of he American League (AL) from 1980 o 2007) significanly less han 2. The impac of RS/RA on winning is marginally higher now (1980-2007) han i was in he earlier period (1950-1979). Compare he value of b 1 (1.9202) esimaed for boh leagues combined in he period 1980-2007 o he corresponding esimae for b 1 (1.8099) in he period 1950-1979. Moreover, i is worh noing ha he average number of runs scored is also higher in he Naional (American) League in he period 1980-2007 han i was in he period 1950-1979 [ RS 1980 2007, NL = 699, RS 1950 1979, NL = 667.3, p-value on he difference beween means is less han.001; RS 1980 2007, AL = 747, RS 1950 1979,AL = 669.6, p-value on he difference is again less han.001]. Figures 1 and 2 show scaerplos of ln(w/l) agains ln(rs/ra) for each subperiod (1950-1979 and 1980-2007, respecively) for each league. Each poin represens an observaion on one eam in one year. The poins more closely fall on a sraigh line for he Naional League, 1950-1979 han hey do for he Naional League, 1980-2007 (compare R 2 =.878 for 1950-1979 wih R 2 =.847 for 1980-2007 in Table 1). Sill, he differences beween he wo periods by league are admiedly very small. Table 2 shows he regression resuls for Model (2), which isolaes he impac of runs scored from he impac of runs allowed on he win-loss raio. The righ-hand column repors he coefficien of deerminaion (R 2 ) for each regression each decade, by league. A look down his column and he corresponding column in Table 1 clearly shows ha he explanaory power (ha is, how well he regressors as a group explain variaion in he dependen variable, namely, ln(wins/losses) ) of Model (2) is no an improvemen over Model (1). In oher words, runs scored and runs allowed seemingly have an equal (and opposie) effec on he win-loss raio.

7 Concluding Remarks Early in he 1980s, Bill James developed a formula in response o he quesion: Can you ell how many games a eam will win, based on is runs scored and runs allowed? A regression analysis of daa on regular season runs scored, runs allowed, and wins (and losses) for each eam each season in Major League Baseball since 1950 reveals ha Bill James s Pyhagorean formula has sood he es of ime very well indeed. Runs scored and runs allowed have equal (and opposie) effecs on eam winning, in boh leagues and in years before and since 1980. If any modificaion should be made o he formula, he exponen on runs scored and runs allowed should be reduced o a power slighly below 2 [ 1.92 for boh leagues since he year 1980].

8 Table 1. Regression Resuls for Model (1) ln(wins/losses) = b 0 + b 1 ln(rs/ra) Slope coefficien on Inercep ln(rs/ra) R 2 b 0 b 1 1950-1959 AL -.0059 [.0131] a 1.7543 [.0598].917 NL.00003 [.0122] 1.8758 [.0737].893 Boh leagues -.0030 [.0089] 1.7985 [.0461].906 1960-1969 AL -.0017 [.0094] 1.8757 [.0593].911 NL -.0013 [.0111] 1.9323 [.0655].901 Boh leagues -.0016 [.0072] 1.9055 [.0441].905 1970-1979 AL.00001 [.0091] 1.8139 [.0560].894 NL.0012 [.0101] 1.6576 [.0642].850 Boh leagues.0006 [.0068] 1.7398 [.0425].873 1980-1989 AL.0005 [.0078] 1.8849 [.0577].885 NL -.0017 [.0100] 2.0195 [.0848].828 Boh leagues -.0005 [.0063] 1.9381 [.0489].859 1990-1999 AL.00003 [.0078] 1.9324 [.0599].883 NL -.0021 [.0090] 1.8370 [.0645].856 Boh leagues -.0012 [.0060] 1.8814 [.0441].869 2000-2007 AL -.0055 [.0099] 2.0026 [.0624].904 NL -.0004 [.0089] 1.8720 [.0682].857 Boh leagues -.0023 [.0066] 1.9445 [.0458].883 1950-1979 AL -.0021 [.0059] 1.8062 [.0333].907 NL.00005 [.0064] 1.8146 [.0393].878 Boh leagues -.0011 [.0044] 1.8099 [.0255].893 1980-2007 AL -.0012 [.0048] 1.9415 [.0344].891 NL -.0014 [.0054] 1.8951 [.0411].847 Boh leagues -.0013 [.0036] 1.9202 [.0266].870 a Numbers in brackes are sandard errors and numbers in boldface (ialics) are significan a beer han he.01 (.05) level. The null hypohesis for he inercep is H 0 :β 0 = 0 and he null hypohesis for he slope coefficien on ln(rs/ra) is H 0 : β 1 = 2. In boh cases, he alernaive hypohesis is wo-ailed.

9 Table 2. Regression Resuls for Model (2) ln(wins/losses) = b 0 + b 1 ln(rs) + b 2 ln(ra) Slope coefficien on: Inercep ln(rs) ln(ra) R 2 b 0 b 1 b 2 1950-1959 AL.3888 [.9790] 1.7224 [.0995] -1.7829 [.0930].917 NL -1.0380 [1.1370] 1.9526 [.1119] -1.7937 [.1164].894 Boh leagues -.2284 [.7351] 1.8162 [.0739] -1.7816 [.0718].906 1960-1969 AL.3171 [.5948] 1.8494 [.0771] -1.8986 [.0733].911 NL.8684 [.7091] 1.8708 [.0823] -2.0052 [.0883].902 Boh leagues.5648 [.4578] 1.8624 [.0562] -1.9499 [.0567].906 1970-1979 AL -.3023 [.5620] 1.8365 [.0702] -1.7900 [.0715].895 NL -.6484 [.7772] 1.7046 [.0854] -1.6046 [.0903].850 Boh leagues -.4204 [.4613] 1.7060 [.0545] -1.7060 [.0564].873 1980-1989 AL.0077 [.3052] 1.8844 [.0619] -1.8855 [.0630].885 NL -.3260 [.4232] 2.0462 [.0919] -1.9959 [.0904].829 Boh leagues -.1265 [.2429] 1.9477 [.0524] -1.9283 [.0525].859 1990-1999 AL.0794 [.4242] 1.9265 [.0680] -1.9385 [.0683].883 NL -.1865 [.4482] 1.8535 [.0762] -1.8253 [.0707].856 Boh leagues -.0895 [.2951] 1.8887 [.0505] -1.8753 [.0486].869 2000-2007 AL -.8735 [.9428] 2.0721 [.0980] -1.9421 [.0907].904 NL 1.6195 [.7686] 1.7343 [.0938] -1.9788 [.0842].862 Boh leagues.5502 [.5714] 1.8998 [.0652] -1.9829 [.0666].884 a Numbers in brackes are sandard errors and numbers in boldface (ialics) are significan a beer han he.01 (.05) level. The null hypohesis for he inercep is H 0 :β 0 = 0 and he null hypoheses for he slope coefficiens on ln(rs) and ln(ra) are H 0 : β 1 = 2 and H 0 : β 2 = -2, respecively. In all hree cases, he alernaive hypohesis is wo-ailed.

10 Figure 1 1.0 Scaerplo of ln(w/l) agains ln(rs/ra), 1950-1979 American League 0.5 ln(w/l) 0.0-0.5-0.50-0.25 0.00 ln(rs/ra) 0.25 0.50 1.0 Scaerplo of ln(w/l) agains ln(rs/ra), 1950-1979 Naional League 0.5 ln(w/l) 0.0-0.5-1.0-0.5-0.4-0.3-0.2-0.1 0.0 ln(rs/ra) 0.1 0.2 0.3 0.4

11 Figure 2 1.0 Scaerplo of ln(w/l) agains ln(rs/ra), 1980-2007 American League 0.5 ln(w/l) 0.0-0.5-1.0-0.5-0.4-0.3-0.2-0.1 0.0 ln(rs/ra) 0.1 0.2 0.3 0.4 0.8 Scaerplo of ln(w/l) agains ln(rs/ra), 1980-2007 Naional League 0.6 0.4 0.2 ln(w/l) 0.0-0.2-0.4-0.6-0.8-0.4-0.3-0.2-0.1 0.0 ln(rs/ra) 0.1 0.2 0.3 0.4

12 References 1. P. Williams, When he Gians Were Gians, Chapel Hill, NC: Algonquin Books, 1994. 2. B. James, The Bill James Baseball Absrac 1983, New York: Ballanine Books, 1983. 3. Toal Baseball: The Ulimae Baseball Encyclopedia (edied by H. Thorn, P. Birnbaum, and B. Deane), Wilmingon, DE: Spors Media Publishing, 2004.

13 Foonoes 1. See, for example, he reference o The Pyhagorean Formula in [2, p. 10]. 2. The b 1 esimae also inerpres as an elasiciy of (Wins/Losses) o (RS/RA), where (in general) he elasiciy of Y wih respec o X is defined as X dy. In oher words, a Y dx 1 percen increase in (RS/RA) will lead o a b 1 percen increase in (Wins/Losses). Moreover, since Wins + Losses is equal o a consan (162 games since he year 1962 and 154 games in years before 1962), i should also be noed ha a given percenage change in Wins is equal o he percenage change in he winning percenage, [Wins/(Wins + Losses)].