Standing Between a Bayesian and a Frequentist: An Emperical Bayes Exploration of Movies, Baseball, and Long Beach Basketball.

Similar documents
Sears Directors' Cup Final Standings

Division I Sears Directors' Cup Final Standings

West Coast Conference CONFERENCE BASKETBALL STATISTICS Through games of Dec 22, 2006 (All games)

Illinois Volleyball TEAM MATCH RECORDS

Val Whiting, a two-time Kodak All-American, still holds a total of seven records in the Stanford career and regular season books.

Indian Cowboy College Basketball Record. By Game Daily Season To Date Date Game / pick Score W / L Units $$$ Units $$$ Units $$$

2010 BIG 12 COMPOSITE SCHEDULE

SPORTS INFORMATION Jon M. Huntsman Center 1825 E. South Campus Dr., Front Salt Lake City, Utah Phone Fax

AKRON, UNIVERSITY OF $16,388 $25,980 $10,447 $16,522 $14,196 $14,196 $14,196 ALABAMA, UNIVERSITY OF $9,736 $19,902 N/A N/A $14,464 $14,464 $14,464

BIG WEST MEN S BASKETBALL

UCLA WOMEN S SOCCER NCAA Tournament. Single-Season Records

Bryan Clair and David Letscher

West Coast Conference CONFERENCE BASKETBALL STATISTICS Through games of Dec 24, 2005 (All games)

All-Time College Football Attendance (Includes all divisions and non-ncaa teams) No. Total P/G Yearly Change No. Total P/G Yearly Change Year Teams

WEEKLY AGGIE UPDATE - NOVEMBER 12, 2012

HANDICAP ACTIVE AND INACTIVE SEASON SCHEDULE

UNIVERSITY OF UTAH SPORTS INFORMATION

University of Northern Colorado

Stanford Basketball 123

REGIONAL SITES (THURSDAY / SATURDAY)

LONG BEACH STATE MEN S BASKETBALL WEEKLY RELEASE Contact: Steve Janisch Date: March 3, 2008

Jeanne Ruark Hoff Field Goal Percentage (58.6) Kate Starbird Points Scored (2,215) Kelley Suminski Free Throw Percentage (.843)

Only one team or 2 percent graduated less than 40 percent compared to 16 teams or 25 percent of the men s teams.

UCLA Men's Basketball UCLA Combined Team Statistics (Complete Season) All games

LONG BEACH STATE MEN S BASKETBALL

LONG BEACH STATE MEN S BASKETBALL WEEKLY RELEASE Contact: Steve Janisch Date: February 18, 2008

LONG BEACH STATE MEN S BASKETBALL

WOLF PACK BASKETBALL HISTORY

Agricultural Weather Assessments World Agricultural Outlook Board

Agricultural Weather Assessments World Agricultural Outlook Board

University of Arizona Men s Golf Results

TABLE C: STATE MANDATES AND FUNDING LEVELS

Big West. Week 18: Monday, March 8, 2010

Table B-8: U.S. Medical School MD-PhD Applications and Matriculants by School, In-State Status, and Sex,

INDIVIDUAL STATISTICS

49ers look to excel in pressure-cooker

WEEKLY AGGIE UPDATE - MARCH 16, 2015

Big West Men s Basketball Week 17: Tuesday, February 25, 2014

2015 SEC Women s Tennis

University of Northern Colorado

Occupant Protection Laws

FRESNO STATE CLASSIC MARCH 8-9, 2010 SAN JOAQUIN C.C. FRESNO, CALIF. Par 72 / 6,970 Yards Host: Fresno State K-STATE SCHEDULE/RESULTS

STANFORD WOMEN S VOLLEYBALL RECORD BOOK

SINGLE-MATCH RECORDS INDIVIDUAL TEAM

Catena Media analysis of how we expect sports betting to roll out across the United States of America.

NO. 4/8 USC WOMEN'S VOLLEYBALL HOSTS MET-RX INVITATIONAL TO OPEN HOME SCHEDULE Women of Troy Open Season 2-0 After Sweeping Texas, Arkansas

Athletic Media Relations

28/1/10 Intercollegiate Athletics Director's Office Scrapbooks, Box 1:

CLIFF PENNINGTON. Oakland A s 2012 Math Educational Program. 3 rd -5 th Grade Workbook

Part 2: Complete the Vocabulary Worksheet for the article N.C.A.A. Tournament: Familiar Favorites and Compelling Underdogs

Water Polo UC SANTA BARBARA. Weekly Release #2 February 19, UCSBgauchos.com

Men s Attendance Records Attendance Leaders... 2 Annual Home Attendance Champions... 4 All-Time Largest Crowds... 5

2011 National Survey of Fishing, Hunting, and Wildlife-Associated Recreation

Occupant Protection Laws

Athletic Media Relations

MARYLAND MEN'S GOLF STATS - FINAL

UTAH STATE VOLLEYBALL. Release #6 9/29/03

BEAVER SOFTBALL OREGON STATE SINGLE-SEASON BATTING RECORDS SINGLE-SEASON BATTING RECORDS CAREER BATTING RECORDS CAREER BATTING RECORDS

Big West. Week 13: Monday, January 31, Tuesday, January 25 Cal State Bakersfield 77, Cal State Fullerton 75

TABLE 3. Level of Activity in State Financial Assurance Funds 2017

WOMEN S SOCCER 2011 UNIVERSITY OF SAN DIEGO SAN DIEGO BOUNCES BACK TO DEFEND TORERO STADIUM 2011 SCHEDULE AND RESULTS

University of Arizona Men s Golf Results

Occupant Protection Laws

QUICK FACTS/NON-CONFERENCE SCHEDULE

INDIVIDUAL STATISTICS

Big West. schedule Last Week s Results

WILDCAT ALMANAC northwestern women s tennis NUsports.com northwestern women s tennis NUsports.com

WEEKLY AGGIE UPDATE - APRIL 6, 2015

Records ASU. Baseball. Oddibe McDowell 83-84

CSUN Softball Record Book

AP Poll System. Overview. Details

COLLEGIATE SKATING PROGRAMS

1-2 Game SCHEDULE 7-8, 0-0 BIG WEST. Jan p.m. CSUN at UC Irvine CSUN TRAVELS TO UC IRVINE SATURDAY

There are three major federal data sources that we evaluate in our Bicycle Friendly States ranking:

APA Local Chapter Pictorial

2011 Stanford men s Volleyball

Arkansas Golf FINAL Results

2011 ACRA National Championship Lake Lanier, GA May 28th & 29th

EASTERN INTERCOLLEGIATE VOLLEYBALL ASSOCIATION WEEKLY REPORT

Big West Men s Basketball Week 19: Tuesday, March 11, 2014

Finals Brackets. Finals Brackets... 73

2009 CAL STATE FULLERTON WOMEN S VOLLEYBALL QUICK FACTS

Purpose of the Efficiency Program Industry By State and Region Appendices and Limitations of Data

WEEKLY AGGIE UPDATE - SEPTEMBER 1, 2014

MEN S GOLF SEASON STATS

Local Chapter Pictorial

o Arizona o California o Washington o Oregon State o Oregon o Stanford o UCLA o USC

Here is a look at what programs did the season after participating in the CIT.

A Comparison of Highway Construction Costs in the Midwest and Nationally

Year Games No. Avg. Year Games Goals

Catena Media analysis of how we expect sports betting to roll out across the United States of America.

Occupant Protection Laws

CHAPTER 6. APPENDICES

28/5/20 Athletic Association Publicity Curt Beamer Photographic File, Box 1:

Week 14: Tuesday, May 15, Last Week s Results. Tuesday, May 8 Nevada 9, UC Davis 8 No. 18 San Diego 4, UC Irvine 1 San Francisco 5, Pacific 2

Our Shining Moment: Hierarchical Clustering to Determine NCAA Tournament Seeding

Dustin Pedroia - # SS 3V ( ) Bats: Right Throws: Right Woodland, Calif. (Woodland)

Mission Control March Madness 2009

COLLEGE BOWL GUIDE

NITTY-GRITTY (THROUGH GAMES OF January 1, 2019 ) Men's Basketball. Conf. Record. Non-Conf. Record

Transcription:

Standing Between a Bayesian and a Frequentist: An Emperical Bayes Exploration of Movies, Baseball, and Long Beach Basketball Arthur Berg Pennsylvania State University

Arthur Berg Standing Between a Bayesian and a Frequentist 2 / 28

Bayesian and Frequentist Representatives Rev. Thomas Bayes FRS (1702-1761) English Mathematician Presbyterian Minister Sir Ronald Fisher FRS (1890-1962) English Statistician Evolutionary Biologist, Geneticist P (H E) = P (E H)P (H) P (E) Let the data speak for itself. Arthur Berg Standing Between a Bayesian and a Frequentist 3 / 28

Bayes Estimator as a Convex Combination 1 st Goal: List the top 250 movies of all time. Movies are rated on a scale of 1 to 10. Some movies are rated by many people, and some by only a few. Movies with fewer than 3000 votes are not considered. All movies have an average rating of C = 6.9. µ i represents the mean rating by everyone who has seen movie i. The real goal is to construct the best estimate of µ i, then pick the top 250. The frequentist approach uses only X i, the average rating for movie i. = X i ˆµ (Fisher) i The Bayesian approach shrinks X i towards C with more shrinking applied when the number of votes for movie i is small. ˆµ (Bayes) i = α i Xi + (1 α i )C where α i (0, 1) Arthur Berg Standing Between a Bayesian and a Frequentist 4 / 28

Internet Movie Database Top 250 Rank WR R Title Votes 1 9.2 9.2 The Shawshank Redemption (1994) 546,155 2 9.1 9.2 The Godfather (1972) 427,961 3 9.0 9.0 The Godfather: Part II (1974) 257,643 4 8.9 9.0 The Good, the Bad and the Ugly (1966) 170,045 5 8.9 9.0 Pulp Fiction (1994) 436,456 6 8.9 8.9 Inception (2010) 265,531 7 8.9 8.9 Schindler s List (1993) 289,170 8 8.9 8.9 12 Angry Men (1957) 126,983 9 8.8 8.9 One Flew Over the Cuckoo s Nest (1975) 225,419 10 8.8 8.9 The Dark Knight (2008) 487,800 85 8.5 8.7 Black Swan (2010) 20,326 142 8.2 8.3 Avatar (2009) 285,005 240 8.0 8.5 True Grit (2010) 6,444 Arthur Berg Standing Between a Bayesian and a Frequentist 5 / 28

IMDb Weighted Ranking a true Bayesian estimate WR i = v ir i + mc v i + m = v i v i + m α i R i = average rating of the movie i ( X i ) R i + Xi v i = total number of votes from regular voters m v i + m 1 α i m = minimum # of votes to make the list = 3000 C = grand mean across all movies in the database = 6.9 C Arthur Berg Standing Between a Bayesian and a Frequentist 6 / 28

A Bayesian Calculation X i = (X i,1,..., X i,vi ) represents the v i ratings of movie i. prior: µ i N (µ 0, σ 2 0 ) conditional: X i,j µ i iid N (µi, σ 2 ) (j = 1,..., v i ) ˆµ (Bayes) i = E[µ i X i ] v i = ( v i + σ 2 /σ0 2 ) X i + ( σ2 /σ0 2 v i + σ 2 /σ0 2 ) µ 0 v i = v i + m R m i + v i + m C µ 0 = C, m = σ 2 /σ0 2 Arthur Berg Standing Between a Bayesian and a Frequentist 7 / 28

1 Does 2 How shrinking really help? much to shrink by? Prediction Error = n i=1 (µ i ˆµ i ) 2

Standing Between a Bayesian and a Frequentist In 1956, Charles Stein proved the existence of an estimator better than the sample mean under certain assumptions. In 1961, Willard James and Charles Stein explicitly constructed such an estimator. Arthur Berg Standing Between a Bayesian and a Frequentist 9 / 28

The James-Stein Estimator (n 4) µ i N (µ 0, σ 2 0) X i µ i iid N (µi, σ 2 ) (i = 1,... n) σ 2 ˆµ (Bayes) i = E [µ i X i ] = ( σ0 2 + σ2 α )µ 0 + ( σ0 2 σ0 2 + )X i σ2 1 α (n ˆµ (JS) 3)σ2 i = ( (X i X) 2 α ) X + ( 1 In practice, if σ 2 is unknown, an estimate is used. (n 3)σ2 (X i X) 2 )X i 1 α Arthur Berg Standing Between a Bayesian and a Frequentist 10 / 28

Predicting Batting Averages 2 nd Goal: Predict final batting averages from pre-season performances. Pre-season batting averages for 18 major league players are provided. Season final batting averages for the same players are also recorded. Data is from the 1970 season and is published in JASA (1975) and Scientific American (1977) by Efron and Morris. The frequentist approach uses only X i, the pre-season batting average for player i. = X i ˆp (Fisher) i The Emperical Bayes approach shrinks X i towards X by some empirically determined amount. ˆp (Stein) i = ˆαX i + (1 ˆα) X where ˆα (0, 1) Arthur Berg Standing Between a Bayesian and a Frequentist 11 / 28

Name hits/ab pre-season (ˆµ (ML) ) season final (µ) 1 Clemente 18/45 0.400 0.346 2 Robinson 17/45 0.378 0.298 3 Howard 16/45 0.356 0.276 4 Johnstone 15/45 0.333 0.222 5 Berry 14/45 0.311 0.273 6 Spencer 14/45 0.311 0.270 7 Kessinger 13/45 0.289 0.263 8 Alvarado 12/45 0.267 0.210 9 Santo 11/45 0.244 0.269 10 Swoboda 11/45 0.244 0.230 11 Unser 10/45 0.222 0.264 12 Williams 10/45 0.222 0.256 13 Scott 10/45 0.222 0.303 14 Petrocelli 10/45 0.222 0.264 15 Rodriguez 10/45 0.222 0.226 16 Campaneris 9/45 0.200 0.286 17 Munson 8/45 0.178 0.316 18 Alvis 7/45 0.156 0.200 Arthur Berg Standing Between a Bayesian and a Frequentist 12 / 28

Batting Average Dataset 1977 Batting Averages Dataset (Efron) Batting Average 0.0 0.1 0.2 0.3 0.4 pre season season final 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Arthur Berg Standing Between a Bayesian and a Frequentist 13 / 28

James-Stein Estimation of Batting Averages 1977 Batting Averages Dataset (Efron) Batting Average 0.0 0.1 0.2 0.3 0.4 pre season season final 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Arthur Berg Standing Between a Bayesian and a Frequentist 14 / 28

Ranking Bias Emperical Bayes + Order Statistics 1977 Batting Averages Dataset (Efron) Genome-wide association studies SNPS: AA/Aa/aa or 0/1/2 ( 10 7 ) Batting Average 0.0 0.1 0.2 0.3 0.4 pre season season final 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ranking bias estimator part frequentist, part Bayesian with robust properties Estimated effects of the top SNPs are biased up. (winner s curse) Applied to 2 GWAS studies with 2,000 cases and 3,000 controls Crohn s Disease Type 1 Diabetes Arthur Berg Standing Between a Bayesian and a Frequentist 15 / 28

49ers Statistics http://www.longbeachstate.com/ Arthur Berg Standing Between a Bayesian and a Frequentist 16 / 28

Opponents Over 3 Seasons 08-09, 09-10, 10-11 opponent # alaska anchorage 1 arizona state 1 boise state 1 byu cougars 1 byu hawaii 1 cal poly 7 cal state fullerton 6 cal state northridge 6 clemson 2 cs monterey bay 1 duke 1 green bay 2 idaho 1 idaho state 1 iowa 1 kentucky 1 loyola marymount 2 montana 1 montana state 1 new mexico state 1 north carolina 1 notre dame 1 oregon 1 pacific 8 pepperdine 2 saint mary s 1 saint peter s 1 san diego state 1 san francisco state 1 syracuse 1 temple 1 texas 1 uc davis 6 uc irvine 6 uc riverside 6 uc santa barbara 7 ucla 1 univ. san francisco 1 utah state 2 washington 1 weber state 2 west virginia 1 wisconsin 1 Arthur Berg Standing Between a Bayesian and a Frequentist 17 / 28

Winning Percentages All Games All 3 Seasons (93) 56% 08-09 Season (30) 50% 09-10 Season (33) 52% 10-11 Season (30) 67% Conference Games All 3 Seasons 67% 08-09 Season 63% 09-10 Season 50% 10-11 Season 88% Arthur Berg Standing Between a Bayesian and a Frequentist 18 / 28

spread 0 5 10 15 Spread = 49ers Score Opponent Score (10 11 Season) uc santa barbara cal state northridge uc davis cal poly uc riverside cal state fullerton pacific uc irvine Arthur Berg Standing Between a Bayesian and a Frequentist 19 / 28

spread 0 5 10 15 Spread = 49ers Score Opponent Score (10 11 Season) uc santa barbara cal state northridge uc davis cal poly uc riverside cal state fullerton pacific uc irvine Arthur Berg Standing Between a Bayesian and a Frequentist 20 / 28

Over/Under (Total Score) 120 140 160 uc irvine Over/Under = 49ers Score + Opponent Score (10 11 Season) cal state fullerton cal state northridge uc riverside Arthur Berg Standing Between a Bayesian and a Frequentist 21 / 28 uc davis pacific uc santa barbara cal poly

Over/Under (Total Score) 120 140 160 uc irvine Over/Under = 49ers Score + Opponent Score (10 11 Season) cal state fullerton cal state northridge uc riverside Arthur Berg Standing Between a Bayesian and a Frequentist 22 / 28 uc davis pacific uc santa barbara cal poly

Conversion Formulas x = LB Score y = Opponent Score Over/Under = x + y Spread = x y Over/Under + Spread x = 2 Over/Under Spread y = 2 Arthur Berg Standing Between a Bayesian and a Frequentist 23 / 28

Predictions Over Under Rank Opponent LB Score O. Score Spread 2 Cal Poly 66 55 11 121 3 Cal State Northridge 81 66 15 147 4 Pacific 69 68 1 136 5 UC Santa Barbara 72 55 17 126 6 Cal State Fullerton 79 71 7 150 7 UC Riverside 75 66 9 141 8 UC Irvine 82 80 2 161 UC Davis 76 64 13 140 Arthur Berg Standing Between a Bayesian and a Frequentist 24 / 28

How good are the predictions? Using the 09-10 season to predict the 10-11 season: adjusted prediction error for spread unadjusted prediction error spread = 197 341 = 58% adjusted prediction error for over/under unadjusted prediction error over/under = 513 818 = 63% Using the 08-09 season to predict the 09-10 season: adjusted prediction error for spread unadjusted prediction error spread = 150 194 = 78% adjusted prediction error for over/under unadjusted prediction error over/under = 442 641 = 69% Arthur Berg Standing Between a Bayesian and a Frequentist 25 / 28

LB vs UCI Vegas Odds (as of 3am on game day) All bets are pay $110 to win $100. Long Beach is the favorite; UCI is the underdog. Casino Spread Over/Under LV Hilton -10 148.5 Wynn -9.5 149 MGM Mirage -10 NA Predicted -2 161 These predictions recommend betting on UCI (still expecting LB to win) and betting on over for the over/under option. Arthur Berg Standing Between a Bayesian and a Frequentist 26 / 28

Disclaimers: 1 I do not necessarily encourage sports betting. 2 I am not liable for any bets made based my presentation. Arthur Berg Standing Between a Bayesian and a Frequentist 27 / 28

Thank You!! Beach.ArthurBerg.com berg@psu.edu Arthur Berg Standing Between a Bayesian and a Frequentist 28 / 28