Sports Predictive Analytics: NFL Prediction Model
|
|
- Dina Waters
- 6 years ago
- Views:
Transcription
1 Sports Predictive Analytics: NFL Prediction Model By Dr. Ash Pahwa IEEE Computer Society San Diego Chapter January 17, 2017 Copyright 2017 Dr. Ash Pahwa 1
2 Outline Case Studies of Sports Analytics Sports Sports Analytics Applications of Sports Analytics Sports Analytics Literature Data Sources Sports Predictive Models Regression Model Multi Variable Regression with Lasso NFL Prediction Model Prediction for Super Bowl 2016 Prediction for NFL 2017 Playoffs Copyright 2017 Dr. Ash Pahwa 2
3 San Diego L.A. Chargers Copyright 2017 Dr. Ash Pahwa 3
4 Case Studies of Sports Analytics? Copyright 2017 Dr. Ash Pahwa 4
5 What is Sports Analytics? Which Pitcher is Better? 2007 Baseball Jake Peavy San Diego Padres : National League John Lackey L.A. Angles : American League ERA: 2.54 ERA: 3.01 ERA: Earned Run Average. Mean number of runs yielded per 9 innings American league allows Designated Hitter (DH) for pitcher National league does not allow Designated Hitter (DH) for pitcher. Pitcher must bat. Copyright 2017 Dr. Ash Pahwa 5
6 Which Variable is the Best Predictor of the Winning Percentage? Copyright 2017 Dr. Ash Pahwa 6
7 Pythagorean Theorem Used in Baseball. Proposed by Bill James Suppose F represents a team s run scored A represents team s runs allowed Pyhtagorean Winning Percentage = F2 F 2 +A 2 It is called Pythagorean theorem because it is similar to the elementary geometry theorem Example: Year 2012: Detroit Tigers Scored Runs = F = 726 Allowed Runs = A = 670 Pyhtagorean Winning Percentage = F2 F 2 +A 2 = = 527, ,976 = 0.54 Total games won = 162*0.54 = 88 games Copyright 2017 Dr. Ash Pahwa 7
8 How to Convey Information to Decision Makers Data Scientist Understand the Sport Understand the Players Understand Performance data Decision Makers Management Operational Personals Raw Data Selection of Statistical Models Predictive Models Meaningful Results Copyright 2017 Dr. Ash Pahwa 8
9 Goals of Sports Analytics 1. Apply Statistical Models to Sporting Data 2. Ratings and Rankings 3. Predictive Models 4. Player and Team Assessment Copyright 2017 Dr. Ash Pahwa 9
10 Statistical Models Predictive Models Indices of Central Tendencies and Variability Histogram (Frequency Distribution) Statistics Used to Examine Relationships Normal Distribution (z-values and p-values) Inferential Statistics Ratings + Rankings Normality Ratings + Rankings Predictive Models Simple Linear Regression Mean Covariance Outlier Rank Aggregation Multiple Linear Regression Median Correlation Pearson Mode Rank Correlation Spearman t-test ANOVA Range, Variance Partial Correlation Chi-Square Standard Deviation Polynomial Regression Logistic Regression Copyright 2017 Dr. Ash Pahwa 10
11 Ranking of Players and Teams? Copyright 2017 Dr. Ash Pahwa 11
12 Pair-Wise Comparison Copyright 2017 Dr. Ash Pahwa 12
13 Pair-Wise Comparison The Social Network Copyright 2017 Dr. Ash Pahwa 13
14 Pair-Wise Comparison Can be Used for Ranking Sorting the Rating list produces Ranking Copyright 2017 Dr. Ash Pahwa 14
15 Log 5 Method Developed by Bill James in 1970s Computes the probability that Team A will beat Team B Log 5 formula has nothing to do with the mathematical function Log Copyright 2017 Dr. Ash Pahwa 15
16 Log 5 Formula p a p a p b p a,b = p a + p b 2 p a p b Suppose Team A true winning percentage is 10 out of 16 games Percentage of true winning = p a = 10/16 = Suppose Team B true winning percentage is 7 out of 16 games Percentage of true winning = p b = 7/16 = The probability that Team A will beat Team B p a,b = p a p a p b p a +p b 2 p a p b = = = = The probability that Team B will beat Team A p b,a = p b p a p b p a +p b 2 p a p b = = = = p a,b + p b,a = 1 Copyright 2017 Dr. Ash Pahwa 16
17 Arpad Elo Physics Professor at Marquette University Milwaukee, Wisconsin Chess Player Devised a method to rank chess players His method was adopted by US Chess Federation World Chess Federation Copyright 2017 Dr. Ash Pahwa 17
18 Points Gained or Lost Points gained/lost by player A = Points gained/lost by player B Before the game Player A rating r A Player B rating r B Difference d AB = r A r B After the game Player A rating r A Player B rating r B r A + r B = r A + r B Points gained/lost by player A against player B μ AB = L d AB 400 = d AB 400 Copyright 2017 Dr. Ash Pahwa 18
19 Example Before the game Player A rating r A = 2400 Player B rating r B = 2000 Difference d AB = r A r B = 400 Difference d BA = r B r A = 400 If Player A wins Draw If Player B wins S(AB) S(BA) Suppose K = 32 for Chess μ AB = 0.91 μ BA = 0.09 If Player A wins After the game Player A rating r A = r A + K S AB μ AB = = 2403 Player B rating r B = r B + K S BA μ BA = = 1997 r A + r B = r A + r B = If Player B wins After the game Player A rating r A = r A + K S AB μ AB = = 2371 Player B rating r B = r B + K S BA μ BA = = 2029 r A + r B = r A + r B = Copyright 2017 Dr. Ash Pahwa 19
20 The Social Network Elo Formula was Used to pair-wise Comparison of Girls Copyright 2017 Dr. Ash Pahwa 20
21 Sports Copyright 2017 Dr. Ash Pahwa 21
22 Sports Inherent part of Human Culture Sports competition dates back to the dawn of our species Greek Olympics : 776 BC Copyright 2017 Dr. Ash Pahwa 22
23 Sportsman Spirit Virtues of sports fairness self-control courage persistence It has been associated with interpersonal concepts of treating others and being treated fairly Maintaining self-control if dealing with others, and respect for both authority and opponents GOOD SPORTSMEN HAVE ALWAYS BEEN HELD IN HIGH ESTEEM Copyright 2017 Dr. Ash Pahwa 23
24 Most Popular Sports in the World Copyright 2017 Dr. Ash Pahwa 24
25 Most Popular Sports in America Copyright 2017 Dr. Ash Pahwa 25
26 Sports Analytics Copyright 2017 Dr. Ash Pahwa 26
27 Predictive Models Estimation Regression Classification (win/loss) Logistic Regression Discriminant Analysis Linear Quadratic Support Vector Machine Copyright 2017 Dr. Ash Pahwa 27
28 Goals of Sports Analytics Player Discovering hidden talent in a new player Player Evaluation Assessing Player Performance Which metrics is most important to assess a players performance Assessing Player Value How much value a player adds to the teams value Copyright 2017 Dr. Ash Pahwa 28
29 Goals of Sports Analytics Team Ranking top teams Accessing Team Performance How to compute the value of a team Which Team Members are best suited to play against the opposing team Which strategy to use to play against a team? Anticipating Opponents Behavior Accessing the probability of a win in a sporting event Copyright 2017 Dr. Ash Pahwa 29
30 Need for Prediction Results Betting on an sporting event People betting on sports need to see the prediction results Probability of a win Point Spread Fantasy Sports DraftKings FanDuel Copyright 2017 Dr. Ash Pahwa 30
31 Applications of Sports Analytics Copyright 2017 Dr. Ash Pahwa 31
32 Application of Sports PA Movies The film is based on Michael Lewis' 2003 nonfiction book of the same name, an account of the Oakland Athletics baseball team's 2002 season and their general manager Billy Beane's attempts to assemble a competitive team. In the film, Beane (Brad Pitt) and assistant GM Peter Brand (Jonah Hill), faced with the franchise's limited budget for players, build a team of undervalued talent by taking a sophisticated sabermetric approach towards scouting and analyzing players. They acquire "submarine" pitcher Chad Bradford (Casey Bond) and former catcher Scott Hatteberg (Chris Pratt), and win 20 consecutive games, an American League record. Copyright 2017 Dr. Ash Pahwa 32
33 Team Selection: Moneyball Copyright 2017 Dr. Ash Pahwa 33
34 Moneyball Billy Beane & Paul DePodesta William Lamar "Billy" Beane III (born March 29, 1962) is an American former professional baseball player and current front office executive. He is the Executive Vice President of Baseball Operations and minority owner of the Oakland Athletics of Major League Baseball (MLB). The character of Brand is an invention by the filmmakers; in the excellent Michael Lewis non-fiction book upon which the movie is based, the real-life Brand is identified as Paul DePodesta. Unlike Brand, DePodesta is slender, fit and handsome. He s also Harvard-educated (not a Yalie screenwriter Aaron Sorkin s private joke). Copyright 2017 Dr. Ash Pahwa 34
35 Application of Sports PA Boston Red Sox Using Predictive Analytics Strategies Boston Red Sox won 3 world series in Baseball In 2006, Time named Bill James in the Time 100 as one of the most influential people in the world. He is a Senior Advisor on Baseball Operations for the Boston Red Sox. Copyright 2017 Dr. Ash Pahwa 35
36 Sports Analytics Literature Copyright 2017 Dr. Ash Pahwa 36
37 Predictive Models for Sports Literature Copyright 2017 Dr. Ash Pahwa 37
38 Statistical Learning Gareth James Daniela Witten Trevor Hastie Robert Tibshirani Stanford University Copyright 2017 Dr. Ash Pahwa 38
39 Data Sources Copyright 2017 Dr. Ash Pahwa 39
40 Data Sources Valid Accurate Complete Contains derived variables Copyright 2017 Dr. Ash Pahwa 40
41 Data Sources Copyright 2017 Dr. Ash Pahwa 41
42 Data Source Football Copyright 2017 Dr. Ash Pahwa 42
43 Sports Predictive Models Copyright 2017 Dr. Ash Pahwa 43
44 Goals of Predictive Analytics Application: Estimation or Classification Estimation Regression modeling technique is used Output is a number House price Product sales for next quarter GNP growth for the next quarter How many points a team will score Classification Logistic Regression Support Vector Machine Discriminant Analysis (Linear, Quadratic) Naïve Bayes, Decision Trees etc. modeling techniques are used Output is a categorical variable Sports team will win or lose is junk or not Which grade student will get Tweet is positive or negative Copyright 2017 Dr. Ash Pahwa 44
45 Common PA Techniques Regression Linear 2 variables Linear multi variables Logistic Polynomial Clustering Decision Trees Neural Networks Naïve Bayes ARIMA A few more Copyright 2017 Dr. Ash Pahwa 45
46 Regression Model Copyright 2017 Dr. Ash Pahwa 46
47 History of Linear Regression Sir Francis Galton and Karl Pearson Developed the concepts on Regression and Correlation in Copyright 2017 Dr. Ash Pahwa 47
48 Definition: Linear Regression 2 Variable Regression How a response variable y changes y x 1 c As the predictor (explanatory) variable x changes Multiple Regression How a response variable y changes As the predictor (explanatory) variables x1, x2, xn change y x x x x n n c Copyright 2017 Dr. Ash Pahwa 48
49 Single Variable Polynomial Regression First degree to Fifth Degree The 5 th degree polynomial regression looks good for training set data But with test data it may not perform well PROBLEM: Over fit for this data Copyright 2017 Dr. Ash Pahwa 49
50 Overfitting If there exists a model with estimated parameters w such that Training error (w2) < Training Error (w1) Generalization (Test) error (w2) > Generalization (Test) Error (w1) Copyright 2017 Dr. Ash Pahwa 50
51 Bias and Variance Goal is to find out a model complexity where Generalization (Validation) errors are least Bias + variance are least Mean Square Error = Bias 2 + Variance Just like Generalization Error We cannot compute Bias and Variance Copyright 2017 Dr. Ash Pahwa 51
52 1 st Degree Polynomial Fit y = a 1 x + c > #result = lm(y1~x) > result = lm(y1 ~ poly(x,1,raw=true)) > summary(result) Call: lm(formula = y1 ~ poly(x, 1, raw = TRUE)) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) poly(x, 1, raw = TRUE) Residual standard error: on 12 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 12 DF, p-value: > p = predict(result,list(x=xpredict)) > lines(xpredict,p,col='black',lwd=3) > Copyright 2017 Dr. Ash Pahwa 52
53 16 th Degree Polynomial Fit y = a 1 x 16 + a 2 x a 15 x 2 + a 16 x + c > result = lm(y1 ~ poly(x,16,raw=true)) > summary(result) Call: lm(formula = y1 ~ poly(x, 16, raw = TRUE)) Residuals: ALL 14 residuals are 0: no residual degrees of freedom! Coefficients: (3 not defined because of singularities) Estimate Std. Error t value Pr(> t ) (Intercept) 1.820e-01 NA NA NA poly(x, 16, raw = TRUE) e+02 NA NA NA poly(x, 16, raw = TRUE) e+02 NA NA NA poly(x, 16, raw = TRUE) e+03 NA NA NA poly(x, 16, raw = TRUE) e+03 NA NA NA poly(x, 16, raw = TRUE) e+03 NA NA NA poly(x, 16, raw = TRUE) e+03 NA NA NA poly(x, 16, raw = TRUE) e+02 NA NA NA poly(x, 16, raw = TRUE) e+02 NA NA NA poly(x, 16, raw = TRUE) e+01 NA NA NA poly(x, 16, raw = TRUE) e+00 NA NA NA poly(x, 16, raw = TRUE) e-01 NA NA NA poly(x, 16, raw = TRUE)12 NA NA NA NA poly(x, 16, raw = TRUE) e-04 NA NA NA poly(x, 16, raw = TRUE)14 NA NA NA NA poly(x, 16, raw = TRUE) e-06 NA NA NA poly(x, 16, raw = TRUE)16 NA NA NA NA Residual standard error: NaN on 0 degrees of freedom Multiple R-squared: 1, Adjusted R-squared: NaN F-statistic: NaN on 13 and 0 DF, p-value: NA Copyright 2017 Dr. Ash Pahwa 53
54 Under fit or Over fit Model # Degree of Polynom ial Under fit or Over fit 1 1 Under fit 2 2 Under fit 3 4 Under fit 4 8 OK 5 10 Over fit 6 16 Over fit Copyright 2017 Dr. Ash Pahwa 54
55 Lasso Regression Least Absolute Shrinkage and Selection Operator Copyright 2017 Dr. Ash Pahwa 55
56 Cost Function of OLS + Ridge + Lasso OLS Ridge Regression Lasso Regression Copyright 2017 Dr. Ash Pahwa 56
57 NFL Model Copyright 2017 Dr. Ash Pahwa 57
58 Data Source Seasons Weeks 1 21 Week#1 #17: 16 games + 1 bye Week#18: 4 games: Wild card Week#19: 4 games: Divisional Playoff Week#20: 2 games: Conference Championship Week#21: 1 game: Super Bowl Copyright 2017 Dr. Ash Pahwa 58
59 Data Tables Arm Chair Data 26 Tables Arm Chair Game Data Play Data Team Data External Source Stadium Data (Wikipedia) City Coordinates (Wikipedia) City GDP (Wikipedia + US Govt.) Copyright 2017 Dr. Ash Pahwa 59
60 Sonny Moore s Computer Rating Copyright 2017 Dr. Ash Pahwa 60
61 USA Today: Jeff Sagarin Copyright 2017 Dr. Ash Pahwa 61
62 NFL Prediction Model Result Copyright 2017 Dr. Ash Pahwa 62
63 Comparison of Errors Sonny Moore & Jeff Sagarin Copyright 2017 Dr. Ash Pahwa 63
64 Super Bowl 2016 Prediction by Nate Silver Copyright 2017 Dr. Ash Pahwa 64
65 Super Bowl 2016 Prediction by A+ Copyright 2017 Dr. Ash Pahwa 65
66 Super Bowl 2016 All Predicted Models were Wrong Panthers Broncos Nate Silver 59% 41% A+ 56.5% 43.5% Copyright 2017 Dr. Ash Pahwa 66
67 2017 NFL Prediction Wild Card : Playoff Predictions for all the 4 games were 100% correct Copyright 2017 Dr. Ash Pahwa 67
68 2017 NFL Prediction Divisional Title: Playoff Predictions 2 correct 2 incorrect 50% correct Copyright 2017 Dr. Ash Pahwa 68
69 2017 NFL Prediction Conference Championship: Playoff Atlanta Falcons vs Green Bay Packers Atlanta Falcons 61% New England Patriots vs Pittsburgh Steelers New England Patriots 70% Copyright 2017 Dr. Ash Pahwa 69
70 2017 NFL Prediction Super Bowl New England Patriots vs Atlanta Falcons New England Patriots 60% Copyright 2017 Dr. Ash Pahwa 70
71 Summary Sports Sports Analytics Applications of Sports Analytics Sports Analytics Literature Data Sources Sports Predictive Models Regression Model Multi Variable Regression with Lasso NFL Prediction Model Prediction for Super Bowl 2016 Prediction for NFL 2017 Playoffs Copyright 2017 Dr. Ash Pahwa 71
72 UCSD Extension Courses Winter 2017 Copyright 20172Dr. Ash Pahwa
73 UCSD Extension courses Winter 2017 Winter 2017 Dates Online Sports Predicted Analytics (7 weeks) January 30 March 19, 2017 Copyright 20173Dr. Ash Pahwa
74 Course UCSD Sports Predictive Analytics Content L# Date Subject 1 01/30/17 Introduction to Sports Analytics 1.1 What is Sports Analytics? 1.2 Tools for Sports Analytics 1.3 Science of Learning from Data 1.4 Basic Stat : Data Types + Histograms + Std Dev 2 02/06/17 Statistical Methods Applied on Sports Data 2.1 Normal Distribution 2.2 Correlation 2.3 Rank and Partial Correlation 3 02/13/17 Central Limit Theorem + Hypothesis Testing 3.1 CLT + Parameter Estimation 3.2 Hypothesis Testing (*) 4 02/20/17 Inference Stat: Chi-Sq+ANOVA+Model Selec 4.1 Chi-Square (*) 4.2 ANOVA 4.3 Stat Model Selection (*) 5 02/27/17 Ratings and Rankings + Rank Aggregation 5.1 Ratings and Rankings (Elo System) 5.2 Rank Aggregation (Borda Voting) 6 03/06/17 Prediction Using Regression Model 6.1 Introduction to Regression 6.2 Regression 2 Variables 6.3 Regression Multi Variables 7 03/13/17 Prediction Using Logistic Regression Model 7.1 Logistic Regression Mathematics 7.2 Logistic Regression Mechanics 7.3 Multivariable Logistic Regression Copyright 2017 Dr. Ash Pahwa 74
MONEYBALL. The Power of Sports Analytics The Analytics Edge
MONEYBALL The Power of Sports Analytics 15.071 The Analytics Edge The Story Moneyball tells the story of the Oakland A s in 2002 One of the poorest teams in baseball New ownership and budget cuts in 1995
More informationBuilding an NFL performance metric
Building an NFL performance metric Seonghyun Paik (spaik1@stanford.edu) December 16, 2016 I. Introduction In current pro sports, many statistical methods are applied to evaluate player s performance and
More informationISDS 4141 Sample Data Mining Work. Tool Used: SAS Enterprise Guide
ISDS 4141 Sample Data Mining Work Taylor C. Veillon Tool Used: SAS Enterprise Guide You may have seen the movie, Moneyball, about the Oakland A s baseball team and general manager, Billy Beane, who focused
More informationASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era. Gary Evans Stat 201B Winter, 2010
ASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era by Gary Evans Stat 201B Winter, 2010 Introduction: After a playerʼs strike in 1994 which resulted
More informationPitching Performance and Age
Pitching Performance and Age By: Jaime Craig, Avery Heilbron, Kasey Kirschner, Luke Rector, Will Kunin Introduction April 13, 2016 Many of the oldest players and players with the most longevity of the
More informationRunning head: DATA ANALYSIS AND INTERPRETATION 1
Running head: DATA ANALYSIS AND INTERPRETATION 1 Data Analysis and Interpretation Final Project Vernon Tilly Jr. University of Central Oklahoma DATA ANALYSIS AND INTERPRETATION 2 Owners of the various
More informationPitching Performance and Age
Pitching Performance and Age Jaime Craig, Avery Heilbron, Kasey Kirschner, Luke Rector and Will Kunin Introduction April 13, 2016 Many of the oldest and most long- term players of the game are pitchers.
More informationBilly Beane s Three Fundamental Insights on Baseball and Investing
Billy Beane s Three Fundamental Insights on Baseball and Investing September 10, 2018 by Marianne Brunet How did Billy Beane come up with the moneyball approach to evaluating baseball players? Though Brad
More informationAnnouncements. Lecture 19: Inference for SLR & Transformations. Online quiz 7 - commonly missed questions
Announcements Announcements Lecture 19: Inference for SLR & Statistics 101 Mine Çetinkaya-Rundel April 3, 2012 HW 7 due Thursday. Correlation guessing game - ends on April 12 at noon. Winner will be announced
More informationAdditional On-base Worth 3x Additional Slugging?
Additional On-base Worth 3x Additional Slugging? Mark Pankin SABR 36 July 1, 2006 Seattle, Washington Notes provide additional information and were reminders during the presentation. They are not supposed
More informationa) List and define all assumptions for multiple OLS regression. These are all listed in section 6.5
Prof. C. M. Dalton ECN 209A Spring 2015 Practice Problems (After HW1, HW2, before HW3) CORRECTED VERSION Question 1. Draw and describe a relationship with heteroskedastic errors. Support your claim with
More informationCS 221 PROJECT FINAL
CS 221 PROJECT FINAL STUART SY AND YUSHI HOMMA 1. INTRODUCTION OF TASK ESPN fantasy baseball is a common pastime for many Americans, which, coincidentally, defines a problem whose solution could potentially
More informationPredicting the use of the Sacrifice Bunt in Major League Baseball. Charlie Gallagher Brian Gilbert Neelay Mehta Chao Rao
Predicting the use of the Sacrifice Bunt in Major League Baseball Charlie Gallagher Brian Gilbert Neelay Mehta Chao Rao Understanding the Data Data from the St. Louis Cardinals Sig Mejdal, Senior Quantitative
More informationQuantitative Literacy: Thinking Between the Lines
Quantitative Literacy: Thinking Between the Lines Crauder, Noell, Evans, Johnson Chapter 6: Statistics 2013 W. H. Freeman and Company 1 Chapter 6: Statistics Lesson Plan Data summary and presentation:
More informationGeorge F. Will, Men at Work
Part of baseball s charm is the illusion it offers that all aspects of it can be completely reduced to numerical expressions and printed in agate type in the sport section. George F. Will, Men at Work
More informationDescriptive Statistics Project Is there a home field advantage in major league baseball?
Descriptive Statistics Project Is there a home field advantage in major league baseball? DUE at the start of class on date posted on website (in the first 5 minutes of class) There may be other due dates
More informationy ) s x x )(y i (x i r = 1 n 1 s y Statistics Lecture 7 Exploring Data , y 2 ,y n (x 1 ),,(x n ),(x 2 ,y 1 How two variables vary together
Statistics 111 - Lecture 7 Exploring Data Numerical Summaries for Relationships between Variables Administrative Notes Homework 1 due in recitation: Friday, Feb. 5 Homework 2 now posted on course website:
More informationOld Age and Treachery vs. Youth and Skill: An Analysis of the Mean Age of World Series Teams
ABSTRACT SESUG Paper BB-67-2017 Old Age and Treachery vs. Youth and Skill: An Analysis of the Mean Age of World Series Teams Joe DeMaio, Kennesaw State University Every October, baseball fans discuss and
More informationStats in Algebra, Oh My!
Stats in Algebra, Oh My! The Curtis Center s Mathematics and Teaching Conference March 7, 2015 Kyle Atkin Kern High School District kyle_atkin@kernhigh.org Standards for Mathematical Practice 1. Make sense
More informationMachine Learning an American Pastime
Nikhil Bhargava, Andy Fang, Peter Tseng CS 229 Paper Machine Learning an American Pastime I. Introduction Baseball has been a popular American sport that has steadily gained worldwide appreciation in the
More informationA Novel Approach to Predicting the Results of NBA Matches
A Novel Approach to Predicting the Results of NBA Matches Omid Aryan Stanford University aryano@stanford.edu Ali Reza Sharafat Stanford University sharafat@stanford.edu Abstract The current paper presents
More informationChapter 12 Practice Test
Chapter 12 Practice Test 1. Which of the following is not one of the conditions that must be satisfied in order to perform inference about the slope of a least-squares regression line? (a) For each value
More informationNavigate to the golf data folder and make it your working directory. Load the data by typing
Golf Analysis 1.1 Introduction In a round, golfers have a number of choices to make. For a particular shot, is it better to use the longest club available to try to reach the green, or would it be better
More informationLesson 3 Pre-Visit Teams & Players by the Numbers
Lesson 3 Pre-Visit Teams & Players by the Numbers Objective: Students will be able to: Review how to find the mean, median and mode of a data set. Calculate the standard deviation of a data set. Evaluate
More informationEffect of homegrown players on professional sports teams
Effect of homegrown players on professional sports teams ISYE 2028 Rahul Patel 902949215 Problem Description: Football is commonly referred to as America s favorite pastime. However, for thousands of people
More information1: MONEYBALL S ECTION ECTION 1: AP STATISTICS ASSIGNMENT: NAME: 1. In 1991, what was the total payroll for:
S ECTION ECTION 1: NAME: AP STATISTICS ASSIGNMENT: 1: MONEYBALL 1. In 1991, what was the total payroll for: New York Yankees? Oakland Athletics? 2. The three players that the Oakland Athletics lost to
More informationLab 11: Introduction to Linear Regression
Lab 11: Introduction to Linear Regression Batter up The movie Moneyball focuses on the quest for the secret of success in baseball. It follows a low-budget team, the Oakland Athletics, who believed that
More informationSAP Predictive Analysis and the MLB Post Season
SAP Predictive Analysis and the MLB Post Season Since September is drawing to a close and October is rapidly approaching, I decided to hunt down some baseball data and see if we can draw any insights on
More informationPREDICTING the outcomes of sporting events
CS 229 FINAL PROJECT, AUTUMN 2014 1 Predicting National Basketball Association Winners Jasper Lin, Logan Short, and Vishnu Sundaresan Abstract We used National Basketball Associations box scores from 1991-1998
More informationFollow links for Class Use and other Permissions. For more information send to:
COPYRIGHT NOTICE: Wayne L. Winston: Mathletics is published by Princeton University Press and copyrighted, 2009, by Princeton University Press. All rights reserved. No part of this book may be reproduced
More information(Under the Direction of Cheolwoo Park) ABSTRACT. Major League Baseball is a sport complete with a multitude of statistics to evaluate a player s
PENALIZED REGRESSION MODELS FOR MAJOR LEAGUE BASEBALL METRICS by MUSHIMIE LONA PANDA (Under the Direction of Cheolwoo Park) ABSTRACT Major League Baseball is a sport complete with a multitude of statistics
More informationAggPro: The Aggregate Projection System
Gore, Snapp and Highley AggPro: The Aggregate Projection System 1 AggPro: The Aggregate Projection System Ross J. Gore, Cameron T. Snapp and Timothy Highley Abstract Currently there exist many different
More informationEstimating the Probability of Winning an NFL Game Using Random Forests
Estimating the Probability of Winning an NFL Game Using Random Forests Dale Zimmerman February 17, 2017 2 Brian Burke s NFL win probability metric May be found at www.advancednflstats.com, but the site
More informationDo Clutch Hitters Exist?
Do Clutch Hitters Exist? David Grabiner SABRBoston Presents Sabermetrics May 20, 2006 http://remarque.org/~grabiner/bosclutch.pdf (Includes some slides skipped in the original presentation) 1 Two possible
More informationAn Analysis of the Effects of Long-Term Contracts on Performance in Major League Baseball
An Analysis of the Effects of Long-Term Contracts on Performance in Major League Baseball Zachary Taylor 1 Haverford College Department of Economics Advisor: Dave Owens Spring 2016 Abstract: This study
More informationSection I: Multiple Choice Select the best answer for each problem.
Inference for Linear Regression Review Section I: Multiple Choice Select the best answer for each problem. 1. Which of the following is NOT one of the conditions that must be satisfied in order to perform
More informationOne could argue that the United States is sports driven. Many cities are passionate and
Hoque 1 LITERATURE REVIEW ADITYA HOQUE INTRODUCTION One could argue that the United States is sports driven. Many cities are passionate and centered around their sports teams. Sports are also financially
More information1. OVERVIEW OF METHOD
1. OVERVIEW OF METHOD The method used to compute tennis rankings for Iowa girls high school tennis http://ighs-tennis.com/ is based on the Elo rating system (section 1.1) as adopted by the World Chess
More informationPredicting Season-Long Baseball Statistics. By: Brandon Liu and Bryan McLellan
Stanford CS 221 Predicting Season-Long Baseball Statistics By: Brandon Liu and Bryan McLellan Task Definition Though handwritten baseball scorecards have become obsolete, baseball is at its core a statistical
More informationNBA TEAM SYNERGY RESEARCH REPORT 1
NBA TEAM SYNERGY RESEARCH REPORT 1 NBA Team Synergy and Style of Play Analysis Karrie Lopshire, Michael Avendano, Amy Lee Wang University of California Los Angeles June 3, 2016 NBA TEAM SYNERGY RESEARCH
More informationRelative Value of On-Base Pct. and Slugging Avg.
Relative Value of On-Base Pct. and Slugging Avg. Mark Pankin SABR 34 July 16, 2004 Cincinnati, OH Notes provide additional information and were reminders during the presentation. They are not supposed
More informationPredicting the Total Number of Points Scored in NFL Games
Predicting the Total Number of Points Scored in NFL Games Max Flores (mflores7@stanford.edu), Ajay Sohmshetty (ajay14@stanford.edu) CS 229 Fall 2014 1 Introduction Predicting the outcome of National Football
More informationPsychology - Mr. Callaway/Mundy s Mill HS Unit Research Methods - Statistics
Psychology - Mr. Callaway/Mundy s Mill HS Unit 2.3 - Research Methods - Statistics How do psychologists ask & answer questions? Last time we asked that we were discussing Research Methods. This time we
More informationCS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan
CS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan Scenario 1: Team 1 scored 200 runs from their 50 overs, and then Team 2 reaches 146 for the loss of two wickets from their
More information1988 Graphics Section Poster Session. Displaying Analysis of Baseball Salaries
1988 Graphics Section Poster Session Displaying Analysis of Baseball Salaries The Statistical Graphics Section of the American Statistical Association is sponsoring a special poster session titled "Why
More informationPredicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007
Predicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007 Group 6 Charles Gallagher Brian Gilbert Neelay Mehta Chao Rao Executive Summary Background When a runner is on-base
More informationChapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data Section 2.1 The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 2 Modeling Distributions of Data 2.1 2.2 Normal Distributions Section
More informationAverage Runs per inning,
Home Team Scoring Advantage in the First Inning Largely Due to Time By David W. Smith Presented June 26, 2015 SABR45, Chicago, Illinois Throughout baseball history, the home team has scored significantly
More informationDeconstructing Data Science
Deconstructing Data Science David Bamman, UC Berkele Info 29 Lecture 4: Regression overview Jan 26, 217 Regression A mapping from input data (drawn from instance space ) to a point in R (R = the set of
More informationStatistical Analysis of PGA Tour Skill Rankings USGA Research and Test Center June 1, 2007
Statistical Analysis of PGA Tour Skill Rankings 198-26 USGA Research and Test Center June 1, 27 1. Introduction The PGA Tour has recorded and published Tour Player performance statistics since 198. All
More information1. Answer this student s question: Is a random sample of 5% of the students at my school large enough, or should I use 10%?
Econ 57 Gary Smith Fall 2011 Final Examination (150 minutes) No calculators allowed. Just set up your answers, for example, P = 49/52. BE SURE TO EXPLAIN YOUR REASONING. If you want extra time, you can
More informationThe factors affecting team performance in the NFL: does off-field conduct matter? Abstract
The factors affecting team performance in the NFL: does off-field conduct matter? Anthony Stair Frostburg State University Daniel Mizak Frostburg State University April Day Frostburg State University John
More informationMinimal influence of wind and tidal height on underwater noise in Haro Strait
Minimal influence of wind and tidal height on underwater noise in Haro Strait Introduction Scott Veirs, Beam Reach Val Veirs, Colorado College December 2, 2007 Assessing the effect of wind and currents
More informationEvaluating and Classifying NBA Free Agents
Evaluating and Classifying NBA Free Agents Shanwei Yan In this project, I applied machine learning techniques to perform multiclass classification on free agents by using game statistics, which is useful
More informationThis page intentionally left blank
PART III BASKETBALL This page intentionally left blank 28 BASKETBALL STATISTICS 101 The Four- Factor Model For each player and team NBA box scores track the following information: two- point field goals
More informationDriv e accu racy. Green s in regul ation
LEARNING ACTIVITIES FOR PART II COMPILED Statistical and Measurement Concepts We are providing a database from selected characteristics of golfers on the PGA Tour. Data are for 3 of the players, based
More informationMay the best (statistically chosen) team win! Danielle Pope
May the best (statistically chosen) team win! Danielle Pope The Burning Question: What does the Pythagorean Expectation tell us, and how can the Pythagorean Expectation be improved? Pythagorean Expectation
More informationthe 54th Annual Conference of the Association of Collegiate School of Planning (ACSP) in Philadelphia, Pennsylvania November 2 nd, 2014
the 54th Annual Conference of the Association of Collegiate School of Planning (ACSP) in Philadelphia, Pennsylvania November 2 nd, 2014 Hiroyuki Iseki, Ph.D. Assistant Professor Urban Studies and Planning
More informationLegendre et al Appendices and Supplements, p. 1
Legendre et al. 2010 Appendices and Supplements, p. 1 Appendices and Supplement to: Legendre, P., M. De Cáceres, and D. Borcard. 2010. Community surveys through space and time: testing the space-time interaction
More informationTwo Machine Learning Approaches to Understand the NBA Data
Two Machine Learning Approaches to Understand the NBA Data Panagiotis Lolas December 14, 2017 1 Introduction In this project, I consider applications of machine learning in the analysis of nba data. To
More informationPredicting NBA Shots
Predicting NBA Shots Brett Meehan Stanford University https://github.com/brettmeehan/cs229 Final Project bmeehan2@stanford.edu Abstract This paper examines the application of various machine learning algorithms
More informationDistancei = BrandAi + 2 BrandBi + 3 BrandCi + i
. Suppose that the United States Golf Associate (USGA) wants to compare the mean distances traveled by four brands of golf balls when struck by a driver. A completely randomized design is employed with
More informationGain the Advantage. Build a Winning Team. Sports
Gain the Advantage. Build a Winning Team. Sports Talent scouts and team managers have traditionally based their drafting selections on seemingly obvious and common-sense criteria: observable skills and
More informationNFL SCHEDULE SAMPLE. Green Bay
Scott Kellen NFL SCHEDULE SAMPLE Thursday game Monday night Green Bay DATE HA OPPONENT 7 9/11/2016 A Jacksonville Jaguars 7.33 9/18/2016 7 A Minnesota Vikings 9.55 9/25/2016 7 H Detroit Lions 7.24 BYE
More informationBivariate Data. Frequency Table Line Plot Box and Whisker Plot
U04 D02 Univariate Data Frequency Table Line Plot Box and Whisker Plot Univariate Data Bivariate Data involving a single variable does not deal with causes or relationships the major purpose of univariate
More informationB. AA228/CS238 Component
Abstract Two supervised learning methods, one employing logistic classification and another employing an artificial neural network, are used to predict the outcome of baseball postseason series, given
More informationWarm-up. Make a bar graph to display these data. What additional information do you need to make a pie chart?
Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group were accidents, 12,958; homicide, 5,793; suicide,
More informationData Set 7: Bioerosion by Parrotfish Background volume of bites The question:
Data Set 7: Bioerosion by Parrotfish Background Bioerosion of coral reefs results from animals taking bites out of the calcium-carbonate skeleton of the reef. Parrotfishes are major bioerosion agents,
More informationNovel empirical correlations for estimation of bubble point pressure, saturated viscosity and gas solubility of crude oils
86 Pet.Sci.(29)6:86-9 DOI 1.17/s12182-9-16-x Novel empirical correlations for estimation of bubble point pressure, saturated viscosity and gas solubility of crude oils Ehsan Khamehchi 1, Fariborz Rashidi
More informationMeasuring Batting Performance
Measuring Batting Performance Authors: Samantha Attar, Hannah Dineen, Andy Fullerton, Nora Hanson, Cam Kelso, Katie McLaughlin, and Caitlyn Nolan Introduction: The following analysis compares slugging
More informationMath SL Internal Assessment What is the relationship between free throw shooting percentage and 3 point shooting percentages?
Math SL Internal Assessment What is the relationship between free throw shooting percentage and 3 point shooting percentages? fts6 Introduction : Basketball is a sport where the players have to be adept
More informationBASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG
BASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG GOAL OF PROJECT The goal is to predict the winners between college men s basketball teams competing in the 2018 (NCAA) s March
More informationConfidence Interval Notes Calculating Confidence Intervals
Confidence Interval Notes Calculating Confidence Intervals Calculating One-Population Mean Confidence Intervals for Quantitative Data It is always best to use a computer program to make these calculations,
More informationThe probability of winning a high school football game.
Columbus State University CSU epress Faculty Bibliography 2008 The probability of winning a high school football game. Jennifer Brown Follow this and additional works at: http://csuepress.columbusstate.edu/bibliography_faculty
More informationThe pth percentile of a distribution is the value with p percent of the observations less than it.
Describing Location in a Distribution (2.1) Measuring Position: Percentiles One way to describe the location of a value in a distribution is to tell what percent of observations are less than it. De#inition:
More informationDeconstructing Data Science
Deconstructing Data Science David Bamman, UC Berkele Info 29 Lecture 4: Regression overview Feb 1, 216 Regression A mapping from input data (drawn from instance space ) to a point in R (R = the set of
More informationThe Intrinsic Value of a Batted Ball Technical Details
The Intrinsic Value of a Batted Ball Technical Details Glenn Healey, EECS Department University of California, Irvine, CA 9617 Given a set of observed batted balls and their outcomes, we develop a method
More informationNFL SCHEDULE SAMPLE. Green Bay
NFL SCHEDULE SAMPLE Thursday game Monday night Green Bay DATE HA OPPONENT 7 9/11/2016 A Jacksonville Jaguars 7.33 9/18/2016 7 A Minnesota Vikings 9.55 9/25/2016 7 H Detroit Lions 7.24 BYE 10/9/2016 14
More informationBABE: THE SULTAN OF PITCHING STATS? by. August 2010 MIDDLEBURY COLLEGE ECONOMICS DISCUSSION PAPER NO
BABE: THE SULTAN OF PITCHING STATS? by Matthew H. LoRusso Paul M. Sommers August 2010 MIDDLEBURY COLLEGE ECONOMICS DISCUSSION PAPER NO. 10-30 DEPARTMENT OF ECONOMICS MIDDLEBURY COLLEGE MIDDLEBURY, VERMONT
More informationRegression Analysis of Success in Major League Baseball
University of South Carolina Scholar Commons Senior Theses Honors College Spring 5-5-2016 Regression Analysis of Success in Major League Baseball Johnathon Tyler Clark University of South Carolina - Columbia
More informationPLAYOFF RACES HEATING UP AS NFL SEASON ROLLS ON
FOR IMMEDIATE RELEASE 11/13/12 http://twitter.com/nfl345 PLAYOFF RACES HEATING UP AS NFL SEASON ROLLS ON The NFL has entered the second half of the season and the excitement is building as playoff races
More information2014 Tulane Baseball Arbitration Competition Josh Reddick v. Oakland Athletics (MLB)
2014 Tulane Baseball Arbitration Competition Josh Reddick v. Oakland Athletics (MLB) Submission on Behalf of the Oakland Athletics Team 15 Table of Contents I. INTRODUCTION AND REQUEST FOR HEARING DECISION...
More informationModeling Fantasy Football Quarterbacks
Augustana College Augustana Digital Commons Celebration of Learning Modeling Fantasy Football Quarterbacks Kyle Zeberlein Augustana College, Rock Island Illinois Myles Wallin Augustana College, Rock Island
More informationMidterm Exam 1, section 2. Thursday, September hour, 15 minutes
San Francisco State University Michael Bar ECON 312 Fall 2018 Midterm Exam 1, section 2 Thursday, September 27 1 hour, 15 minutes Name: Instructions 1. This is closed book, closed notes exam. 2. You can
More informationNUMB3RS Activity: Is It for Real? Episode: Hardball
Teacher Page 1 NUMB3RS Activity: Is It for Real? Topic: Data analysis Grade Level: 9-10 Objective: Use formulas to generate data points. Produce line graphs of which inferences are made. Time: 20 minutes
More informationAnalysis of Variance. Copyright 2014 Pearson Education, Inc.
Analysis of Variance 12-1 Learning Outcomes Outcome 1. Understand the basic logic of analysis of variance. Outcome 2. Perform a hypothesis test for a single-factor design using analysis of variance manually
More informationThe Reliability of Intrinsic Batted Ball Statistics Appendix
The Reliability of ntrinsic Batted Ball Statistics Appendix Glenn Healey, EECS Department University of California, rvine, CA 92617 Given information about batted balls for a set of players, we review
More informationWhich On-Base Percentage Shows. the Highest True Ability of a. Baseball Player?
Which On-Base Percentage Shows the Highest True Ability of a Baseball Player? January 31, 2018 Abstract This paper looks at the true on-base ability of a baseball player given their on-base percentage.
More informationEffects of Incentives: Evidence from Major League Baseball. Guy Stevens April 27, 2013
Effects of Incentives: Evidence from Major League Baseball Guy Stevens April 27, 2013 1 Contents 1 Introduction 2 2 Data 3 3 Models and Results 4 3.1 Total Offense................................... 4
More informationSpecial Topics: Data Science
Special Topics: Data Science L Linear Methods for Prediction Dr. Vidhyasaharan Sethu School of Electrical Engineering & Telecommunications University of New South Wales Sydney, Australia V. Sethu 1 Topics
More informationWhen Should Bonds be Walked Intentionally?
When Should Bonds be Walked Intentionally? Mark Pankin SABR 33 July 10, 2003 Denver, CO Notes provide additional information and were reminders to me for making the presentation. They are not supposed
More informationStats 2002: Probabilities for Wins and Losses of Online Gambling
Abstract: Jennifer Mateja Andrea Scisinger Lindsay Lacher Stats 2002: Probabilities for Wins and Losses of Online Gambling The objective of this experiment is to determine whether online gambling is a
More informationAnalysis of Curling Team Strategy and Tactics using Curling Informatics
Hiromu Otani 1, Fumito Masui 1, Kohsuke Hirata 1, Hitoshi Yanagi 2,3 and Michal Ptaszynski 1 1 Department of Computer Science, Kitami Institute of Technology, 165, Kouen-cho, Kitami, Japan 2 Common Course,
More informationEvaluation of Regression Approaches for Predicting Yellow Perch (Perca flavescens) Recreational Harvest in Ohio Waters of Lake Erie
Evaluation of Regression Approaches for Predicting Yellow Perch (Perca flavescens) Recreational Harvest in Ohio Waters of Lake Erie QFC Technical Report T2010-01 Prepared for: Ohio Department of Natural
More informationIs lung capacity affected by smoking, sport, height or gender. Table of contents
Sample project This Maths Studies project has been graded by a moderator. As you read through it, you will see comments from the moderator in boxes like this: At the end of the sample project is a summary
More informationANALYSIS OF SALARY FOR MAJOR LEAGUE BASEBALL PLAYERS
ANALYSIS OF SALARY FOR MAJOR LEAGUE BASEBALL PLAYERS A Thesis Submitted to the Graduate Faculty of the North Dakota State University of Agriculture and Applied Science By Michael Glenn Hoffman In Partial
More informationAnnouncements. % College graduate vs. % Hispanic in LA. % College educated vs. % Hispanic in LA. Problem Set 10 Due Wednesday.
Announcements Announcements UNIT 7: MULTIPLE LINEAR REGRESSION LECTURE 1: INTRODUCTION TO MLR STATISTICS 101 Problem Set 10 Due Wednesday Nicole Dalzell June 15, 2015 Statistics 101 (Nicole Dalzell) U7
More informationJournal of Quantitative Analysis in Sports Manuscript 1039
An Article Submitted to Journal of Quantitative Analysis in Sports Manuscript 1039 A Simple and Flexible Rating Method for Predicting Success in the NCAA Basketball Tournament Brady T. West University
More informationHow Do Injuries in the NFL Affect the Outcome of the Game
How Do Injuries in the NFL Affect the Outcome of the Game Andy Sun STATS 50 2014 Motivation NFL injury surveillance shows upward trend in both the number and severity of injuries Packers won 2010 Super
More information1 Hypothesis Testing for Comparing Population Parameters
Hypothesis Testing for Comparing Population Parameters Hypothesis testing can address many di erent types of questions. We are not restricted to looking at the estimated value of a single population parameter.
More information