Sports Predictive Analytics: NFL Prediction Model

Similar documents
MONEYBALL. The Power of Sports Analytics The Analytics Edge

Building an NFL performance metric

ISDS 4141 Sample Data Mining Work. Tool Used: SAS Enterprise Guide

ASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era. Gary Evans Stat 201B Winter, 2010

Pitching Performance and Age

Running head: DATA ANALYSIS AND INTERPRETATION 1

Pitching Performance and Age

Billy Beane s Three Fundamental Insights on Baseball and Investing

Announcements. Lecture 19: Inference for SLR & Transformations. Online quiz 7 - commonly missed questions

Additional On-base Worth 3x Additional Slugging?

a) List and define all assumptions for multiple OLS regression. These are all listed in section 6.5

CS 221 PROJECT FINAL

Predicting the use of the Sacrifice Bunt in Major League Baseball. Charlie Gallagher Brian Gilbert Neelay Mehta Chao Rao

Quantitative Literacy: Thinking Between the Lines

George F. Will, Men at Work

Descriptive Statistics Project Is there a home field advantage in major league baseball?

y ) s x x )(y i (x i r = 1 n 1 s y Statistics Lecture 7 Exploring Data , y 2 ,y n (x 1 ),,(x n ),(x 2 ,y 1 How two variables vary together

Old Age and Treachery vs. Youth and Skill: An Analysis of the Mean Age of World Series Teams

Stats in Algebra, Oh My!

Machine Learning an American Pastime

A Novel Approach to Predicting the Results of NBA Matches

Chapter 12 Practice Test

Navigate to the golf data folder and make it your working directory. Load the data by typing

Lesson 3 Pre-Visit Teams & Players by the Numbers

Effect of homegrown players on professional sports teams

1: MONEYBALL S ECTION ECTION 1: AP STATISTICS ASSIGNMENT: NAME: 1. In 1991, what was the total payroll for:

Lab 11: Introduction to Linear Regression

SAP Predictive Analysis and the MLB Post Season

PREDICTING the outcomes of sporting events

Follow links for Class Use and other Permissions. For more information send to:

(Under the Direction of Cheolwoo Park) ABSTRACT. Major League Baseball is a sport complete with a multitude of statistics to evaluate a player s

AggPro: The Aggregate Projection System

Estimating the Probability of Winning an NFL Game Using Random Forests

Do Clutch Hitters Exist?

An Analysis of the Effects of Long-Term Contracts on Performance in Major League Baseball

Section I: Multiple Choice Select the best answer for each problem.

One could argue that the United States is sports driven. Many cities are passionate and

1. OVERVIEW OF METHOD

Predicting Season-Long Baseball Statistics. By: Brandon Liu and Bryan McLellan

NBA TEAM SYNERGY RESEARCH REPORT 1

Relative Value of On-Base Pct. and Slugging Avg.

Predicting the Total Number of Points Scored in NFL Games

Psychology - Mr. Callaway/Mundy s Mill HS Unit Research Methods - Statistics

CS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan

1988 Graphics Section Poster Session. Displaying Analysis of Baseball Salaries

Predicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007

Chapter 2: Modeling Distributions of Data

Average Runs per inning,

Deconstructing Data Science

Statistical Analysis of PGA Tour Skill Rankings USGA Research and Test Center June 1, 2007

1. Answer this student s question: Is a random sample of 5% of the students at my school large enough, or should I use 10%?

The factors affecting team performance in the NFL: does off-field conduct matter? Abstract

Minimal influence of wind and tidal height on underwater noise in Haro Strait

Evaluating and Classifying NBA Free Agents

This page intentionally left blank

Driv e accu racy. Green s in regul ation

May the best (statistically chosen) team win! Danielle Pope

the 54th Annual Conference of the Association of Collegiate School of Planning (ACSP) in Philadelphia, Pennsylvania November 2 nd, 2014

Legendre et al Appendices and Supplements, p. 1

Two Machine Learning Approaches to Understand the NBA Data

Predicting NBA Shots

Distancei = BrandAi + 2 BrandBi + 3 BrandCi + i

Gain the Advantage. Build a Winning Team. Sports

NFL SCHEDULE SAMPLE. Green Bay

Bivariate Data. Frequency Table Line Plot Box and Whisker Plot

B. AA228/CS238 Component

Warm-up. Make a bar graph to display these data. What additional information do you need to make a pie chart?

Data Set 7: Bioerosion by Parrotfish Background volume of bites The question:

Novel empirical correlations for estimation of bubble point pressure, saturated viscosity and gas solubility of crude oils

Measuring Batting Performance

Math SL Internal Assessment What is the relationship between free throw shooting percentage and 3 point shooting percentages?

BASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG

Confidence Interval Notes Calculating Confidence Intervals

The probability of winning a high school football game.

The pth percentile of a distribution is the value with p percent of the observations less than it.

Deconstructing Data Science

The Intrinsic Value of a Batted Ball Technical Details

NFL SCHEDULE SAMPLE. Green Bay

BABE: THE SULTAN OF PITCHING STATS? by. August 2010 MIDDLEBURY COLLEGE ECONOMICS DISCUSSION PAPER NO

Regression Analysis of Success in Major League Baseball

PLAYOFF RACES HEATING UP AS NFL SEASON ROLLS ON

2014 Tulane Baseball Arbitration Competition Josh Reddick v. Oakland Athletics (MLB)

Modeling Fantasy Football Quarterbacks

Midterm Exam 1, section 2. Thursday, September hour, 15 minutes

NUMB3RS Activity: Is It for Real? Episode: Hardball

Analysis of Variance. Copyright 2014 Pearson Education, Inc.

The Reliability of Intrinsic Batted Ball Statistics Appendix

Which On-Base Percentage Shows. the Highest True Ability of a. Baseball Player?

Effects of Incentives: Evidence from Major League Baseball. Guy Stevens April 27, 2013

Special Topics: Data Science

When Should Bonds be Walked Intentionally?

Stats 2002: Probabilities for Wins and Losses of Online Gambling

Analysis of Curling Team Strategy and Tactics using Curling Informatics

Evaluation of Regression Approaches for Predicting Yellow Perch (Perca flavescens) Recreational Harvest in Ohio Waters of Lake Erie

Is lung capacity affected by smoking, sport, height or gender. Table of contents

ANALYSIS OF SALARY FOR MAJOR LEAGUE BASEBALL PLAYERS

Announcements. % College graduate vs. % Hispanic in LA. % College educated vs. % Hispanic in LA. Problem Set 10 Due Wednesday.

Journal of Quantitative Analysis in Sports Manuscript 1039

How Do Injuries in the NFL Affect the Outcome of the Game

1 Hypothesis Testing for Comparing Population Parameters

Transcription:

Sports Predictive Analytics: NFL Prediction Model By Dr. Ash Pahwa IEEE Computer Society San Diego Chapter January 17, 2017 Copyright 2017 Dr. Ash Pahwa 1

Outline Case Studies of Sports Analytics Sports Sports Analytics Applications of Sports Analytics Sports Analytics Literature Data Sources Sports Predictive Models Regression Model Multi Variable Regression with Lasso NFL Prediction Model Prediction for Super Bowl 2016 Prediction for NFL 2017 Playoffs Copyright 2017 Dr. Ash Pahwa 2

San Diego L.A. Chargers Copyright 2017 Dr. Ash Pahwa 3

Case Studies of Sports Analytics? Copyright 2017 Dr. Ash Pahwa 4

What is Sports Analytics? Which Pitcher is Better? 2007 Baseball Jake Peavy San Diego Padres : National League John Lackey L.A. Angles : American League ERA: 2.54 ERA: 3.01 ERA: Earned Run Average. Mean number of runs yielded per 9 innings American league allows Designated Hitter (DH) for pitcher National league does not allow Designated Hitter (DH) for pitcher. Pitcher must bat. Copyright 2017 Dr. Ash Pahwa 5

Which Variable is the Best Predictor of the Winning Percentage? Copyright 2017 Dr. Ash Pahwa 6

Pythagorean Theorem Used in Baseball. Proposed by Bill James Suppose F represents a team s run scored A represents team s runs allowed Pyhtagorean Winning Percentage = F2 F 2 +A 2 It is called Pythagorean theorem because it is similar to the elementary geometry theorem Example: Year 2012: Detroit Tigers Scored Runs = F = 726 Allowed Runs = A = 670 Pyhtagorean Winning Percentage = F2 F 2 +A 2 = 7262 726 2 +670 2 = 527,076 975,976 = 0.54 Total games won = 162*0.54 = 88 games Copyright 2017 Dr. Ash Pahwa 7

How to Convey Information to Decision Makers Data Scientist Understand the Sport Understand the Players Understand Performance data Decision Makers Management Operational Personals Raw Data Selection of Statistical Models Predictive Models Meaningful Results Copyright 2017 Dr. Ash Pahwa 8

Goals of Sports Analytics 1. Apply Statistical Models to Sporting Data 2. Ratings and Rankings 3. Predictive Models 4. Player and Team Assessment Copyright 2017 Dr. Ash Pahwa 9

Statistical Models Predictive Models Indices of Central Tendencies and Variability Histogram (Frequency Distribution) Statistics Used to Examine Relationships Normal Distribution (z-values and p-values) Inferential Statistics Ratings + Rankings Normality Ratings + Rankings Predictive Models Simple Linear Regression Mean Covariance Outlier Rank Aggregation Multiple Linear Regression Median Correlation Pearson Mode Rank Correlation Spearman t-test ANOVA Range, Variance Partial Correlation Chi-Square Standard Deviation Polynomial Regression Logistic Regression Copyright 2017 Dr. Ash Pahwa 10

Ranking of Players and Teams? Copyright 2017 Dr. Ash Pahwa 11

Pair-Wise Comparison Copyright 2017 Dr. Ash Pahwa 12

Pair-Wise Comparison The Social Network Copyright 2017 Dr. Ash Pahwa 13

Pair-Wise Comparison Can be Used for Ranking Sorting the Rating list produces Ranking Copyright 2017 Dr. Ash Pahwa 14

Log 5 Method Developed by Bill James in 1970s Computes the probability that Team A will beat Team B Log 5 formula has nothing to do with the mathematical function Log Copyright 2017 Dr. Ash Pahwa 15

Log 5 Formula p a p a p b p a,b = p a + p b 2 p a p b Suppose Team A true winning percentage is 10 out of 16 games Percentage of true winning = p a = 10/16 = 0.625 Suppose Team B true winning percentage is 7 out of 16 games Percentage of true winning = p b = 7/16 = 0.438 -------------------------------------------------------------------------------------- The probability that Team A will beat Team B p a,b = p a p a p b p a +p b 2 p a p b = 0.625 0.625 0.438 = 0.625 0.274 = 0.351 = 0.681 0.625+0.438 2 0.625 0.438 1.063 2 0.274 0.515 ---------------------------------------------------------------------------------------- The probability that Team B will beat Team A p b,a = p b p a p b p a +p b 2 p a p b = 0.438 0.625 0.438 = 0.438 0.274 = 0.164 = 0.318 0.625+0.438 2 0.625 0.438 1.063 2 0.274 0.515 --------------------------------------------------------------------------------------------------- p a,b + p b,a = 1 Copyright 2017 Dr. Ash Pahwa 16

Arpad Elo Physics Professor at Marquette University Milwaukee, Wisconsin Chess Player Devised a method to rank chess players His method was adopted by US Chess Federation World Chess Federation Copyright 2017 Dr. Ash Pahwa 17

Points Gained or Lost Points gained/lost by player A = Points gained/lost by player B Before the game Player A rating r A Player B rating r B Difference d AB = r A r B After the game Player A rating r A Player B rating r B r A + r B = r A + r B Points gained/lost by player A against player B μ AB = L d AB 400 = 1 1+10 d AB 400 Copyright 2017 Dr. Ash Pahwa 18

Example Before the game Player A rating r A = 2400 Player B rating r B = 2000 Difference d AB = r A r B = 400 Difference d BA = r B r A = 400 If Player A wins Draw If Player B wins S(AB) 1 0.5 0 S(BA) 0 0.5 1 Suppose K = 32 for Chess μ AB = 0.91 μ BA = 0.09 If Player A wins After the game Player A rating r A = r A + K S AB μ AB = 2400 + 32 1 0.91 = 2403 Player B rating r B = r B + K S BA μ BA = 2000 + 32 0 0.09 = 1997 r A + r B = r A + r B 2400 + 2000 = 2403 + 1997 If Player B wins After the game Player A rating r A = r A + K S AB μ AB = 2400 + 32 0 0.91 = 2371 Player B rating r B = r B + K S BA μ BA = 2000 + 32 1 0.09 = 2029 r A + r B = r A + r B 2400 + 2000 = 2371 + 2029 Copyright 2017 Dr. Ash Pahwa 19

The Social Network Elo Formula was Used to pair-wise Comparison of Girls Copyright 2017 Dr. Ash Pahwa 20

Sports Copyright 2017 Dr. Ash Pahwa 21

Sports Inherent part of Human Culture Sports competition dates back to the dawn of our species Greek Olympics : 776 BC Copyright 2017 Dr. Ash Pahwa 22

Sportsman Spirit Virtues of sports fairness self-control courage persistence It has been associated with interpersonal concepts of treating others and being treated fairly Maintaining self-control if dealing with others, and respect for both authority and opponents GOOD SPORTSMEN HAVE ALWAYS BEEN HELD IN HIGH ESTEEM Copyright 2017 Dr. Ash Pahwa 23

Most Popular Sports in the World Copyright 2017 Dr. Ash Pahwa 24

Most Popular Sports in America Copyright 2017 Dr. Ash Pahwa 25

Sports Analytics Copyright 2017 Dr. Ash Pahwa 26

Predictive Models Estimation Regression Classification (win/loss) Logistic Regression Discriminant Analysis Linear Quadratic Support Vector Machine Copyright 2017 Dr. Ash Pahwa 27

Goals of Sports Analytics Player Discovering hidden talent in a new player Player Evaluation Assessing Player Performance Which metrics is most important to assess a players performance Assessing Player Value How much value a player adds to the teams value Copyright 2017 Dr. Ash Pahwa 28

Goals of Sports Analytics Team Ranking top teams Accessing Team Performance How to compute the value of a team Which Team Members are best suited to play against the opposing team Which strategy to use to play against a team? Anticipating Opponents Behavior Accessing the probability of a win in a sporting event Copyright 2017 Dr. Ash Pahwa 29

Need for Prediction Results Betting on an sporting event People betting on sports need to see the prediction results Probability of a win Point Spread Fantasy Sports DraftKings FanDuel Copyright 2017 Dr. Ash Pahwa 30

Applications of Sports Analytics Copyright 2017 Dr. Ash Pahwa 31

Application of Sports PA Movies The film is based on Michael Lewis' 2003 nonfiction book of the same name, an account of the Oakland Athletics baseball team's 2002 season and their general manager Billy Beane's attempts to assemble a competitive team. In the film, Beane (Brad Pitt) and assistant GM Peter Brand (Jonah Hill), faced with the franchise's limited budget for players, build a team of undervalued talent by taking a sophisticated sabermetric approach towards scouting and analyzing players. They acquire "submarine" pitcher Chad Bradford (Casey Bond) and former catcher Scott Hatteberg (Chris Pratt), and win 20 consecutive games, an American League record. Copyright 2017 Dr. Ash Pahwa 32

Team Selection: Moneyball Copyright 2017 Dr. Ash Pahwa 33

Moneyball Billy Beane & Paul DePodesta William Lamar "Billy" Beane III (born March 29, 1962) is an American former professional baseball player and current front office executive. He is the Executive Vice President of Baseball Operations and minority owner of the Oakland Athletics of Major League Baseball (MLB). The character of Brand is an invention by the filmmakers; in the excellent Michael Lewis non-fiction book upon which the movie is based, the real-life Brand is identified as Paul DePodesta. Unlike Brand, DePodesta is slender, fit and handsome. He s also Harvard-educated (not a Yalie screenwriter Aaron Sorkin s private joke). Copyright 2017 Dr. Ash Pahwa 34

Application of Sports PA Boston Red Sox Using Predictive Analytics Strategies Boston Red Sox won 3 world series in Baseball In 2006, Time named Bill James in the Time 100 as one of the most influential people in the world. He is a Senior Advisor on Baseball Operations for the Boston Red Sox. Copyright 2017 Dr. Ash Pahwa 35

Sports Analytics Literature Copyright 2017 Dr. Ash Pahwa 36

Predictive Models for Sports Literature Copyright 2017 Dr. Ash Pahwa 37

Statistical Learning Gareth James Daniela Witten Trevor Hastie Robert Tibshirani Stanford University Copyright 2017 Dr. Ash Pahwa 38

Data Sources Copyright 2017 Dr. Ash Pahwa 39

Data Sources Valid Accurate Complete Contains derived variables Copyright 2017 Dr. Ash Pahwa 40

Data Sources www.nfl.com www.nba.com www.footballoutsiders.com www.pro-football-reference.com www.soccerstats.com www.basketball-reference.com Copyright 2017 Dr. Ash Pahwa 41

Data Source Football www.armchairanalysis.com Copyright 2017 Dr. Ash Pahwa 42

Sports Predictive Models Copyright 2017 Dr. Ash Pahwa 43

Goals of Predictive Analytics Application: Estimation or Classification Estimation Regression modeling technique is used Output is a number House price Product sales for next quarter GNP growth for the next quarter How many points a team will score Classification Logistic Regression Support Vector Machine Discriminant Analysis (Linear, Quadratic) Naïve Bayes, Decision Trees etc. modeling techniques are used Output is a categorical variable Sports team will win or lose Email is junk or not Which grade student will get Tweet is positive or negative Copyright 2017 Dr. Ash Pahwa 44

Common PA Techniques Regression Linear 2 variables Linear multi variables Logistic Polynomial Clustering Decision Trees Neural Networks Naïve Bayes ARIMA A few more Copyright 2017 Dr. Ash Pahwa 45

Regression Model Copyright 2017 Dr. Ash Pahwa 46

History of Linear Regression Sir Francis Galton and Karl Pearson Developed the concepts on Regression and Correlation in 1900-1930 Copyright 2017 Dr. Ash Pahwa 47

Definition: Linear Regression 2 Variable Regression How a response variable y changes y x 1 c As the predictor (explanatory) variable x changes Multiple Regression How a response variable y changes As the predictor (explanatory) variables x1, x2, xn change y x x x... 1 1 2 2 3 3 x n n c Copyright 2017 Dr. Ash Pahwa 48

Single Variable Polynomial Regression First degree to Fifth Degree The 5 th degree polynomial regression looks good for training set data But with test data it may not perform well PROBLEM: Over fit for this data Copyright 2017 Dr. Ash Pahwa 49

Overfitting If there exists a model with estimated parameters w such that Training error (w2) < Training Error (w1) Generalization (Test) error (w2) > Generalization (Test) Error (w1) Copyright 2017 Dr. Ash Pahwa 50

Bias and Variance Goal is to find out a model complexity where Generalization (Validation) errors are least Bias + variance are least Mean Square Error = Bias 2 + Variance Just like Generalization Error We cannot compute Bias and Variance Copyright 2017 Dr. Ash Pahwa 51

1 st Degree Polynomial Fit y = a 1 x + c > #result = lm(y1~x) > result = lm(y1 ~ poly(x,1,raw=true)) > summary(result) Call: lm(formula = y1 ~ poly(x, 1, raw = TRUE)) Residuals: Min 1Q Median 3Q Max -0.9872-0.7277-0.1394 0.7842 1.2692 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 0.1919 0.4285 0.448 0.662 poly(x, 1, raw = TRUE) -0.0492 0.1120-0.439 0.668 Residual standard error: 0.8449 on 12 degrees of freedom Multiple R-squared: 0.01581, Adjusted R-squared: -0.0662 F-statistic: 0.1928 on 1 and 12 DF, p-value: 0.6684 > p = predict(result,list(x=xpredict)) > lines(xpredict,p,col='black',lwd=3) > Copyright 2017 Dr. Ash Pahwa 52

16 th Degree Polynomial Fit y = a 1 x 16 + a 2 x 15 + + a 15 x 2 + a 16 x + c > result = lm(y1 ~ poly(x,16,raw=true)) > summary(result) Call: lm(formula = y1 ~ poly(x, 16, raw = TRUE)) Residuals: ALL 14 residuals are 0: no residual degrees of freedom! Coefficients: (3 not defined because of singularities) Estimate Std. Error t value Pr(> t ) (Intercept) 1.820e-01 NA NA NA poly(x, 16, raw = TRUE)1 1.280e+02 NA NA NA poly(x, 16, raw = TRUE)2-7.656e+02 NA NA NA poly(x, 16, raw = TRUE)3 1.973e+03 NA NA NA poly(x, 16, raw = TRUE)4-2.895e+03 NA NA NA poly(x, 16, raw = TRUE)5 2.675e+03 NA NA NA poly(x, 16, raw = TRUE)6-1.631e+03 NA NA NA poly(x, 16, raw = TRUE)7 6.712e+02 NA NA NA poly(x, 16, raw = TRUE)8-1.869e+02 NA NA NA poly(x, 16, raw = TRUE)9 3.447e+01 NA NA NA poly(x, 16, raw = TRUE)10-3.944e+00 NA NA NA poly(x, 16, raw = TRUE)11 2.282e-01 NA NA NA poly(x, 16, raw = TRUE)12 NA NA NA NA poly(x, 16, raw = TRUE)13-5.901e-04 NA NA NA poly(x, 16, raw = TRUE)14 NA NA NA NA poly(x, 16, raw = TRUE)15 1.468e-06 NA NA NA poly(x, 16, raw = TRUE)16 NA NA NA NA Residual standard error: NaN on 0 degrees of freedom Multiple R-squared: 1, Adjusted R-squared: NaN F-statistic: NaN on 13 and 0 DF, p-value: NA Copyright 2017 Dr. Ash Pahwa 53

Under fit or Over fit Model # Degree of Polynom ial Under fit or Over fit 1 1 Under fit 2 2 Under fit 3 4 Under fit 4 8 OK 5 10 Over fit 6 16 Over fit Copyright 2017 Dr. Ash Pahwa 54

Lasso Regression Least Absolute Shrinkage and Selection Operator Copyright 2017 Dr. Ash Pahwa 55

Cost Function of OLS + Ridge + Lasso OLS Ridge Regression Lasso Regression Copyright 2017 Dr. Ash Pahwa 56

NFL Model Copyright 2017 Dr. Ash Pahwa 57

Data Source Seasons 2000 2016 Weeks 1 21 Week#1 #17: 16 games + 1 bye Week#18: 4 games: Wild card Week#19: 4 games: Divisional Playoff Week#20: 2 games: Conference Championship Week#21: 1 game: Super Bowl Copyright 2017 Dr. Ash Pahwa 58

Data Tables Arm Chair Data 26 Tables Arm Chair Game Data Play Data Team Data External Source Stadium Data (Wikipedia) City Coordinates (Wikipedia) City GDP (Wikipedia + US Govt.) Copyright 2017 Dr. Ash Pahwa 59

Sonny Moore s Computer Rating http://sonnymoorepowerratings.com/archive Copyright 2017 Dr. Ash Pahwa 60

USA Today: Jeff Sagarin http://www.usatoday.com/sports/nfl Copyright 2017 Dr. Ash Pahwa 61

NFL Prediction Model Result Copyright 2017 Dr. Ash Pahwa 62

Comparison of Errors Sonny Moore & Jeff Sagarin Copyright 2017 Dr. Ash Pahwa 63

Super Bowl 2016 Prediction by Nate Silver Copyright 2017 Dr. Ash Pahwa 64

Super Bowl 2016 Prediction by A+ Copyright 2017 Dr. Ash Pahwa 65

Super Bowl 2016 All Predicted Models were Wrong Panthers Broncos Nate Silver 59% 41% A+ 56.5% 43.5% Copyright 2017 Dr. Ash Pahwa 66

2017 NFL Prediction Wild Card : Playoff Predictions for all the 4 games were 100% correct Copyright 2017 Dr. Ash Pahwa 67

2017 NFL Prediction Divisional Title: Playoff Predictions 2 correct 2 incorrect 50% correct www.nflprediction.co Copyright 2017 Dr. Ash Pahwa 68

2017 NFL Prediction Conference Championship: Playoff Atlanta Falcons vs Green Bay Packers Atlanta Falcons 61% New England Patriots vs Pittsburgh Steelers New England Patriots 70% Copyright 2017 Dr. Ash Pahwa 69

2017 NFL Prediction Super Bowl New England Patriots vs Atlanta Falcons New England Patriots 60% Copyright 2017 Dr. Ash Pahwa 70

Summary Sports Sports Analytics Applications of Sports Analytics Sports Analytics Literature Data Sources Sports Predictive Models Regression Model Multi Variable Regression with Lasso NFL Prediction Model Prediction for Super Bowl 2016 Prediction for NFL 2017 Playoffs Copyright 2017 Dr. Ash Pahwa 71

UCSD Extension Courses Winter 2017 Copyright 20172Dr. Ash Pahwa

UCSD Extension courses Winter 2017 Winter 2017 Dates Online Sports Predicted Analytics (7 weeks) January 30 March 19, 2017 Copyright 20173Dr. Ash Pahwa

Course UCSD Sports Predictive Analytics Content L# Date Subject 1 01/30/17 Introduction to Sports Analytics 1.1 What is Sports Analytics? 1.2 Tools for Sports Analytics 1.3 Science of Learning from Data 1.4 Basic Stat : Data Types + Histograms + Std Dev 2 02/06/17 Statistical Methods Applied on Sports Data 2.1 Normal Distribution 2.2 Correlation 2.3 Rank and Partial Correlation 3 02/13/17 Central Limit Theorem + Hypothesis Testing 3.1 CLT + Parameter Estimation 3.2 Hypothesis Testing (*) 4 02/20/17 Inference Stat: Chi-Sq+ANOVA+Model Selec 4.1 Chi-Square (*) 4.2 ANOVA 4.3 Stat Model Selection (*) 5 02/27/17 Ratings and Rankings + Rank Aggregation 5.1 Ratings and Rankings (Elo System) 5.2 Rank Aggregation (Borda Voting) 6 03/06/17 Prediction Using Regression Model 6.1 Introduction to Regression 6.2 Regression 2 Variables 6.3 Regression Multi Variables 7 03/13/17 Prediction Using Logistic Regression Model 7.1 Logistic Regression Mathematics 7.2 Logistic Regression Mechanics 7.3 Multivariable Logistic Regression Copyright 2017 Dr. Ash Pahwa 74