BASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG

Similar documents
PREDICTING THE NCAA BASKETBALL TOURNAMENT WITH MACHINE LEARNING. The Ringer/Getty Images

PREDICTING the outcomes of sporting events

A Novel Approach to Predicting the Results of NBA Matches

Evaluating and Classifying NBA Free Agents

Predicting the NCAA Men s Basketball Tournament with Machine Learning

Basketball s Cinderella Stories: What Makes a Successful Underdog in the NCAA Tournament? Alinna Brown

Predicting Momentum Shifts in NBA Games

CS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan

ANALYSIS OF SIGNIFICANT FACTORS IN DIVISION I MEN S COLLEGE BASKETBALL AND DEVELOPMENT OF A PREDICTIVE MODEL

Using Spatio-Temporal Data To Create A Shot Probability Model

Pairwise Comparison Models: A Two-Tiered Approach to Predicting Wins and Losses for NBA Games

Predicting NBA Shots

B. AA228/CS238 Component

Projecting Three-Point Percentages for the NBA Draft

Name: Date: Hour: MARCH MADNESS PROBABILITY PROJECT

How Do Injuries in the NFL Affect the Outcome of the Game

Department of Economics Working Paper

Predicting Horse Racing Results with TensorFlow

Two Machine Learning Approaches to Understand the NBA Data

NBA Statistics Summary

Predicting the development of the NBA playoffs. How much the regular season tells us about the playoff results.

Predicting the Total Number of Points Scored in NFL Games

Predictive Model for the NCAA Men's Basketball Tournament. An Honors Thesis (HONR 499) Thesis Advisor. Ball State University.

PREDICTING OUTCOMES OF NBA BASKETBALL GAMES

Bayesian Optimized Random Forest for Movement Classification with Smartphones

Estimating the Probability of Winning an NFL Game Using Random Forests

This page intentionally left blank

Modeling the NCAA Tournament Through Bayesian Logistic Regression

Our Shining Moment: Hierarchical Clustering to Determine NCAA Tournament Seeding

Is Tiger Woods Loss Averse? Persistent Bias in the Face of Experience, Competition, and High Stakes. Devin G. Pope and Maurice E.

Predicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007

Predicting Results of March Madness Using the Probability Self-Consistent Method

Football Play Type Prediction and Tendency Analysis

Predictors for Winning in Men s Professional Tennis

NBA Plays. Mitchell Kates. at the. September 2014

POKEMON HACKS. Jodie Ashford Josh Baggott Chloe Barnes Jordan bird

Building an NFL performance metric

BRACKETING THE NCAA WOMEN S BASKETBALL TOURNAMENT

CS145: INTRODUCTION TO DATA MINING

Predicting Tennis Match Outcomes Through Classification Shuyang Fang CS074 - Dartmouth College

Machine Learning an American Pastime

How to Win in the NBA Playoffs: A Statistical Analysis

Improving Bracket Prediction for Single-Elimination Basketball Tournaments via Data Analytics

An Artificial Neural Network-based Prediction Model for Underdog Teams in NBA Matches

Investigation of Winning Factors of Miami Heat in NBA Playoff Season

Navigate to the golf data folder and make it your working directory. Load the data by typing

Environmental Science: An Indian Journal

Player Availability Rating (PAR) - A Tool for Quantifying Skater Performance for NHL General Managers

Smart-Walk: An Intelligent Physiological Monitoring System for Smart Families


Basketball field goal percentage prediction model research and application based on BP neural network

NBA TEAM SYNERGY RESEARCH REPORT 1

Title: 4-Way-Stop Wait-Time Prediction Group members (1): David Held

Predicting Season-Long Baseball Statistics. By: Brandon Liu and Bryan McLellan

Using Multi-Class Classification Methods to Predict Baseball Pitch Types

DATA MINING ON CRICKET DATA SET FOR PREDICTING THE RESULTS. Sushant Murdeshwar

3X3U Official Rules of the Game

Using Computer Vision and Machine Learning to Automatically Classify NFL Game Film and Develop a Player Tracking System

Basketball. Revised 3/4/15 3 on 3 Tournament Rules. Three-on-Three Half Court

International Discrimination in NBA

Journal of Quantitative Analysis in Sports Manuscript 1039

COMPLETING THE RESULTS OF THE 2013 BOSTON MARATHON

PREDICTION OF RENTAL DEMAND FOR A BIKE-SHARE PROGRAM

This file is part of the following reference:

Forecasting Baseball

CS 221 PROJECT FINAL

c 2016 Arash Khatibi

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Package mrchmadness. April 9, 2017

Should bonus points be included in the Six Nations Championship?

SAP Predictive Analysis and the MLB Post Season

Deconstructing Data Science

Deconstructing Data Science

Special Topics: Data Science

IDENTIFYING SUBJECTIVE VALUE IN WOMEN S COLLEGE GOLF RECRUITING REGARDLESS OF SOCIO-ECONOMIC CLASS. Victoria Allred

CS249: ADVANCED DATA MINING

Dynamic Winner Prediction in Twenty20 Cricket: Based on Relative Team Strengths

Fun Neural Net Demo Site. CS 188: Artificial Intelligence. N-Layer Neural Network. Multi-class Softmax Σ >0? Deep Learning II

A Machine Learning Approach to Predicting Winning Patterns in Track Cycling Omnium

Available Online at Vol 5, Issue xx, pp xxx, xxx 2016 RESEARCH ARTICLE

Mobility Detection Using Everyday GSM Traces

E STIMATING KILN SCHEDULES FOR TROPICAL AND TEMPERATE HARDWOODS USING SPECIFIC GRAVITY

Running head: DATA ANALYSIS AND INTERPRETATION 1

CB2K. College Basketball 2000

Basketball data science

Information Visualization in the NBA: The Shot Chart

Predicting Outcome of Indian Premier League (IPL) Matches Using Machine Learning

A Simple Visualization Tool for NBA Statistics

A computer program that improves its performance at some task through experience.

Spatio-temporal analysis of team sports Joachim Gudmundsson

Copyright 2017 by Glenn Daniel Sidle. All Rights Reserved

Ivan Suarez Robles, Joseph Wu 1

BioTechnology. BioTechnology. An Indian Journal FULL PAPER KEYWORDS ABSTRACT. Based on TOPSIS CBA basketball team strength statistical analysis

The Champion UT Austin Villa 2003 Simulator Online Coach Team

Using AI to Correct Play-by-play Substitution Errors

arxiv: v1 [stat.ml] 15 Dec 2017

(8, 9 and Some 10 Year Olds) Shield and Steal Activity Description Coaching Objective. Coach divides the players into two

CAAD CTF 2018 Rules June 21, 2018 Version 1.1

Home Team Advantage in the NBA: The Effect of Fan Attendance on Performance

Simulating Major League Baseball Games

Transcription:

BASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG

GOAL OF PROJECT The goal is to predict the winners between college men s basketball teams competing in the 2018 (NCAA) s March Madness tournament. The creation of these brackets has largely been an informal competition between friends and. family. There has increasingly been a monetary component for this competition, namely a prize for being the one who finishes with the most accurate March Madness bracket. As a result, figuring out the best way to accurately predict the bracket for that year has attracted considerable interest due to the difficulty of correctly predicting the outcome of 64 teams playing a total of 63 games. The most significant prize being Warren Buffet s offer of $1 million dollars a year for the rest of one s life if one is able to perfect predict the outcome of every March Madness game in an year.

PREDICTION INFORMATION In basketball, it has been determined that the future performance of a team will depend on many factors: past performance of team momentum in terms of win streaks team standings external environmental circumstances(travel schedule, home-court advantage, off-season days, etc) Overall, these prediction models attempt to predict a strong team vs a weak team. These give a rough estimate of what teams are likely to appear on the top of the bracket and which are on the lower end.

DATASET INFORMATION We ll be using datasets from Kaggle s Google Cloud & NCAA ML Competition 2018-Men s machine learning competition. These datasets have a range of statistics including the outcomes from past NCAA March Madness tournaments every year since 1984, team seedings, team points scored per game, team standings during regular season, and team. The model is evaluated based on accuracy of how many correct outcomes it can predict for a certain year s NCAA March Madness tournament. Better performing methods should predict the probability of two teams playing each other in the tournament for that year.

STEPS IN PROJECT 1. Decide which variables have the highest predictive power in determining which team is most likely to win a matchup in a given year. Given the wide variety of data provided, deducing these variables can be accomplished using a number of machine learning techniques including decision trees, linear regression, and clustering. 2. Construct a model with these variables such that, when given two teams in a specific year, one will be able to predict which team is most likely to win if they play against each other in the NCAA tournament. 3. Try to predict an NCAA March Madness tournament bracket for a given year using the model we created

Data Collection Obtained datasets from Kaggle Data preprocessing accomplished using Python Variety of statistics calculated for each team that participated in March Madness since 2003 (first year that detailed statistics were collected for teams that season) Total number of home, away, neutral wins Average regular season stats, including three pointers made and attempted, blocks, steals, etc. Point differential

Linear Regression Results

ROUNDS VS HOME WINS Coefficients: [0.76880701] Mean squared error: 148.59 Variance score: -83.16

ROUNDS VS NEUTRAL WINS Coefficients: [0.30054386] Mean squared error: 5.87 Variance score: -2.32

ROUNDS VS AWAY WINS Coefficients: [0.09588975] Mean squared error: 34.73 Variance score: -18.67

ROUNDS VS POINT DIFFERENTIAL Coefficients: [13.24735648] Mean squared error: 265.71 Variance score: -149.51

ROUNDS VS TEAM WINS Coefficients: [1.16524063] Mean squared error: 487.99 Variance score: -275.41

ROUNDS VS FIELD GOALS MADE Coefficients: [0.44359187] Mean squared error: 615.39 Variance score: -347.57

ROUNDS VS FIELD GOALS ATTEMPTED Coefficients: [0.48955215] Mean squared error: 3054.05 Variance score: -1728.90

ROUNDS VS FREE THROWS MADE Coefficients: [[0.03517384]] Mean squared error: 205.68 Variance score: -115.50

ROUNDS VS FREE THROWS ATTEMPTED Coefficients: [[0.02434854]] Mean squared error: 428.21 Variance score: -241.55

ROUNDS VS OFFENSIVE REBOUNDS Coefficients: [[0.23840846]] Mean squared error: 113.75 Variance score: -63.43

ROUNDS VS DEFENSIVE REBOUNDS Coefficients: [[0.19470072]] Mean squared error: 557.56 Variance score: -314.82

ROUNDS VS ASSISTS Coefficients: [[0.28843754]] Mean squared error: 182.25 Variance score: -102.23

ROUNDS VS TURNOVERS Coefficients: [[-0.19625423]] Mean squared error: 145.03 Variance score: -81.15

ROUNDS VS STEALS Coefficients: [[0.10295244]] Mean squared error: 37.52 Variance score: -20.25

ROUNDS VS BLOCKS Coefficients: [[0.21824861]] Mean squared error: 9.75 Variance score: -4.52

ROUNDS VS PERSONAL FOULS Coefficients: [[-0.23720717]] Mean squared error: 288.20 Variance score: -162.24

Binary Classification Label teams based on whether or not they reached the Final Four stage of the March Madness tournament Trained classifiers using regular season statistics for each team Excluded March Madness statistics for each team as there is a known correlation between features like seeding and likelihood of the likelihood of reaching the Final Four Utilized Scikit-Learn implementation of classifier with default parameters neural network (MLPClassifier) decision tree (CART algorithm) (DecisionTreeClassifier) K Nearest Neighbor (KNeighborsClassifier)

Binary Classification Results Classifier Accuracy based off of testing data (2017 season stats) Neural Network: 0.94 (64/68) Decision Tree: 0.93 (63/68) K Nearest Neighbor: 0.96 (65/68)

Future Direction Investigate March Madness games where a lower seeded teams defeats an higher seeded team, leading to an upset Use supervised machine learning techniques like support vector machines (SVM) to see if upset teams have specific advantages in certain stats/features over the teams they upset Apply kmeans clustering to see if upset teams cluster together based on a certain subset of features compared to other teams