CS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan

Similar documents
BASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG

DATA MINING ON CRICKET DATA SET FOR PREDICTING THE RESULTS. Sushant Murdeshwar

Evaluating and Classifying NBA Free Agents

Honest Mirror: Quantitative Assessment of Player Performances in an ODI Cricket Match

A Novel Approach to Predicting the Results of NBA Matches

A Fair Target Score Calculation Method for Reduced-Over One day and T20 International Cricket Matches

PREDICTING the outcomes of sporting events

CS 221 PROJECT FINAL

Predicting Tennis Match Outcomes Through Classification Shuyang Fang CS074 - Dartmouth College

Machine Learning an American Pastime

Building an NFL performance metric

PREDICTING THE NCAA BASKETBALL TOURNAMENT WITH MACHINE LEARNING. The Ringer/Getty Images

Predicting the Total Number of Points Scored in NFL Games

Competitive Performance of Elite Olympic-Distance Triathletes: Reliability and Smallest Worthwhile Enhancement

intended velocity ( u k arm movements

Projecting Three-Point Percentages for the NBA Draft

Cricket umpire assistance and ball tracking system using a single smartphone camera

Smart-Walk: An Intelligent Physiological Monitoring System for Smart Families

Pitching Performance and Age

Marathon Performance Prediction of Amateur Runners based on Training Session Data

Predicting Momentum Shifts in NBA Games

Lecture 5. Optimisation. Regularisation

Predicting Horse Racing Results with TensorFlow

Pitching Performance and Age

Estimating the Probability of Winning an NFL Game Using Random Forests

Driv e accu racy. Green s in regul ation

Title: 4-Way-Stop Wait-Time Prediction Group members (1): David Held

Legendre et al Appendices and Supplements, p. 1

How Effective is Change of Pace Bowling in Cricket?

Playing Conditions 20 overs per side

This file is part of the following reference:

Dynamic Winner Prediction in Twenty20 Cricket: Based on Relative Team Strengths

CS145: INTRODUCTION TO DATA MINING

Chapter 12 Practice Test

Navigate to the golf data folder and make it your working directory. Load the data by typing

Using Spatio-Temporal Data To Create A Shot Probability Model

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 03, 2016 ISSN (online):

Principal component factor analysis-based NBA player comprehensive ability evaluation research

Two Machine Learning Approaches to Understand the NBA Data

Analysis of Run-Off-Road Crashes in Relation to Roadway Features and Driver Behavior

Chapter 5: Methods and Philosophy of Statistical Process Control

COMPLETING THE RESULTS OF THE 2013 BOSTON MARATHON

Pairwise Comparison Models: A Two-Tiered Approach to Predicting Wins and Losses for NBA Games

Open Research Online The Open University s repository of research publications and other research outputs

Basketball data science

DGT Cricket Club Presents

Predicting NBA Shots

Published on 10th July 2018

A team recommendation system and outcome prediction for the game of cricket

Adaptability and Fault Tolerance

Tweed District Junior Cricket Association Inc.

Machine Learning Methods for Climbing Route Classification

Tweed District Junior Cricket Association Inc.

Predicting the NCAA Men s Basketball Tournament with Machine Learning

Data Analytics based Deep Mayo Predictor for IPL-9

A N E X P L O R AT I O N W I T H N E W Y O R K C I T Y TA X I D ATA S E T

Introduction to Pattern Recognition

Section I: Multiple Choice Select the best answer for each problem.

1.1 The size of the search space Modeling the problem Change over time Constraints... 21

The Application of Data Mining in Sports Events

a) List and define all assumptions for multiple OLS regression. These are all listed in section 6.5

A Chiller Control Algorithm for Multiple Variablespeed Centrifugal Compressors

Mobility Detection Using Everyday GSM Traces

Predicting Players Performance in the Game of Cricket Using Machine Learning. Niravkumar Pandey

Neural Networks II. Chen Gao. Virginia Tech Spring 2019 ECE-5424G / CS-5824

B. AA228/CS238 Component

OXYGEN POWER By Jack Daniels, Jimmy Gilbert, 1979

WINNING AND SCORE PREDICTOR (WASP) TOOL

BARBADOS CRICKET ASSOCIATION

EFFICIENCY OF TRIPLE LEFT-TURN LANES AT SIGNALIZED INTERSECTIONS

Predicting Outcome of Indian Premier League (IPL) Matches Using Machine Learning

Finding your feet: modelling the batting abilities of cricketers using Gaussian processes

UNDER 14 s SPECIFIC COMPETITION RULES (Updated September 2017)

SOUTH EAST CRICKET ASSOCIATION HALL OF FAME

INTRODUCTION AND METHODS OF CMAP SOFTWARE BY IPC-ENG

ISDS 4141 Sample Data Mining Work. Tool Used: SAS Enterprise Guide

Forecasting Baseball

y ) s x x )(y i (x i r = 1 n 1 s y Statistics Lecture 7 Exploring Data , y 2 ,y n (x 1 ),,(x n ),(x 2 ,y 1 How two variables vary together

Basketball field goal percentage prediction model research and application based on BP neural network

Ongoing Match Prediction in T20 International

1. Task Definition. Automated Human Gait Recognition Ifueko Igbinedion, Ysis Tarter CS 229, Fall 2013

Introduction to Pattern Recognition

This page intentionally left blank

Matrix-analog measure-cerrelatepredict

No ball ready reckoner Umpire who may call Non-striker s end Square leg Front foot infringement Yes No Back foot infringement Yes No More than

An Artificial Neural Network-based Prediction Model for Underdog Teams in NBA Matches

IMPROVING MOBILITY PERFORMANCE IN WHEELCHAIR BASKETBALL

CS249: ADVANCED DATA MINING

A Machine Learning Approach to Predicting Winning Patterns in Track Cycling Omnium

Remember to always conduct yourself in a polite, efficient, professional manner, as you are representing your state or association.

Mathematics of Pari-Mutuel Wagering

How To Win The Ashes A Statistician s Guide

Evaluation and further development of car following models in microscopic traffic simulation

Design Strategies for ARX with Provable Bounds: SPARX and LAX

Revised on 11 th January 2018

This document provides information about all data that comes available for every sport.

Quantitative Assessment of Player Performance and Winner Prediction in ODI Cricket

A Network-Assisted Approach to Predicting Passing Distributions

INFLUENCE OF ENVIRONMENTAL PARAMETERS ON FISHERY

Predicting Exchange Rates Out of Sample: Can Economic Fundamentals Beat the Random Walk?

Transcription:

CS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan

Scenario 1: Team 1 scored 200 runs from their 50 overs, and then Team 2 reaches 146 for the loss of two wickets from their first 40 overs before rain stops play. Scenario 2: Bad weather reduces match to 40 overs. Team 1 scores 223 from their 40 overs. During, Team 2 s innings, it rains after 30 overs, by which point they are 147/5. They lose 5 overs due to rain and face the final 5 overs.

Duckworth/Lewis method Each team has two resources to use to make as many runs as possible: overs and wickets At any point, score depends on combination of these two resources A published table provides the % of these combined resources remaining for any number of overs left and wickets lost This table helps calculate the target score

Essence of D/L method

Wickets are (necessarily) a much more weighted resource than overs Other factors that equally affect outcome not considered: Venue Power play Player Rating Momentum Player History

Creating an efficient model to reflect as many diverse scenarios in real-time matches as possible Model Runs Wickets Player Rating Power play Venue Rain Affected Match (predict) Outcome

Data sets for cricket matches were not available in usable form Data had to be extracted from websites that maintain cricket statistics (eg: ) Ball-by-ball details of the match were extracted (commentary) Data was formatted based on desired attributes (over, runs, wickets, batsman, bowler, power play) and written to a file In all, data for 52 innings were collected

Commentary

D/L method uses only two features runs and overs New features were added venues, teams taking part in the game, data about the batsman/bowler, nature of the over with field restriction (ie, power play or not) Ran some feature extraction algorithms & finally selected Correlation Based Subset Feature Selection (with fast correlation based filter search strategy)

Principle: A good feature subset is one that contains features highly correlated with (predictive of) the class, yet uncorrelated with (not predictive of) each other. Results: Showed that most important attributes were runs scored, wickets, venue & power play Also validated by other feature estimators like PCA, Latent Semantic Analysis

FEATURE MERIT OF THE FEATURE Run 0.418 Wicket 0.418 Venue 0.095 Power play 0.076 Batsman 0.007

Almost all target prediction algorithms make use of function approximation Standard basic classifiers were used to benchmark the effort of the data sets Classification rate is currently lower than the D/L method mainly because of the lesser number of data sets available for training. This can be increased with more data samples knn and Linear Regression provided high classification rates

ALGORITHM Neural Network 51 % Linear Regression 19 % knn 16 % REPTree 23 % ERROR RATE

Thought: knn and Linear Regression can be used together? Why Quadratic Regression? Many curve fitting functions used to predict cricket scores are cubic Can be an improvement over Linear Regression, which already performed well Predictor variables are run, run 2,wicket, wicket 2 Can be transformed into equivalent Linear Regression and solved Can be coupled with Smoothing by Neighbour Polling so that the prediction curve is smoothened

Reduces spikes in the prediction curve

Variance Analysis: Obtain actual match curve Calculate our prediction Fit them over each other and obtain the variance This is per-match metric and hence captures accuracy locally over a continuous period Converges with Absolute Mean Error & Least Square Error methods (for equal samples in each match)

Match Interrupted 100 /3 25 overs Will both the teams score at the same rate? After 15 overs: Score: Team 1 90/0 Team 2 40/3 Given the above history, will the answer to above question change?

Data sets

Duckworth/Lewis method of score prediction was analyzed and its pitfalls were addressed Correlation Based Subset evaluation method was used for Feature Evaluation Modified data set was used to benchmark the predictions of naïve classifiers Hybrid approach using Quadratic Regression and knn was used, which performed better than D/L method Introduced concept of prediction using Momentum