Modelling the distribution of first innings runs in T20 Cricket

Similar documents
Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Business and housing market cycles in the euro area: a multivariate unobserved component approach

Which On-Base Percentage Shows. the Highest True Ability of a. Baseball Player?

Logistic Regression. Hongning Wang

Analysis of performance at the 2007 Cricket World Cup

Reflecting Against Perception: Data Analysis of IPL Batsman

15) Players should have played at least 1 of its regular season games to be qualified for Semi Finals & Finals

THE SURREY CHAMPIONSHIP

A statistical model for classifying ambient noise inthesea*

Major League Baseball Offensive Production in the Designated Hitter Era (1973 Present)

How To Win The Ashes A Statistician s Guide

This document is to be used in conjunction with the Laws of Cricket

Using Poisson Distribution to predict a Soccer Betting Winner

A SHORT GUIDE TO SCORING

February 12, Winthrop University A MARKOV CHAIN MODEL FOR RUN PRODUCTION IN BASEBALL. Thomas W. Polaski. Introduction.

Nonlife Actuarial Models. Chapter 7 Bühlmann Credibility

WIN/LOSE MATCH RULES

Basic scoring. Scoring balls and runs

Special Topics: Data Science

PREDICTING the outcomes of sporting events

MEN S PREMIER AND PREMIER RESERVE GRADE TWO DAY CHAMPIONSHIP COMPETITION

CS145: INTRODUCTION TO DATA MINING

Our Objective is that all players playing junior cricket in our association are given a chance for equal participation in each game.

Playing Conditions 20 overs per side

WHITECROSS DISTRICT TOURNAMENT Boys Under 14 T20

DGT Cricket Club Presents

THE SURREY CHAMPIONSHIP

A One-Parameter Markov Chain Model for Baseball Run Production

Honest Mirror: Quantitative Assessment of Player Performances in an ODI Cricket Match

The ICC Duckworth-Lewis-Stern calculator. DLS Edition 2016

Machine Learning an American Pastime

Limited Over Cricket All Divisions

WHITECROSS DISTRICT TOURNAMENT Boys Year 7 LO

Building an NFL performance metric

CHANGE TO LAW (From 1st April 2019) OFFICIAL. Marylebone Cricket Club. Change to Law 47.1 (From 1st April 2019) 1

Tournament Operation Procedures

Men in Black: The impact of new contracts on football referees performances

QUEENSLAND YEARS SCHOOL CRICKET 15 YEARS GIRLS STATE CHAMPIONSHIPS PLAYING CONDITIONS

2 When Some or All Labels are Missing: The EM Algorithm

Royal London Club Championship Competition Rules

PLAYERS CLOTHING 2018

The Intrinsic Value of a Batted Ball Technical Details

Copy of my report. Why am I giving this talk. Overview. State highway network

ECA Proposed Rule Changes 2018

Journal of Quantitative Analysis in Sports Manuscript 1039

Viegli Bedfordshire T20 Cup Competition

1 INTRODUCTION. Contents

MIT Cricket Tournament - Summer 2009 Rules

Citation for published version (APA): Canudas Romo, V. (2003). Decomposition Methods in Demography Groningen: s.n.

Application of Queuing Theory to ICC One Day Internationals

Online Companion to Using Simulation to Help Manage the Pace of Play in Golf

WHITECROSS DISTRICT TOURNAMENT Boys Under 15 LO

MyCricket Basics. Season 2016/17. Brett Holliday. Cricket NSW

MANLY WARRINGAH JUNIOR CRICKET ASSOCIATION LOCAL RULES. UNDER 15 Divisions 1 and 2 as at the 2013/14 season

Learning to Imitate a Fighter Pilot

Pokémon Organized Play Tournament Operation Procedures

Mornington Peninsula Cricket Association

Optimizing Cyclist Parking in a Closed System

arxiv: v1 [stat.ap] 18 Nov 2018

SENIOR COUNTRY WEEK GENERAL RULES ANNEXURE A:

Rules and Playing Conditions of Cricket Conference Cup. 1. Title The title of the Competition shall be The Conference Cup for the Bertie Joel Trophy.

JUNIOR COUNTRY WEEK GENERAL RULES ANNEXURE A:

Fair Standings in Soccer and Other Round-Robin Leagues

! "#$%&! '("")#!*+%$,-.&'+-,! ($,-")'!,%&)/!! ($,-")'!0(-1)!! 2345!')%'.&!!

Club of Origin Boys Under 17 T20

CRICKET ASSOCIATION COMPETITION HANDBOOK

EE 364B: Wind Farm Layout Optimization via Sequential Convex Programming

PGA Tour Scores as a Gaussian Random Variable

CRICKET UMPIRING. PCA Umpiring committee

Modeling the NCAA Tournament Through Bayesian Logistic Regression

CS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan

IDENTIFICATION OF WIND SEA AND SWELL EVENTS AND SWELL EVENTS PARAMETERIZATION OFF WEST AFRICA. K. Agbéko KPOGO-NUWOKLO

Bayesian Methods: Naïve Bayes

Taking Your Class for a Walk, Randomly

Cricket Albury Wodonga Board

SIL explained. Understanding the use of valve actuators in SIL rated safety instrumented systems ACTUATION

Chapter 12 Practice Test

Part 2: Specific Skills

EBCC CRICKET COACHING - COMPETENCIES

Podcast 48 - Sir Donald Bradman - An Australian Cricket Legend

CS249: ADVANCED DATA MINING

Impact of Bike Facilities on Residential Property Prices

Gamblers Favor Skewness, Not Risk: Further Evidence from United States Lottery Games

Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Research Article Safety Impacts of Push-Button and Countdown Timer on Nonmotorized Traffic at Intersections

CRICKET SCOTLAND CODE OF CONDUCT FOR PLAYERS AND TEAM OFFICIALS

Playing Conditions 50 overs per side

Playing Conditions 40 overs per side

BK SUPER SMASH 1. THE COMPETITION

Specsavers Under 17 County Championship B Division Two-Day Competition

SOUTHERN PREMIER LEAGUE T20 CUP RULES & REGULATIONS

Determination of the Design Load for Structural Safety Assessment against Gas Explosion in Offshore Topside

HIGH RESOLUTION DEPTH IMAGE RECOVERY ALGORITHM USING GRAYSCALE IMAGE.

2018 USTA Arkansas Championships Procedures

NCSS Statistical Software

Unassisted breathing and death as competing events in critical care trials

Dynamic Winner Prediction in Twenty20 Cricket: Based on Relative Team Strengths

OPTIMIZATION OF A WAVE CANCELLATION MULTIHULL SHIP USING CFD TOOLS

Youth Leagues Competition Rules

Energy Output. Outline. Characterizing Wind Variability. Characterizing Wind Variability 3/7/2015. for Wind Power Management

Transcription:

Modelling the distribution of first innings runs in T20 Cricket James Kirkby The joy of smoothing James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 1 / 22

Introduction Cricket for the uninitiated Figure : Muralitharan to Gilchrist James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 2 / 22

Introduction Motivation Why we might we interested in cricket data? James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 3 / 22

Introduction Motivation Why we might we interested in cricket data? Because we love cricket? James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 3 / 22

Introduction Motivation Why we might we interested in cricket data? Because we love cricket? Well some of us do James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 3 / 22

Introduction Motivation Why we might we interested in cricket data? Because we love cricket? Well some of us do Because it s not the Iris or the Old Faithful data James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 3 / 22

Introduction Motivation Why we might we interested in cricket data? Because we love cricket? Well some of us do Because it s not the Iris or the Old Faithful data There is lots of cricket data Discrete nature of the game, means that large quantities of data are available Statistics are already an important aspect of the game James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 3 / 22

Introduction Motivation Why we might we interested in cricket data? Because we love cricket? Well some of us do Because it s not the Iris or the Old Faithful data There is lots of cricket data Discrete nature of the game, means that large quantities of data are available Statistics are already an important aspect of the game Gambling James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 3 / 22

Introduction Motivation Why we might we interested in cricket data? Because we love cricket? Well some of us do Because it s not the Iris or the Old Faithful data There is lots of cricket data Discrete nature of the game, means that large quantities of data are available Statistics are already an important aspect of the game Gambling Standing on the shoulders of giants Working out the odds of dice and card games is what inspired the first interest in statistics and probability James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 3 / 22

Data Scope of the Data There are a vast number of matches played worldwide each year for which data is publicly available We are going to restrict attention to the following types of matches: T20 cricket, ie 20 overs per team Only Top Tier competitions: T20 internationals, English County T20s, IPL, Big Bash, South African T20 James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 4 / 22

Data Scope of the Data There are a vast number of matches played worldwide each year for which data is publicly available We are going to restrict attention to the following types of matches: T20 cricket, ie 20 overs per team Only Top Tier competitions: T20 internationals, English County T20s, IPL, Big Bash, South African T20 We are going to be modelling the number runs teams score in an innings, and so we First Innings (only data for the team that bats first) Full allocation of overs was available, ie not weather affected James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 4 / 22

Data Scope of the Data There are a vast number of matches played worldwide each year for which data is publicly available We are going to restrict attention to the following types of matches: T20 cricket, ie 20 overs per team Only Top Tier competitions: T20 internationals, English County T20s, IPL, Big Bash, South African T20 We are going to be modelling the number runs teams score in an innings, and so we First Innings (only data for the team that bats first) Full allocation of overs was available, ie not weather affected These restrictions lead to a sample of 1138 matches James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 4 / 22

Data Data Description We observe the progression of runs that a team scores through the innings At the beginning of each over we have the following information: The number of runs scored in the remainder of the innings The number of wickets down / number of batsmen remaining The number of overs / balls remaining James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 5 / 22

Data Data Description We observe the progression of runs that a team scores through the innings At the beginning of each over we have the following information: The number of runs scored in the remainder of the innings The number of wickets down / number of batsmen remaining The number of overs / balls remaining We will focus on the run rate (runs per over) to ensure that results are comparable with different numbers of overs remaining James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 5 / 22

Data Data Description We observe the progression of runs that a team scores through the innings At the beginning of each over we have the following information: The number of runs scored in the remainder of the innings The number of wickets down / number of batsmen remaining The number of overs / balls remaining We will focus on the run rate (runs per over) to ensure that results are comparable with different numbers of overs remaining Definition We define the random variable, Y W,R as the subsequent run rate a team achieves given that they are currently W wickets down with R overs remaining in the innings James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 5 / 22

Data Our Aim We would like to estimate the distributions of the various Y W,R with the following requirements Avoid a full rank method - don t want be storing the entire data set in order to evaluate probabilities Want to be able to easily evaluate the probabilities from the distribution We would like a set of consistent distributions ie the probability of achieving any given run rate should be lower if a team has fewer wickets remaining James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 6 / 22

Data Observed Data Frequency James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 7 / 22

Data Empirical Distribution James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 8 / 22

Data Empirical Distribution James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 9 / 22

Model Notation We observe many realisations of each of the Y W,R We will refer to the i th realisation of Y W,R, when W = w and R = r, as y w,r,i When it is clear from the context which W and R we are talking about, or if it doesn t matter, we will drop the subscripts and use Y and y i James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 10 / 22

Model Distribution Assumption We assume that Y follows a spline distribution, with pdf given by: m f(y) = B j(y)α j (1) j=1 Sufficient conditions for a valid pdf are: α j > 0 and m α j = 1 (2) j=1 We can remove the need for the first condition by re-parameterizing to: m f(y) = B j(y) exp(a j) (3) j=1 James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 11 / 22

Model Likelihood The log-likelihood for our data given the spline distribution l(a; y) = 1 T n log (B exp(a)) (4) where B = b 1(y i) b m(y i) b 1(y n) b m(y n) and a = a 1 a m (5) James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 12 / 22

Model Estimation Estimation of the parameters can now proceed by finding the roots of the Lagrangian: L(a, γ) = 1 T log (B exp a) + γ ( 1 T m exp a 1 ) (6) James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 13 / 22

Model Estimation Estimation of the parameters can now proceed by finding the roots of the Lagrangian: L(a, γ) = 1 T log (B exp a) + γ ( 1 T m exp a 1 ) (6) The gradient vectors are: ( ) T ( ) T L a = 1 1 (B diag(exp a)) + γ exp a = (B diag(α)) + γα (7) B exp a Bα and L γ = ( 1 T m exp a 1 ) = ( 1 T mα 1 ) (8) James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 13 / 22

Model Estimation The hessian of the our objective function is [ ( diag L H a,γl = γ exp a) ] V T U 1 V exp a a, (9) (exp a) T 0 where U = diag (B exp a) 2 and V = B diag(exp a) This can be combined with expressions (7) and (8) to find the maximum likelihood estimate of the coefficients, a, using Newton-Raphson James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 14 / 22

Model Result James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 15 / 22

Model Further Smoothing We would like to impose some smoothness on the distributions, so that when the number of wickets remaining and overs remaining is similar we have a similar distribution We can achieve this by imposing a difference penalty on the parameters of the neighbouring distributions In order to be able to add the penalty we first need to be able to estimate the parameters jointly, which requires that we make a couple of tweaks to our basis and likelihood James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 16 / 22

Model Multi-density Basis In order to model the distributions joint, you would naively define the basis as: B W =0,R=20 0 0 0 0 0 B W =1,R=20 0 0 0 0 0 B W =2,R=20 0 0 B = 0 0 0 B W =8,R=1 0 0 0 0 0 B W =9,R=1 James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 17 / 22

Model Multi-density Basis In order to model the distributions joint, you would naively define the basis as: B W =0,R=20 0 0 0 0 0 B W =1,R=20 0 0 0 0 0 B W =2,R=20 0 0 B = 0 0 0 B W =8,R=1 0 0 0 0 0 B W =9,R=1 This part of the basis does not support any data! James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 17 / 22

Model Multi-density Basis So after removing columns from the basis which support no observations, we have something like: B W =0,R=20 0 0 0 0 0 B W =0,R=19 0 0 0 0 0 B W =1,R=19 0 0 B = 0 0 0 B W =8,R=1 0 0 0 0 0 B W =9,R=1 James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 18 / 22

Model Multi-density Basis We also need to define a summing matrix to enforce the constraints in the Lagrangian : 1 m 0 0 0 0 0 1 m 0 0 0 0 0 1 m 0 0 N = 0 0 0 1 m 0 0 0 0 0 1 m Clearly we will need to define an analogue of B for N, which we will refer to as Ñ James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 19 / 22

Model Bring on the smoothing Our unpenalised target function becomes ( ) ) L(ã, γ) = 1 T log B exp ã + γ (Ñ T exp ã 1 (10) We can then simply add add a difference penalty to impose smoothness across our distributions: L P(ã, γ) = L(ã, γ) λ exp(ã) T DT D exp(ã), (11) where D is matrix that has been chopped down from some difference matrix D For our example, we will use D = D W D R I m James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 20 / 22

Model Result James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 21 / 22

Model Further Work Would be good to take account of the repeated measurements in the data James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 22 / 22

Model Further Work Would be good to take account of the repeated measurements in the data Find a way to introduce a parametric component into the model James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 22 / 22

Model Further Work Would be good to take account of the repeated measurements in the data Find a way to introduce a parametric component into the model Performance improvements - Woodbury Matrix Identity / Schur Complement James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 22 / 22

Model Further Work Would be good to take account of the repeated measurements in the data Find a way to introduce a parametric component into the model Performance improvements - Woodbury Matrix Identity / Schur Complement Alternative penalty structure - add a penalty to ensure the CDFs do not cross James Kirkby Modelling the distribution of first innings runs in T20 Cricket The joy of smoothing 22 / 22