Predicting Results of a Match-Play Golf Tournament with Markov Chains

Similar documents
2015 Hero World Challenge

Kelsey Schroeder and Roberto Argüello June 3, 2016 MCS 100 Final Project Paper Predicting the Winner of The Masters Abstract This paper presents a

2018 TOUR Championship Broadcast Window & Pre-Tournament Notes

Tee Tests: Playing with Tiger. September 2007 MIDDLEBURY COLLEGE ECONOMICS DISCUSSION PAPER NO

2017 Valspar Championship Broadcast Window & Pre-Tournament Notes. ShotLink Keys to Victory for Charl Schwartzel

2018 AT&T Pebble Beach Pro-Am Broadcast Window & Pre-Tournament Notes. ShotLink Keys to Victory for Jordan Spieth

The Project The project involved developing a simulation model that determines outcome probabilities in professional golf tournaments.

Using Markov Chains to Analyze a Volleyball Rally

2010 World Golf Championships-Bridgestone Invitational Updated Aug. 2, Contacts:

2010 World Golf Championships-Accenture Match Play Championship. Contact:

ShotLink Keys to Victory for Wesley Bryan

Atlanta, Ga. Sept , 2018 Purse: $9,000,000 East Lake Golf Club Par/Yards: /7,362 FedExCup Points:2,000

THE MASTERS. AUGUSTA April 6-10, 2011

World Golf Championships-Bridgestone Invitational News and Notes

Regression to the Mean at The Masters Golf Tournament A comparative analysis of regression to the mean on the PGA tour and at the Masters Tournament

IDENTIFYING SUBJECTIVE VALUE IN WOMEN S COLLEGE GOLF RECRUITING REGARDLESS OF SOCIO-ECONOMIC CLASS. Victoria Allred

THE DETERMINANTS OF ANNUAL EARNINGS FOR PGA PLAYERS UNDER THE NEW PGA S FEDEX CUP SYSTEM

2018 Zurich Classic of New Orleans Broadcast Window & Pre-Tournament Notes

Austin, Texas March 21-25, 2018 Purse: $10,000,000 ($1,700,000 to the winner) Austin County Club Par/Yards: /7,108

Spreadsheet-based Simulation Models for Decision Support: Case of strategic pairings in sport tournaments with match play for team competition

Swinburne Research Bank

Summary of SAGA Handicap Manual

Tigermetrics. Roland Minton

Improving the Australian Open Extreme Heat Policy. Tristan Barnett

PGA Tour Scores as a Gaussian Random Variable

A point-based Bayesian hierarchical model to predict the outcome of tennis matches

Austin, Texas March 22-26, 2017 Purse: $9,750,000 ($1,670,000 to the winner) Austin County Club Par/Yards: /7,037

Summary of SAGA Handicap Manual

Chapter 11 Glossary. Allowance see Handicap Allowance

2009 SHELL HOUSTON OPEN. Contact: Mark Williams Media Official

2017 Pequot Men's Golf Association (PMGA)

NBA TEAM SYNERGY RESEARCH REPORT 1

Miscellaneous Facts (through Houston Open)

Simulating Major League Baseball Games

Atlanta, Ga. Sept , 2018 Purse: $9,000,000 East Lake Golf Club Par/Yards: /7,362 FedExCup Points:2,000

2013 PGA Commemorative Edition

EVENTS TRADITIONAL EVENTS UNIFIED EVENTS

Largest margin of victory: 11 points United States def. International 2000, 21½ - 10½ 9 points International def. United States, 1998, 20½ - 11½

On-site PGA TOUR media contact: Mark Williams, Communications Director (904)

GENERAL INFORMATION Champion Adam Hadwin

BIRDIES FOR THE BRAVE

Pair by (how the Scheduler organizes the foursomes):

ShotLink Keys to Victory for Dustin Johnson

Shadow Hills Men s Golf League Rules

Online Companion to Using Simulation to Help Manage the Pace of Play in Golf

Perfect Day. The 2012 Ryder Cup 13

Beyond Central Tendency: Helping Students Understand the Concept of Variation

#1 Accurately Rate and Rank each FBS team, and

Using Markov Chains to Analyze Volleyball Matches

Building an NFL performance metric

Tuesday Mens League 2017

Estimating the Probability of Winning an NFL Game Using Random Forests


Journal of Quantitative Analysis in Sports

[5] S S Blackman and Casey J W. Development of a rating system for all tennis players. Operations Research, 28: , 1980.

PARTAKERS 2018 LEAGUE RULES

D.O.I: Assistant Prof., University of Guilan, Rasht, Iran

FARRELL ROAD ENGINEERING GOLF ASSOCIATION (FREGA) BYLAWS (REVISED Feb 19, 2015)

To score the Orbiter Classic Grand Slam make six ringers in a row Swings #1, #2, #3, #4, and #5 and the last and most challenging of all, Swing #6.

Assessing Golfer Performance on the PGA TOUR

2018 Safeway Open Broadcast Window & Pre-Tournament Notes

Golasso Golf Systems 2018 Six Applications for Windows PC or Laptop Ideal Software for Country Clubs, Golf Leagues or Tours

THE USGA HANDICAP SYSTEM. Reference Guide

An Analysis of Factors Contributing to Wins in the National Hockey League

Is Tiger Woods Loss Averse? Persistent Bias in the Face of Experience, Competition, and High Stakes. Devin G. Pope and Maurice E.

On-site PGA TOUR media contact: Michael Baliker PGA TOUR Communications (864)

BEST EVER GOLF ASSOCIATION (BEGA) BY-LAWS

Finding your feet: modelling the batting abilities of cricketers using Gaussian processes

STRETCHES FOR GOLF SIMPLE GOLF PRACTICE SCHEDULES SIMPLE GOLF SERIES

Application of Bayesian Networks to Shopping Assistance

1 Crossland Ps, Millersville, PA (717) Senior Men s League 2017 Rules & Procedures

Uninformed Search (Ch )

What it Takes to Win on the PGA TOUR. (If Your Name is Tiger or if it isn t) 1

1 Crossland Ps, Millersville, PA (717) Senior Men s League 2018 Rules & Procedures

PIQCS HACCP Minimum Certification Standards

Honest Mirror: Quantitative Assessment of Player Performances in an ODI Cricket Match

2018 US Masters Factfile

c 2016 Arash Khatibi

2017 Distance Report. A Review of Driving Distance Introduction

APPLYING TENNIS MATCH STATISTICS TO INCREASE SERVING PERFORMANCE DURING A MATCH IN PROGRESS

Gizachew Tiruneh, Ph. D., Department of Political Science, University of Central Arkansas, Conway, Arkansas

RULES AND REGULATIONS OF FIXED ODDS BETTING GAMES

Algorithms and Software for the Golf Director Problem

TRIP GENERATION RATES FOR SOUTH AFRICAN GOLF CLUBS AND ESTATES

The RSM Player Performance Study Findings

Shining Rock Winter Tour Ryder Cup. October 6 & Final Results

Why We Should Use the Bullpen Differently

Black Sea Bass Encounter

The Science of Golf. Test Lab Toolkit The Score: Handicap. Facilitator Guide Grades 6-8

Seniors of Birchwood. Competition Information

Natural Resource Partners Bluegrass Junior JUNIOR-AM TOURNAMENT Members & Volunteers Monday, June 9, 2014 Bellefonte Country Club Ashland, Kentucky

RBC CANADIAN OPEN FIELD TO BOAST STAR POWER OF U.S. OPEN TOP FINISHERS AND OTHER NOTABLE PGA TOUR PLAYERS

Should pitchers bat 9th?

A bet Score by sets is offered. The correspondent columns in the line are entitled: 2:0, 2:1, etc.

THE LEDGES 2018 GOLF SCHEDULE

2017 Distance Report. Distance Report - Summary

arxiv: v1 [stat.ap] 18 Nov 2018

VIRTUAL TENNIS TOUR SEASON 2014 OFFICIAL RULEBOOK

2018 Pequot Men's Senior League

AUGUST 1-7, 2016 I TPC RIVER HIGHLANDS I CROMWELL, CT

Transcription:

Predicting Results of a Match-Play Golf Tournament with Markov Chains Kevin R. Gue, Jeffrey Smith and Özgür Özmen* *Department of Industrial & Systems Engineering, Auburn University, Alabama, USA, [kevin.gue, jsmith, ozgur]@auburn.edu Abstract. We introduce a Markov Chain model for predicting outcomes in golf match-play. The model uses individual players score probability distributions for each hole to estimate the probability of winning the match. The model is specific both to the individual participants and to the course on which the match is played. We use six years of PGA ShotLink data to determine individual player statistics and to estimate the required probability distributions. We compare the prediction of the model results in the Ryder Cup singles match-play (Day 3).. Introduction Golf tournaments take on two major forms: stroke-play and match-play. In stroke-play, a player s final score consists of the sum of scores for each hole of the tournament; the player with the lowest score wins. In matchplay, two players compete on a hole-by-hole basis, such that the player with the lower score on a hole wins one point. Equal scores on a hole yield one half-point for each player. The player with the most points after 8 holes wins. A match may not require a full 8 holes, if one player has a point advantage greater than the number of remaining holes. For example, if a player is 3-up with only two holes remaining, he or she has won the match and play ceases. In this study, we consider only match-play competition. Match outcome and decision support models for sports have been proposed in several studies (Scarf and Shi, 5; Goddard, 5; Barnett and Clarke, 5). Reilly and Williams (3) summarize the effects of implementing scientific methodologies to soccer. Regarding golf, Scheid (979) simulates the effect of handicap allowances in golf on players chance of winning, and McHale () conducts simulation studies to examine the fairness of handicapping by using data from a real golf tournament. Similar to these studies, Franks and McGarry (3) examine the relationship between observed results and expected results using real data. Markov chains are widely used to model sporting events (Sokol, 4; Kostuk et al., ). Berry () builds a Markov chain model to compare Tiger Woods with other golfers to find out if he has the persona of a winner. Fearing et al. () use PGA Tour ShotLink data to develop distance-based models of putting performance and to create a new putting performance metric. Our study combines player statistics from six years of PGA ShotLink data and a Markov chain model to predict the outcome of golf match-play events. In the following sections, we start with giving information about the aggregated data we formed and the mathematical model we built. Then we talk about our validation efforts regarding the data model. We present results of our computational model for Ryder Cup, and conclude with pedagogical notes and future goals.. Methodology. PGA ShotLink Data PGA ShotLink data is gathered in the major PGA Tour stroke-play events by volunteers using mobile computers and laser rangefinders. We used six years of raw data (4 9) consisting of the scores of every player on every hole in every tournament during those years. We aggregated this data to estimate player performance statistics by par of hole. The structure of the data is given in Table. For example, we can determine a player s probability of scoring i strokes on a Par- j, where i is {,,...,} and j is {3,4,5} as in Table. (Professional players almost never score more than on a hole.) http://www.pgatour.com/story/9596346/

Table. Sample player scores on Par-5 holes Player Name Obs 3 4 5 6 7 8 9 Mickelson, Phil 47 4 678 575 97 3 Mahan, Hunter 79 34 733 863 38 8 3 Watson, Bubba 79 38 48 463 8 3. Mathematical Model Table. Sample probabilities on Par-5 holes Player Name Obs 3 4 5 6 7 8 9 Mickelson, Phil 47.3.48.4.7. Mahan, Hunter 79..4.48.8. Watson, Bubba 79.4.45.43.8. We model a match-play match as a Markov process, in which the state of the match is the advantage one player has over the other and the transition probabilities correspond to the probabilities that that player wins, ties, or loses the current hole. We also assume that performance on a hole does not depend on holes already played, and so we meet the required memorylessness property. We also assume that the performance of a player is not influenced by the identity of his opponent. Let A j and B j be random variables corresponding to the score of Player A and Player B on a par- j hole. The probability that A wins the hole is P(A j < B j ) = = = P(A j < b B j = b)p(b j = b) b= P(A j < b)p(b j = b) b= b a= b= P(A j = a)p(b j = b) Similarly, the probability of a tie is and the probability of a loss is P(A j = B j ) = a= P(A j = a)p(b j = a), P(A j > B j ) = P(A j < B j ) P(A j = B j ). With the probabilities of win, tie, and loss for each of the three pars (3, 4, 5), we can completely specify a state transition diagram (see Figure ). The match, which may be defined from either player s perspective, begins in state zero and proceeds hole-by-hole, using probabilities appropriate to the par of each hole. In an 8-hole match-play we have different states. Gray nodes indicate termination states, in which one player has won. There is also a termination state of tie after 8 holes. The structure of the state diagram suggests a simple, recursive expression for the probability that the match is in state m after h holes. Let w h, t h, and l h be the probabilities of a win, tie, and loss on hole h. These probabilities will depend, of course, on the par of the hole. In general, the probability p(m,h) of being in state m after h holes is the recursive expression: p(m,h) = p(m,h )w h + p(m,h )t h + p(m +,h )l h. If states (m,h ),(m,h ), or (m +,h ) are infeasible (e.g., 3-up after two holes), we set their respective state probabilities to zero. Similarly, if a state is feasible, but the transition is not, we modify the

9 9 9 9 7 Hole 8 8 7.. 3 8 Win Tie Lose - -7.. - - -8-8 -3-9 -9-9 Figure. State diagram of a match. Gray circles represent terminating states, indicating the end of the match. recursion appropriately. For example, p(,8) = p(,7)w 8 because the other feeder states (p(3,7) and p(,7)) are winning states, and therefore the match is over if they are reached (see Figure ). The probability that a player wins the match is the sum of probabilities of reaching the winning states. The probability that the match ends in a tie is p(,8). 3. Validation and Results We assume that the probabilities we derived from the raw data are accurate and applicable in head-to-head matches between individual players. This is an important assumption and warrants validation. Since the current data was gathered from stroke-play tournaments, we wanted to also collect some match-play data to provide validity evidence. To our knowledge, there are only two major events which have match-play rounds; Ryder Cup Day-3 and the Accenture Match Play tournament. We searched world wide web to find these tournaments data and to discern rivalry information between players. Since the Accenture event is a knockout style tournament and since there are so few match-play tournaments, we could only find small number of players who played against each other multiple times. Our goal was to have enough match-play observations between two players to calculate binomial confidence interval and compare it with the conditional probabilities computed using our model. Since the number of observations for each par level was around or below, we found wide 95% confidence intervals which the conditional probabilities derived from ShotLink data always fall within. This doesn t give us great comfort in our validation efforts, but we are currently seeking additional head-to-head data in order to improve the validiation process. For our second validation effort, we use only PGA Championship data in PGA ShotLink and assume that if two players played a hole on the same day in the same round at the same event, we can use that We could find scorecards for Accenture Match Play Tournament,, 9, 8, 7, 6, 5, 4 and Ryder Cup, 8, 6, 4

data as if they played against each other in a match-play for that particular hole. We analyzed the data and picked two players (Toms and Mickelson) who played the most common holes in the same days. Using this data, we then calculated the probabilities of winning, losing and halving for these players. Assuming the normal approximation, we calculated binomial 95% confidence intervals on the respective probabilities. Table 3 shows that all of the conditional probabilities calculated by our algorithm using all ShotLink data fall within confidence intervals. In terms of validation, our results are still fairly weak. We hope to work with the PGA Tour to identify and obtain some additional data to support our validation effort. Table 3. Validation results for all ShotLink data All ShotLink Data PGA Championship Data 95% Confidence Interval Hole Toms Tie Mickelson Toms Tie Mickelson Toms Tie Mickelson Par - 3.5.56.4.9.563.9 (.36,.3) (.463,.66) (.36,.3) Par - 4.7.469.6.73.473.54 (.9,.36) (.43,.534) (.,.36) Par - 5.344.399.56.39.43.78 (.,.47) (.89,.56) (.74,.38) 3. Ryder Cup Day-3 Results The Ryder Cup is a golf competition between two teams from Europe and the United States which is held in every two years. Each team consists of members who are picked by the respective team captains. The Ryder Cup matches involve various match-play competitions between players selected from two teams of twelve. Currently, the matches consist of eight foursomes matches, eight fourball matches and singles matches. 3 The winner of each match scores a point for his team, or / point if the match ends in a draw. In this paper we are interested only in singles matches that are played at day-3 of the Ryder Cup tournament. The sequences of the players in each team are announced by the team captains the night before Day-3 session. Players who have the same rank play against each other. We ran our algorithm for Ryder Cup and the results are given in Table 4. Note that the actual winners are illustrated in bold characters. Table 4. Results for Ryder Cup Singles Match Play Match US Player EU Player P(US Wins) P(EU Wins) P(Tie) Stricker, Steve Westwood, Lee.557.3. Cink, Stewart McIlroy, Rory.48.39.8 3 Furyk, Jim Donald, Luke.48.389.3 4 Johnson, Dustin Kaymer, Martin.59.9.6 5 Kuchar, Matt Poulter, Ian.475.396.9 6 Overton, Jeff Fisher, Ross.569.39. 7 Watson, Bubba Jimenez, Miguel A..579.3.9 8 Woods, Tiger Molinari, Francesco.679.3.9 9 Fowler, Rickie Molinari, Edoardo.746.64.9 Mickelson, Phil Hanson, Peter.74.86. Johnson, Zach Harrington, Padraig.457.45.8 Mahan, Hunter McDowell, Graeme.535.34.3 In the appendix, we present the probabilities of winning and being tied from the US player s (first player listed) perspective. In Table 5, we show the conditional probabilities for Ryder Cup match-plays that are found by our algorithm using all PGA ShotLink data. In Table 6, similarly to our second validation effort, we present Ryder Cup match-play opponents data that is discerned from PGA Championship rounds. We only list the match-ups that have sufficient observations to make the normal (distribution) approximation. 3 http://en.wikipedia.org/wiki/ryder Cup

Table 7 gives 95% confidence intervals for PGA Championship data to compare with probabilities found using all ShotLink data. 4. Conclusion Assuming that the player probabilities we derived from ShotLink data are accurate for match-play, we calculate the probabilities of winning, losing and being tied for each player against each other on each par level (3, 4, and 5). For further validation of our assumption, we need more match-play data to compare. With our recursive algorithm, we can also find the probabilities of winning an 8 hole match. For the Ryder Cup tournament, using the same recursive logic (but without termination states and pruning), we can predict which team is more likely to win the day-3 singles match-play event (consists of matches). In the Ryder Cup, team Europe was leading the game with 9.5 to 6.5 before day-3 started. Our algorithm found 8% chance of winning for team US. Even the chance of winning for US with deficit of 3+ was around 4% which suggested that very exciting day-3 event was waiting for us at least this was an accurate prediction. As a future goal, we are working on a team selection tool based on our probability model. The tool will assist in the team selection process by finding good player assignments given the opposing team s line-up. We also assigned this model as a class project in our undergraduate applied probability course to measure the reaction of the students regarding their learning experience. Our purpose was to introduce an entertaining but also stimulating problem that would raise the student interest and makes the subject matter more memorable. Feedbacks we got back from the students were encouraging and really useful to design different implementations of this project assignment. Our future plan is to design the project in milestones at which students accomplish one task at a time such as manipulating the data, calculating conditional probabilities, calculating match results, and calculating game results (in Ryder Cup case) etc. We want them to compare their results with the real life Ryder Cup results to gain more faith on the method. 5. Acknowledgments We would like to thank Kin Lo of PGA Tour Headquarters and the PGA Tour for providing us with the ShotLink data that was used in this research. Appendix Table 5. Ryder Cup Match-up Probabilities when All ShotLink data is used Par-3 Win Par-3 Tie Par-4 Win Par-4 Tie Par-5 Win Par-5 Tie Stricker vs. Westwood.6.59.3.467.33.45 Cink vs. McIlroy.5.535.86.47.74.394 Furyk vs. Donald.4.53.63.498.9.436 Johnson vs. Kaymer.69.496.35.446.355.379 Kuchar vs. Poulter.6.56.63.49.87.47 Overton vs. Fisher.7.55.3.477.34.388 Watson vs. Jimenez.45.54.9.469.395.377 Woods vs. Molinari, F.97.563.33.54.436.357 Fowler vs. Molinari, E.95.48.36.449.43.353 Mickelson vs. Hanson.86.53.3.46.43.368 Johnson vs. Harrington.5.55.7.479.86.45 Mahan vs. McDowell.54.53.96.459.36.4

Table 6. PGA Championship results Par-3 Win Par-3 Tie Par-4 Win Par-4 Tie Par-5 Win Par-5 Tie Stricker vs. Westwood.9.54.8.47.36.5 Furyk vs. Donald.56.578.7.47..538 Mickelson vs. Hanson.34.48.99.463.375.53 Johnson vs. Harrington.34.484.85.44.37.43 Mahan vs. McDowell.9.5.59.534.458.375 Table 7. Confidence intervals of PGA Championship results Par-3 Win Par-3 Tie Par-4 Win Par-4 Tie Par-5 Win Par-5 Tie Stricker vs. Westwood (.,.348) (.4,.683) (.4,.357) (.385,.555) (.55,.456) (.337,.663) Furyk vs. Donald (.67,.45) (.457,.699) (.64,.89) (.396,.546) (.,.33) (.43,.674) Mickelson vs. Hanson (.83,.44) (.35,.63) (.9,.369) (.387,.54) (.7,.543) (.358,.74) Johnson vs. Harrington (.3,.338) (.36,.67) (.7,.35) (.35,.498) (.99,.454) (.89,.557) Mahan vs. McDowell (.76,.36) (.37,.673) (.83,.36) (.43,.638) (.59,.658) (.8,.569) References Barnett T. and Clarke S. (5) Combining player statistics to predict outcomes of tennis matches. IMA Journal of Management Mathematics 6, 3. Berry S. () Is tiger woods a winner? Mathematical Association of America Distinguished Lecture Series. Fearing D., Acimovic J. and Graves S. () How to catch a tiger: Understanding putting performance on the pga tour. Journal of Quantitative Analysis in Sports 7. Franks I. and McGarry T. (3) The science of match analysis. Science and soccer. Goddard J. (5) Regression models for forecasting goals and match results in association football. International Journal of Forecasting, 33 34. Kostuk K., Willoughby K. and Saedt A. () Modelling curling as a markov process. European Journal of Operational Research 33, 557 565. McHale I. () Assessing the fairness of the golf handicapping system in the uk. Journal of sports sciences 8, 33 4. Reilly T. and Williams A. (3) Science and soccer. Scarf P. and Shi X. (5) Modelling match outcomes and decision support for setting a final innings target in test cricket. IMA Journal of Management Mathematics 6, 6. Scheid F. (979) Golf competition between individuals. Winter Simulation Conference: Proceedings of the th conference on Winter simulation- Volume : San Diego, CA, United States, 55 5. Sokol J. (4) An intuitive markov chain lesson from baseball. Informs Transactions on Education 5, 47 55.