Using Big Data to Model Football Edward Griffiths Hong Kong Jockey Club 29 April 2015
Outline of Presentation 1) Hong Kong Jockey Club Introduction 2) Why Big Data? 3) Case Study - The Football Betting Challenge 4) Case Study - Data Sources 5) Case Study - Using the Information 6) Case Study - Payoff 7) Conclusions 2
1) The Hong Kong Jockey Club One of the world s most successful horse racing and sports betting operators, with annual turnover of 175 bn Hong Kong Dollars Regulated Horse Racing betting has existed in Hong Kong for 130 years, whilst Regulated Football betting was added 12 years ago The world s most profitable Football Bookmaker We advise the China Sports Lottery Administration Centre and, previously, the Taiwan Sport Lottery Administration Centre on fixed odds betting products 3
1) The Hong Kong Jockey Club Afternoon Session Evening Session Morning Session Offering 70+ leagues / cups / tournaments = 9,000+ matches per season J.League / A-League European Leagues South American Leagues & Major League Soccer Afternoon Session Evening Session Morning Session 4
2) Why use Big Data in Betting? Big data can unlock significant value for both companies and customers and significantly improve customer satisfaction 1) Big data allows even narrower segmentation and much more precisely tailored and more profitable products and services 2) Sophisticated analytics can improve decision-making, minimise risks and unearth valuable insights. 3) Big data can be used to develop the next generation of products and services 5
2) Why use Big Data in Betting? Why did the HKJC decide to take a Big Data approach to Football Betting? 1) From Best in Class to Future-proof 2) Unlock customer value and improve decision-making 3) Increase turnover and profitability from a very high base 4) Develop the next generation of products (in a regulated environment) 5) Manage risk 6
3) Case Study - The Football Betting Challenge THE CHALLENGE FOR THE HKJC TRADING SOLUTIONS TEAM 1) Creating THE MOST SOPHISTICATED AND PERFORMING FOOTBALL TRADING ALGORITHMS for any football bookmaker in the world for both: Real time ANTE-POST BETTING IN-PLAY BETTING Fast & Furious 2) Develop a portfolio of new cutting-edge products for customers and the trading team that are best-in-class in terms of profitability 3) Provide the Trading Team with sophisticated decision-making and risk management tools. 7
3) Case Study - The Football Betting Challenge FROM DATA TO KNOWLEDGE INFORMED DECISIONS Knowledge ALGORITHMS HUMANS Information (Organised, systematised) Data explosion (terabytes) information overload = extracting meaningful information leading to actionable knowledge is even more difficult. 8
3) Case Study - The Football Betting Challenge THE RIGHT STRUCTURE TOP MANAGEMENT TRADING SOLUTIONS FOOTBALL TRADING THE RIGHT INFRASTRUCTURE IT DATA SYSTEMS SOFTWARE HARDWARE INVESTMENT 9
4) Case Study - Data Sources External Data Sources: Market data across major Leagues since 2007 (75,000 matches) Data on goals, red cards and corners for 13,000+ matches (140,000 corners) Goal data minute by minute from 2007 (6.3m playing minutes analysed) 10
4) Case Study - Data Sources: Reason for a Database 1) High Data Integrity Database schema eliminates any invalid data 2) Quick responses from consolidated data Compress thousands of files into few tables 11
4) Case Study - Data Sources: Response Times 1. Analyzing the average time of first goal scored in a league Current Approach 1.Read 380 XML files Database Approach 1.Define a query 2. Parse matches details 3. Check the time for the first goal 4. Group data by teams Estimated Time: 2 minutes 2. Pull and group the data of first goals from database Estimated Time: 2 seconds 12
4) Case Study - Data Sources: F1/F9/F24 XML Feeds o F1: Fixture and Results o F9: Match/Player Stats o F24: Event Feeds F24: Geo-referenced (a possibility to extract more exotic statistics) o Pitch x, y coordinates (qualifier = 140, 141) o Goalmouth y, z coordinates (qualifier = 102, 103) o Shot Locations (zonal qualifier) o Length (in yard) and Angles (in radian) 13
4) Data Sources - Chelsea vs Man Utd (26 Oct 2014) 14
4) Data Sources - Match Statistics I (Sample) HOME AWAY GOALS 1 1 SHOTS ON TARGET 7 4 Match Statistics Generated from F24 SHOTS OFF TARGET 10 3 YELLOW CARDS 3 6 RED CARDS 0 1 FOULS 13 14 OFFSIDES 0 2 HAND BALLS 1 1 POSSESSION % 50.81 49.18 TOTAL PASSES 439 425 ACCURATE PASSES 375 351 CROSSES 20 7 ACCURATE CROSSESS 7 2 15
4) Data Sources - Match Statistics II (Sample) HOME AWAY CORNERS 4 7 CORNERS INTO THE PENALTY AREA CORNERS INTO THE SIX-YARD BOX 4 6 3 4 TACKLES WON 14 16 Customised Statistics by Location and Category TOTAL TACKLES 16 21 BLOCKED PASS 8 6 BLOCKED SCORING ATTEMPTS 2 2 INTERCEPTIONS 12 6 CLEARANCES 25 29 KEEPER SAVES 3 6 GOAL KICKS 7 9 16
4) Data Sources - Match Statistics III (Sample) Aggregated statistics up to 78minutes 17
4) Data Sources - Event Feeds 43:07 Chelsea - Foul by Branislav Ivanovic. 43:07 Manchester United -Adnan Januzaj wins a free kick at the Left. 44:19 Manchester United - Attempt missed! by Juan Mata. 45:00 Second half begins. 49:03 Chelsea - Foul by Cesc Fabregas. 49:03 Manchester United -Juan Mata wins a free kick at the Back. 49:20 Chelsea -Cesc Fabregas is shown a yellow card. 50:15 Chelsea - Corner Conceded by Willian Willian. 51:10 Manchester United - Attempt missed! by Marouane Fellaini. 52:02 Manchester United - Corner Conceded by David de Gea. 52:22 Chelsea - Goal by Didier Drogba. 55:52 Manchester United -Attempt blocked 61:15 Chelsea -Foul by Oscar Oscar. 61:15 Manchester United -Chris Smalling wins a free kick at the Back. 61:26 Chelsea -Oscar Oscar is shown a yellow card. 64:01 Chelsea - Foul by Branislav Ivanovic. 64:01 Manchester United -Angel Di Maria wins a free kick on the Left. 64:28 Chelsea - Branislav Ivanovic is shown a yellow card. 65:19 Chelsea - Corner Conceded by Cesc Fabregas. A more advanced version of event feeds (with locations and more qualifiers) can be generated 18
4) Data Sources - Shot Events This can be provided using spreadsheet and visualisation tools. 19
4) Data Sources - Position Statistics Average Position Manchester United Shaw Positional Summary de Gea Rojo Januzaj Fellaini Blind Mata van Di María Persie Smalling Rafael Average Position Chelsea Ivanovic Willian Drogba Oscar Fàbregas Matic Hazard Cahill Terry Courtois Filipe Luis 20
4) Data Sources - Heat Maps I 21
5) Using the Information - When do Goals Occur? Premier League, 2372 matches Serie A, 2403 matches 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 0.44:0.56 0 20 40 60 80 100 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 0.44:0.56 0 20 40 60 80 100 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 La Liga, 2444 matches 0.44:0.56 0 20 40 60 80 100 22
5) Using the Information - When do Goals Occur? 0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 0.06 0.05 0.04 0.03 Bundesliga, 1871 matches 0.45:0.55 0 20 40 60 80 100 Japan, 1941 matches 0.41:0.59 Spikes observed at 45 & 90 min, into which all injury time condensed; magnitude of spikes: 4% (in Germany) to 7% Discernible spikes observed at 89 & 90 min in Japan presumably due to attacking nature of teams seeking to avoid draws at the death Japan also has the largest disparity in 1 st half / 2 nd half goal percentage split 41:59 generally around 45:55 0.02 0.01 0 0 20 40 60 80 100 Dug deeper into the flat areas to discover the trends and what happens at different states. 23
5) Using the Information - Score Alters Decay Corner Distributions dependent upon scoreline 24
6) Case Study - Payoff Use of Big Data made one of the world s most profitable betting operator even more profitable in Year 1! Football Betting Turnover % + 35% Football Betting Turnover HK$ + $6.3bn Football Betting Margin % + 26% Football Betting Margin HK$ + $700m PAYOFF for investment over 1,000% 25
7) Conclusions Big Data Big Payoff, but execution is key: - CLEAR GOALS AND A CLEAR PLAN - RIGHT TEAM - RIGHT INFRASTRUCTURE - RIGHT LEVEL OF INVESTMENT - TOP MANAGEMENT SPONSORSHIP At HKJC we believe in Big Data and are extending this approach to other areas, including: Customer Intelligence & NPD Horse racing and Risk Management 26
7) Conclusions Impact is absolutely transformational: what I have shown you is just a small example of what Big Data can deliver Whether you have already implemented this approach or are about to do so, let me wish you a very interesting, and most importantly, profitable journey! 27