King s College London, Division of Imaging Sciences & Biomedical Engineering, London,

Similar documents
The First 25 Years of the Premier League

Home Team Advantage in English Premier League

UNIT 3 Graphs Activities

YouGov Survey Results

Results. North. South. 1-0 West Ham United 2-3 Aston Villa

Premier League 2 and Professional Development League Bulletin. No. 3

Results. North. South. Wolverhampton Wanderers. 3-2 West Bromwich Albion. 5-0 Leicester City. 1-1 Southampton

SPORTS TEAM QUALITY AS A CONTEXT FOR UNDERSTANDING VARIABILITY

Premier League 2 and Professional Development League. No. 30

Premier League 2 and Professional Development League. No. 3

North Season 2016/ /09/2016

Football fan locality- An analysis of football fans tweet locations

U18 Bulletin. No. 12

SCANCOMING UK FIXTURE LIST

PLEASE NOTE MATCHES HIGHLIGHTED IN BLUE WHICH ARE SUBJECT TO CHANGE RESULTING FROM FA CUP, CHAMPIONS LEAGUE or EUROPA LEAGUE PARTICIPATION

Premier League 2 and Professional Development League Bulletin. No. 10

SCANCOMING UK FIXTURE LIST

U18 Premier League and U18 Professional Development League. No. 3

Premier League 2 and PDL Bulletin. No. 2

Reboot Annual Review of Football Finance Sports Business Group June 2016

WELCOME TO FORM LAB MAX

SCANCOMING UK FOOTBALL LIST

Premier League 2 and Professional Development League. No. 26

The impact of human capital accounting on the efficiency of English professional football clubs

HOW TO READ FORM LAB S GAME NOTES

Results. North. South. 2-2 Newcastle United. 6-1 West Ham United 2-0 Southampton

Premier League - Matchround 4 (29-30 August 2015)

How close to the trapdoor? Measuring the vulnerability of managers in the English Premiership

Premier League 2 and PDL Bulletin. No. 8

Premier League 2 and PDL Bulletin. No. 4

Commemorative Books Coverage List

STATISTICS ON FOOTBALL-RELATED ARRESTS & BANNING ORDERS SEASON Home Office 23 October 2004

Premier League 2 and PDL Bulletin. No. 30

U18 Bulletin. No. 31

Premier League 2 and PDL Bulletin. No. 13

Commemorative Books Coverage List

U18 Bulletin. No. 32

BRIGHTON & HOVE ALBION FOOTBALL CLUB SPONSORSHIP OPPORTUNITIES 2018/19.

Premier League 2 and PDL Bulletin. No. 6

Football clubs in the Premier League. Are they doing what they should for disabled people? Easy Read

Premier League 2 and PDL Bulletin. No. 16

Premier League 2 and PDL Bulletin. No. 35

Premier League 2 and PDL Bulletin. No. 21

The Football Association Challenge

Premier League 2 and PDL Bulletin. No. 29

Commemorative Books Coverage List

Premier League 2 and PDL Bulletin. No. 26

Spurs finally top Premier League at least in terms of house price growth

Commemorative Books Coverage List

The Premier League housing boom

U18 Bulletin. No. 21

Commemorative Books Coverage List

Premier League 2 and PDL Bulletin. No. 32

Premier League - Matchround 27 (27-28 February 2016)

Commemorative Books Coverage List

Premier League 2 and PDL Bulletin. No. 33

Extreme Shooters in the NBA

Commemorative Books Coverage List

Premier League 2 and PDL Bulletin. No. 28

Premier League 2 and PDL Bulletin. No. 25

languagecaster.com English through football the world s most popular game

Connecting with communities 2015/16

Covered. Football Grounds on the covers of programmes, handbooks and other club publications

Premier League 2 and PDL Bulletin. No. 23

Premier League 2 and PDL Bulletin. No. 34

Chelsea takes the title before the first ball of the Premier League is even kicked

premier league COMMUNITIES 2013/14

Premier League 2 and PDL Bulletin. No. 14

Commemorative Books Coverage List

Commemorative Books Coverage List

BET AGAINST THE MASSES 219

Reshaping data in R. Duncan Golicher. December 9, 2008

Taking Your Class for a Walk, Randomly

arxiv: v1 [math.pr] 31 Mar 2014

Legendre et al Appendices and Supplements, p. 1

Looking at Spacings to Assess Streakiness

Chapter 5: Methods and Philosophy of Statistical Process Control

Section 5.1 Randomness, Probability, and Simulation

Year Football Related Of the 126 football related incidents there were 4 reports of damage:

The Giant Gambling Loophole

Football is one of the biggest betting scenes in the world. This guide will teach you the basics of betting on football.

The Effects of Managerial Changes in English Professional Soccer,

The Premier League is back!

5.1A Introduction, The Idea of Probability, Myths about Randomness

NCSS Statistical Software

Central Attacking Midfielders / Deep-Lying Playmakers. 50. Andy King 29/10/1988 Wales 39 caps. 49. Gordon Cowans 27/10/1958 England 10 caps

RUGBY UNION STADIUMS - ENGLAND

Chapter 5 - Probability Section 1: Randomness, Probability, and Simulation

CIES Football Observatory Monthly Report n 37 - September Financial analysis of the transfer market in the big-5 leagues ( )

Commemorative Books Coverage List

Football Season by MC TRAVEL

POSTAL AUCTION ISSUE 90 Closing Date: THUR 7 TH DECEMBER 2017

Predictive Analysis of Football Matches using In-play Data

News English.com Ready-to-use ESL / EFL Lessons

Manchester City Vs Liverpool. Monday 25th August Get full match stats every week here

UEFA CHAMPIONS LEAGUE /15 SEASON MATCH PRESS KITS

An exclusive report from Across The Leagues.

Revisiting the Hot Hand Theory with Free Throw Data in a Multivariate Framework

An Empirical Comparison of Regression Analysis Strategies with Discrete Ordinal Variables

Center for Economic Research. No SHARE PRICE REACTIONS TO SPORTY PERFORMANCES OF SOCCER CLUBS LISTED ON THE LONDON STOCK EXCHANGE AND THE AIM

Transcription:

1 A time to win? Patterns of outcomes for football teams in the English Premier League Nigel Smeeton, MSc, Honorary Lecturer King s College London, Division of Imaging Sciences & Biomedical Engineering, London, United Kingdom Email: nigel.smeeton@kcl.ac.uk Summary It is a widely held view that football teams can have periods of good, poor or indifferent performance characterised by a preponderance of wins, defeats or draws respectively. The multiple runs test can be used to detect clustering in a sequence of events in which several outcomes are possible. The development of the study of runs is briefly outlined. A method for simulating multiple runs distributions using the statistical package Stata is applied to the results of the English Premier League for the 2002-2003 season. For these data there is little evidence for clustering of outcomes. NOTE This paper reports on an analysis performed by the author in 2003. Following unsuccessful submission for publication to a refereed magazine (the analysis was considered to be too simplistic) it has remained filed away and was recently rediscovered by the author when sorting out old documents. I believe that these findings may now be of historic interest to those who follow British football.

1 Introduction 2 The English Premier League involves the top twenty football clubs; during a season the different pairings of clubs play against each other twice, once at each ground. Clubs are ranked based on points gained by wins and draws (see Smeeton (2003) for a more detailed description). On the basis of the clubs end-of-season positions, the most successful teams can participate in European competitions during the following season whereas the bottom three clubs are relegated. It is not unknown for a football club to perform poorly for most of a season but be saved from relegation by a run of wins close to the end. In a similar manner, a team that is consistently at the top of the table can be overtaken because of a series of poor results towards the finish. Inspection of sequences of results can give the impression that some teams experience good or bad patches; Smeeton (2003) showed evidence of clustering in ten consecutive match results for Leeds United. However, as far as I am aware an investigation of the results from Premier League teams obtained over a complete season has not as yet been conducted. 2 Statistical background For a series of events where several outcomes are possible, a run is defined as a sequence of one or more observations of the same type. Under the null hypothesis that the different outcomes occur randomly, a relatively small value for the total number of runs indicates possible clustering whereas an unexpectedly large value suggests alternation between outcomes of different types. If it is clustering that is under investigation, the one-tailed probability of the number of runs being no more than a stated low number is of relevance. For data that have already been collected it is the conditional distribution of the number of runs that is considered (Schuster and Gu, 1997).

3 Interest in patterns of runs, often based around outcomes in games of gambling such as roulette, blossomed during the latter part of the nineteenth century in both the popular and scientific literature. Thomas Hardy s novel A Laodicean, originally published in 1881, relates the advice given to a young man who loses significant sums of money at the roulette tables of Monte Carlo: These runs of luck will be your ruin, as I have told you before You will be for repeating and repeating your experiments and will end by blowing your brains out. Whitworth (1886) gives the number of ways in which m indifferent (i.e. indistinguishable) black balls and n indifferent white balls can be arranged in a row for a particular number of black/ white contacts, although he evidently felt that the proof was straightforward as this is left as an exercise for the reader! Karl Pearson (1897) thought along similar lines in his statement the theory of runs is a very simple one. Only in 1940 did Wald and Wolfowitz demonstrate how the two-sample runs distribution could be obtained by summing individual probabilities. For three or more outcomes, probabilities relating to the number of runs were derived by Mood (1940). No illustrative example is given in what from an algebraic point of view is a very difficult paper. Barton and David (1957) extended Mood s work and applied their theory to falls in share prices on the London Stock Exchange. Their article contains tables for the distribution of the number of runs for samples of up to 12 observations both for 3 and 4 different outcomes. Shaughnessy (1981) tested for randomness in time-ordered residuals from regression analyses by applying multiple runs distributions. The tables of critical values in the paper are mainly for groups of similar size and so are of limited application. Schuster

4 and Gu (1997) developed algorithms for calculating exact distributions for the number of runs for multiple outcomes using the software system Mathematica. A recent comprehensive text by Balakrishnan and Koutras (2002) explores the theoretical aspects of a wide range of run problems. 3 Simulation of the multiple run distributions Suppose that in a complete season a club plays n (=38) fixtures with n 1 wins, n 2 draws and n 3 defeats. Using a local macro developed within the statistical package Stata (StataCorp, 2001) a dataset of size n is created, which contains one observation for each event, the three outcomes win, draw and defeat being indicated by different letters in the ratio n 1 :n 2 :n 3 (Smeeton and Cox, 2003). The data are then randomly shuffled repeatedly, and the number of runs after each permutation is calculated to generate the conditional distribution. Probabilities in the lower tail of the distribution indicate the degree of clustering in the observed sample. Consecutive Premier League match results for 2002-2003 were obtained from the Internet site football.guardian.co.uk (accessed 19 th May 2003). A runs distribution was simulated for each of the 20 clubs using the appropriate numbers of wins, draws and defeats in each case. One million permutations were used for each distribution, giving a confidence interval of approximately ± 0.0005 for a tail probability of 0.05. 4 Results Table 1 shows the league positions of the clubs at the end of the season, the numbers of wins, draws and defeats and the number of runs over the season. The one-tailed probability under a random pattern of at most the observed number of runs is also given. For almost all of the teams there is no evidence of clustering. However, Liverpool showed a strong pattern of

5 clustered results and there was a weaker tendency towards clustering for Manchester United. Both teams attracted the attention of the media for their inconsistent performances. Two other teams were frequently in the news and merit mention. West Ham United produced mediocre results for most of the season but turned themselves around during the last few weeks; despite their valiant efforts they just failed to avoid relegation. Sunderland, after a reasonable start to the season had a disastrous run of defeats. West Ham and Sunderland show no real tendency for clustering, however, when the season is taken as a whole. Table 1 here 5 Discussion The lack of evidence for clustering might come as a surprise to those who pay close attention to the game. However, even with a completely random sequence of events, runs of outcomes of the same type can occur by chance and in a league of 20 clubs followed over 38 matches some observed clustering is inevitable. Lowest p-values of 0.009 and 0.056 for particular teams seem perfectly reasonable. Although tempting to brush aside the apparent clustering of the results for Liverpool and Manchester United, they do deserve closer inspection. Liverpool had a run of seven wins in the early part of the season, putting them at the top of the table, but towards the end of 2002 there was a run of four defeats followed by three draws. Not many football supporters would be convinced that chance is the true explanation; factors such as the injury or suspension of a key player should be considered. Some closely involved with football believe that once a few poor results have been obtained a team can enter a period of low morale during which winning is difficult even when playing supposedly weaker teams. For the 2002-2003 season,

6 Leeds United was a possible example. Similarly, the development of a high level of confidence might explain the long runs of wins sometimes experienced by the most successful teams. The evidence of clustering with Manchester United is more difficult to explain. A quick scan of the results for the whole season does not give a particularly strong impression of clustering but closer inspection shows that of the 13 non-win outcomes, ten occurred as pairs of the same type (e.g. DD or LL). Put another way, Manchester United were generally consistent winners during 2002-2003 but if a less favourable result was obtained there was not an immediate return to winning form, possibly indicating that even the most successful teams can experience a temporary loss in confidence. Overall, it is unclear as to whether the findings for these two clubs represent genuine shortterm shifts in their levels of performance. Inspection of the results for following seasons might help to clarify the truth; the chance repetition of a sequence of clustered events by the same team would be highly unlikely. As it stands, the jury on clustering is still out! Acknowledgements I am grateful for the helpful comments on this manuscript received from Obi Ukoumunne.

References 7 Balakrishnan, N. and Koutras, M.V. (2002) Runs and Scans with Applications. New York: John Wiley & Sons. Barton, D.E. & David, F.N. (1957) Multiple runs. Biometrika, 44, 168-178. Hardy, T. (1912) A Laodicean: a Story of Today. London: Macmillan. Mood, A.M. (1940) The distribution theory of runs. Annals of Mathematical Statistics, 11, 367 392. Pearson, K. (1897) The Chances of Death and Other Studies in Evolution. Vol. 1. London: Edward Arnold. Schuster, E.F. & Xiangjun, G. (1997) On the conditional and unconditional distributions of the number of runs in a sample from a multisymbol alphabet. Communications in Statistics: Simulation and Computation, 26, 423-442. Shaughnessy, P.W. (1981) Multiple runs distributions: recurrences and critical values. Journal of the American Statistical Association, 76, 732-736. Smeeton, N. (2003) Do football teams have clusters of wins, draws and defeats? Teaching Statistics, 25, 90-92. Smeeton, N. and Cox, N. (2003) Do-it-yourself shuffling and the number of runs under randomness for a sample consisting of several categories. Stata Journal, 3, 270-277. StataCorp (2001) Stata Statistical Software: Release 7.0. College Station, TX: Stata Corporation. Wald, A. and Wolfowitz, J. (1940) On a test whether two samples are from the same population. Annals of Mathematical Statistics, 11, 147-162. Whitworth, W. A. 1886. Choice and Chance, 4 th edn. Cambridge: Deighton Bell and Co.

8 Table 1. Runs in football match outcomes for the English Premier League: 2002-2003 season Team Results (W, D, L) Number of runs P-value (one-tailed) Manchester United Arsenal Newcastle United Chelsea Liverpool Blackburn Rovers Everton Southampton Manchester City Tottenham Hotspur Middlesbrough Charlton Athletic Birmingham City Fulham Leeds United Aston Villa Bolton Wanderers West Ham United West Bromwich Albion Sunderland 25, 8, 5 23, 9, 6 21, 6, 11 19, 10, 9 18, 10, 10 16, 12, 10 17, 8, 13 13, 13, 12 15, 6, 17 14, 8, 16 13, 10, 15 14, 7, 17 13, 9, 16 13, 9, 16 14, 5, 19 12, 9, 17 10, 14, 14 10, 12, 16 6, 8, 24 4, 7, 27 16 25 25 26 18 26 22 25 27 27 29 22 27 28 24 28 27 23 19 16 0.056 0.928 0.802 0.739 0.009 0.585 0.160 0.379 0.858 0.769 0.896 0.188 0.738 0.844 0.612 0.864 0.691 0.199 0.230 0.213