Published as: Journal of Applied Statistics, Vol. 26, No. 4, 1999, 461-468. The final set in a tennis match: four years at Wimbledon 1 Jan R. Magnus, CentER, Tilburg University, the Netherlands and Franc J.G.M. Klaassen, Department of Econometrics, Tilburg University, the Netherlands SUMMARY We consider the final (deciding) set in a tennis match. We examine whether it is true that the chances for both players to win the match are equal at the beginning of the final set, even though they were not equal at the beginning of the match. We also test whether it is easier for a non-seeded man to beat a seeded player than for a non-seeded woman and whether male players are more equal in quality than female players. Does the service dominance decrease in long matches, and does winning the pre-final set provide an advantage in the final set? We use almost 90,000 points at Wimbledon to test all five hypotheses. 1 Introduction The final set - the 5th in the men's singles (at a Grand Slam event) and the 3rd in the ladies' singles - decides a tennis match. Such a deciding set occurs in about one fourth of all matches at Wimbledon. Tension is high and mistakes can be very costly. There are a number of interesting questions relating to the final set, some of which we shall investigate in this paper. For example, suppose a seed plays against a non-seed. (At Wimbledon 16 players out of 128 are seeded in order to prevent the situation that top players meet too early in the tournament.) You wish to forecast the winner. At the beginning of the match, if no further information is available, the relative frequency at Wimbledon that the non-seed wins is small, about 10-15%. What is the probability at the beginning of the final set? Is it now close to 50%? Also, is it more difficult for non-seeded women to beat a seed than for men, and are men more equal in quality than women? We also examine whether players get tired. That is, does the dominance of the service decrease in long matches (that is, in the final set)? And finally - something that commentators say - is it 1 Correspondence: Jan R. Magnus, CentER, Tilburg University, P.O. Box 90153, 5000 LE Tilburg, the Netherlands. E-mail: magnus@kub.nl 1
true that in the final set the player who has won the previous set has the advantage? Many other hypotheses are considered in Magnus and Klaassen (1998a)-(1998d). In the statistical literature on tennis so far, people have analysed some other interesting questions. If we assume that two fixed probabilities govern the match (the probability of winning a point on service for both players), then we can calculate the probability of winning a game, set, tiebreak or match. Of the many papers in this category we mention Hsi and Burych (1971), Kemeny and Snell (1976), and Pollard (1983). The assumption that the probability of winning a point on service is constant during the match is questionable. After all, an often-heard hypothesis states that at the beginning of the final set the chances to win are equal for both players, even though at the beginning of the match one player was the favourite. The reason why the assumption of fixed probabilities has not yet been tested is that the literature on tennis is hampered by an almost complete lack of data. (In Klaassen and Magnus (1998) we analyse the assumption of fixed probabilities of winning a point on service directly. We reject it and present a model for the evolvement of the probability during the match.) Most papers are theoretical and contain no data at all. If the authors use data, they are usually either point-to-point data of one match, or based on several end-of-match results (6-4/6-3/6-3, say). Fortunately, we have a large data set. It consists of 88,883 points distributed over 481 Wimbledon matches. A second category of theoretical papers concerns the tennis scoring system and its impact on the probability of winning a match. See Maisel (1966), Miles (1984), Riddle (1988, 1989) and the comments by Jackson (1989). In this paper we analyse the effect of the difference in the scoring systems used in the men s and the ladies singles: men have to win three sets, while women have to win only two sets to win the match. This difference in itself implies that it is more difficult for non-seeded men to beat a seed than for non-seeded women. But maybe men are more equal in quality than women, which in itself would lead to more upsets in the men s singles. We use our large Wimbledon data set to show which effect actually dominates. In section 2 we briefly describe the Wimbledon data. In section 3 we examine whether there are special advantages for seeds. Do seeds often win the final set if they play against a non-seed? Is it more difficult for a woman non-seed to beat a seed than for a man non-seed, as is often argued? Are men more equal in quality than women? Section 4 is about service dominance in long matches: is it lower in the final set? The effect of the pre-final set on the final set will be investigated in section 5. Section 6 concludes. 2 The data Our data set consists of 481 matches played in the men's singles (MS) and ladies' singles (LS) 2
championships at Wimbledon from 1992 to 1995. For each of these matches we know the exact sequence of points. We also know at each point whether the first or the second service was in and whether the point was decided through an ace or a double fault. Table 1 provides a summary of the data. TABLE 1 We have slightly more matches for men than for women, but of course many more sets, games and points for the men s singles than for the ladies singles, since the men play for three sets won and the women for two. The men play less points per game than the women, because the dominance of their service is greater; see Magnus and Klaassen (1998a) for empirical evidence. But the women play less games per set on average (scores like 6-0 and 6-1 are more common in the ladies singles than in the men s singles), because the difference between seeded and nonseeded players is much greater, as we will show below. However, the larger gap between seeded and non-seeded women does not result in less final sets in a match compared to the men. In 30% of the matches in our data set women have to play a final set; the men in 20% of the matches. This is because the ladies play for the best of three sets, whereas the men play for the best of five sets. All matches in our data set are played on one of the five show courts : Centre Court and Courts 1, 2, 13 and 14. As a result we have in our data set almost one half of all singles matches played during these four years of our sample period. Usually matches involving top-players are scheduled on the show courts and this causes an underrepresentation in the data set of matches with non-seeded players. In order to account for this selection problem, we weigh the matches when computing statistics; see Magnus and Klaassen (1998b) for further details. To avoid too much averaging, we shall usually distinguish between the 16 seeds and the 112 non-seeds. 3 Advantage for the seeded player? Suppose a seed plays against a non-seed. At the beginning of the match, if no further information is available, the probability that the non-seed will win is small: 13.1% (2.7%) in the men s singles and 10.5% (2.5%) in the ladies singles. (Standard errors are in parentheses.) What is the probability at the beginning of the final set? Some people claim: Hypothesis 1: At the beginning of a final set both players have equal chance to win. This hypothesis is false. Naturally, the probabilities have increased, but only to 28.7% (8.7%) for the men and 17.1% (6.3%) for the ladies. Both are significantly different from 50%. (In this paper significant means that the estimate is more than 2 standard errors away from its target. At 2-2 in sets in the men's singles, it is therefore certainly not true that the chances are now even 3
between the seed and the non-seed. The seed is still very much the favourite. This is even clearer in the ladies singles. At 1-1 in sets, the seeded player still has a probability of 82.9% (6.3%) to win the match! (Of course, strictly speaking we can only test hypotheses for the Wimbledon tournament. However, we believe that most of our conclusions, at least qualitatively, also apply to other professional tournaments, in particular non-clay tournaments.) At first sight, the estimated probabilities of winning the final set, 28.7% for the men and 17.1% for the women, also seem to indicate that it is more difficult for a female non-seed to beat a seed than for a male, as is often claimed. However, one should keep in mind that the male non-seed has to win two sets to arrive at the final set, whereas the woman has to win only one set. So, when arriving at the final set, the quality difference between the non-seed and the seed is generally smaller for the men than for the women. This in itself will already result in a higher probability of winning the final set for non-seeded men than for non-seeded women. Hypothesis 2: Upsets are more common in the men s singles than in the ladies singles. An upset occurs when a non-seeded player beats a seeded player. In the ladies singles the probability of an upset is 10.5% (2.5%), which is not significantly different from the 13.1% (2.7%) of the men. Therefore, we have no evidence for a difference between men and women regarding the difficulty of achieving an upset. However, as we will explain below, this does not mean that there is no difference between men and women regarding the strength of seeds compared to non-seeds. It is often argued that Hypothesis 3: Anyone in the top one-hundred for the men can beat the number one, but this is not true for the women. Of course, formally speaking this is nonsense, since any woman can beat the number one with non-zero probability. What is meant is that the men are more equal in quality than the women. Our results show that this is indeed the case. The probability of winning a set for a non-seed against a seed is 24.9% (1.8%) for the men and 18.0% (2.1%) for the women, which is significantly lower. The results concerning hypotheses 2 and 3 seem to be contradictory. On the one hand, upsets are equally likely in the men s singles as in the ladies singles. On the other hand, non-seeded men have a much higher probability of winning a set against a seed than non-seeded women. These results can be reconciled by remembering that men have to win three sets, whereas women have to win only two sets. Therefore, even though non-seeded men win one set more easily than nonseeded women, they have about equal difficulty in beating a seed. 4
4 Does the dominance of the service decrease? Tennis matches can last for two or three hours or longer. The players get tired and therefore, possibly, their service becomes less powerful. But is this true? Hypothesis 4: In long matches the dominance of the service decreases. We will examine this hypothesis by comparing the service dominance in the final set with that in earlier sets. There are, at least, two effects that may result in less service dominance in the final set. First, the server can indeed get tired in the final set. Secondly, the receiver gets increasingly better acquainted with his/her opponent s service during the match. To distinguish between both effects, we will estimate a simple logit model; see e.g. McFadden (1984). The first effect is represented by a dummy variable which is one in the final set and zero otherwise. The second effect, the learning effect, can be captured by the number of the set the players are in. For example, in the first set the receiver has received less services to learn his/her opponent s service than in the second set. This leads to the following specification of the probability of winning a point on service: Pr(point won on service) = ( 0 + 1 # dummy final set + 2 # set number ), (1) where is the logistic distribution function, (x)=exp(x)/(1+exp(x)). Hypothesis 4 implies a negative value for 1, whereas the learning effect makes 2 negative, as well. Estimating this model using all matches, however, results in a downward bias of the learning effect. Set numbers higher than three for the men and two for the women not only indicate more learning, but also imply that players are more equal. After all, matches between very unequal players have already stopped after three (MS) and two (LS) sets. If players are more equal, this results in lower probabilities of winning a point on service. So, in total, high set numbers imply a negative value for 2 in (1), even if there is no learning effect at all. To solve this bias problem one could estimate the parameters in (1) separately for 3,4 and 5 sets matches (MS) and for 2 and 3 sets matches (LS). The set number then no longer contains information about the relative strength of players. To increase estimation efficiency, however, we impose that the learning effect, 2, is the same for all types of matches and pool all matches together. We thus have the following specification: Pr(point won on service) ( 3 0 #d 3 4 0 #d 4 5 0 #d 5 1 #dummy final set 2 #set number ) (2) for the men, where d 3 is a dummy which is one in case of a three-set match and zero otherwise. The dummies d 4 and d 5 represent the four- and five-set matches. Formula (2) shows that each 5
4 type of match has its own intercept 03, 0 or 05, which is the correction for the bias problem 2 discussed above. For the ladies singles we have only dummies d 2 and d 3 and intercepts 0 and 3. Table 2 presents the maximum likelihood estimation results for all parameters in (2). Standard 0 errors are in brackets and a * denotes significance. TABLE 2 We see that the probability of winning a point on service is not lower in the final set, once we have corrected for the learning effect. For the women the probability is even higher, although not significantly. The learning effect is negative for both men and women, but again insignificant. So, there is no decrease in service dominance during the match. Hypothesis 4 is false. 5 The final and pre-final set We complete our discussion with an analysis about the relation between the final and pre-final set. More specifically, take the following idée reçue. Hypothesis 5: In the final set the player who has won the previous set has the advantage. We investigate this hypothesis by looking at probabilities of winning the final set after winning the pre-final set. These are presented in Table 3 for all matches taken together (total) and for different types of matches. Sd-NSd indicates a match of a seeded (Sd) against a non-seeded (NSd) player, where the first player (Sd) won the pre-final set. Sd-Sd, NSd-Sd and NSd-NSd are similarly defined. As the number of observations is quite small (51 final sets in the men s singles, 57 in the ladies singles in our data set), the standard errors are quite large. TABLE 3 In the men s singles the probability that the same player wins the fourth and fifth sets is estimated to be 50.2% (7.0%). In the ladies singles the estimated probability that the same player wins the second and third sets is 61.2% (6.5%). These percentages are not significantly different from 50% and hence there is no ground for believing hypothesis 5. If we look at the sub-categories, then we see that, when two seeds play against each other, the winner of the 4th (2nd) set will probably lose the match, especially in the men s singles. When a non-seeded women plays against a seed winning the pre-final set is also a disadvantage, as her probability of winning the match is only 14.8% (14.5%). Both results are significant and indicate that, if there is correlation between the final and pre-final sets, this is more likely to be negative than positive. 6
6 Conclusion In this paper we have looked at the final set in a tennis match. At Wimbledon, the final set is the fifth in the men s singles and the third set in the ladies singles. Is it true that both players have the same chance of winning the match at the beginning of the final set? This is not the case: the seeded player is still very much the favourite against a non-seed, even though the non-seed has performed unexpectedly well up to the final set. The next two hypotheses concern the difference between men s tennis and ladies tennis. In contrast to what many people believe, it is not more difficult for a non-seeded women than for a non-seeded man to beat a seeded player. It is true, however, that male players are more equal in quality than female players. These results seem contradictory, but one should keep in mind that non-seeded men have to win three sets, whereas non-seeded women have to win only two sets. Therefore, even though non-seeded men win one set more easily, they still find it difficult to beat a seed. Our fourth hypothesis states that the service dominance decreases in long matches. Servers can get tired and receivers learn more about their opponent s service and hence score more points. It appears that winning a point on service is not more difficult in the final set than in other sets, so hypothesis four is false. Finally, we examine whether winning the pre-final set provides an advantage in the final set. Again there is no ground for believing this hypothesis. Acknowledgements: We thank IBM UK and The All England Club at Wimbledon for their kindness in providing the data. We also thank Arthur van Soest and Martin Dufwenberg for useful comments. 7
References Hsi, B.P. and D.M. Burych (1971) Games of two players, Applied Statistics, 20, pp. 86-92. Jackson, D.A. (1989) Letter to the Editor on Probability models for tennis scoring systems by L.H. Riddle, Applied Statistics, 38, pp. 377-378. Kemeny, J.G. and J.L. Snell (1976) Finite Markov Chains, New York: Springer Verlag, in particular pp. 161-167. Klaassen, F.J.G.M. and J.R. Magnus (1998) On the independence and identical distribution of points in tennis, CentER, Tilburg University, submitted for publication. Magnus, J.R. and F.J.G.M. Klaassen (1998a) On the advantage of serving first in a tennis set: four years at Wimbledon, The Statistician (Journal of the Royal Statistical Society, Series D), to appear. Magnus, J.R. and F.J.G.M. Klaassen (1998b) The effect of new balls in tennis: four years at Wimbledon, The Statistician (Journal of the Royal Statistical Society, Series D), to appear. Magnus, J.R. and F.J.G.M. Klaassen (1998c) On the existence of big points in tennis: four years at Wimbledon, mimeo, CentER, Tilburg University. Magnus, J.R. and F.J.G.M. Klaassen (1998d) The importance of breaks in tennis: four years at Wimbledon, mimeo, CentER, Tilburg University. Maisel, H. (1966) Best k of 2k-1 comparisons, Journal of the American Statistical Association, 61, pp. 329-344. Miles, R.E. (1984) Symmetric sequential analysis: the efficiencies of sports scoring systems (with particular reference to those of tennis), Journal of the Royal Statistical Society, Series B, 46, pp. 93-108. McFadden, D. (1984) Econometric analysis of qualitative choice models, in: Z. Griliches and M.D. Intriligator (eds), Handbook of Econometrics, Vol. II, Chapter 24, Amsterdam: North-Holland Publishing Company. Pollard, G.H. (1983) An analysis of classical and tie-breaker tennis, Australian Journal of Statistics, 25, pp. 496-505. Riddle, L.H. (1988) Probability models for tennis scoring systems, Applied Statistics, 37, pp. 63-75. (Corrigendum in Applied Statistics, 37, p. 490.) Riddle, L.H. (1989) Reply to D.A. Jackson, Applied Statistics, 38, pp. 378-379. 8
Number of... MS LS Matches Non-final sets Final sets Games Tiebreaks Points Sets in match Final sets in match Games in non-final set Games in final set Tiebreaks in non-final set Points in match Points in game Points in tiebreak 258 899 51 9,367 177 59,466 3.7 0.2 9.8 11.1 0.2 230.5 6.1 12.1 223 446 57 4,486 37 29,417 2.3 0.3 8.9 9.2 0.1 131.9 6.5 11.8 Table 1 - Number of matches, sets, games, tiebreaks and points in the data set. dummy 3/2 sets match dummy 4/3 sets match dummy 5 sets match dummy final set set number MS 0.589* (0.022) 0.662* (0.027) 0.618* (0.030) -0.032 (0.048) -0.010 (0.010) LS 0.326* (0.041) 0.245* (0.044) - 0.111 (0.057) -0.039 (0.025) Table 2 - Probability of winning a point on service: logit estimation results. Sd-Sd Sd-NSd NSd-Sd NSd-NSd Total MS 10.7 (10.3) 40.8 (11.3) 76.4 (15.0) 57.6 (12.8) 50.2 (7.0) LS 33.7 (15.8) 62.3 (8.9) 14.8 (14.5) 68.5 (13.4) 61.2 (6.5) Table 3 - Estimated probabilities of winning 5th (3rd) set after winning 4th (2nd) set in men s singles (ladies singles). 9