sport Grand Slams are short changing women s tennis Sports reporters are often quick to dismiss women s tennis as unpredictable when compared to the men s game. But Stephanie Kovalchik finds match format is to blame for inconsistencies between genders Image: Al Bello/Getty Images Sport/Thinkstock Sports fans the world over love a good upset when the underdog upends the competition and triumphs against the favourite. In July, at the Wimbledon Championships in southwest London, tennis delivered one such scenario: Rafael Nadal, a player ranked 10th in the world, lost in the second round to Dustin Brown, who was ranked 10nd. These sorts of upset in men s tennis, when they occur, lead commentators to praise the depth and competitive balance 1 of the game. But when top-ranked female players are unexpectedly defeated, the tone of the coverage shifts. Words such as inconsistent and unpredictable 3 are used, often as euphemisms for inferior play. Tennis World exemplified this with its sceptical outlook for the women s game in the wake of several upsets at the 014 US Open: These upsets, when looked [at] from the perspective of the sport overall, raise a lot of questions about the direction that women s tennis is taking. 4 Commentators have been echoing these sentiments for a number of years. Ironically, some of the severest fault-finding has come from women like writer Caroline Cameron, who summarily concluded in 01 that the women s game hasn t been able to keep up 5 with the incredible talent and competitiveness of the men s game. Past studies have shown that female tennis players receive less press coverage than men, and the coverage they do receive is more likely to be off topic or negative, 6 and discussions about consistency in performance are a case in point. But is it accurate to describe women tennis players as less consistent than men? The verdict from the tennis media has been a resounding yes. Yet, in what follows, I will show that performance statistics in professional tennis strongly argue for the conclusion that inconsistency in tennis is a problem of match format, not gender. Measures of consistency Although consistency is a popular concept in tennis, the sport lacks formal metrics for consistent performance. In fact, the very nature of the sport complicates the measurement of consistency. For time or distance sports, where all performance is judged on a common continuous scale, the standard measure of consistency is variation in performance over time. 7 For 1 015 The Royal Statistical Society
Figure 1. Frequency of upsets in tennis matches, 010 014. A match is considered an upset when a lower-ranked player beats a higher-ranked player combat sports like tennis there is no single equivalent metric. The discrete nature of win-or-lose outcomes and the wide variety of conditions in tennis (each tournament is held in a different place, with its own surface, and unique set of match-ups) make the measurement of consistent performance a challenge. A good metric of consistency should measure the extent to which a player plays to his or her expected level. In this article, I will consider a number of measures that aim to do this and will examine what each has to tell us about differences in the consistency of men and women s performance in singles tennis. The performance data presented here will focus on singles matches (i.e. one player versus one opponent) on the Women s Tennis Association (WTA) tour, and the men s equivalent, the Association of Tennis Professionals (ATP) tour, between 010 and 014 the five most recently completed seasons. The tournaments that make up the season are grouped into different tiers according to their difficulty. The Grand Slams the Australian Open, the French Open, the Wimbledon Championships, and the US Open represent the highest tier for both tours. The next highest tier for the WTA combines the Premier Mandatory and Premier 5 tournaments, which I will refer to as the Premier 5+. For the ATP, the analogous tier is the nine tournaments that make up the Masters 1000. Fewer of the top-ranked players compete in lower-tiered tournaments, so they will not be discussed. Upsets A match win is considered an upset when the lower-ranked player prevails and some upsets are bigger than others. For example, Roger Federer s loss (when ranked 7th) at the 013 Wimbledon Championships to Sergiy Stakhovsky, a player ranked 116th in the world at that time, was more shocking than Federer s loss two years earlier (when ranked 3rd) to 19th ranked Jo-Wilfried Tsonga. As this example suggests, expectations about winning in tennis are typically judged by player seeding or rank. We can therefore measure the degree of an upset by the rank differential between the winning player (R W ) and the losing player (R L ), rank differential = R W R L An upset occurs whenever the difference in rank is positive (as the better player s rank is a lower number), and the larger and more positive the differential, the bigger the upset. Between 010 and 014, the overall frequency of higher-ranked players losing to lower-ranked players (i.e. upsets) was 31.5% for the WTA and 9.6% for the ATP, suggesting that high-ranked female players were slightly more likely to lose to a lower-ranked player than high-ranked male players. However, the frequency of upsets by tournament tier shows that this difference was only true for performance at the Grand Slams though even there, huge upsets with players differing by more than 70 ranking points were similarly rare for both tours (Figure 1). At lower-tier tournaments, the distribution of upsets was nearly identical for men and women. Streaks Consistency can also be measured by consecutive match wins, or streaks. By this standard, Novak Djokovic s 41-match winning streak in 011 was the most impressive display of consistency in recent history. But are males more streaky than females, in general? The characteristics of maximum streaks for the top 100 male and female players between 010 and 014 suggests not. Figure (page 14) shows that the distribution of consecutive wins for both tours lines up almost exactly. Letdowns A letdown is the phenomenon of an early round loss following a big win, which can be regarded as an extreme type of upset such as when Nadal (this time ranked 5th), fresh from his eighth title win at the French Open, lost in the first round of the 013 Wimbledon Championships to Steve Darcis, who was ranked 135th. Defining an early round loss as an exit in the first or second round, the 13
Figure. Distribution of consecutive match wins, or streaks, 010 014 Figure 3. Frequency of letdowns, 010 014. A letdown is an early round loss following a big win likelihood that a finalist or winner of a big tournament between 010 and 014 would exit early was greater for female players than for male players, but only at the Grand Slams (Figure 3). The frequency of letdowns was statistically similar at the other major tournaments of the season. Variation in match win percentage In time and distance sports, where the quality of a performance is measured by a single number on a continuous scale, assessing consistency is straightforward. We simply look at deviations in performance from one competition to the next. For tennis, performance is a binary outcome win or loss. The trouble with binary outcomes for the study of consistency is the basic fact that their variance (which tells us about consistency) is completely determined by their mean (i.e. expected wins). What we would like to know is how the win expectation varies with each match; that is, how far off a player s chance of winning can be from one match to the next. Unfortunately, a win expectation, like the inner thoughts that run through an athlete s mind in the heat of competition, is not something we can observe. However, the strong relationship between rank and win ability suggests a possible solution. If rank is the main factor determining a player s win expectation, we can consider the win frequencies in each season for the players with the same rank as repeated observations of win expectations for players with the same level of ability. For example, we would take the season win percentage across multiple seasons for the number 1 ranked player to determine season-to-season variation in win expectation for the topranked player in the world. Using this approach to measure stability in win expectations for the 010 014 Masters/Premier 5+ tournaments, men and women were found to have a comparable pattern of stability: the highest-ranked players have the most stable (least variable) win expectations and lower-ranked players the least stable (most variable). The overlapping regression lines in Figure 4a not only confirm that the best players distinguish themselves by their greater consistency, but also show that the consistency of win expectations for female and male players of equal rank is statistically identical at the Masters/Premier 5+ tournaments. At the Grand Slams, the gap in the regression lines in Figure 4b indicates that win expectations for a female player are more variable than for a male player of equal rank. In other words, women have been less consistent in the number of matches won than men in recent seasons, but only at the Grand Slams. Up-and-down matches How a player performs during a match can also provide insight into his or her consistency. Lopsided sets are an extreme example of inconsistent play during a match. Consider, for example, Serena Williams s round of 16 match in the 014 China Open. After a lapse in the second set, Williams defeated Lucie Safarova 6 1, 1 6, 6. To capture the up-and-down nature of matches, I introduce the metric of average game spread reversals. A game spread is the difference in games won in a set between the higher-ranked player and the lower-ranked player (e.g. a 6 1 set win would be a spread of 5). A reversal is the difference in game spreads from consecutive sets, and a higher reversal reflects a more up-and-down match. So a player who wins 6 1, 6 1 would have a reversal of 0, indicating perfectly consistent play in each set; whereas a 6 1, 1 6 performance in two sets would have a reversal of 10. The average of reversals is used to correct for differences in the number of sets played (the topsy-turvy Williams Safarova match had an average reversal in game spread of 9.5). 14
Over the past five seasons, the occurrence of lopsided matches was very similar for both tours. Overall, the median game spread reversals for each tour was, and at both the Grand Slams and Masters/ Premier 5+ tournaments, the difference in average game spread reversals between tours was 0.1 games or less. Current top female players appear as likely to have an up-anddown match as male players. (a) (b) Figure 4. Variation in match win probabilities, 010 014: (a) Masters/Premier 5+; (b) Grand Slams Format advantage Of the five measures of consistency considered here, there were no gender differences in performance for two of the measures (streaks and up-and-down matches). For three of the measures (upsets, letdowns and variation in win percentage) differences were found that suggest female players have had less consistent performance in recent years. However, these differences were only observed at the Grand Slams, where female players play a different match format than males: women compete in a best-of-three format, while men play a more taxing best-of-five. Could match format explain the tour differences in consistency observed in recent years? Logic dictates that an underdog will have a harder time winning three sets in a match than two sets. Even without a complicated mathematical argument, we can conclude that the best-of-five format favours higher-ranked players more than a best-of-three. How much of an edge it offers is the trickier question. This is where a mathematical analysis is helpful and, thanks to the hierarchical structure of tennis, also tractable. The most studied and discussed mathematical model in tennis is the IID model. IID stands for independent and identically distributed, which refers to the basic assumption that the probabilities of winning a point on serve or return are treated as constant throughout the match. In other words, point outcomes are treated like the outcomes of coin tosses, and the probability of a success (a point won) on any toss is a constant that is determined by the player s underlying ability against his or her particular opponent. The IID model seems implausible, but it has been shown to be remarkably accurate for describing outcomes in tennis. 8 One of the reasons for the IID model s popularity is that it simplifies much of the mathematics of tennis. Assuming that the 15
Matthew Stockman/Getty Images Sport/Thinkstock Edge % probability of winning a set follows the Figure 5 shows the magnitude of the is neither a toss-up (50%) nor a certainty IID model, it is possible to write down edge for a range of hypothetical probabilities (100%), and the edge falls off more steeply as exact formulae for winning a match under of winning a set. The parabolic shape shows the likelihood of winning a set becomes more a variety of formats. Suppose that p is the that the edge is largest when the highercertain than when it becomes less certain. Figure 5. Edge in win advantage probability that the higher-ranked player ranked player s chance of winning the set When the chance of winning a set is between in a particular match-up wins a set. As was previously shown in this magazine,9 the chance that this player will win a 10 best-of-three match under the IID model is M3 = p (1 + (1 p)). For a best-of-five match it is M5 = p3(1 + 3(1 p) + 6(1 p)). 8 Given a match win probability of M, the gap in a player s chance of winning versus 6 losing, what can be called the player s win advantage, is M 1. We are interested in 4 how match format influences win advantage, all else being equal. The edge that match format provides to the higher-ranked player is the difference in win advantage between a best-of-five match and a best-of-three, and is 0 equal to the following polynomial: edge = (M5 M3 ) = p { p 1 + 3 (1 p) + 6 (1 p) (1 +0.5 (1 p))} 0.6 0.7 0.8 0.9 1.0 edge = (M5 M3 ) = p { p 1 + 3 (1 p) + 6 (1 p) (1 + (1 p))} Probability of Winning a Set 3 (1 p) + 6 (1 p) (1 + (1 p))} Figure 5. Edge in win advantage 16
Match Win Advantage % 60 55 50 45 40 Best of 3 Best of 5 ATP WTA would be the surest way to eliminate this source of disparity and help the women s game to enjoy the benefits of having more of its top players (not just Serena Williams) dominate at the biggest tournaments. Although the tennis world would probably resist such a change, the evidence presented in this article shows that equality in match format would be an important step towards true gender equality in tennis. Acknowledgements All of the data used in this paper are in the public domain and were collected using the author s R package deuce. 40 60 80 100 40 60 80 100 Rank Figure 6. Win advantage of higher-ranked player given observed set win probabilities, 010 014 0.6 and 0.8, this plot suggests that the boost in win advantage for the higher-ranked player playing a best-of-five match is 7 10 percentage points. Although Figure 5 suggests that match format can have a huge impact on win advantage, it is based on theoretical values for the chance of winning a set, which might not reflect their probabilities on the professional tours. To determine the actual edge that match format has given the tours in recent years, I computed the edge using actual set win probabilities for each tour from 010 to 014. The results, shown in Figure 6, show the win advantage by rank, aggregating the match outcomes across seasons for players of the same rank. Compared to women, the men s game has a slightly larger advantage among the highest-ranked players under either match format. However, the tour differences are minuscule in contrast to the advantage provided by the best-of-five format, which gives an edge of 9 percentage points overall. Can the edge afforded by best-of-five matches explain gender differences at the Grand Slams? The observed win advantage of higher-ranked players at the 010 014 Grand Slams was 50% for men and 4% for women, an 8 percentage point difference that is entirely consistent with the expected edge of 9 percentage points with a best-of-five match. If women played best-of-five matches like men, we would expect them to perform as consistently as men. These findings shine a spotlight on two biases in the way tennis and comparisons between the men s and women s game are reported. There is both an overemphasis on performance at Grand Slams and a discounting of differences between the tours that extend beyond gender. Equality in match format would be an important step towards true gender equality in tennis Conclusion Despite praiseworthy efforts to close the gender gap in tournament prize money, the Grand Slams are inadvertently short-changing the women s game by having men and women play a different match format and using a format for women that makes the outcomes for their tour less predictable than the men s. Having both tours play the longer bestof-five format (currently reserved for the men) References 1. McGrogan, E. (015). Men s depth on delightful display as Murray, Kyrgios dig deep to reach QFs. Tennis, 5 January. Retrieved from bit.ly/1osyuyb. W. S. (01) Where there s a Williams. The Economist, 31 May. Retrieved from econ.st/1euz0bu 3. Sujith, K. (01). The problem with women s tennis. Roar, 14 June. Retrieved from bit.ly/1gddk0b 4. Iyer, S. (014). A vicious cycle of inconsistency: The ailments plaguing WTA tennis. Tennis World, September. Retrieved from bit.ly/1p89fwk 5. Cameron, C. (01) Cameron on Tennis: Inconsistency in the WTA. Sportsnet, 11 October. Retrieved from bit.ly/1osyrlm 6. Tuggle, C. A. (1997) Differences in television sports reporting of men s and women s athletics: ESPN SportsCenter and CNN Sports Tonight. Journal of Broadcasting & Electronic Media, 41(1), 14 4. 7. Currell, K. and Jeukendrup, A. E. (008) Validity, reliability and sensitivity of measures of sporting performance. Sports Medicine, 38(4), 97 316. 8. Klaassen, F. J. and Magnus, J. R. (001). Are points in tennis independent and identically distributed? Evidence from a dynamic binary panel data model. Journal of the American Statistical Association, 96(454), 500 509. 9. Gray, C. (015). Game, set and stats. Significance, 1(1), 8 31. Stephanie Kovalchik is a statistician at the RAND Corporation, tennis analyst, and the 016 Program Chair-Elect for the Section on Statistics in Sports of the American Statistical Association. You can follow her work on tennis at on-the-t.com and on Twitter @StatsOnTheT 17