Analyzing Soccer Strategies Using Network Theory

Similar documents
THE LB - LEFT BACK CB- CENTER BACK LDM - LEFT DEFENSIVE MID RDM - RIGHT DEFENSIVE MID

A Network-Assisted Approach to Predicting Passing Distributions

The Effect of Some Variables Offensive and Defensive Play on the Order of Participating teams Ranked in the World Cup Football 2010

Pass Appearance Time and pass attempts by teams qualifying for the second stage of FIFA World Cup 2010 in South Africa

TECHNICAL STUDY 2 with ProZone

Using Spatio-Temporal Data To Create A Shot Probability Model

CAMPBELL RIVER YOUTH SOCCER ASSOCIATION

Game Theory (MBA 217) Final Paper. Chow Heavy Industries Ty Chow Kenny Miller Simiso Nzima Scott Winder

Opleiding Informatica

THE TRENDS THE TRENDS WORLD CUP WINNERS. "This is the greatest satisfaction any coach or footballer can feel." Marcello Lippi 2006

THE ACADEMY WAY 11v11 METHODOLOGY growing talent

Although only 15.0% of the transfers involved a fee, global spending on transfer fees 2 during this one month period was USD 1.28 billion.

Technical and tactical analysis of the Olympic tournament London November AEFCA Symposium in Split Dany Ryser

9-11 YEAR OLD PLAYERS

HOMEWORK BOOKLET DEVELOPMENT NAME: FORM: TEACHER:

Surf Soccer Curriculum

Building the Playing Style Concepts

Vision & philosophy 1

Clutch Hitters Revisited Pete Palmer and Dick Cramer National SABR Convention June 30, 2008

Analysis of the offensive teamwork intensity in elite female basketball

save percentages? (Name) (University)

Using Markov Chains to Analyze a Volleyball Rally

An examination of try scoring in rugby union: a review of international rugby statistics.

Examining NBA Crunch Time: The Four Point Problem. Abstract. 1. Introduction

THE EFFICIENCY MODEL OF SOCCER PLAYER S ACTIONS IN COOPERATION WITH OTHER TEAM PLAYERS AT THE FIFA WORLD CUP

Kinetic Energy Analysis for Soccer Players and Soccer Matches

Why Small Sided Games?

Behavior under Social Pressure: Empty Italian Stadiums and Referee Bias

COACHING CONTENT: TACTICAL Aspects to improve game understanding TACTICAL

Coaching Attacking Team Play: Addressing individual function in team attack

GLOBAL PREMIER SOCCER

Rondos How to use Spain s secret weapon

Hour 4 World Cup 2014!

OptaPro Post-Match Pack /16. Juventus - Barcelona. Champions League Season 2014/15, Matchday 13. Saturday, 06 June :45

A NOTE FROM THE YOUTH DIRECTOR > SPEED >> POWER >>> AGILITY >>>> REACTION >>>>> QUICKNESS >>>>>> TECHNIQUE NSA,

Football Rules How To Play World Cup Games 2010

Our Shining Moment: Hierarchical Clustering to Determine NCAA Tournament Seeding

DIFFERENTIATED ANALYSIS OF OFFENSIVE ACTIONS BY FOOTBALL PLAYERS IN SELECTED MATCHES FROM THE EURO 2008

3-16 Penetration: Taking/making space 1 Set up: 10x10yard square. Set up multiple squares if needed, at

The art of disciplined play

NBA TEAM SYNERGY RESEARCH REPORT 1

Football. English for THE GAMES

Europeans & the World Cup

Engaging. 503 transfers. 1,263 transfers. Big 5. USD 2,550 million. USD 461 million. Releasing. 1,126 transfers. 5,509 transfers. RoW.

Liberty FC Model of Play. Players role and responsibilities, positional numbers

Coaching Curriculum for Age Group: 12 and Under

Title of Study: Is there a Potential Market for Futsal in the UK? A Critical Analysis.

Using Actual Betting Percentages to Analyze Sportsbook Behavior: The Canadian and Arena Football Leagues

Defend deep to counter-attack

System of Play Position Numbers and Player Profiles

Spain the FIFA World Cup s super team ; South America home of the 2014 superfan

U.S. SOCCER D LICENSE

A closer look at pressing

In this session we look at developing teams ability to defend as a unit.

SOCCER STUDY GUIDE. Soccer is played in 132 countries all over the world. It is the world s number one sport, played for both fun and fitness.

Harvard Soccer Club. Spring 2012 Coaches Clinic. Master Coaching for Developing Youth Soccer Players

On Probabilistic Excitement of Sports Games

Evaluation of aerodynamic criteria in the design of a small wind turbine with the lifting line model

Using Big Data to Model Football. Edward Griffiths Hong Kong Jockey Club 29 April 2015

Analyzing the Influence of Player Tracking Statistics on Winning Basketball Teams

Analyses of the Scoring of Writing Essays For the Pennsylvania System of Student Assessment

The Coaching Hierarchy Part II: Spacing and Roles Tom Turner, OYSAN Director of Coaching August 1999 (Revised: December 2000)

2. Evolution and Revolution of Systems

Kevin U. Guevara, Bernie Gabriel, Jesan Morales. University of California, Merced

Antonio Jesús Calderón Guillén Francisco de Asís González García Alejandro Lobato Manzano Sergio Mata Antolín

2014 WORLD CUP BRAZIL

FUBA RULEBOOK VERSION

Helping players reach their goals. Curriculum

The Progression from 4v4 to 11v11

Average Runs per inning,

Analyzing football matches using relational performance data

Season By Daniel Mattos GLSA Futsal Director FUTSAL

Benefits in effective scouting:


A Simple Visualization Tool for NBA Statistics

FOOTBALL PHILOSOPHY DAN WRIGHT

VGI for mapping change in bike ridership

U10 Challenger Sports Curriculum

Progressive systems of play. Presenter: John Ellinger Technical Director-US Youth Soccer

Mini Soccer What Game Format and Development Model is Best?

STANDARD SCORES AND THE NORMAL DISTRIBUTION

How Effective is Change of Pace Bowling in Cricket?

Know the direction of play

GLOBAL PREMIER SOCCER

Project Title: Overtime Rules in Soccer and their Effect on Winning Percentages

REASONS FOR SMALL-SIDEDSIDED The fewer players on the field, the more players will get to touch the ball The more players touch the ball, the more ski

Regression to the Mean at The Masters Golf Tournament A comparative analysis of regression to the mean on the PGA tour and at the Masters Tournament

THE ACADEMY WAY 11v11 METHODOLOGY growing talent

NUMB3RS Activity: Is It for Real? Episode: Hardball

The FIFA Universe. Massive scale, massive influence, massive corruption

ECO 199 GAMES OF STRATEGY Spring Term 2004 Precept Materials for Week 3 February 16, 17

Building an NFL performance metric

Indoor 2017/18 Coaching Resource

RUGBY is a dynamic, evasive, and highly possessionoriented

Pedestrian crossings survey in Europe

Fullbacks. Tactical and Positional analysis. Kieran Smith UEFA A Licence Coach

March Madness Basketball Tournament

Region 1472 Coaches Corner DR

Analysis of performance at the 2007 Cricket World Cup

A Developmental Approach. To The Soccer Learning Process

Transcription:

1 Introduction Analyzing Soccer Strategies Using Network Theory Atli J. Einarsson Carthage College aeinarsson@carthage.edu April 27, 2015 Abstract Soccer has never been a game of statistics. A team s style of play has always been reserved for analysts and commentators to explain to the masses. In my research, network analysis and invariant calculation is used to analyze a team s style of play, specifically for the top four teams in the 2014 World Cup. This was done using passing data implemented in weighted adjacency matrices. In graph theory, a graph is a structure that is used to model relations between mathematical objects. Graph theory and networks can be implemented to solve various problems in different fields, including computer science, chemistry, physics, and sociology. Graphs can model many interesting things, including human made infrastructures, power grids, social connections, and the Internet. Team-sporting events that involve passes between players provides one with a fascinating example of a network. Generally throughout history, the statistics of a soccer game has not been used as a measure of team and player performance. This is due to the constant movement of the ball, the fluidity of the players, and low scoring. Because of this, simple statistics like shots, goals, and assists are bad representations of both individual and team performance. This outlook, however, has been changing in recent years. With the help of technology, extensive statistical data has been made available to the public starting with the UEFA Euro Cup in 2008. This includes the number of passes to player to player, the number of shots taken, and ball possession. This massive publication of data paves the way for more detailed analyses of soccer games. Every great soccer team throughout history has had a specific style of play they adhere to. This can be thought of like a blueprint to the way a certain team plays. In soccer, a team s style of play has never been described by statistics, but rather analyzed by experts and soccer commentators. However, it is possible to analyze this data quantitatively in a systematic way. To get a sense of what the style of play is, a distribution of passes between players in a team is used to create a network. The nodes correspond to the players, and the edges are passes between players. If each player is placed in a graph that visually represents the spot on the field where that player plays, then that network can immediately give us insight into a team s style of play. We can use this network to see neglected areas of the field, contributions of individual players compared to the rest of the team, or notice any performance problems between players.

We can make this visual analysis much more measureable by computing and analyzing network invariants. These invariants are used to analyze the performance of the team as a whole, as well as the performances of individual players. 2 Definitions and Development Definition 1 We define a graph as an ordered pair G = (V,E) where V is a set of objects called vertices and E is a set of links that connect these vertices, which are called edges. Each element of E is a pair of elements of V. This is also called a network. Definition 2 The passing network is a network where a team s players are vertices, and successful passes between players are edges between vertices. Definition 3 The adjacency matrix is an nxn matrix, denoted A, that represents which vertices are adjacent to which other vertices, where A ij is 1 if player i passed to player j, otherwise it is 0. Definition 4 A weighted adjacency matrix is an adjacency matrix where each entry refers to the number of passes from player i to player j. Definition 5 Network invariants are computations that help us understand our data quantitatively, and don t depend on visualization to make direct comparisons between graphs. Global invariants characterize the network as a whole, local invariants give insight into individual players. Definition 6 Edge connectivity is a global network invariant, and it represents the minimum number of edges one needs to remove to disconnect the network. Definition 7 Closeness is a local network invariant, and represents the inverse of the sum of the distance from a node to all other vertices. It is denoted C! =!!!!!!!"!!!!!!!", where d ij is the minimum length of paths connecting vertex i to vertex j, and n is the number of nodes in the network. Definition 8 Betweenness is a local network invariant, defined as the percentage of shortest paths that go through any given player. It is denoted C! =!!!"!!!!!, Where n!" is the number of paths from j to k, going through i and g jk is the total number of paths. Definition 9 Pagerank is a local network invariant that measures popularity of a vertex.!! Pagerank is defined by x i = α! A!"!!"#, where k!"#! is the vertex out degree and α is! always positive. Definition 10 Clustering is a local network invariant; it represents the number of triangles in a graph out of all possible triangles you can create in a graph. It is denoted!!!"!!"!!"!,! for a node i, where A ij is the weight of the edge connecting i and j.!"# (!)!!"!

Definition 11 The Global Average Clustering Coefficient is a global network invariant. It is the average of all clustering coefficients in a network. Definition 12 The Maximal clique is a global network invariant, it represents the number of nodes that are all pairwise connected to one another. 3 Results We present in this section the computations of the different invariants presented in the previous section for the teams that reached the quarterfinals in the 2014 FIFA World Cup. The average position of each player and the passing data was found on the FIFA website, which was then analyzed in a spreadsheet; the graphs were created using Mathematica. The passing networks of each team were created with per-game average passing data for each of the four teams. This network was then arranged into the formation most frequently used by that team during the tournament. Each network invariant computation was calculated using Mathematica. The passing networks give us a qualitative look into a team s style of play, whereas the network invariants give us a nice quantitative approach to the analysis. In soccer, since players move fluidly from position to position during play, the passing networks are oversimplifications of the overall shape of the teams. However, the different thicknesses and hues of the arrows give a good representation of the tactics of each team. The thicker and darker the arrows, the more passes were strung between those players. The networks can, for example, determine what area of the pitch the teams most frequently move the ball to, whether teams prefer long or short passes, or if a player is not involved as much as his teammates are. It is important to note that players playing the striker role are the focal point of attack and are more responsible for scoring goals than they are for contributing to the passing network. We can see in Figure 1 that Argentina has a somewhat symmetrical graph, but the majority of the passes seem to revolve around players Mascherano, Gago, and Di Maria. This may suggest that Argentina is relying too much on those players. In his central position, with a multitude of arrows coming to and away from him, Mascherano seems to be the focal point in Argentina s passing network. We can also see, when we look at the thickness of the arrows, that Argentina prefer to pass the ball on the left side of the pitch rather than the right. Another important thing to note is that Lionel Messi, who is often touted as the best player in the world, doesn t have a serious imprint in the passing network. We can see that he has more arrows directed towards him than he does away from him. This suggests that he is more prone to dribble the ball and shoot on goal than he is to pass. Argentina s local invariant scores in Table 1 reinforce this. Lionel Messi has very average numbers compared to the rest of his team, which is contrary to his reputation. Tight marking by the opposition may have influenced these scores. The top two scores in each category are shaded. We can see that Mascherano is indeed an integral part in the Argentinian team. He is the highest scorer in betweenness, pagerank, average passes made, and average passes received, this may indicate that Argentina relies to heavily on him in their passing game. Higuain s low score reflects his role as a striker; meaning he is not very involved in the passing game. In Figure 2, we see that Brazil s passing network is very disconnected. It is the least symmetric graph out of all four, and least robust. Like Argentina, Brazil also prefers the left side of the pitch due to the darker hues and thicker arrows filling the left side of the graph. Unlike the other three graphs, the thickest arrows in Figure 2 lie on the edge

of the graph this implies that Brazil tends to pass less through the central of midfield. There is also a big discontinuity in this graph as the player Hulk managed to pass to Fred less than one pass a game, so there is no arrow there. This suggests that there may be a discord between those two players. This graph also indicates that Neymar who is known as the superstar for the Brazilians received plenty of passes but rarely passed the ball, which means that he prefered to dribble and shoot over passing the ball. In Table 2, we see the local invariants for the Brazilian team. There is a large variance in betweenness scores; some are very high which implies that the team relies on those players a lot more than the rest of the team. Most notably Luiz Gustavo. With the highest pagerank, it seems that Neymar and Dani Alves are the most popular players to pass to. We also see that Dani Alves scored high in average passes made and received, which suggest he is highly involved in Brazil s passing game this is reinforced by his high pagerank and betweenness scores. With a betweenness score of zero, we see that Hulk can be taken out of the team and the passing network would not be disrupted. He also only made around fifteen passes on average per game, which is lower than the goalkeeper. This would imply that Hulk is an underperformer in the Brazilian passing network. Germany s passing network is shown in Figure 3. We can see that this is the most complete graph out of all them. It is the most robust and symmetrical, which implies that Germany is a very good passing team. We can see that there are no apparent underachievers, and the arrows are thickest in the center of the graph. This means that the German network is well connected. We can see that the arrows from the defense are well connected to the attack through players Lahm, Schweinsteiger, and Kroos. In Table 3, we see the local invariants for the German team. With very low betweenness scores, and low deviation of those scores, Germany is the most balanced team with the most balanced passing network. They don t rely on certain players to be successful. Philip Lahm seems to be the heart of Germany s midfield, with the highest betweenness score, pagerank, and most passes received and made. He also has the most passes made out of anyone in the four teams. The top performers in each category vary significantly, which explains the low betweenness scores and indicates a good overall balance of the team. Lastly, in Figure 4 we can see the passing network for the Netherlands. Although the tactical shape of the team is asymmetrical, we see that the network itself is evenly distributed and symmetrical. The left side of the graph is slightly favored over the right. The two strikers, Robben and Van Persie, are isolated from the rest of the team, but the remaining players make a robust, connected network. The three central players have thick arrows connecting them to wide players, which implies a well-rounded network. It is also important to note that the Netherlands were the only team to play with five defenders. The shape of the team may have an effect on the passing network. Looking at the local network invariants of the Netherlands in Table 4, we see that the scores are very spread out. This implies a very well connected passing network and one that doesn t rely on just a couple of players. The largest betweenness score of 2.618, achieved by De Vrij, reinforces this. Like the German network, the betweenness scores don t vary much. Despite that, De Vrij and Daley Blind seem like the players most involved in the passing game with the two highest betweenness scores and the most passes completed. It is important to note that the Netherlands scores are the only ones that have defenders as the top performers in the network. The shape of the team may have influenced this.

Figure 1: Argentina Passing Network

Figure 2: Brazil Passing Network

Figure 3: Germany Passing Network

Figure 4: Netherlands Passing Network

Table 1: Argentina s Local Invariants

Table 2: Brazil s Local Network Invariants

Table 3: Germany s Local Invariants

Table 4: Netherlands Local Invariants Table 5: Global Invariants

4 Conclusion and Directions for Further Research From this analysis, we can definitively say that Germany has the most complete passing network out of every other network analyzed, as well as the most balanced team. This is reinforced by the global invariants in Table 5 where Germany has the highest scores in four out of the six categories, and has the lowest average betweenness. These conclusions seem fitting since Germany won the 2014 FIFA World Cup, Argentina being the runner up, the Netherlands third, and Brazil coming in fourth place. It is interesting to note that Brazil has the most fragmented and incomplete passing network, and they finished last out of the four teams. It seems as if there is strong correlation between a balanced passing network, high invariant scores and team success. Another interesting thing to look at is the Castrol Performance Index, which is a FIFA sponsored ranking system that uses mathematical formulas to analyze individual soccer players. The top ten performers in the 2014 FIFA World Cup according to the performance index include nine players combined from Germany, Netherlands, Argentina, and Brazil. Toni Kroos and Stefan de Vrij ended up first and third respectively, which is reinforced by the local network analysis in the previous section because they were top performers for their respected teams. Mats Hummels, Oscar, Thiago Silva, Marcos Rojo, and Ron Vlaar are also on the top ten list and all of them were top performers in the local network analysis in the previous section. Although there are no performance indices that can give us any absolute scale to judge player performance, the fact that the individual performance analysis in the previous section agrees with the Castrol Performance Index merit both as good measures on player performance. One thing that would be an interesting to look at in this analysis is the passing network and network invariants of Spain in the 2014 FIFA World Cup. The Spanish team won the 2010 FIFA World cup and is known for playing a tiki-taka style of play in soccer, which is characterized by short passing, movement and possession of the ball. There are also many additional features that can be implemented in order to make the network more realistic and useful. A node representing the opposition goal can be added to the network and edges to these nodes would represents shots on goal. This would prevent the low scores of the forwards and give us a better snapshot into player performance. Another network that could be created is a passing accuracy network where the weight to the adjacency matrix is the accuracy of the passes instead of the amount of passes. This would highlight the efficiency of players and teams. An alternative network that can be constructed is the defensive network. In the defensive network, interceptions and tackles would be analyzed to judge player and team performance. Since the passing network does not take defensive ability into consideration, the defensive network would supplement the passing network effectively. These two networks would make analysis more broad and make judgments of team and player performance more well rounded, balanced, and effective. All of these different types of networks could initiate different ways for sports analysts to reinforce their claims in a quantitative way. This would remove some of the subjectivity in soccer reporting. Introducing networks into television analysis and commentary would also create good pictorial representations of a team s style of play, which would make it easier for a casual fan to understand. Quantifiable analytics have come to be imperative in any other sport, whether it be the sabermetric revolution in baseball or the statistical obsession in basketball. It is inevitable that quantifiable data will make its way into the game of soccer and its analysis, creating different types of networks gives us a great way to do just that.

References [1] Newman, Mark E.J. Networks: An Introduction. Oxford: Oxford UP, 2013. Print. [2] Pena, Javier L., and Hugo Touchette. A Network Theory Analysis of Football Strategies. Thesis. 2012. N.p.: n.p., n.d. Print. [3]"Fédération Internationale De Football Association" FIFA.com. FIFA, n.d. Web. 06 May 2015. [4] "The Castrol Index: Analysing Peak Performance." FIFA.com. FIFA, 20 May 2014. Web. 06 May 2015. [5] "Definition of Tiki-taka in English." Tiki-taka: Oxford Dictionary. Oxford University Press, n.d. Web. 06 May 2015. [6] "Wolfram Mathematica." : Definitive System for Modern Technical Computing. Wolfram, n.d. Web. 06 May 2015. [7] Birnbaum, Phil. "SABR." A Guide to Sabermetric Research. SABR, 2015. Web. 14 May 2015.