Sport statistics: how to assemble your team elivian.nl high quality, poorly written update history current(v2.1):7-oct-16, original(v1):13-sep-14

Similar documents
windpro WP A15-Lingewaard WTGs Shadow receptor-input

Navigate to the golf data folder and make it your working directory. Load the data by typing

Opleiding Informatica

Lesson 14: Modeling Relationships with a Line

It s conventional sabermetric wisdom that players

y ) s x x )(y i (x i r = 1 n 1 s y Statistics Lecture 7 Exploring Data , y 2 ,y n (x 1 ),,(x n ),(x 2 ,y 1 How two variables vary together

Why We Should Use the Bullpen Differently

Running head: DATA ANALYSIS AND INTERPRETATION 1

Modeling Fantasy Football Quarterbacks

Chapter 12 Practice Test

How to Make, Interpret and Use a Simple Plot

ORGANISING TRAINING SESSIONS

Legendre et al Appendices and Supplements, p. 1

Chapter 13. Factorial ANOVA. Patrick Mair 2015 Psych Factorial ANOVA 0 / 19

APPROACH RUN VELOCITIES OF FEMALE POLE VAULTERS

Is lung capacity affected by smoking, sport, height or gender. Table of contents

Premium Report 27/07/2017

PREDICTING the outcomes of sporting events

DOE Golfer Experiment

Lab 11: Introduction to Linear Regression

Evaluating The Best. Exploring the Relationship between Tom Brady s True and Observed Talent

Analyses of the Scoring of Writing Essays For the Pennsylvania System of Student Assessment

Lecture 22: Multiple Regression (Ordinary Least Squares -- OLS)

DR. JAY S INFALLIBLE BASEBALL SYSTEM!!!

Grade: 8. Author(s): Hope Phillips

An Overview: 5000 MW Ultramega Solar Park at Dholera. For the positive response to Expression of Interest (EoI)

Averages. October 19, Discussion item: When we talk about an average, what exactly do we mean? When are they useful?

Using Markov Chains to Analyze a Volleyball Rally

MONEYBALL. The Power of Sports Analytics The Analytics Edge

The MACC Handicap System

Appendix B-2: General Population Codebook

Competitive Performance of Elite Olympic-Distance Triathletes: Reliability and Smallest Worthwhile Enhancement

Calculation of Trail Usage from Counter Data

Practice Test Unit 6B/11A/11B: Probability and Logic

Benefits in effective scouting:

Section I: Multiple Choice Select the best answer for each problem.

Building an NFL performance metric

Chapter. 1 Who s the Best Hitter? Averages

Practice Test Unit 06B 11A: Probability, Permutations and Combinations. Practice Test Unit 11B: Data Analysis

Warm-up. Make a bar graph to display these data. What additional information do you need to make a pie chart?

SPATIAL STATISTICS A SPATIAL ANALYSIS AND COMPARISON OF NBA PLAYERS. Introduction

Do Clutch Hitters Exist?

A V C A - B A D G E R R E G I O N E D U C A T I O N A L T I P O F T H E W E E K

Mohammad Hossein Manshaei 1393

The Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

save percentages? (Name) (University)

Bivariate Data. Frequency Table Line Plot Box and Whisker Plot

This page intentionally left blank

WHAT CAN WE LEARN FROM COMPETITION ANALYSIS AT THE 1999 PAN PACIFIC SWIMMING CHAMPIONSHIPS?

Lesson 3 Pre-Visit Teams & Players by the Numbers

Taking Your Class for a Walk, Randomly

Darrell Klassen Inner Circle

Announcements. Lecture 19: Inference for SLR & Transformations. Online quiz 7 - commonly missed questions

Besides the reported poor performance of the candidates there were a number of mistakes observed on the assessment tool itself outlined as follows:

The Effect of Drunk Driving Script Madeline McNamara Edgar Snyder Words to be Heard

Held by Held by Held by Secondary. Total Primary Secondary Special/ Federation

A Study of Olympic Winning Times

Clutch Hitters Revisited Pete Palmer and Dick Cramer National SABR Convention June 30, 2008

STAT 625: 2000 Olympic Diving Exploration

Introduction. Level 1

5 REASONS WHY YOU WILL NEVER BE SUCCESSFUL AT FOOTBALL BETTING

Major League Baseball Offensive Production in the Designated Hitter Era (1973 Present)

Gamblers Favor Skewness, Not Risk: Further Evidence from United States Lottery Games

Assessment Schedule 2016 Mathematics and Statistics: Demonstrate understanding of chance and data (91037)

Winning 10U Defensive Strategy

1. OVERVIEW OF METHOD

Algebra I: A Fresh Approach. By Christy Walters

March Madness Basketball Tournament

It s Not About You, Coach: A message to Youth Baseball Coaches

Conceal Defense Basic Explanation and Purpose The is an initial defensive alignment I saw watching a community college game a few

DISMAS Evaluation: Dr. Elizabeth C. McMullan. Grambling State University

Analysis of Variance. Copyright 2014 Pearson Education, Inc.

When Should Bonds be Walked Intentionally?

Announcements. % College graduate vs. % Hispanic in LA. % College educated vs. % Hispanic in LA. Problem Set 10 Due Wednesday.

PROPOSAL TO IGC PLENARY 2017 Year 1. From USA. PROPOSAL to change the calculation of speed points and distance points.

Pass Protection Manual

Teachings From An American Style Fighter Kite

Ocean Waves and Graphs

Journal of Quantitative Analysis in Sports

The final set in a tennis match: four years at Wimbledon 1

Hitting The Driver Made Easy

a) List and define all assumptions for multiple OLS regression. These are all listed in section 6.5

Extra: What is the minimum (fewest) number of train cars that it would take to hold all those passengers at once, if each car holds 12 passengers?

11 Ways Youth Baseball Parents Can Support Their Coach

The Most Important Letter You ll Ever Read Before You Play Your Next Game Of Golf!

SUMMARIZING FROG AND TOAD COUNT DATA

Minimal influence of wind and tidal height on underwater noise in Haro Strait

Citation for published version (APA): Canudas Romo, V. (2003). Decomposition Methods in Demography Groningen: s.n.

ASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era. Gary Evans Stat 201B Winter, 2010

Session 2: Introduction to Multilevel Modeling Using SPSS

Developing a Javelin Run-Up

1 Streaks of Successes in Sports

COMPLETING THE RESULTS OF THE 2013 BOSTON MARATHON

Peer Effect in Sports: A Case Study in Speed. Skating and Swimming

Create a bungee line for an object to allow it the most thrilling, yet SAFE, fall from a height of 3 or more meters.

1. Answer this student s question: Is a random sample of 5% of the students at my school large enough, or should I use 10%?

Primary Objectives. Content Standards (CCSS) Mathematical Practices (CCMP) Materials

Target Density Lab SCIENTIFIC. Density Inquiry Lab Activities. Introduction. Concepts. Materials. Safety Precautions. Preparation

ANOVA - Implementation.

BIOL 101L: Principles of Biology Laboratory

Transcription:

Sport statistics: how to assemble your team elivian.nl high quality, poorly written update history current(v2.1):7-oct-16, original(v1):13-sep-14 If you ve ever tried to make a selection team for any sport, you know it s a hassle. First of all, egos are involved for people who are being selected. Second if you re doing the selection with other people they will most likely (subconsciously?) favour: i. friends, ii. people with experience and iii. people who make a lot of goals. Even if you all agree to look past this and look at the actual player it is still hard to choose between players because you do not know if technique, speed or team play is more important. And finally your team also have to play well together. I did a team selection for a couple of years and after a couple of years I wanted some answers! So I started to collect data a half year in advance. I jotted down all the teams and final scores in training matches with the intent to do some serious statistics just before the team selection. I was aided by the fact that we played many training matches (153) with small (5v5) random teams of mixed gender, and a relatively small number of people (12 people joined in over 50 matches, 22 people joined in over 20 matches)). This allowed me to estimate the contribution individuals to their team score. The results surprised me, and that is why I m writing this down. Chapter 1 describes the dataset. Chapter 2, here I will develop a model to predict team strength based on individual strength. Chapter 3 Based on chapter 2 we now extend the analysis to investigate the specialties of players 3.1 how should you balance offensive/defensive players? 3.2 How important is it to take into account that certain players play well together? 3.3 How important is it to take into account that some players really start shining in good teams?. Chapter 4, here we will answer the question if technique, speed or teamplay (or something else) is the most important. If you don t like mathematics I suggest skipping the Method, Results and Observations sections of every question and all sections marked as (skippable). If you love mathematics I suggest getting the dataset and investigate yourself! You can find both datasets used in this document at: elivian.nl/support/sport1.xlsx and elivian.nl/support/sport2.xlsx. 1 elivian.nl

1. The Dataset (skippable) 1.1 The good Unihockey has small teams (4 vs 4 playing) which makes it much easier to infer the contribution of the individual. More goals are scored per time period (on average about 4 per 10 minutes) compared to many other sports. This makes it possible to get a more accurate picture even in a shorter time span. People have no fixed position in the field (e.g. no goalkeeper) Large dataset for a relatively small number of people (for 10 people there is over 1000 minutes of playing time, another 8 are observed for over 500 minutes) Random teams. We would always put our sticks on a heap and randomly divide those. Only in rare cases things we would redo the teams (teams too similar to the previous match or sometimes in case of extreme skill level difference) People didn t know they were being observed 1.2 The bad It was done at an amateur club, results might not hold on a professional level. We had single-time visitors joining our teams. Attendance varies. There were 9 members with fewer than 10 matches observed (<150minutes) Matches had different durations Different team sizes. Frequently there were teams of 5 players of which only 4 can play at any time. Unihockey allows for infinite substitutions at any time. The observations were done in 6 months. Since some players just started playing unihockey their skill level might have changed in these months. I had to remember all the teams and score before I could write it down back home. I expect a bias (in score) in favour of myself. My analysis is pretty basic. I like eyeballing and not doing really rigid tests (I like to understand what I m doing). Problem is, I make mistakes so nothing can be taken for granted. 1.3. Preparing the dataset (very skippable) There are a couple of issues with the dataset which I had to address before I could proceed. Single time visitors I divided into 5 skill levels based on my own impression. Matches had different durations. Since the duration was similar I just scaled everything (linearly) to matches of 30 minutes (so a 4-1 match in 10 minutes would become 12-3). Members with few matches I also fixed a score for (combining my own experience with preliminary results from the analysis). After the analysis and before putting it online I anonymized the dataset. 2 elivian.nl

2. Predicting team effort based on individual strength In order to obtain a single parameter strength-score for each player we need to make assumptions on how individual strength leads to final score. Once we know this relation we can use the observed scores to calculate the most likely individual strength. 2.1 The most simple model The model will need to predict the score so before we start it would be good to know the observed probability distribution of scores. In order to first get an idea of how scores are distributed we will first make a graph of the sorted score difference of all matches. We are lucky as it seems to closely resemble a normal distribution. Since summing variables often results in a normal distribution my first guess would be that: i. the strength of a team is simply the sum of the strength of the individual members ii. the score difference in 30 minutes is obtained by subtracting the team strengths. Basic assumption E[score difference in 30 minutes] = sum(strength players team 1) sum(strength players team 2) Example: Team 1 Team 2 Name Strength Name Strength Peter +0.0 Nemo +2.7 Pan +0.9 Dory -1.3 Wendy -2.0 Hank +0.2 Tink +4.1 Destiny -0.6 Sum: +3 Sum: +1 In this example, according to this assumption, when playing 30 minutes team 1 would win with 2 goals difference. So the final score might be 8-6 (or 7-5 or 6-4 etc.). In case of 5 players in a team (of which only 4 can be playing at any time) I simply multiply the individual strength by 5/4 th. Using a maximum likelihood method (+enforcing the average strength to be 0 + assumptions listed in 1.3) yield a strength estimate of individual players. (curious? Skip ahead to section 3.1). 3 elivian.nl

2.2 A non-linear extension of the basic model (skippable) Question: Isn t the basic assumption too simple? Perhaps the final score isn t a simple subtraction of the strength of the teams. Perhaps strong teams totally dominate weaker teams, or perhaps strong teams vs weak teams reach a certain ceiling (they might get lazy). Something like this might be going on: E[score difference in 30 minutes] = sum(strength players team 1) n sum(strength players team 2) n Method: Plotting the predictions of the model and comparing it with the actually observed data. Results: x-axis: model prediction of score difference (score team 1 - score team 2) under the basic assumption of section 2.1 30 25 20 15 10 5 0-15 -10-5 -5 0 5 10 15 20-10 -15-20 y = 0.0022x 3-0.0071x 2 + 0.8056x + 0.4006 y-axis actually measured score difference Observations: i. There exist more matches with a positive score (reason: I ve always put myself in team 1 and I won more than I lost). This shouldn t affect results ii. Prediction according to the basic assumption and actually measured score seem well correlated iii. When adding a 3 rd degree polynomial we see that when the model predicts a high score difference (i.e. 13) it slightly underestimates (i.e. real score 15). Interpretation: So it seems that stronger teams become stronger against weaker players, but barely. A possible explanation might be that as a losing team you don t feel like putting in too much effort, on the other hand, if you re winning you might be interested in getting another goal. Conclusion: The effect doesn t seem too strong/significant and I therefore think it is negligible. 4 elivian.nl

2.3 Extending the basic model to include standard deviation in a team (skippable) Question: Isn t the basic model too simple? Is it really true that the strength of a team is the sum of its parts? Perhaps a team with players of strength [+4, +2, -2, -4] is stronger than a team with players of strength [+2,+1,-1,-2]. Or in other words, how much does the strength difference within a team matter? Method: In order to measure this we make the following graph including the variance (=standard deviation ^2) in a team. A higher variance indicates more difference within a team. We compare this with the residuals (actually measured score difference predicted score difference according to the basic model of section 2.1). 15 x-axis: Residuals after prediction with the basic model 10 5 0-20 -10-5 0 10 20-10 -15-20 y-axis variance(team1)-variance(team2) R² = 0.0021 Observations: i. It is a big (and random looking) mess ii. Plotting a line seems to indicate that that as variance in team 1 increases the model underestimates the score. So there might be a slight advantage to have a high-variance team. Interpretation: I don t know. Perhaps good players can coach their team? Or fix the holes in a team. Conclusion: It is a very small effect and most likely not negligible/significant. 5 elivian.nl

3. Different playing styles When making teams we would always try to ensure a balanced team. We would put people with different skills together and try to make teams with people who play well together. In this chapter we ll compare this with the data. 3.1 How much should you care about balancing offensive/defensive players. Question: A first thing we always took into account is if the team has a balanced offense/defense. People seem to prefer one or the other and a different skillset seems required. Can the data reveal more offensive players and more defensive players and what can we learn from this? Method: In order to analyze this I couldn t simple do a maximum likelihood estimate with an offensive and defensive strength for each person (too many variables). Instead for every player I use the basic strength obtained in section 2.1 and combine this with the average total goals the player was present in. Since offensive strength adds to the total number of goals and defensive strength subtracts from the total number of goals the combination these values can be used to compute an estimated offensive/defensive strength. Results: The following table shows the overall strength of a player (section 2.1) and the offensive and defensive strength (this section)). None of these values have an absolute meaning, but in comparing them to other values scoring 1 higher means 1 extra goal made/avoided per 30 minutes. Higher is always better. When looking at offense/defense separately they can also be compared to each other (so a 7/8 player is better in defense). 6 elivian.nl

Average number of goals (in team) Strength Strength (overall) For against Total Offense Defense Aa 4.2 7.1 3.5 10.4 7.5 7.3 Ag* 4.1 6.3 3.5 9.2 6.8 7.9 Ah 3.4 5.9 3.5 9.9 6.8 7.2 Ae* 3.4 4.8 6.4 10.9 7.3 6.7 Am 3.3 5.7 3.2 9.7 6.6 7.2 Ai 3.1 6.4 3.4 10.0 6.7 7.0 Bb 2.2 6.2 4.6 11.0 6.6 6.1 Ay* 2.0 5.2 4.8 10.0 6.1 6.5 Ak* 1.6 6.2 3.4 9.2 5.5 6.7 Ap 1.5 5.8 4.3 9.7 5.6 6.5 Az 1.4 7.5 5.1 12.0 6.8 5.2 Aw 1.3 4.7 5.3 9.6 5.5 6.4 Aj* 1.2 7.4 5.1 12.4 6.8 4.9 Af 1.2 4.9 5.2 10.0 5.6 6.1 Bb 0.9 5.1 5.6 10.5 5.7 5.8 Ba* 0.9 3.0 6.7 9.6 5.2 6.2 As 0.7 6.4 4.0 10.1 5.4 5.8 Ar 0.6 5.0 5.4 11.0 5.8 5.4 Av 0.4 5.3 5.5 10.6 5.5 5.5 Ac 0.2 4.4 6.0 10.6 5.4 5.4 Au 0.0 4.4 5.1 9.0 4.5 6.1 Be* -0.1 5.1 6.4 10.2 5.1 5.5 Ab -0.1 4.7 6.0 11.1 5.5 5.0 An -0.3 5.5 5.9 11.4 5.6 4.7 Bg -0.3 4.5 5.9 10.6 5.1 5.2 At* -0.4 4.5 9.5 13.2 6.4 3.8 Ad -0.7 4.4 6.0 11.0 5.1 4.8 Ao* -1.2 7.2 6.3 12.4 5.6 3.8 Bi -1.8 3.7 6.4 10.1 4.1 4.7 Bh -2.1 3.4 6.3 8.9 3.3 5.2 Ax* -2.3 2.2 8.4 9.5 3.5 4.8 Bf* -2.5 4.7 6.3 10.9 4.1 4.0 Aq* -4.0 5.6 8.0 13.1 4.4 2.3 Al* -4.0 2.3 8.4 10.2 2.9 3.7 Bc* -6.5 5.0 5.0 9.5 1.2 2.9 *Low number of observations, likely unaccurate results. Observation: both offensive and defensive strength are highly correlated with overall strength. Interpretation: this might indicate that general skills (i.e. ball control, agility, endurance and teamplay) are more important than specific offense/defense skills Conclusion: There seems to be no such thing as truly offensive players or defensive players. So it seems that more effort should be put into finding the best players than in balancing the offense and defense of a team. 7 elivian.nl

3.2 How important is it to take into account that certain players okay well together? Question: As we ve now seen there is little difference between offensive and defensive players (only better and worse players). But still one might wonder if 2 people actually outperform the sum of their parts. I think that everyone who does a team sport knows this feeling of really being in sync with a specific other player. How does this hold up against the data? Method: A problem when answering this question is that looking at 2 people playing together significantly reduces the number of observations. So we just look at the 37 most occurring combinations. We look at the average residuals when a combination of these players are together. Results: Aa Ac Ad Af Ai Au Aw Bb Bd Bg Ad -4.0 Af 1.2-1.8-0.6 Ai -0.7 Ap -0.8 2.0-0.4 Ar 4.5 Aw -1.0-0.8 0.5 Bb -2.0 3.7 4.5 0.0 Bd 0.3 0.0-3.1 0.4-2.2-0.4 Bg -4.3 5.3 0.0-7.6 5.1 0.0 Bh 1.5 0.0 Bi -4.0 1.0-0.7 0.2 0.8-1.0 3.4 Observations: i. A value of 5.1 means that when these players are together in a team the model underestimate the actual score per 30 minutes by 5.1 (so together these players outperform the model by 5.1). ii. There seem to be as many negative numbers as positive numbers. Interpretation: i. In order to find out if these numbers are due luck, we combine this with the amount of time played together and the standard deviation of all matches to obtain a p-value. Even with 37 matches and a residual value of -7.2 yields a p-value of only 0.064. Since we are looking at over20 values pure luck would often result in the lowest p-value being 0.05. Conclusion: There is NO effect visible whatsoever. The people you are in sync with are probably secretly just good players. So yes, you might actually be doing better, but this is because of them, not because of you or because of a special connection you share. So you might make more goals, but this is the result of getting better assists of a good player, not because you suddenly are better at making goals. 8 elivian.nl

3.3 How important is it that some players really start to shine in good teams? Question: When making teams an often heard argument was is that some people play a lot better in better teams. Method: we look at the average residuals of every player when they are in a strong team (sum of strength of other players >3.3). Combining this with the number of games in strong teams to obtain p-value. Results: Residual #games in strong team p-value Aa -0.08 20 0.94 Ab -1.11 14 0.41 Ac -0.44 21 0.69 Ad -0.18 28 0.85 Af -0.11 33 0.90 Ah -0.74 19 0.52 Ai -0.50 20 0.65 Aj 3.03 7 0.11 Ak 3.38 7 0.07 Am -1.98 11 0.19 An 0.05 22 0.96 Ao 1.80 11 0.23 Ap -0.58 31 0.52 Ar 1.25 13 0.37 As -0.78 11 0.60 Au -0.50 20 0.65 Av 0.20 6 0.92 Aw -0.92 24 0.37 Az 0.40 14 0.76 Bb -0.68 34 0.43 Bd -0.31 32 0.73 Be 0.96 5 0.67 Bf 1.27 8 0.47 Bg -0.05 29 0.96 Bh 0.50 11 0.74 Bi -0.29 28 0.76 Observation: i. Many people seem to do worse when playing in a good team (average -0.44). I cannot explain this. Interpretation: i. Is there enough power in this test? Probably yes, because even before compensating for the number of games the average difference is pretty small (or negative). Conclusion: Once again I cannot find any effect in the data to support that people become better in better teams. 9 elivian.nl

4. Which skills contribute most to a players overall strength? 4.1 Male vs female Question: I always wondered if men were indeed better at sports, and if so, by how much? Since unihockey is a mixed sport (officially 2 men and 2 women must be playing at any time) and we didn t make teams based on gender anyway this is a good opportunity to get slightly side-tracked and answer this question. Method: Eyeballing. Results: To save paper I ll refer you to the table in section 3.1. I made the anonymized names of the women slightly pink. Observations: i. Most of the pink seem to be on the bottom. Interpretation: i. About 70% of all men is better than about 90% of all women. ii. The median (average) man is better than the best woman. iii. A year after this analysis was done a team including only women who were included in this study finished 3 rd (out of 24 Dutch teams). The results therefore seems to hold more widely. iv. There were small differences in the background of men/women: men train slightly more often and have been training slightly longer. But not enough to explain (even part) of these differences. Technical skill (ball handling) seem really similar for men and women. Conclusion: Men seem to be doing A LOT better. This study does not reveal a definite cause. 4.2 Which skills contribute the most to the final score of a team? Question: What makes the best player? Is it technique? Speed? Tactical assessment? Now that we have an approximation of the strength of players it might be interesting to see if we can find out what causes the strength differences. Method: I videotaped a single training session and afterwards counted a lot of variables for each person (a summary can be found in appendix 1). I conduct a simple linear regression with these variables as independent variables and the final strength (as obtained in this document) as the dependent variable. To avoid having too many variables I remove the variables which seem to have the least significant effect and rerun the regression. Results: Info for the regression Meervoudige correlatiecoëfficiënt R 0.925009013 R-kwadraat 0.855641675 Standard error 0.723471796 Observations 15 Coefficient P-value Intercept -6.761236073 0.0139379 Bal touched 0.071446602 0.00655132 Bal gains (capturing) 0.086061731 0.1060351 %passes correctly received 4.052100781 0.27967184 Bal loss -0.092010322 0.32448499 %passes correctly given 0.966893215 0.75310889 10 elivian.nl

Observations: i. Only times touched the ball is significant. ii. Everything not listed in the results (e.g. #goals, #assists, #mistakes, #shots on goal) were removed as being very insignificant Interpretation i. The number of goals you make or assists you give seem a mediocre proxy for overall contribution to a team at best. ii. The most important thing seemed to touch the ball a lot. This indicates the importance of team play as receiving/giving passes is the dominant factor in number of times a player touched the ball. In fact every time you touch the ball you contribute (or avoid) 0.07 goals to your team. iii. Capturing the ball (e.g. directly from an opponent, intercepting a pass or when the ball ends up in an empty space) is the second most important. The predominant factor leading to this is speed and endurance. With unihockey (at this level) it often occurs that the ball ends up in an empty space and the fastest person will get the ball. The results seem to indicate that every time you capture a ball this yields your team 0.16 goal. iv. Passes correctly received is also pretty high. Possible once again due to its importance in team play. Conclusions: Team play, speed and endurance dominate the other skills. This might explain the difference in strength between men and women (as the women in the list score lower mainly on speed/endurance) 11 elivian.nl

5. The end 5.1 Conclusions Where I set out to investigate the differences between players and their complex interactions I found simplicity instead. I didn t find any evidence that a team is more than the sum of its parts. I didn t find any evidence of complex interactions which forces me to carefully conclude that there is superstition in selecting teams. Speed, endurance and team play (basic passing) seem to be the most important characteristics in determining the quality of a player. Technique, number of goals and ability to do solo actions seem of less importance. Based on eyeballing the data, experience seems to be of mediocre importance. So basing your selection on experience is better than choosing at random, but is inferior to actually looking at the skills of the players. 5.2 Implications for real life My personal lesson out of all this is that we aren t good at estimating contribution of a person to the common cause. Teams seemed to be made based on experience, goals made and who is your friend and superstitious beliefs. To me it seems likely that the same effects hold for example at work. In fact, the workplace might even be less transparent than sports because in sports you can see everything someone does at work not so much. So this is good news for people who are slacking (do you check facebook at work a lot?). If you are someone who is working hard the lesson seems to be to: not expect that people are able to see your contributions and don t blame them if they undervalue you. Sad lesson. A lesson which I ve learned for when I m a manager is that you probably will not be able to reward someone justly for what they are doing since you do not know how well they are doing. A focus on intrinsic motivation and trust therefore seem important. 5.3 Further reading I haven t read anything on this subject. I can recommend the movie: moneyball though. Don t expect any statistics, but it is about a real life success story of applying statistics to baseball and it is fun. A TED talk about a very similar subject is by Rajiv Maheswaran: The math behind basketball's wildest moves. It s interesting. 5.4 Feedback Let me know! Anything. feedback@elivian.nl. 5.5 Copyright None (public domain): do whatever you like with this article (and the 2 corresponding datasets). 12 elivian.nl

Appendix 1: video tape based statistics summary (full: elivian.nl/support/sports2.xlsx) Offense Teamplay Ball posession Defense Time played Shots on goal Goals %Goal Assists Ball touched (any ball contact except passing from A to B to A to B to A to B would only Passes count for 2 given %Passes given correctly (speed, direction, and height are all decent) Ball gain (capturin g the ball in some way) Ball loss Of which dangerous ball loss %passes correctly received (technically, quick ball control) Excellently defended (in between a dangerous pass, a good action as a goalie or returning to the defense quick enough to avoid a counter) Defense error (not being on ones knees when keeping, not returning to the defense fast enough, wrong position in front of goal) Time in field (min) (already corrected. All values in this sheet are normalized for a 20 minute play) Aa 11.60 1.83 16% 4.3 49 32 92% 14.0 7.3 2.4 0.89 6.7 0.0 32.76 Ab 1.80 0.00 0% 1.8 20 14 100% 6.3 1.8 0.0 0.80 1.8 0.0 22.17 Ac 1.10 0.00 0% 1.1 24 26 71% 3.3 8.8 1.1 0.82 1.7 4.4 36.36 Ad 0.00 0.00 1.1 24 19 66% 5.9 7.5 2.7 0.67 2.2 0.5 37.13 Af 8.63 1.15 13% 2.9 37 30 83% 9.2 9.8 0.0 0.86 1.2 0.0 34.78 An 6.16 0.68 11% 1.4 35 25 67% 7.5 10.3 3.4 0.78 1.4 1.4 29.2 Ap 12.12 3.64 30% 3.0 35 23 79% 12.1 9.1 0.6 0.92 5.5 1.2 33 Ar 7.07 0.50 7% 1.0 26 20 72% 6.1 5.6 0.5 0.82 2.0 0.5 39.63 Au 4.80 0.53 11% 0.5 23 15 83% 6.9 10.1 1.6 0.97 8.0 2.1 37.5 Av 10.10 2.53 25% 0.6 24 14 77% 13.3 12.6 1.3 0.94 2.5 0.0 31.67 Aw 8.43 1.30 15% 0.6 29 19 63% 14.9 11.0 1.9 0.81 1.9 0.0 30.83 Bb 7.99 0.94 12% 3.3 42 27 86% 18.3 10.3 1.4 0.82 3.3 0.0 42.57 Bc 4.61 0.00 0% 3.5 34 37 77% 7.5 7.5 1.2 0.85 3.5 0.6 34.71 Bg 6.54 0.55 8% 1.1 34 17 56% 8.7 9.8 1.6 0.70 2.2 1.1 36.67 Bh 3.33 0.00 0% 0.6 21 19 60% 1.1 6.1 2.2 0.81 3.3 2.2 36.05 13 elivian.nl