Matthew Gebbett, B.S. MTH 496: Senior Project Advisor: Dr. Shawn D. Ryan Spring 2018 Predictive Analysis of Success in the English Premier League

Size: px
Start display at page:

Download "Matthew Gebbett, B.S. MTH 496: Senior Project Advisor: Dr. Shawn D. Ryan Spring 2018 Predictive Analysis of Success in the English Premier League"

Transcription

1 Matthew Gebbett, B.S. MTH 496: Senior Project Advisor: Dr. Shawn D. Ryan Spring 2018 Predictive Analysis of Success in the English Premier League

2 Abstract The purpose of this project is to use mathematical methods from statistical and applied analysis to create models capable of predicting the rankings of the English Premier League teams for the season. This method revolves around using a statistical multivariate regression approach. The models will be created by using carefully selected data that may contribute to onfield performance and applying each approach to this data creating a score for each of the teams. The teams will be ranked based off these final average scores. The rankings assigned to the teams using this method will then be compared to the actual rankings of each team to check the accuracy of each of the models as well as to allow comparison of the different approaches.

3 1. Introduction Being born in England, I have always been interested in soccer or football as we refer to it. My favorite sport was always soccer while growing up. I played for multiple teams in my youth and went to matches supporting my local teams as often as possible. My dad and my sister were always arguing who s team was better. Both were very passionate about the sport which lead to me becoming a big fan. After moving to the USA in 2005, I stopped following the sport almost immediately because the passion my sister had for the sport stayed in England with her and my dad became too busy with his job. Eventually, I stopped playing the sport all together until I reached college. Once in Cleveland, I started to get back into the sport, watching the matches every weekend and began playing casually with friends. As I became more interested in Mathematics, I started to get more involved in the analysis of the matches. A fundamental curiosity for most fans is wanting to know which team really has the advantage in each match and which aspects of the game has was really the most important. Could these things be predicted from the strengths and weaknesses of each team in a scientific way? The central question addressed here will be whether the end of season results can be predicted from midseason or prior season data. Throughout my college experience, I have experienced the many different pathways that math can lead to. Initially, I was drawn to a more applied pathway involving statistics or analysis. The background I obtained from studying statistical analysis influenced my intrigue into wanting to know how it could be used rigorously when applied to soccer analysis. This led to the central hypothesis of needing to develop a model to determine the end of season rankings. Ultimately to reach this goal, I will have to find the optimal combination of a statistical and analytical approaches. To focus the study, we restrict ourselves to the data collected from the English Premier League (EPL), which only came to existence in Originally, the English soccer league was referred to as the English Football League (EFL) which began in 1888 and technically still exists today ([2] Shaw, 2013). The EFL consisted of the top four tiers of English soccer. A team that is in the third tier can move both up to the second tier with a good season or can move down to the fourth tier with a poor season. These processes have come to be known as promotion or relegation respectively. Until 1991, the top division in the EFL was referred to as the first division, but in 1991 there was a large investment into television programming for soccer matches in the top tier resulting in the need for a new name and branding ([3] History of the English Premier League, 2018). This lead to the rebranding of the top tier as the English Premier League while the EFL still exists as the second through fourth tiers of English soccer. When the Premier League first began, there were twenty-four teams in the league with the bottom four being relegated and the champions of the league being granted access to the European Cup. Currently there are twenty teams in the Premier League with the bottom three being relegated and the top four being granted access to the Champions league. There are no regions or conferences in English soccer, so every team plays every other team twice home and away. This leads to a completely balanced schedule and should make the ranking process more objective. I will be using the data from two seasons, the season, and a partial amount

4 of the current season, ([4]Premier League Team Statistics, stics/england-premier-league , 2018). The teams play a total of thirty-eight matches each season in the EPL which runs from the middle of August, until the end of May. A win grants a team three points, a draw grants one point, and a loss grants zero points. The team with the most points at the end of the season wins the Premier League. Since the introduction of the EPL in 1992, only six teams have managed to win: Manchester United, Chelsea, Arsenal, Blackburn Rovers, Leicester City, and Manchester City. What makes these teams so successful and consistently ranked near the top? Many teams have been in the Premier League because of the relegation and promotion aspect of English Football. A team that places in the bottom three of the Premier League is relegated to the second tier of the EFL. To keep the tiers consistent in size, the best three teams in the second tier of English Football are moved up to the Premier League. This system is consistent throughout all the tiers of English Football all the way to the tenth tier. The top four teams of the Premier League qualify for the Champions league. The Champions League is a large tournament between every single League in all of Europe. The team that wins the Champions League is considered the best team in Europe and respectively the world that year. Teams can also qualify for the Europa League which is the next best tournament in Europe. The teams that qualify from the EPL are fifth and sixth place. The team that wins the Europa league automatically qualifies for the Champions league regardless of their national league position. The important positions in the league are certainly first through fourth as they qualify for the Champions League, fifth and sixth for the Europa League, and then the bottom three because they are relegated to a lower division. Finishing higher in the league does grant some bonus money to the club, but that is negligible compared to qualifying for the European competitions or being relegated to a lower division. The team that wins the Europa League also qualifies for the Champions League but that cannot be determined by looking at the results in only the Premier League. While the primary goal of this project is the see if mathematical modeling can be used to predict all the positions in the league, a secondary focus will be on looking at these more important league positions. The difference in moving in to the Champion s League and being relegated can be on the order of 100 s of millions of dollars in franchise value and sale revenue. Since every team plays every other team both at home and away, there are no distinct advantages across the season, so the playing field is level. There is also no end of season knockout round to determine the champion of England unlike many other sports which allows for an easier model. My project data consists of data from one full season, and two-thirds of another season. The full season will be the season in which all thirty-eight matches are played, while the other is the current season which is the season which has not finished yet. Between the two seasons, twenty-three teams have played in the premier league because three teams were relegated after the season and three teams were promoted from the lower league into the premier league. The teams I will primarily be looking at will be the teams that qualified for the champions league (position 1-4), teams that qualified for the Europa League (positions 5-6), teams that were relegated (positions 18-20), teams very close to relegation but

5 managed to stay in the EPL (positions 15-17), and the teams that were recently promoted from the lower league (positions 1-3 in tier 2). ([5] European Qualification for UEFA Competitions Explained, 2018). Image from: ( Premier League, ) Based on my knowledge of soccer through many years of observation and more importantly results from the research data, I will restrict the key factors determining ranking to several variables that appear to be key to determining the outcome of individual matches and in turn, the league. For my project, there are seven variables used throughout. These variables are Goal Difference (GD), penalties conceded, discipline, shot accuracy, pass accuracy, clean sheets, and possession. GD is defined as the number of goals conceded by a team minus the number of goals scored by a team (e.g., 100 goals scored, and 40 goals given up lead to a GD of 60). Penalties conceded is a measure of the number of times when a team commits a foul in their own penalty box, which results in a penalty kick. Discipline may be harder to quantify, but in this project, it is defined as the total number of red cards earned by a team during the season and if there were the same number of red cards, yellow cards were used as a tie breaker. Shot accuracy is number of shots on target divided by the number of total shots by a team. Possession is measured as the percentage of time that a team is in control of the ball. Clean sheets are the number of times during the season that a team did not concede a goal during a match. Pass accuracy is the percentage of successful passes by a player on a team that reaches a teammate. The value of each of these variables is different so something like GD is going to determine a match much better than passing accuracy. However, removing passing accuracy may reduce the accuracy of the model and hide a possible subtle dependence buried beneath the surface.

6 The second half of the project will focus on predicting the final rankings for the season based on data obtained before the transfer window. Before and during each season, there are transfers made for each team which bring in new players to the club. The summer transfer window starts on June 1st and goes until August 31st as well as a winter transfer window which begins January 1st and ends January 31st. These windows exist so that players cannot move from clubs at any time during the year. They also add an element of change to the league because any team can hypothetically buy any player if they have the money for them, and their contract allows a purchase. The summer window allows players to transfer while they are only training and do not have any matches. This window also creates excitement for the upcoming season as each team normally brings in new players that get fans excited. The winter window is much more important to this project as it occurs at around a halfway point in the season and allows teams to potentially bring in their biggest rival s best player which can cause a big change in the league ranking prediction. To avoid these variables from the winter transfer window, I have decided to collect the data directly after the winter transfer window for the season, but before any new matches can be played with a team s new players. 2. Statistics in Sports Ever since sports have been popular in the public s eye, people have always debated which team is the best and which player scored the best goal etc. These questions have consistently changed as teams get better and athletes become stronger and faster, but simply claiming who the best player is, simply is not enough. People quickly turned to statistics in order to settle the debate with an objective source to back it up. This allowed for people to settle their debates on who the best player is by simply looking at their statistics and comparing them to every other player and seeing who came out on top. These comparisons have led to teams collecting data to try and improve their teams by analyzing the data. These analyses are constantly performed today and are essential for many teams and companies to continue their progress toward a championship. Prior to the formation of the Premier league in 1991, there was not nearly as much interest in soccer in England as there is today. Players were paid much less money, the conditions that they played in were worse, the stadiums were smaller, so they fit less spectators in them, and overall it just simply was not as professional. Once the investment into the Football League in 1991 was enacted, many clubs received large bonuses because the investment was primarily to improve the ease of viewing for the public. Soccer matches would rarely be broadcasted on television until the Premier League came into existence. From that point on every single team s matches were broadcasted across the country. Because of this large increase in pay for the clubs and their players, they were able to improve their skills and training so that very quickly, the players became much stronger and faster. This improvement drew several fans outside of England to the game and lead many people to support English teams even though they were not from the area. The players also received a larger salary. This higher pay lead to many foreign players wanting to migrate into the Premier League so that they continued to play quality soccer, but

7 they were also able to make a very good living from it. In the 1960 s player s salaries were around 20 a week. Today the average salary of a player in the Premier League is around 34,000 a week ([6] Harris, 2011). At the inception of the Premier League in 1992/93, just 11 players named in the starting line-ups for the first round of matches were 'foreign' (players hailing from outside of the United Kingdom or Republic of Ireland). By 2000/01, the number of foreign players participating in the Premier League was 36 percent. In the 2004/05 season the figure had increased to 45 percent. On December 26, 1999, Chelsea became the first Premier League side to field an entirely foreign starting line-up, and on February 14, 2005, Arsenal were the first to name a completely foreign 16-man squad for a match. ([3] History of the Premier League, 2018). Image from ([6] Harris, ) The record transfer fee, as well as the average cost of players skyrocketed to new levels. Currently, the world record transfer fee is 222 million by Paris Saint-Germain for Neymar while the record transfer fee for a player in the Premier League is 90 million for Paul Pogba. Back in 1991 however, the record transfer fee was 5.5 million. This transfer record was consistently broken every few years as the popularity of watching soccer grew. All this investment into players and clubs spending absorbent fees to bring the best in the world to their

8 club drew the interest of companies wanting to put their brand names wherever anyone could see them. It also drew the interest of betting companies wanting to make as much money as possible from the sport. Betting has been a part of sports for a very long time. People will make a wager against a friend or against a company claiming that a certain team will win and sometimes the wager will be more specific such as a team winning by three goals to 1 goal. The betting company will give odds of the specific event occurring such as 8-1 which means for every 1 unit of currency put in, if the result holds, the investor will receive 8 units of currency. Before betting started becoming very profitable, the data collected focused on very tangible concepts such as total number of goals scored per player or total number or saves made by a goalkeeper. This data is easy to understand and can determine which player is the best in their positions. More recently, the data has become much more complex by focusing on player positions and how they are moving around the field compared to every other player. This data is expensive to gather so they are rarely available for public use. Companies such as Opta Sport collects this data and sells it to betting companies as well as teams in the Premier League that want to use their service to try and improve their odds or strategy respectively ([7] Predictive Analytics, 2018). These large corporations are constantly changing their odds for matches based on all sorts of data such as the weather and the lineups of each team. This is so the prediction is very accurate and reputable to get more people to bet with their specific company over any other companies. These companies employ data-analysts full time who have had a lot of experience analyzing data and determining which team is more likely to win than the other during each match. These betting companies also allow people to bet on which team they think will win the league at the end of the year. This is my goal in this project, to determine which team is most likely to win the Premier League in the 2016/2017 season as well as predicting the exact order of finish based on recorded statistical categories. This would show in advance, based on current statistics, who will move on to the UEFA Champions League and a big payday versus those relegated to the lower league by finishing at the bottom of the standings. 3. How the Statistics Will Work A statistical approach is very commonly used in sports for determining which team is more likely to win certain games, or which team is worth betting on for certain variables. In my project s case, I am trying to find the best method to predict the Premier League ranking at the end of the season. The method would have to use the data I collected to create a model which would solve for the optimal weights for each variable in the equation to be able to predict the league with accuracy. The method that makes the most sense for this would be an applied multivariate regression model. A brief description of an applied multivariate regression model is necessary to understand how the model works, and the results of the model. A regression model will be defined as: Y = β 0 + β 1 X 1,

9 where Y is the response variable, β i are the weights, and X i is the explanatory variable. A response variable is the variable being tested against in the equation. It is the actual value that we are trying to predict. An explanatory variable is the variable that is independent and will be used to solve for the response variable. It is essentially the variables being tested against the response variable. The weight is an unknown variable that will be solved for in this equation. In my case, since I have multiple variables, this method will not work exactly since I have more variables than the equation can handle. For multiple variables, it requires a multiple variable regression model which is defined as: Y = β 0 + β 1 X 1 + β 2 X β n X n, where this equation is identical to the equation above, except it allows for multiple variables in the equation opposed to just one variable. Each β i value is a new weight which will change for each new response variable. Each new X i value is a different variable depending on the equation. For my model, I will be using multivariate regression. I will also be using several different sizes of multiple regression for my model as the values change with each different model studied. The variables are selected for each individual model. The results we would be looking for will minimize the root mean squared error (RMS). This error is the average distance in each rank from the actual result to the predicted result with some metric; namely: N E RMS : = 1 N (Y Pred Y Actual ) 2 1 The total error is all these individual errors summed up then divided by the total number of data pointed summed and finally the square root is taken. The RMS error places more emphasis on outliers than simply using an absolute value for measuring error. A completely random model should, on average, give a mean squared error of 10 if there are 20 total data points being used. Since there are 20 teams in the Premier League, the error will divided by 20. A bad model would produce a result that is larger than 10 because the error is very large, so each predicted data point is very far from the actual data points. A good model would produce an error lower than 10 but preferably as close to 0 as possible. A predict error of 0 would be a perfect result, but this would be very difficult to obtain with a model because there are many factors not accounted for (e.g., luck, weather, political climate). Here is an example of multiple different types of error results and their meaning. From the three models, there are error terms of 5.36, 3.57, and The 5.36 is a very good result while 3.57 is an excellent result. However, the error term is a result that could not be much worse, so this sort of model would be dismissed.

10 A multivariate regression model is crucial for predicting the best possible result because by design, a regression model will produce the minimum error possible. This error term for each model will be used to determine whether that model was a viable prediction. The error term is the result of a model being run. If the error is high, then the model can be dismissed, if the error term is low, then the model can be considered and possibly adjusted further. Also, the coefficients or weights β i in multivariate regression indicate the relative importance of each component of the model.

11 Here is an example of a model that was relatively accurate in its prediction. The x-component of the dots represents what the model predicted for each team while the y-component is the actual position. The closer a point is to the line, the closer that team s predicted value is to its actual value. If a point is above the line, then the model overestimates the placement of the actual result. If a point is below the line, then the model has underestimated it. The greater the number of points are away from the line, then greater the variance is in the data. Here is an example of a model that was less accurate.

12 There are very few dots that are on the line, therefore, very few teams were predicted in the correct place. The dots are also not very close to the line in general with some being close to ten places away. The error for this model would be very high, therefore, it will not be investigated further. Based on both models, I will look at the variables selected and try to determine why these variables produced these weights. Both were models developed to predict the league. In this project, we analyze these models and try to figure out why they produced the result they did by looking at the variables and the weights and then adjusting the next model based on the previous results. 4. The Models As stated previously, this project uses a multivariate regression analysis to predict the Premier League standings. For each of the models run, we used between 2 to 8 variables in each model which all produced different, interesting results. The relevant data was transferred to Microsoft Excel and then organized it for each model and then from there transferred to both R and MATLAB. The statistical program R was used for the multivariate regression because it allows one to run the multivariate regression for each model quickly and adjust each model easily. It produced the weights for each model ran and allows one to see how important the weights are in R s attempt to reduce the error term. MATLAB was used to run several multivariate regression analyses too. MATLAB gave me more freedom to manipulate the values depending on each model, but was a slower process than R. Both were used in the analysis successfully.

13 Using all the data I collected I was able to create three different types of variables that I would test. The three variables are the original variable, the weighted version of the variable, and the normalized version. The original version is just the data itself so if one team scored 50 goals that season and conceded 30 then their raw GD would be +20. A weighted version eliminated the differences in the values across the variables. The spread from the best team to the worst team for GD was +60 to -43 while the difference for a variable such as Red cards spread from 0 to 5. This caused some variables to be dismissed because they were barely affecting the outcome of the league. So, a weighted approach ranked the variables based on the team s position on a specific scale from 0 to 1 by.05 increments. If a team had the highest GD then they were given a value of 1, the second highest GD was given a value of.95, then.9 for third and so on. If two or more teams tied, then they were given the same decimal value between 0 and 1 and the next team would jump up more than.05 so if two teams tied a.45 the next best team would be given.35. The weighted version eliminated a bias on variables, but some team s positions were significantly higher than others originally, and now they were reduced because of arbitrary results. If a team had a GD of 100 and the next best team was 20 then these two teams would only have a difference of.05 which may lead to surprising results in the model. So, the next approach is a normalized approach in which the values are still all between 0 and 1 but they are divided by the value of the highest team in each category. This created a distribution with no bias on a team nor a variable. Each of these three methods were run to allow for diversity in the results and to determine which approach would be most accurate moving forward. The initial set of models run was done using the original data in R. The first model run would always contain every variable so that one could then reduce it from there based on simulat ion results. This is the type output R would give each time any model was run: Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) Goal_Diff ** Possesion_Avg_Pct Yellow_Cards Red_Cards Shot_Accuracy_Pct Passing_Pct Conceded_Penalties Clean_Sheets The primary output to consider is the estimate value. This is the β i discussed in section 3 which would then be inserted into the multivariate regression model. Y = β 0 + β 1 X 1 + β 2 X β n X n (*) The β i terms have now been found and the X i terms are the data initially collected. This information was plugged into the equation (*) to produce a ranking prediction. The β 0 term is the intercept allowing for one extra degree if freedom. This prediction is the Y value in the equation and it represents the value which the equation predicts the end of season ranking to be. This was

14 done for each of the twenty teams in the Premier League in Excel then sorted the values from least to greatest so that the lowest predicted value was in first and the highest predicted value was in twentieth. The furthest left column is the team, the second column is the actual end of season ranking and the third column is the predicted ranking for the most basic model. This model would then be investigated further by looking at the means squared error term to determine exactly how accurate it is but simply looking at it is helpful in determining which model might be more useful to run in the future. Most of the top 10 in this model is accurate except for West Bromwich being predicted in 16 th when they actually finished 10 th. This result would lead to a mean squared error value of 6 places off which would reduce the chances of it being accurate. This process was done for four models using the same data. These are referred to as the basic models (Model Group 1). The first being the completely basic model accounting for everything (Model 1A), the second being a reduced basic model (Model 1B), the third being a basic model without GD (Model 1C), and the fourth time being a reduced basic model without GD (Model 1D). A reduced model is one in which stepwise-elimination is conducted in R. This process is done in R by removing the least valuable variable from each model, re-running it, and then checking to see if the variables can be eliminated again until a final model is found. The final model found by R which for Model 1B, looks like this: Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) Goal_Diff e-09 ***

15 Yellow_Cards The model was reduced to just GD and Yellow Cards because it was determined that the other variables were insignificant. This model is much smaller than the others, but it still follows the multivariate regression method by plugging in the estimate terms into the β i values and then determining the predicted league ranking. The other two models were Model 1C and Model 1D. GD was eliminated to see if the league could be predicted without GD because it is the most important variable due to its nature of being the point of soccer; to score more goals. The reason for this is to determine how important the other variables were. Once Model Group 1 had all been run, I started to run the weighted models (Model Group 2) in a similar fashion. The data was used in R and the output visually looks the same. The four models run were weighted basic model (Model 2A), reduced weighted basic model (Model 2B), weighted model without GD ((Model 2C), and reduced weighted model without GD (Model 2D). These four versions are the same as the ones run in the original model. The weighted version produced more accurate results than the original version because the data values are more consistent throughout, so no team has a massive advantage on one variable. The β terms from R can be seen here: Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-09 *** Clean_Sheets_Rank_Weighted Goal_Diff_Rank_Weighted e-06 *** Possesion_Avg_Pct_Rank_Weighted YDiscipline_Rank_Weighted Shot_Accuracy_Pct_Rank_Weighted Passing_Pct_Rank_Weighted Conceded_Penalties_Rank_Weighted The key difference between Model Group 2 and Model Group 1 is the actual values of the β terms. In Model Group 1, the estimated values are all very similar except for shot accuracy. In Model Group 2, the estimated values are slightly more spread out except for GD. This is because GD is the most important variable in predicting the league so it s estimated value is much higher. The next step was to use run the normalized versions of the models in R (Model Group 3). The models conducted here are the same four variations as before except using a normalized version of the data (Model 3A, Model 3B, Model 3C, Model 3D). These results were very similar to Model Group 2 with GD being valued much more than the other data points and the spread being even throughout the rest of them. All these results for each model were logged and placed into an excel file for further comparisons later.

16 Once the models in R were conducted I went to compare the models in MATLAB where we use optimization to minimize the error. This process was done using only the normalized data because it eliminates the most bias in the data. The process is essentially the same except I was able to reduce the model as much as I felt I needed to in MATLAB and it could be done by visualizing the results of each model. A linear system is constructed: 2 = 1 20 (α 1A i + + α i N i R) 2 R EMS 20 1 where A,..,N are the variables such as GD, α i is the value we are solving for, and R is the actual end of season ranking. A partial derivative of this with respect to each variable and set equal to zero will result in: 20 0 = 1 10 n i(α 1 A i + + α i N i R) 1 where n i is the variable with which we took the partial derivative with respect to. The solution is X = A^-1*b where A= a matrix of all these partial derivatives and b= R i A i where R i is the Ranking There were numerous different models that were conducted in this approach using MATLAB. These consisted of: a normalized model using all eight variables, seven variables with yellow cards having been eliminated, 6 variables with yellow cards and clean sheets, and then these three variations without an initial β 0 term included. These variations were done by simulating each model and analyzing the results of each one, comparing it to the actual end of season ranking, and then determining why the prediction might have been further or closer to the actual result. Once I was analyzed the results I was able to determine which variable could be removed. I then would run the model again and repeat the same process. In the end around 20 models had been considered all in the attempt to predict the end of season ranking for the Premier League. 5. The Results The results of the models presented in the previous section all are useful in furthering my attempt to predict the outcome of the Premier League. The main importance was the mean squared error term which would determine how accurate each model was compared to the actual end-of-season ranking. After running each of the models, the results were sorted into Microsoft Excel for a comparison between each model with the actual end of season ranking and then in turn, each other. Many of the models produced a result that was very far from the actual end of season ranking. Almost all the models that were without GD produced a result in which the mean squared error term was above 10 which as mentioned before, would be worse than complete

17 randomization. This is because GD is key in predicting the Premier League due to it being the reason that teams win match. There is direct correlation between goals and league ranking so GD is necessary for a model to be accurate. This step eliminated several of the potential models. After observing the data, a conclusion was made that the models that did not contain GD were unsubstantial. The results of the models that lacked GD produced a result that were consistently above a mean squared error term of 9 across all types of the models. These models without GD will not be included in the observations. Model Group 1 was relatively accurate but because there was a bias towards the data with a larger spread they were not as efficient as the Model Group 2. Model Group 1 produced a mean squared error between 4.5 and 4.9 which means that each premier league team in the prediction is on average between 4.5 and 4.9 places from their actual ranking respectively. However, these results were less accurate than the weighted and normalized approaches. Model Group 2 produced a result which was the most accurate of all. Initially, I expected the Model Group 3 to be the most accurate, but the results show that Model Group 2, overall, was more accurate. Model Group 2 produced an error of 3.1 and These are both accurate because they are only 3.1 and 4.02 places off on average in the prediction. Model Group 3 produced the most consistent results for this season. There were many more models that were run because the approach in MATLAB used only normalized data. The results for the Model Group 3 produced an error term between 4.02 and 5.36 across all the models. The MATLAB approach had the largest variation of all models which consisted of the lowest and highest results, being 4.02 and 5.36 while the results from R were between 4.5 and 4.9. The three best models from the non-randomization part are as follows: Prem_Results_Weighted_End_of_Season_Rank error=4.02: Rank = (GD)-.1228(Poss) (YDIsc)-.6692(SA) (Pass) (CP) (CS) (Beta) Normalised_Model_Full_Without_Yellow_CS error=4.02 Rank = (GD) (Poss) (SA) (Pass) (CP) (Red) (Beta) Prem_Results_Reduced_Weighted_End_of_Season_Rank = 3.1 Rank = (GD) (Pass) (Beta) These three results were the most accurate in predicting the league by having mean squared error values of 4.02, 4.02, and 3.1. This is a plot of the predicted end of season result vs. The actual end of season ranking for each of the three models:

18

19 The top four is consistently accurate throughout each of these models but the winner, Chelsea, was only accurately predicted once after every single model was run. This is because Tottenham placed higher than them on almost every single category which then begs the question, why did Tottenham not win the Premier league? This occurred because while Tottenham had a better GD, Chelsea was able to be more efficient with their goals. Winning a match 5-0 rewards the same points as winning 1-0, so while Tottenham did score more and concede less than Chelsea, they were less consistent with their goal scoring across the season. The middle of the table appears to be a guess based on the plots is due to the congestion of the league in that area. The top 6 are all much further ahead than any of the other teams due to their club s wealth and the skill of the players; there is a large difference in wealth. The middle and bottom of the table, however, are all relatively close in their club s value. ([8] Premier League - Market Value v League Position, 2018). This causes the table to be very congested because any team has good odds of beating every other team in that section of the table. The top teams are almost always predicted to beat the bottom teams while the bottoms and middle teams are very even matches. This causes most of the teams to be around the same number of points at the end of the season.

20 The end-of-season ranking for each team between 8 th place and 17 th place is only a 6- point difference with many teams being on the same points and only separated by GD. This means that almost all these teams are difficult to predict because they most likely performed the same throughout the season. This lead to very few models being able to come close to predicting accurately this section of the table. The final section is the bottom of the table where the relegations spots are. From the three best models, two of them predicted the correct teams to be relegated while one predicted two of three teams to be relegated. This section of the table was easier for the model to predict because there was a very large gap in the performance of these bottom teams and the teams above them. 6 points separated 8 th and 17 th while 6 points also separated 17 th and 18 th. None of the best models predicted the exact result of the relegation zone, but it is more important to predict the correct teams rather than the exact positions. 6. Randomization The key to this analysis was to accurately predict the end of season ranking for each of the premier league teams in the season. If the models were able to accurately predict the season, then they should be able to consistently produce a result in future years as well as previous years. Based on this information, I created my own version of the season of the premier league in which some of the data was randomized to produce a new result. The purpose of this randomization is to determine how accurate the best models from the regular analysis of the premier league is in a different, hypothetical season. The data used was gathered from the season and then randomized in a specific manner. The specific data that was used was based on the specific model being used. The normalized version of the data was used for the normalized model in which the values are spread between 0 and 1 by dividing each team s values on each variable by the maximum value on that specific variable. The weighted data was used in the weighted models in which the data is evenly

21 spread between 0 and 1 by.05 increments. This created an even spread between 0 and 1 of all the data and eliminated the potential for biased data. The randomization process was done by taking the data from Excel and randomly assigning a value to each premier league team between one and two. This value determined whether the team will have their statistics increased or decreased; one for decrease, and two for increase. This randomization process was done using an online randomization calculator ([9] True Random Number Service, 2018). Once the teams had their positive or negative values assigned to them, they then would be assigned a new random value. This value was to determine which of the variables would be either increased or decreased based on the previous result. The randomization process was done using the same online calculator except this time it was assigned a value between one and eight since there are eight variables being used. One of the weighted models only had two values associated with it so the random variable selection was done using a randomizer to choose numbers between only 0 and 1 so that each team had at least one variable effected. Now that every team had been assigned a variable and a change that was positive or negative, the values could be increased or decreased to create this new version of the season. Each variable that was randomly selected would be increased or decreased by the same amount across each team; the value would be.25. So, if one team was ranked at.75 for GD, then they would now, in this new version of the season, be considered the best team at a ranking of 1. The new season has now been completely randomized and each value has been changed accordingly. Now the optimal models from the actual season can be tested on this random season to see whether the models are consistent. The three models being tested previously provided a mean squared error term of 1.6 for the weighted end of season ranking, 1.6 for the normalized model without yellow cards and clean sheets, and 1.4 for the reduced weighted model. The models produced this result:

22 Where Model 1 is the 6 variable normalized result, Model 2 is the full weighted result, and Model 3 is the reduced weighted result. This result is interesting because these three models were the best models from the actual season and yet they provide a wide range of error values. The full weighted model is consistent with the result of the actual season. The Normalized model provides an error term that is twice as large as the result from the actual season. This result would still be a useful error term for predicting the actual season however it being twice as large shows that it s consistency is in question. The wildest model is the reduced weighted model because it s error term for the randomized season is 10 times larger than the error term for the actual season. This occurred because the randomization was only spread throughout two variables one of which is GD. This means that about half of the teams had their GD affected by the randomization process with the increase or decrease being a quarter of their total goals scored. This lead to the prediction being very inaccurate for any of the teams that had their GD affected. The teams that gained GD almost certainly moved up several places in the end of season table. Any team that did not have their GD affected stayed very close to their actual result because the other teams were shifting both up and down around them. The teams that had their GD negatively affected would drop several places in the end of season ranking. The other two models are accurate to the actual end of season result despite having several of their variables randomized. This would indicate that these models would be useful in

23 another season of competition to determine the end outcome of the premier league. Also, these models are less sensitive to a significant change in any given category. 7. Further Analysis & Conclusion The next step is to test these three best models on an actual season of the Premier League opposed to a hypothetical one. The season that will be tested is slightly more than half of the season to see if the results will be accurate. The data that was collected for this season was the same 8 variables are beforehand. Two different versions of the data were collected, the weighted, and the normalized versions so that the three best models were possible to run. The data collected totaled 26 matches which is more than half of 38. The reasoning for this decision is that the transfer window for the Premier League ends on week 26. This means every single team has finished conducting their business for buying and selling players to and from different clubs. The eliminates a player being transferred away from a club as a confounding variable that could potentially be the reason they moved up or down in the ranking. The season, at the time of writing, is still ongoing. This means that comparing the results of the predictive analysis to the actual end-of-season rankings would be inconclusive because not each team has finished playing all their matches. The prediction is show below, but it cannot be compared to the end-of-season ranking because the season has not finished yet. The three best models were used from the season and can be seen in the image which model is associated with which prediction. The only team that currently has its league position decided, is Manchester City in 1 st. Each prediction predicted this at the mid-point of the season so at least 1 league position is correctly predicted.

24 To conduct this experiment more effectively in the future, a change that should be made would be collecting the data in a different way. I would not use GD as a variable, I believe that using goals scored and goals conceded as two different variables would provide a more effective model. I would also test the models on more historical seasons such as the and more past seasons. This would allow the more consistent model to be used by observing the average mean squared error term across all seasons tested. In conclusion, this project was conducted to try and predict the outcome of the premier league season. This was done by collecting important data from the season and analyzing this data by using a multivariate regression model. Many models were ran, checked, and tested to determine exactly what the best model was found to be the most accurate in its prediction. The root mean squared error was analyzed to give the best three models. These models were tested against the actual end-of-season rankings and had accurate predictions in which the mean squared error was 3.1, 4.02, and These accurate predictions were then tested in predicting a randomized league to test if the models were consistent. The results showed that a model in which the data was normalized and contained Goal Difference, Red Cards, Passing Accuracy, Possession, Shot Accuracy, and Penalties Conceded was the most consistent at predicting the Premier League.

25 8. References 1. Alexopoulos, Evangelos. Introduction to Multivariate Regression Analysis. HIPPOKRATIA, 2010, pp Shaw, Phil. ESTABLISHING THE TEMPLATE To Football League 125, 2013, 3. History of the English Premier League. SuperSport - Football, MultiChoice, 2018, 4. Premier League Statistics Retrieved from: stics/england-premier-league European Qualification for UEFA Competitions Explained. Premier League Football News, Fixtures, Scores & Results, 2018, 6. Harris, Nick. From 20 to 33,868 per Week: a Quick History of English Football's Top- Flight Wages. Sporting Intelligence, 20 Jan. 2011, 7. Predictive Analytics. Opta Sports, 2018, 8. Premier League - Market Value v League Position. Transfermarkt, 2018, 9. True Random Number Service. RANDOM.ORG - Integer Generator, 2018,

The MACC Handicap System

The MACC Handicap System MACC Racing Technical Memo The MACC Handicap System Mike Sayers Overview of the MACC Handicap... 1 Racer Handicap Variability... 2 Racer Handicap Averages... 2 Expected Variations in Handicap... 2 MACC

More information

Should bonus points be included in the Six Nations Championship?

Should bonus points be included in the Six Nations Championship? Should bonus points be included in the Six Nations Championship? Niven Winchester Joint Program on the Science and Policy of Global Change Massachusetts Institute of Technology 77 Massachusetts Avenue,

More information

Opleiding Informatica

Opleiding Informatica Opleiding Informatica Determining Good Tactics for a Football Game using Raw Positional Data Davey Verhoef Supervisors: Arno Knobbe Rens Meerhoff BACHELOR THESIS Leiden Institute of Advanced Computer Science

More information

1. OVERVIEW OF METHOD

1. OVERVIEW OF METHOD 1. OVERVIEW OF METHOD The method used to compute tennis rankings for Iowa girls high school tennis http://ighs-tennis.com/ is based on the Elo rating system (section 1.1) as adopted by the World Chess

More information

The impact of human capital accounting on the efficiency of English professional football clubs

The impact of human capital accounting on the efficiency of English professional football clubs MPRA Munich Personal RePEc Archive The impact of human capital accounting on the efficiency of English professional football clubs Anna Goshunova Institute of Finance and Economics, KFU 17 February 2013

More information

1. Answer this student s question: Is a random sample of 5% of the students at my school large enough, or should I use 10%?

1. Answer this student s question: Is a random sample of 5% of the students at my school large enough, or should I use 10%? Econ 57 Gary Smith Fall 2011 Final Examination (150 minutes) No calculators allowed. Just set up your answers, for example, P = 49/52. BE SURE TO EXPLAIN YOUR REASONING. If you want extra time, you can

More information

Using Actual Betting Percentages to Analyze Sportsbook Behavior: The Canadian and Arena Football Leagues

Using Actual Betting Percentages to Analyze Sportsbook Behavior: The Canadian and Arena Football Leagues Syracuse University SURFACE College Research Center David B. Falk College of Sport and Human Dynamics October 2010 Using Actual Betting s to Analyze Sportsbook Behavior: The Canadian and Arena Football

More information

Modeling Fantasy Football Quarterbacks

Modeling Fantasy Football Quarterbacks Augustana College Augustana Digital Commons Celebration of Learning Modeling Fantasy Football Quarterbacks Kyle Zeberlein Augustana College, Rock Island Illinois Myles Wallin Augustana College, Rock Island

More information

Sport Hedge Millionaire s Guide to a growing portfolio. Sports Hedge

Sport Hedge Millionaire s Guide to a growing portfolio. Sports Hedge Sports Hedge Sport Hedging for the millionaire inside us, a manual for hedging and profiting for a growing portfolio Congratulations, you are about to have access to the techniques sports books do not

More information

Running head: DATA ANALYSIS AND INTERPRETATION 1

Running head: DATA ANALYSIS AND INTERPRETATION 1 Running head: DATA ANALYSIS AND INTERPRETATION 1 Data Analysis and Interpretation Final Project Vernon Tilly Jr. University of Central Oklahoma DATA ANALYSIS AND INTERPRETATION 2 Owners of the various

More information

Efficiency Wages in Major League Baseball Starting. Pitchers Greg Madonia

Efficiency Wages in Major League Baseball Starting. Pitchers Greg Madonia Efficiency Wages in Major League Baseball Starting Pitchers 1998-2001 Greg Madonia Statement of Problem Free agency has existed in Major League Baseball (MLB) since 1974. This is a mechanism that allows

More information

Home Team Advantage in English Premier League

Home Team Advantage in English Premier League Patrice Marek* and František Vávra** *European Centre of Excellence NTIS New Technologies for the Information Society, Faculty of Applied Sciences, University of West Bohemia, Czech Republic: patrke@kma.zcu.cz

More information

Predicting the Total Number of Points Scored in NFL Games

Predicting the Total Number of Points Scored in NFL Games Predicting the Total Number of Points Scored in NFL Games Max Flores (mflores7@stanford.edu), Ajay Sohmshetty (ajay14@stanford.edu) CS 229 Fall 2014 1 Introduction Predicting the outcome of National Football

More information

Why We Should Use the Bullpen Differently

Why We Should Use the Bullpen Differently Why We Should Use the Bullpen Differently A look into how the bullpen can be better used to save runs in Major League Baseball. Andrew Soncrant Statistics 157 Final Report University of California, Berkeley

More information

Pokémon Organized Play Tournament Operation Procedures

Pokémon Organized Play Tournament Operation Procedures Pokémon Organized Play Tournament Operation Procedures Revised: September 20, 2012 Table of Contents 1. Introduction...3 2. Pre-Tournament Announcements...3 3. Approved Tournament Styles...3 3.1. Swiss...3

More information

Football is one of the biggest betting scenes in the world. This guide will teach you the basics of betting on football.

Football is one of the biggest betting scenes in the world. This guide will teach you the basics of betting on football. 1 of 18 THE BASICS We re not going to fluff this guide up with an introduction, a definition of football or what betting is. Instead, we re going to get right down to it starting with the basics of betting

More information

The Rise in Infield Hits

The Rise in Infield Hits The Rise in Infield Hits Parker Phillips Harry Simon December 10, 2014 Abstract For the project, we looked at infield hits in major league baseball. Our first question was whether or not infield hits have

More information

Primary Objectives. Content Standards (CCSS) Mathematical Practices (CCMP) Materials

Primary Objectives. Content Standards (CCSS) Mathematical Practices (CCMP) Materials ODDSBALLS When is it worth buying a owerball ticket? Mathalicious 204 lesson guide Everyone knows that winning the lottery is really, really unlikely. But sometimes those owerball jackpots get really,

More information

Using Poisson Distribution to predict a Soccer Betting Winner

Using Poisson Distribution to predict a Soccer Betting Winner Using Poisson Distribution to predict a Soccer Betting Winner By SYED AHMER RIZVI 1511060 Section A Quantitative Methods - I APPLICATION OF DESCRIPTIVE STATISTICS AND PROBABILITY IN SOCCER Concept This

More information

An Analysis of Factors Contributing to Wins in the National Hockey League

An Analysis of Factors Contributing to Wins in the National Hockey League International Journal of Sports Science 2014, 4(3): 84-90 DOI: 10.5923/j.sports.20140403.02 An Analysis of Factors Contributing to Wins in the National Hockey League Joe Roith, Rhonda Magel * Department

More information

Is Home-Field Advantage Driven by the Fans? Evidence from Across the Ocean. Anne Anders 1 John E. Walker Department of Economics Clemson University

Is Home-Field Advantage Driven by the Fans? Evidence from Across the Ocean. Anne Anders 1 John E. Walker Department of Economics Clemson University Is Home-Field Advantage Driven by the Fans? Evidence from Across the Ocean Anne Anders 1 John E. Walker Department of Economics Clemson University Kurt W. Rotthoff Stillman School of Business Seton Hall

More information

Tournament Operation Procedures

Tournament Operation Procedures Tournament Operation Procedures Date of last revision: NOTE: In the case of a discrepancy between the content of the English-language version of this document and that of any other version of this document,

More information

News English.com Ready-to-use ESL / EFL Lessons

News English.com Ready-to-use ESL / EFL Lessons www.breaking News English.com Ready-to-use ESL / EFL Lessons The Breaking News English.com Resource Book 1,000 Ideas & Activities For Language Teachers http://www.breakingnewsenglish.com/book.html English

More information

Applications of Culture in Mathematics NCCTM 2008

Applications of Culture in Mathematics NCCTM 2008 Matthew Smith Wake Forest University TOPIC: Soccer, Currency, and a Global Market. NCTM STANDARDS: Numbers and Operations, Algebra, Measurements, Data Analysis & Probability, Problem Solving, Reasoning,

More information

What Causes the Favorite-Longshot Bias? Further Evidence from Tennis

What Causes the Favorite-Longshot Bias? Further Evidence from Tennis MPRA Munich Personal RePEc Archive What Causes the Favorite-Longshot Bias? Further Evidence from Tennis Jiri Lahvicka 30. June 2013 Online at http://mpra.ub.uni-muenchen.de/47905/ MPRA Paper No. 47905,

More information

Kelsey Schroeder and Roberto Argüello June 3, 2016 MCS 100 Final Project Paper Predicting the Winner of The Masters Abstract This paper presents a

Kelsey Schroeder and Roberto Argüello June 3, 2016 MCS 100 Final Project Paper Predicting the Winner of The Masters Abstract This paper presents a Kelsey Schroeder and Roberto Argüello June 3, 2016 MCS 100 Final Project Paper Predicting the Winner of The Masters Abstract This paper presents a new way of predicting who will finish at the top of the

More information

By Rob Friday

By Rob Friday By Rob Friday www.sports-picker.com How To Earn Money Sports Investing In this report I will share with you a strategy that if followed will ensure you have the maximum opportunity to profit investing

More information

AN ANALYSIS OF TEAM STATISTICS IN AUSTRALIAN RULES FOOTBALL. Andrew Patterson and Stephen R. Clarke 1. Abstract 1. INTRODUCTION

AN ANALYSIS OF TEAM STATISTICS IN AUSTRALIAN RULES FOOTBALL. Andrew Patterson and Stephen R. Clarke 1. Abstract 1. INTRODUCTION AN ANALYSIS OF TEAM STATISTICS IN AUSTRALIAN RULES FOOTBALL Andrew Patterson and Stephen R. Clarke 1 Abstract Champion Data has collected Australian Rules Football player statistics since the beginning

More information

Behavior under Social Pressure: Empty Italian Stadiums and Referee Bias

Behavior under Social Pressure: Empty Italian Stadiums and Referee Bias Behavior under Social Pressure: Empty Italian Stadiums and Referee Bias Per Pettersson-Lidbom a and Mikael Priks bc* April 11, 2010 Abstract Due to tightened safety regulation, some Italian soccer teams

More information

Simulating Major League Baseball Games

Simulating Major League Baseball Games ABSTRACT Paper 2875-2018 Simulating Major League Baseball Games Justin Long, Slippery Rock University; Brad Schweitzer, Slippery Rock University; Christy Crute Ph.D, Slippery Rock University The game of

More information

March Madness Basketball Tournament

March Madness Basketball Tournament March Madness Basketball Tournament Math Project COMMON Core Aligned Decimals, Fractions, Percents, Probability, Rates, Algebra, Word Problems, and more! To Use: -Print out all the worksheets. -Introduce

More information

Navigate to the golf data folder and make it your working directory. Load the data by typing

Navigate to the golf data folder and make it your working directory. Load the data by typing Golf Analysis 1.1 Introduction In a round, golfers have a number of choices to make. For a particular shot, is it better to use the longest club available to try to reach the green, or would it be better

More information

Ranking teams in partially-disjoint tournaments

Ranking teams in partially-disjoint tournaments Ranking teams in partially-disjoint tournaments Alex Choy Mentor: Chris Jones September 16, 2013 1 Introduction Throughout sports, whether it is professional or collegiate sports, teams are ranked. In

More information

How Effective is Change of Pace Bowling in Cricket?

How Effective is Change of Pace Bowling in Cricket? How Effective is Change of Pace Bowling in Cricket? SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries.

More information

Equation 1: F spring = kx. Where F is the force of the spring, k is the spring constant and x is the displacement of the spring. Equation 2: F = mg

Equation 1: F spring = kx. Where F is the force of the spring, k is the spring constant and x is the displacement of the spring. Equation 2: F = mg 1 Introduction Relationship between Spring Constant and Length of Bungee Cord In this experiment, we aimed to model the behavior of the bungee cord that will be used in the Bungee Challenge. Specifically,

More information

Our Shining Moment: Hierarchical Clustering to Determine NCAA Tournament Seeding

Our Shining Moment: Hierarchical Clustering to Determine NCAA Tournament Seeding Trunzo Scholz 1 Dan Trunzo and Libby Scholz MCS 100 June 4, 2016 Our Shining Moment: Hierarchical Clustering to Determine NCAA Tournament Seeding This project tries to correctly predict the NCAA Tournament

More information

WELCOME TO FORM LAB MAX

WELCOME TO FORM LAB MAX WELCOME TO FORM LAB MAX Welcome to Form Lab Max. This welcome document shows you how to start using Form Lab Max and then as you become more proficient you ll discover your own strategies and become a

More information

The Project The project involved developing a simulation model that determines outcome probabilities in professional golf tournaments.

The Project The project involved developing a simulation model that determines outcome probabilities in professional golf tournaments. Applications of Bayesian Inference and Simulation in Professional Golf 1 Supervised by Associate Professor Anthony Bedford 1 1 RMIT University, Melbourne, Australia This report details a summary of my

More information

March Madness Basketball Tournament

March Madness Basketball Tournament March Madness Basketball Tournament Math Project COMMON Core Aligned Decimals, Fractions, Percents, Probability, Rates, Algebra, Word Problems, and more! To Use: -Print out all the worksheets. -Introduce

More information

a) List and define all assumptions for multiple OLS regression. These are all listed in section 6.5

a) List and define all assumptions for multiple OLS regression. These are all listed in section 6.5 Prof. C. M. Dalton ECN 209A Spring 2015 Practice Problems (After HW1, HW2, before HW3) CORRECTED VERSION Question 1. Draw and describe a relationship with heteroskedastic errors. Support your claim with

More information

PROFIT RECALL 2 USER GUIDE

PROFIT RECALL 2 USER GUIDE PROFIT RECALL vbeta 2 USER GUIDE DISCLAIMER Please make sure you read this disclaimer all the way through and contact us if there is anything you don't understand before you use the contents of this guide.

More information

A Network-Assisted Approach to Predicting Passing Distributions

A Network-Assisted Approach to Predicting Passing Distributions A Network-Assisted Approach to Predicting Passing Distributions Angelica Perez Stanford University pereza77@stanford.edu Jade Huang Stanford University jayebird@stanford.edu Abstract We introduce an approach

More information

Handicapping Process Series Part 6 of 6: Wrapping It Up

Handicapping Process Series Part 6 of 6: Wrapping It Up Frandsen Publishing Presents Favorite ALL-Ways TM Newsletter Articles Handicapping Process Series Part 6 of 6: Wrapping It Up This six-part Handicapping Process Series covers, in a simple step-by-step

More information

Practice Test Unit 6B/11A/11B: Probability and Logic

Practice Test Unit 6B/11A/11B: Probability and Logic Note to CCSD Pre-Algebra Teachers: 3 rd quarter benchmarks begin with the last 2 sections of Chapter 6, and then address Chapter 11 benchmarks; logic concepts are also included. We have combined probability

More information

PREDICTING the outcomes of sporting events

PREDICTING the outcomes of sporting events CS 229 FINAL PROJECT, AUTUMN 2014 1 Predicting National Basketball Association Winners Jasper Lin, Logan Short, and Vishnu Sundaresan Abstract We used National Basketball Associations box scores from 1991-1998

More information

Lesson 14: Games of Chance and Expected Value

Lesson 14: Games of Chance and Expected Value Student Outcomes Students use expected payoff to compare strategies for a simple game of chance. Lesson Notes This lesson uses examples from the previous lesson as well as some new examples that expand

More information

Most hit keno numbers ohio

Most hit keno numbers ohio content="http://dpsinfo.com/2001/teens/index.html"> Most hit keno numbers ohio Numbers Frequency Drawn ago Days ago Last Drawn Date. Numbers Frequency Drawn ago Days ago Last Drawn Date. Numbers Frequency

More information

Building an NFL performance metric

Building an NFL performance metric Building an NFL performance metric Seonghyun Paik (spaik1@stanford.edu) December 16, 2016 I. Introduction In current pro sports, many statistical methods are applied to evaluate player s performance and

More information

Practice Test Unit 06B 11A: Probability, Permutations and Combinations. Practice Test Unit 11B: Data Analysis

Practice Test Unit 06B 11A: Probability, Permutations and Combinations. Practice Test Unit 11B: Data Analysis Note to CCSD HS Pre-Algebra Teachers: 3 rd quarter benchmarks begin with the last 2 sections of Chapter 6 (probability, which we will refer to as 6B), and then address Chapter 11 benchmarks (which will

More information

NHL SALARY DETERMINATION AND DISTRIBUTION A THESIS. Presented to. The Colorado College. Bachelor of Arts. Ian Young. February 2015

NHL SALARY DETERMINATION AND DISTRIBUTION A THESIS. Presented to. The Colorado College. Bachelor of Arts. Ian Young. February 2015 NHL SALARY DETERMINATION AND DISTRIBUTION A THESIS Presented to The Faculty of the Department of Economics and Business The Colorado College In Partial Fulfillment of the Requirements for the Degree Bachelor

More information

Regression to the Mean at The Masters Golf Tournament A comparative analysis of regression to the mean on the PGA tour and at the Masters Tournament

Regression to the Mean at The Masters Golf Tournament A comparative analysis of regression to the mean on the PGA tour and at the Masters Tournament Regression to the Mean at The Masters Golf Tournament A comparative analysis of regression to the mean on the PGA tour and at the Masters Tournament Kevin Masini Pomona College Economics 190 2 1. Introduction

More information

Naval Postgraduate School, Operational Oceanography and Meteorology. Since inputs from UDAS are continuously used in projects at the Naval

Naval Postgraduate School, Operational Oceanography and Meteorology. Since inputs from UDAS are continuously used in projects at the Naval How Accurate are UDAS True Winds? Charles L Williams, LT USN September 5, 2006 Naval Postgraduate School, Operational Oceanography and Meteorology Abstract Since inputs from UDAS are continuously used

More information

b

b Empirically Derived Breaking Strengths for Basket Hitches and Wrap Three Pull Two Webbing Anchors Thomas Evans a and Aaron Stavens b a Montana State University, Department of Earth Sciences, PO Box 173480,

More information

Evaluating The Best. Exploring the Relationship between Tom Brady s True and Observed Talent

Evaluating The Best. Exploring the Relationship between Tom Brady s True and Observed Talent Evaluating The Best Exploring the Relationship between Tom Brady s True and Observed Talent Heather Glenny, Emily Clancy, and Alex Monahan MCS 100: Mathematics of Sports Spring 2016 Tom Brady s recently

More information

Beyond the game: Women s football as a proxy for gender equality

Beyond the game: Women s football as a proxy for gender equality Beyond the game: Women s football as a proxy for gender equality Morris, Ruth and Morris, Ben. Women s football: Played, Watched, Talked about! FREE Conference University of Copenhagen, June 2013 This

More information

Review of A Detailed Investigation of Crash Risk Reduction Resulting from Red Light Cameras in Small Urban Areas by M. Burkey and K.

Review of A Detailed Investigation of Crash Risk Reduction Resulting from Red Light Cameras in Small Urban Areas by M. Burkey and K. Review of A Detailed Investigation of Crash Risk Reduction Resulting from Red Light Cameras in Small Urban Areas by M. Burkey and K. Obeng Sergey Y. Kyrychenko Richard A. Retting November 2004 Mark Burkey

More information

Project Title: Overtime Rules in Soccer and their Effect on Winning Percentages

Project Title: Overtime Rules in Soccer and their Effect on Winning Percentages Project Title: Overtime Rules in Soccer and their Effect on Winning Percentages Group Members: Elliot Chanen, Lenny Bronner, Daniel Ramos Introduction: We will examine the overtime rules of soccer to evaluate

More information

How to Make, Interpret and Use a Simple Plot

How to Make, Interpret and Use a Simple Plot How to Make, Interpret and Use a Simple Plot A few of the students in ASTR 101 have limited mathematics or science backgrounds, with the result that they are sometimes not sure about how to make plots

More information

Pierce 0. Measuring How NBA Players Were Paid in the Season Based on Previous Season Play

Pierce 0. Measuring How NBA Players Were Paid in the Season Based on Previous Season Play Pierce 0 Measuring How NBA Players Were Paid in the 2011-2012 Season Based on Previous Season Play Alex Pierce Economics 499: Senior Research Seminar Dr. John Deal May 14th, 2014 Pierce 1 Abstract This

More information

Lecture 22: Multiple Regression (Ordinary Least Squares -- OLS)

Lecture 22: Multiple Regression (Ordinary Least Squares -- OLS) Statistics 22_multiple_regression.pdf Michael Hallstone, Ph.D. hallston@hawaii.edu Lecture 22: Multiple Regression (Ordinary Least Squares -- OLS) Some Common Sense Assumptions for Multiple Regression

More information

STUDY BACKGROUND. Trends in NCAA Student-Athlete Gambling Behaviors and Attitudes. Executive Summary

STUDY BACKGROUND. Trends in NCAA Student-Athlete Gambling Behaviors and Attitudes. Executive Summary STUDY BACKGROUND Trends in NCAA Student-Athlete Gambling Behaviors and Attitudes Executive Summary November 2017 Overall rates of gambling among NCAA men have decreased. Fifty-five percent of men in the

More information

Effect of homegrown players on professional sports teams

Effect of homegrown players on professional sports teams Effect of homegrown players on professional sports teams ISYE 2028 Rahul Patel 902949215 Problem Description: Football is commonly referred to as America s favorite pastime. However, for thousands of people

More information

save percentages? (Name) (University)

save percentages? (Name) (University) 1 IB Maths Essay: What is the correlation between the height of football players and their save percentages? (Name) (University) Table of Contents Raw Data for Analysis...3 Table 1: Raw Data...3 Rationale

More information

b

b Empirically Derived Breaking Strengths for Basket Hitches and Wrap Three Pull Two Webbing Anchors Thomas Evans a and Aaron Stavens b a Montana State University, Department of Earth Sciences, PO Box 173480,

More information

The importance of t. Gordon Craig, Coerver Coaching Director

The importance of t. Gordon Craig, Coerver Coaching Director Gordon Craig, Coerver Coaching Director The importance of t Inspired by the ideas of the Dutch coach, Wiel Coerver in the 60 s, that all the great skills from the top players at the time could be taught

More information

Do Clutch Hitters Exist?

Do Clutch Hitters Exist? Do Clutch Hitters Exist? David Grabiner SABRBoston Presents Sabermetrics May 20, 2006 http://remarque.org/~grabiner/bosclutch.pdf (Includes some slides skipped in the original presentation) 1 Two possible

More information

Algebra I: A Fresh Approach. By Christy Walters

Algebra I: A Fresh Approach. By Christy Walters Algebra I: A Fresh Approach By Christy Walters 2016 A+ Education Services All rights reserved. No part of this publication may be reproduced, distributed, stored in a retrieval system, or transmitted,

More information

y ) s x x )(y i (x i r = 1 n 1 s y Statistics Lecture 7 Exploring Data , y 2 ,y n (x 1 ),,(x n ),(x 2 ,y 1 How two variables vary together

y ) s x x )(y i (x i r = 1 n 1 s y Statistics Lecture 7 Exploring Data , y 2 ,y n (x 1 ),,(x n ),(x 2 ,y 1 How two variables vary together Statistics 111 - Lecture 7 Exploring Data Numerical Summaries for Relationships between Variables Administrative Notes Homework 1 due in recitation: Friday, Feb. 5 Homework 2 now posted on course website:

More information

Reading Time: 15 minutes Writing Time: 1 hour 30 minutes. Structure of Book. Number of questions to be answered. Number of modules to be answered

Reading Time: 15 minutes Writing Time: 1 hour 30 minutes. Structure of Book. Number of questions to be answered. Number of modules to be answered Reading Time: 15 minutes Writing Time: 1 hour 30 minutes Letter Student Number: Structure of Book Section A - Core Section B - Modules Number of questions Number of questions to be answered Number of marks

More information

The next criteria will apply to partial tournaments. Consider the following example:

The next criteria will apply to partial tournaments. Consider the following example: Criteria for Assessing a Ranking Method Final Report: Undergraduate Research Assistantship Summer 2003 Gordon Davis: dagojr@email.arizona.edu Advisor: Dr. Russel Carlson One of the many questions that

More information

Two Machine Learning Approaches to Understand the NBA Data

Two Machine Learning Approaches to Understand the NBA Data Two Machine Learning Approaches to Understand the NBA Data Panagiotis Lolas December 14, 2017 1 Introduction In this project, I consider applications of machine learning in the analysis of nba data. To

More information

9-11 YEAR OLD PLAYERS

9-11 YEAR OLD PLAYERS STAGE 3 ACTIVITIES 9-11 YEAR OLD PLAYERS NSCAA Foundations of Coaching Diploma NSCAA Foundations of Coaching Activities Stage 3: 9-11 Year Old Players 4V1 TO 4V2 IN THIRDS This game provides attackers

More information

Jonathan White Paper Title: An Analysis of the Relationship between Pressure and Performance in Major League Baseball Players

Jonathan White Paper Title: An Analysis of the Relationship between Pressure and Performance in Major League Baseball Players Jonathan White Paper Title: An Analysis of the Relationship between Pressure and Performance in Major League Baseball Players If you were to scrutinize Alex Rodriguez s statistics during the 2006 season,

More information

Journal of Human Sport and Exercise E-ISSN: Universidad de Alicante España

Journal of Human Sport and Exercise E-ISSN: Universidad de Alicante España Journal of Human Sport and Exercise E-ISSN: 1988-5202 jhse@ua.es Universidad de Alicante España SOÓS, ISTVÁN; FLORES MARTÍNEZ, JOSÉ CARLOS; SZABO, ATTILA Before the Rio Games: A retrospective evaluation

More information

Chapter 5: Methods and Philosophy of Statistical Process Control

Chapter 5: Methods and Philosophy of Statistical Process Control Chapter 5: Methods and Philosophy of Statistical Process Control Learning Outcomes After careful study of this chapter You should be able to: Understand chance and assignable causes of variation, Explain

More information

Using Markov Chains to Analyze a Volleyball Rally

Using Markov Chains to Analyze a Volleyball Rally 1 Introduction Using Markov Chains to Analyze a Volleyball Rally Spencer Best Carthage College sbest@carthage.edu November 3, 212 Abstract We examine a volleyball rally between two volleyball teams. Using

More information

Predictors for Winning in Men s Professional Tennis

Predictors for Winning in Men s Professional Tennis Predictors for Winning in Men s Professional Tennis Abstract In this project, we use logistic regression, combined with AIC and BIC criteria, to find an optimal model in R for predicting the outcome of

More information

Copyright Winningmore.com 2008

Copyright Winningmore.com 2008 By Steve Davidson 2008 www.winningmore.com The ultimate guide to making a tax-free living from Backing Horses. Presented by; Steve Davidson www.winningmore.com 1 P age Liability and Disclaimer The author

More information

Case Study: How Misinterpreting Probabilities Can Cost You the Game. Kurt W. Rotthoff Seton Hall University Stillman School of Business

Case Study: How Misinterpreting Probabilities Can Cost You the Game. Kurt W. Rotthoff Seton Hall University Stillman School of Business Case Study: How Misinterpreting Probabilities Can Cost You the Game Kurt W. Rotthoff Seton Hall University Stillman School of Business Abstract: Using data to make future decisions can increase the odds

More information

Old Age and Treachery vs. Youth and Skill: An Analysis of the Mean Age of World Series Teams

Old Age and Treachery vs. Youth and Skill: An Analysis of the Mean Age of World Series Teams ABSTRACT SESUG Paper BB-67-2017 Old Age and Treachery vs. Youth and Skill: An Analysis of the Mean Age of World Series Teams Joe DeMaio, Kennesaw State University Every October, baseball fans discuss and

More information

Revisiting the Hot Hand Theory with Free Throw Data in a Multivariate Framework

Revisiting the Hot Hand Theory with Free Throw Data in a Multivariate Framework Calhoun: The NPS Institutional Archive DSpace Repository Faculty and Researchers Faculty and Researchers Collection 2010 Revisiting the Hot Hand Theory with Free Throw Data in a Multivariate Framework

More information

CHAPTER 5 RESULTS AND ANALYSIS

CHAPTER 5 RESULTS AND ANALYSIS CHAPTER 5 RESULTS AND ANALYSIS 5.1 European s Top Leagues In season 2015 to 2016, the revenue of the major European football leagues have increased by 1.4 billion pounds as a whole. This was a 12% of increase,

More information

A Developmental Approach. To The Soccer Learning Process

A Developmental Approach. To The Soccer Learning Process A Developmental Approach To The Soccer Learning Process Soccer by definition Soccer is a game played between 2 teams and each team is trying to score more goals than the other team. Soccer games are decided

More information

A New Chart for Pitchers and My Top 10 Pitching Thoughts Cindy Bristow - Softball Excellence

A New Chart for Pitchers and My Top 10 Pitching Thoughts Cindy Bristow - Softball Excellence This is Part 6 of my 6 part article series for the National Fastpitch Coaches Association (NFCA) Since this is my last article I want to share something cool with you I ve learned recently, along with

More information

A V C A - B A D G E R R E G I O N E D U C A T I O N A L T I P O F T H E W E E K

A V C A - B A D G E R R E G I O N E D U C A T I O N A L T I P O F T H E W E E K A V C A - B A D G E R R E G I O N E D U C A T I O N A L T I P O F T H E W E E K P E R F O R M A N C E B E N C H M A R K S ( S T A T S ) F O R T H E R E S T O F U S KYLE MASHIMA, ROTATE23 AND SOLOSTATS23

More information

Lab Report Outline the Bones of the Story

Lab Report Outline the Bones of the Story Lab Report Outline the Bones of the Story In this course, you are asked to write only the outline of a lab report. A good lab report provides a complete record of your experiment, and even in outline form

More information

Legendre et al Appendices and Supplements, p. 1

Legendre et al Appendices and Supplements, p. 1 Legendre et al. 2010 Appendices and Supplements, p. 1 Appendices and Supplement to: Legendre, P., M. De Cáceres, and D. Borcard. 2010. Community surveys through space and time: testing the space-time interaction

More information

Effects of Incentives: Evidence from Major League Baseball. Guy Stevens April 27, 2013

Effects of Incentives: Evidence from Major League Baseball. Guy Stevens April 27, 2013 Effects of Incentives: Evidence from Major League Baseball Guy Stevens April 27, 2013 1 Contents 1 Introduction 2 2 Data 3 3 Models and Results 4 3.1 Total Offense................................... 4

More information

ELO MECHanICS EXPLaInED & FAQ

ELO MECHanICS EXPLaInED & FAQ By Mario Kovacevic, RankR How accurate is the Elo Ranking System? This is a bit of a philosophical question. In order for us to have a sense of how accurate Elo is, we need to have some sort of PERFECT

More information

CIES Football Observatory Monthly Report n 37 - September Financial analysis of the transfer market in the big-5 leagues ( )

CIES Football Observatory Monthly Report n 37 - September Financial analysis of the transfer market in the big-5 leagues ( ) CIES Football Observatory Monthly Report n 37 - September 2018 Financial analysis of the transfer market in the big-5 leagues (2010-2018) Drs Raffaele Poli, Loïc Ravenel and Roger Besson 1. Introduction

More information

Wildlife Ad Awareness & Attitudes Survey 2015

Wildlife Ad Awareness & Attitudes Survey 2015 Wildlife Ad Awareness & Attitudes Survey 2015 Contents Executive Summary 3 Key Findings: 2015 Survey 8 Comparison between 2014 and 2015 Findings 27 Methodology Appendix 41 2 Executive Summary and Key Observations

More information

NCAA Recruits & Ranks Research

NCAA Recruits & Ranks Research Danny Town Di Chu J.C. Vivek G. NCAA Recruits & Ranks Research (Image Source: http://www.tannerfriedman.com/blog/?p=3575) Intro NCAA College Basketball is one of the most worldwide recognized sports today.

More information

NBA TEAM SYNERGY RESEARCH REPORT 1

NBA TEAM SYNERGY RESEARCH REPORT 1 NBA TEAM SYNERGY RESEARCH REPORT 1 NBA Team Synergy and Style of Play Analysis Karrie Lopshire, Michael Avendano, Amy Lee Wang University of California Los Angeles June 3, 2016 NBA TEAM SYNERGY RESEARCH

More information

How To Bet On Football

How To Bet On Football INTRODUCTION TO SPORTS BETTING Sports betting is a fun and exciting way to bolster your total sports experience and to make some money along the way. Using this guide will teach you how to go about your

More information

Contact with your suggestions for this chapter. Chapter1 Standard 4 v 4

Contact with your suggestions for this chapter. Chapter1 Standard 4 v 4 Chapter1 Standard 4 v 4 All the following games will use the following standard 4 v 4 pitch in the diagram below unless a new diagram is shown. Win by 1 Normal game of 4 v 4 but you can never lead the

More information

An examination of try scoring in rugby union: a review of international rugby statistics.

An examination of try scoring in rugby union: a review of international rugby statistics. An examination of try scoring in rugby union: a review of international rugby statistics. Peter Laird* and Ross Lorimer**. *4 Seton Place, Edinburgh, EH9 2JT. **66/5 Longstone Street, Edinburgh, EH14 2DA.

More information

Become Expert at Something

Become Expert at Something Frandsen Publishing Presents Favorite ALL-Ways TM Newsletter Articles Become Expert at Something To achieve reasonably consistent profitable play, it is critically important to become expert at handling

More information

July 2010, Number 58 ALL-WAYS TM NEWSLETTER

July 2010, Number 58 ALL-WAYS TM NEWSLETTER July 2010, Number 58 ALL-WAYS TM NEWSLETTER Inside This Newsletter Getting Started with ALL-Ways Wagering Reference Sheet Handicapping Reference Sheet Announcements Coming Soon Some things to look for

More information

Gail Howard's Three Methods to Win at Lotto Written by Gail Howard

Gail Howard's Three Methods to Win at Lotto Written by Gail Howard Gail Howard's Three Methods to Win at Lotto Written by Gail Howard Do you buy lottery tickets and dream sweet dreams about winning millions$ of dollars? Maybe that dream can come true for you as it has

More information

5th Grade Decimal Concepts

5th Grade Decimal Concepts Slide 1 / 192 Slide 2 / 192 5th Grade Decimal Concepts 2015-11-16 www.njctl.org Slide 3 / 192 Table of Contents What is a Decimal? Click on a topic to go to that section. Identify Place Values Read and

More information