Evolving strategies for prediction of sporting fixtures

Size: px
Start display at page:

Download "Evolving strategies for prediction of sporting fixtures"

Transcription

1 Evolving strategies for prediction of sporting fixtures Mark Rowan School of Computer Science, The University of Birmingham, Birmingham, B15 2TT, UK April 30, 2007 Supervisor: Dr. John Bullinaria

2 Abstract Gambling is a risky business, particularly in the field of sports betting, where the inherently random nature of the game can present great difficulties to systems attempting to predict future event outcomes. In this mini-project, a genetic algorithm representation is presented with the intention of recovering underlying structure in football data, including potential temporal changes. A number of enhancements to the algorithm are then proposed and implemented, with varying degrees of success, and suitable values for a range of parameters are identified. The preliminary results show that it is possible to recover at least sufficient underlying structure from the data to break even across the length of a footballing season, and indeed that in certain circumstances, a healthy profit may be made. The report ends with a comprehensive list of proposals for extension of the system and implementation of alternative technologies. Football, betting, prediction, gambling, sports, genetic algorithm, evolutionary computa- Keywords tion.

3 Contents 1 Introduction Problem Definition Relevant literature Design Genetic algorithm Representation Fitness evaluation Crossover Mutation Data sources Program structure Gambler Predictor Population Individual ScoresDatabase FixtureResult Experiments performed Preparing the testing data set Naive betting Naive betting vs basic GA Number of epochs Population size Fitness scaling Self-adaptation Self-adapting home/away win ratio Self-adapting bitstring and home/away win ratio mutation rate Self-adaptation (complex schema) Bounded self-adapting mutation strengths Fast EP Improved Fast EP Points given for a correct prediction Experts Multiple oracles (ensemble machine) Multiple populations Human selection of bets Performance on other data sets English Premier League German Bundesliga German Bundesliga Evolved values in representation

4 4 Evaluation Conclusions Further work Improvements to the genetic algorithm General improvements to the project A Dividing data into sets by time 44 B Human-selected bets for testing set 45 C Literature search strategy 50 C.1 Forms of literature C.2 Range of literature C.3 Initial search tokens C.4 Search strategy D Project proposal 51 3

5 List of Figures 2.1 Program structure Naive betting Effect of varying number of epochs Effect of varying size of population Different fitness scaling schemes Self adaptation strengths for the home/away win ratio Self-adapting win ratio and bitstring mutation Self-adapting mutation strengths (complex schema) Bounded self-adapting mutation strengths (complex schema) Cauchy-based mutation and self-adaptation Improved Fast-EP mutation Utilisation of Gaussian vs Cauchy mutation operators (40 epochs) Utilisation of Gaussian vs Cauchy mutation operators (500 epochs) Points given by an individual if the home team is predicted to win Using a number of oracles Switching number of oracles after x consecutive losses Adapt number of oracles after x losses across y bets: finding gradient Adapt number of oracles after x losses across y bets: fine-tuning gradient Adapt number of oracles after x losses across y bets: extending gradient Number of populations Weighting populations predictions across time Performance of the system compared to a human expert Performance of the system on English Premier League testing data Performance of the system on German Bundesliga testing data Performance of the system on German Bundesliga testing data Actual representation values evolved for each team

6 Chapter 1 Introduction Gambling is a risky business, and much effort has taken place to analyse statistics and try to foretell the outcomes of uncertain events for profit. One of the most popular gambling arenas is that of sports betting. For the purposes of this project, English football particularly, the Premier League will be used as the sport for study. Genetic algorithms, pioneered by Holland [10], have long been known to perform well on many general function optimisation problems. If the problem of comparing pairs of football teams can successfully be represented as a numerical function, then it follows that a GA should have at least some measure of success at optimising the function, in order to ascertain some understanding of the data describing the football teams. 1.1 Problem Definition Sports, such as football, are inherently random in nature, which makes forecasting their futures particularly difficult. There is massive business in bookmaking and betting, and almost as large a market in tipsters who are paid to advise gamblers, and systems designed to keep track of large quantities of data and statistics about the game to aid in making predictions. The aim of this project is to determine the success of attempting to use genetic algorithms to evolve strategies for prediction of Premier League football matches. This will entail designing an appropriate representation for a genetic algorithm, in order to reduce the problem of comparing pairs of teams to each other down to a simple multi-dimensional function. It will also entail implementing various extensions or additions to the GA, testing them to show which are beneficial, and selecting ideal values for the various parameters. Finally, this work should also lead to a proposal of future areas and ideas to be researched. 1.2 Relevant literature Clair and Letscher [3] successfully implement a strategy to predict winners and underdogs in football pools. This is quite a different strategy to what will need to be evolved for this project, however, as the tournament for which they are predicting is a simple knockout, where the winner of a pair of teams goes on to play in the next round. They focus on a feature unique to the pools, of computationally finding a balance between predicting success for the favourite team, and predicting an upset caused by the underdog. This is done in order to minimise the size of the pool of players with which the money will be shared, rather than just to maximise the number of correct predictions, which is the aim of this project. Tsakonas et al. [20] present work similar in a number of ways to the aims of this project. They find success in using fuzzy rules, neural networks, and genetic programming, and they highly recommend such soft-computing techniques as an area for future research in the field of computer-aided gambling. They make use of a relatively large number of data features ( difference of infirmity factors; difference of dynamics profile; difference of ranks; host factor; personal score of the teams ). 5

7 With the exception of the study in neural networks, they focus primarily on the use of these techniques to generate hard rule-sets which then govern prediction, rather than using the techniques themselves to dynamically adapt to the data and predict outcomes more flexibly. It will be beneficial to draw on some of their experiences, and attempt firstly to reduce the extent of the input data features to just those which are necessary to achieve a good rate of correct prediction. Conversely, this mini-project aims to evolve a representation of the strengths and weaknesses of teams which is flexible and can accommodate new data as it is encountered, rather than generation of rigid sets of rules using fuzzy logic or genetic programming, which cannot accommodate flexibility in input data to such a high degree. Rotshtein et al. [16] attempt to use a simpler set of input data, using the logic that human sports fans frequently make predictions based on simple, common-sense assumptions, such as IF a team T1 won all previous matches, AND a team T2 lost all previous matches, AND the team T1 won the previous matches between T1 and T2, THEN it should be expected that T1 would win. They show how this can be formalised as a fuzzy logic model, and progress to devise such a model. Their input data is far more sparse than that of Tsakonas et al. [20]. The authors introduce a genetic algorithm and a neural network to optimise the fuzzy logic rule generator, with respectable success, although they concede in their conclusions that the prediction model can be further improved by accounting for additional factors in the fuzzy rules: home/away game, number of injured players, and various psychological effects including factors such as the referee who oversees the game, and the weather conditions prior to kick-off. Kvam and Sokol [13] build upon the work by Clair and Letscher [3] to accurately rank (and/or rate) teams using only basic input data based on Markov chain models. They summarise that part of the reason for the comparative success of our model is that the other models... treat the outcome of games as binary events, wins and losses. In contrast, our model estimates the probability of the winning team being better than the losing team based on the location of the game and the margin of victory and is therefore able to more accurately assess the outcome of a close game. The authors focus a large amount of their research on accurately predicting the outcomes of close games, using the differences in probabilities of each team winning to fine-tune their predictions. From this research it appears that a binary win / lose comparison for the genetic algorithm in this project may be deficient for accurate predictions. 6

8 Chapter 2 Design 2.1 Genetic algorithm Representation The representation chosen for a genetic algorithm can have dramatic effects on the success of the system. The proposed representation for this mini-project will contain a measure of attack and defence strengths (or abilities) for each team, so that the predicted outcome of a match can be determined by comparing the two teams attack and defence strengths. For a set of teams t, this could be represented as a vector of real values for each team in the format: [{a, d} 1, {a, d} 2,..., {a, d} t ] Teams play very differently when at their home ground compared to when they are visiting another team s ground. There are well-defined reasons for this: Buraimo and Simmons [2] state that home field advantage is a bundle of attributes including home team psychology, greater familiarity with pitch, passionate home fans and susceptibility of referees to home crowd pressure. Therefore it would seem reasonable to extend the representation to include home and away strengths: [{a h, a a, d h, d a } 1, {a h, a a, d h, d a } 2,..., {a h, a a, d h, d a } t ] Since a team s performance fluctuates over time due to seasonal effects, such as the arrival of new players in the summer and January transfer windows, the representation could be extended further to include each of the 38 gameweeks that a Premiership team plays in. If gameweeks belong to a set g the representation could be written as: [[{a h, a a, d h, d a } 1, {a h, a a, d h, d a } 2,..., {a h, a a, d h, d a } g ] 1, [{a h, a a, d h, d a } 1, {a h, a a, d h, d a } 2,..., {a h, a a, d h, d a } g ] 2,..., [{a h, a a, d h, d a } 1, {a h, a a, d h, d a } 2,..., {a h, a a, d h, d a } g ] t ] It can be seen that by representing the data in this way, each individual will quickly become very large. There are currently forty unique teams to have appeared for at least one season in the Premiership. This equates to 4 parameters 38 gameweeks 40 teams = 6080 elements in the vector 1. An alternative method of introducing temporal data into the representation, using the second suggestion above, is outlined in Appendix A. In brief, this entails breaking the input space into temporallyrelated portions, and training experts only on one portion of the input space. Following this schema, the data in each individual could be reduced to a more manageable 4 parameters 40 teams = 160 elements 2. Experiments showing the effects of this schema can be found in section If these real values are represented as double-precision variables of eight bytes each, this leads to a size of bytes per individual, or a somewhat excessive 46MB for a population size of 1000 individuals. 2 At 8-byte precision, this equates to 1280 bytes per individual or 1.2MB for a population size of

9 A number of methods exist for representing a vector of real-valued numbers for GAs, including storing them as directly-manipulable numbers in the vector, and real-to-discrete binary encoding. Binary encoding, with mutation consisting of randomly flipping a number of bits proportional to the mutation strength, is the more elegant solution. However, it introduces Hamming cliffs to the representation. Mathias and Whitley [15] state that Gray coding is known to eliminate Hamming cliffs that exist in binary function spaces. A Hamming cliff occurs when two consecutive numbers have complementary binary representations. For example, the binary representations for the numbers 7 and 8 are complements of each other (i.e and 1000). Gray coding is an alternative encoding of a number using binary characters which allows every number to be only a Hamming distance of 1 away from its immediate neighbours. Binary encoding also has the disadvantage that it only works up to a certain precision, depending on the number of bits representing each value in the vector. Real-valued encoding would allow arbitrary precision as the algorithm sees fit, with mutation achieved by adding a random value δ to each value of the vector, proportional to the mutation strength Fitness evaluation Good parents are chosen by the genetic algorithm according to their fitness, which is a measure of how well an individual in the population performs on the problem. The fitness is calculated by taking the home and away attack and defence parameters for each team contained in each individual, and using them to predict the outcome for each match in the training set. The output of the predictor is of the form x N 3 where: x = 3 in the case that the home team is predicted to win x = 1 in the case that the match is predicted to be a draw x = 0 in the case that the away team is predicted to win The reason for choosing these values for x is that, in the Premier League, these are the points given to teams in the equivalent cases. It seemed sensible to follow this convention for the fitness evaluation. The predicted outcome x is then compared with the actual outcome y, as recorded in the training set, and the absolute of the result is cumulatively summed across the whole training set. For an individual i trained over a training set T, where each t T is a pair of teams playing a match, the fitness is therefore calculated as follows: f(i) = t T abs(x t y t ) and the problem therefore becomes a non-linear fitness-minimisation problem. Home and away win ratios In calculating what result the predictor should output for two given teams, the defence score of the home team is subtracted from the attack score of the away team, and the defence score of the away team is subtracted from the attack score of the home team, giving a value for the overall strength s of each team in comparison to the opposing team, which ranges from 1 to +1. The values are then normalised to 0 s 1: s h = a h d a s a = a a d h s = s The difference in strengths of the teams is calculated: d = s h s a 8

10 and this is again normalised so that 0 d 1: d = d Finally this resulting difference d is compared to three thresholds t to determine the resulting prediction x of the match outcome: 3 if d > 1 t h x = 0 if d < t a 1 otherwise It is necessary to consider the thresholds at which different results will be chosen for output. These thresholds relate to the real-world proportions of games which are won by the home or away team, or which result in a draw. Buraimo and Simmons [2] state that around 48 percent of games in English football are won by the home team, with the remainder of games split approximately equally between away wins and draws 3. For ease of calculation, these thresholds could be set to 50% for home wins, 25% for draws, and 25% for away wins. The prediction x therefore becomes: Weighting per season x = 3 if d > if d < otherwise A team s overall performance tends to remain relatively stable over many seasons. For example, Manchester United are usually found to finish in one of the top three places. However in previous seasons, some teams in the League have had vastly different performances. A good example is that of Chelsea FC, who would in the past invariably finish the season mid-table, but since the arrival of a wealthy investor at the club, have finished more recent seasons at the top of the table. Other clubs, such as Blackburn Rovers, who have been champions in past seasons, have been consistently found in the lower part of the top half of the table in more recent times. It is generally the case that a team s recent performance is a good indicator of its future performance, whereas its historic performance is not a very good indicator. For example, Wimbledon used to perform well in the Premier League but, for various reasons and effects, are now to be found three divisions below in League 2 under the name MK Dons. One way to deal with this effect is to take into account all the history of a team s performance in the league across the entire training set, but to bias the fitness evaluation towards a team s recent history. This is achieved by weighting the fitness of each individual according to the time since the match being evaluated was played. If w is the distance between two seasons, taken as the year of the most-recent season in the training set minus the year of the current season being used in the training set, the fitness evaluation becomes: f(i) = t T (abs(x t y t )) w This produces an exponential decay curve of weightings. Training on the most recent season will have a weighting w 1 of 1, whilst training on the preceding season will have a weighting of 0.5, and the season before that will be weighted at 0.25, then 0.125, etc Crossover Crossover in the GA is performed between two parent individuals in order to create a new population of individuals which shares qualities of both parents. This is achieved by taking an element at each position in the vector from either one parent or the other, and using it to construct a new vector (table 2.1). 3 This can be confirmed at where the statistics are calculated to be 46.42% home wins (std dev 2.99), 26.86% draws (std dev 3.11), and 26.72% away wins (std dev 1.50) over the 14-season history of the Premier League. 9

11 Table 2.1: Crossover between two parents vectors to create a new child individual The fittest individual in the population is also separately copied to the new population, unchanged (that is, without mutation or crossover) in order to preserve it for future generations. This technique is known as elitism. Jennison and Sheehan [11] note that elitist strategies enhance selectivity by retaining the fittest individual at each generation and an elitist algorithm would be guaranteed to solve [... ] simple problems because mutation and crossover can be relied upon to improve the fittest individual, element by element if necessary, until the optimum is reached. However, by increasing selectivity, elitist strategies increase the risk of effective convergence at an inferior local optimum in problems with multiple local optima, so there is no immediate guarantee that implementing elitism will initially be a successful strategy Mutation Each individual in the population (apart from the elite individual mentioned earlier) is then subject to random mutation. For each element in the vector, a random value δ is added. δ is randomly generated from the normal distribution, with a mean of 0 and standard deviation of 1, for each element, and is multiplied by a scaling factor µ known as the mutation strength. For each element j of the vector x in each individual: 2.2 Data sources x j = x j + N(0, 1) µ x All data for the training and testing sets was obtained from in CSV format. The attributes used for training were the names of the home and away teams for each match, and the date on which the match was played, as well as a record of the final score. For the testing set, the attributes recorded were the names of the home and away teams for each match, the date on which the match was played, the name of the winning team, and the odds given to that outcome before the match, according to the BetBrain 4 service. 2.3 Program structure Gambler The Gambler module reads the testing set into memory and then requests betting recommendations from the Predictor for each match in the testing set. It calculates the winnings for testing purposes. A Gambler can take recommendations from any number of differently-trained Predictors, although this will not be implemented for this project. The Gambler expects to find testing data in the tab-separated format in table Predictor The predictor initialises a population or multiple populations, and provides an interface for the Gambler to query the predicted outcomes of matches. The Predictor in turn queries each of the Populations it has initialised, to find their predicted results, before reporting these to the Gambler. It also initialises a ScoresDatabase. 4 An online service at which collates several bookmakers freely-available odds and provides the user with the highest odds for each outcome, in order to maximise potential profits 10

12 Figure 2.1: Program structure %season identifier (String) (String) (String) (String) (double) (String) $home team away team winning team odds dd/mm/yyyy $home team away team winning team odds dd/mm/yyyy.. %next season identifier $home team away team winning team odds dd/mm/yyyy. Table 2.2: Testing dataset format 11

13 %season identifier (String) (String) (String) (int) (int) (String) $home team away team home score home result dd/mm/yyyy $home team away team home score home result dd/mm/yyyy.. %next season identifier $home team away team home score home result dd/mm/yyyy.. Table 2.3: Training dataset format Population Each Population initialises a training set by reading items from the ScoresDatabase. It also initialises an array of Individuals and begins training the GA. Training is performed according to the following process: Randomly initialise each Individual s representation vector Mutate each Individual Evaluate each Individual s fitness Select and cross over two Individuals to create a new array of Individuals The two parent individuals are chosen using roulette-wheel (fitness-proportional) selection, so that two good, but not necessarily the fittest, Individuals are selected. This is in order to allow the GA to maintain population diversity. The Population selects the fittest Individual to be consulted for prediction of games, and returns this Individual s predicted winner when the Predictor requests it on behalf of the Gambler Individual Each Individual contains the vector representation of the four attributes for each team (home and away attack/defence strengths) as well as referencing an array of teams, passed to it by the ScoresDatabase, which it will use to predict the outcome of any two teams passed to it. It also initialises the mutation strength for the Individual, as well as the ratios for home wins, away wins, and draws, used as the thresholds for determining the winner of a match between two teams. Each Individual provides its own method for random mutation of the representation vector according to the mutation strengths contained within ScoresDatabase The ScoresDatabase reads the training data set into memory, creating an array of FixtureResult objects. It also creates array listing all unique teams, which is passed to the Populations so they can initialise their Individuals correctly. The ScoresDatabase expects to find data in the tab-separated format in table FixtureResult Each FixtureResult is a simple container, storing the names of two teams which play a match, the date and season at which the match was played, and the final result of the match. 12

14 Chapter 3 Experiments performed 3.1 Preparing the testing data set For the purposes of testing, the GA was trained repeatedly under varying experimental conditions on all the English Premier League data from 1993/ /05 seasons. The system was then tested on unseen data from the 2005/06 season, and approximately the first half of the 2006/07 season up to the Arsenal vs. Wigan game on 11th February Odds for the first part of the data from the 2005/06 season were obtained from football-data.co.uk and the BetBrain max odds were selected. It should be noted that BetBrain finds the best odds (ie. highest payout) across a number of bookmakers, so this will have the effect of boosting slightly any potential returns from the data set compared to using a single bookmaker s odds. Many gamblers use services such as BetBrain to attempt to maximise the payout for any one fixture. Odds for the to 11th February season were obtained from betbase.info, again using the maximum odds across a number of bookmakers. Since Wigan and Reading made their Premiership debuts in 2005/06 and 2006/07 seasons respectively, they do not appear in the training data, and therefore the decision was made to remove all games involving these two teams from the relevant testing data, in order to avoid unfair advantages or disadvantages due to the GA not being trained to predict results for fixtures involving either of these two teams. 3.2 Naive betting Naive betting on just the home or away team to win, or for the outcome to be a draw, will be used as a baseline for evaluation of the GA s performance. All bets placed will be to the value of 10, from a starting balance of There are 559 games in the testing set. Percentage yield will be used as a measure of performance of the algorithm. This is calculated as yield = profit turnover, so a profit of 400 on a turnover (spend) of 6000 returns a yield of 6.67% Bookmakers reduce the odds (and thus the payouts returned to customers) by a small amount in order to make their profit. This can be seen by the gentle downward slope of the average payout in fig 3.1. The average payout is roughly equivalent to the rewards which would be obtained by randomly betting on matches according to the probabilities that a home win will occur approximately 50% of the time, an away win 25% of the time, and a draw 25% of the time. If the three odds (home win, away win, draw) for a given game are inverted and then summed, standard probabilities would sum to exactly 1. However bookmakers inverse odds sum to a value of slightly greater than 1, in order for them to ensure a profit. For the data set being used for testing, the inverse summed odds have a mean of and standard deviation of , so the bookmakers are making approximately a 5.1% profit on average. As naively betting on the home team to win each time returns a profit for this testing set, this will be used as a benchmark for the performance of the algorithm. The aim is to tune the algorithm such that it can return a consistently higher yield than naive betting. All experiments are performed over five runs, and the mean and standard deviations of the results recorded and plotted. 13

15 Figure 3.1: Naive betting 14

16 3.3 Naive betting vs basic GA Number of epochs Hypothesis In a genetic algorithm, training is performed in a number of generations, or epochs. Generally, the higher the number of epochs, the more accurately trained the model will be. However training over too large a number of epochs can lead to over-fitting, where the model learns to output correctly for just the data it has seen, and fails to generalise. Extra unnecessary epochs of training greatly increase the time taken to train the model, and if this time can be reduced without impacting on the performance of the model, it will benefit the user in the event that computation time is expensive. Results This is the basic GA as defined in the Design section. An arbitrary number of individuals (50) was chosen for the population size according to a best-guess as to what would be a reasonable number. The number of epochs of training was then varied and the results plotted (fig 3.2) against the benchmark of naive betting on the home team to win. Figure 3.2: Effect of varying number of epochs Conclusion None of the results are good when compared to naive betting, although they do perform slightly better in the early stage of the testing set where naive betting shows a steep drop in profits before recovering. As expected, a model trained with a low number of epochs does not perform well as the model is poorly trained, but also models trained with higher numbers of epochs perform badly due to the over-fitting they produce. 40 epochs of training seems to be a suitable figure. 15

17 3.3.2 Population size Hypothesis Haupt [9] states that DeJong [5] found that a small population size improved initial performance while large population size improved long-term performance and a high mutation rate was good for off-line performance while low mutation rate was good for on-line performance. For the purposes of this problem, a high long-term level of performance of the model will be required, therefore a large population is likely to produce the best results due to the extra diversity between individuals it introduces. For later experimentation, it will be useful to know that a high mutation rate would be beneficial, as the algorithm is trained off-line at each epoch. Results The GA was trained over 40 epochs with varying population sizes (fig 3.3). Figure 3.3: Effect of varying size of population Conclusion Clearly, the most successful population size for the model is 100 individuals. Having too small a population (in the case of 20 and 40 individuals) restricts the diversity within the population, and leads to fewer good individuals being discovered on each epoch of training. It is not immediately clear why a population size of 200 results in very poor performance of the model, despite the large increase in fitness function evaluations it allows, although it is possible that this is an effect of the good parent individuals having to be selected from a much larger pool, and therefore becoming less likely to be picked after each epoch than if the pool were smaller. 3.4 Fitness scaling Hypothesis Kreinovich et al. [12] state that empirical studies of genetic algorithms... showed that in the beginning, the... algorithm often leads to the appearance of a few superindividuals who dominate 16

18 the selection process and therefore slow it down. At the end, when the population consists largely of the individuals x, for which J(x) is close to maximum, the competition is practically absent, which again slows down the process. Goldberg [8] documents the procedure of fitness scaling. Instead of taking fitness equal to the value of the objective function F (x) = J(x), we take F (x) = f(j(x)), where f(z) is some monotone function from real numbers into real numbers (called a scaling) (summarised by Kreinovich et al. [12]). Forrest [6] introduces linear or simple fitness scaling, where f(z) = z b, where b is the fitness of the worst individual, computed for each generation. This has the effect of removing a constant amount of fitness from each individual, such that only the remaining differences (which now appear greater) are used for ranking the individuals. Another form of fitness scaling is power scaling, introduced by Gillies [7], where f(z) = z k, and k is a constant > 0. For this experiment, a standard value of k = 3 will be used. Finally, the fitness can be scaled exponentially where f(z) = exp(z), which has a similar but more drastic effect when compared to power scaling. A further fitness scaling schema could combine all three of these concepts, such that linear scaling is applied first to each individual, followed by power scaling and exponential scaling. It is hypothesised that fitness scaling will increase the resulting performance of the model over the given number of runs. Results These three fitness scaling schemas were implemented and tested, in addition to the combined schema, over 40 epochs of training and with 100 individuals in the population (fig 3.4). Figure 3.4: Different fitness scaling schemes Conclusion Fitness scaling appears to have no beneficial effect when compared to runs of the algorithm without using scaling (compare fig 3.3). This is unexpected, as genetic algorithm theory states that fitness scaling is generally known to improve algorithm performance. All three of the individual schemas 17

19 perform similarly badly, however the combined fitness scaling schema shows a marked improvement over the individual schemas. 3.5 Self-adaptation Self-adapting home/away win ratio Hypothesis The home/away win ratio is very important in determining the prediction that each individual makes. Although the current values were justified in section 2.1.2, there is no guarantee that these are precisely correct for each training set. It would be beneficial for the genetic algorithm to adapt the values of the ratio for each individual as the algorithm runs, and include the success of these adaptations into the fitness evaluation process. Self-adaptation in this form is introduced by Schwefel [17] and built upon by Bäck and Schwefel [1]. Self-adaptation in this simple schema will be achieved by adding a random Gaussian-distributed number to the mutation strength η of each individual on each iteration, such that η i = η i + N(0, 1) where N(0, 1) is a Gaussian-distributed number with mean 0 and standard deviation 1. Results Different starting values for the mutation strength were selected for self-adaptation of the home/away win ratio, and the results plotted (fig 3.5). The experiment was run over 40 epochs of training and with 100 individuals in the population. Fitness scaling was switched off until the final set of runs, during which the combined schema discussed previously was used. The results were recorded and plotted in fig 3.5. Conclusion Self-adaptation of the home/away win ratios is clearly very successful, particularly at a higher initial self-adaptation strength of 1.0. When combined fitness scaling was used, there was a very slight improvement in the mean results, although it should be noted that the standard deviation was greatly reduced, therefore indicating that the results with fitness scaling are more consistent than without Self-adapting bitstring and home/away win ratio mutation rate Hypothesis In addition to self-adapting the mutation of the home/away win ratio, as this is a realvalued optimisation problem, it is possible to self-adapt the mutation rate of the bitstring itself. In this case, η is again set such that η i = η i + N(0, 1) where N(0, 1) is a Gaussian-distributed number with mean 0 and standard deviation 1. Runs of the experiment were recorded with various combinations of bitstring and home/away win ratio self-adaptation (fig 3.6). In the first experiment, the same N(0, 1) is used for both win ratio and bitstring mutation adaptation, and in the final experiment N(0, 1) is generated separately for each. Results The self-adapting mutation strength for each individual was initialised at 1.0, and the algorithm trained over 40 epochs of training with 100 individuals in the population. Fitness scaling was switched on to use the combined schema. The results were recorded and plotted in fig 3.6. Conclusion Although the results are all similar, the best results were obtained when allowing separate self-adaptation rates for bitstring mutation and win ratio mutation, although the standard deviation of these results was higher. Forcing the algorithm to use the same self-adaptation rate for both bitstring and win ratio mutation produced the least successful results Self-adaptation (complex schema) Hypothesis Bäck and Schwefel [1] propose an alternative, more complex self-adaptation schema: η i (j) = η i (j) exp(τ N(0, 1) + τ N j (0, 1)) 18

20 Figure 3.5: Self adaptation strengths for the home/away win ratio 19

21 Figure 3.6: Self-adapting win ratio and bitstring mutation 20

22 The factors τ and τ... are rather robust exogenus parameters, which Schwefel [17] suggests to set as follows: ( τ 2 ) 1 n ( ) 1 τ 2n Where: n is the number of individuals in the population. N j (0, 1) is normally distributed, and generated for each individual j. N(0, 1) is normally distributed, and generated once for the whole population. Results With 40 epochs of training and 100 individuals in the population, and combined fitness scaling, the results were recorded and plotted in fig 3.7. Figure 3.7: Self-adapting mutation strengths (complex schema) Conclusion The results of using the simple and complex schemas are almost identical. The complex schema shows some slight mean improvement over the simple schema (compare fig 3.6), so it will be used for future experiments. Again, an initial self-adaptation strength of 1.0 has shown to be the most successful, so it will be used for future experiments. 21

23 3.5.4 Bounded self-adapting mutation strengths Hypothesis It became apparent when observing the mutation strengths which were being evolved, that they were often achieving very low or very high, or even negative, values. This was considered not to be beneficial to the mutation of the population, so the self-adaptation strengths were bounded as follows: 1 n i < x < n i where n is a real value and i is the initial mutation strength of the individual. It is hypothesised that this will improve the overall fitness of the population due to the requirement to keep the mutation self-adaptation rate within reasonable bounds. Therefore, a better performance should be obtained from the model after training. Results The model was trained over 40 epochs with 100 individuals in the population, using combined fitness scaling, and the complex mutation self-adaptation schema with initial self-adaptation strength of 1.0. The results were recorded and plotted in fig 3.8. Figure 3.8: Bounded self-adapting mutation strengths (complex schema) Conclusion There is very little difference to be obtained by bounding the self-adaptation mutation strengths when compared to fig 3.7. However it is interesting to note that with a bounding value of n = 4 the standard deviation of the results is zero. This indicates that bounding the mutation self-adaptation rate has the effect of stabilising the model s output and making it more consistent. It is also worth noting that the results from the model almost exactly follow the home-wins betting line. It seems, therefore, that the model has successfully learnt that the best easy performance can 22

24 be obtained simply by biasing the model such that it always bets on home wins. There may be an evolutionary plateau that has to be overcome before results can improve on naive home-win betting. 3.6 Fast EP Hypothesis Yao et al. [21] introduce a technique known as fast evolutionary programming. They claim that this technique... is very good at search in a large neighborhood while CEP [Classical Evolutionary Programming] is better at search in a small local neighborhood. The search space in use for this project is a large-dimensional real-valued space, therefore it is hypothesised that FEP will introduce benefits over CEP due to its ability to make larger jumps out of any local optima that it may encounter. Cauchy mutation was implemented as follows: η i = η i + N(0, 1) where N(0, 1) is a Cauchy-distributed number with mean 0 and standard deviation 1. Cauchy numbers were generated according to the equation a + b tan (π x) where a is the mean, b is the standard deviation, and x is a uniformly distributed random number 0.5 x 0.5. Results Using Cauchy mutation, and testing differing initial mutation self-adaptation strengths, the algorithm was run over 40 epochs of training with 100 individuals in the population and combined fitness scaling, using the complex self-adaptation schema, and varying self-adaptation strengths with bounding n set to 4. The results were then plotted in fig Conclusion The plots show almost no difference with the varying initial self-adaptation strengths, and there appears to be no significant improvement over Gaussian mutation. These results are very disappointing, although the standard deviation has again been reduced. The model seems incapable so far of breaking free of betting on the home win for each fixture, although it has learnt to do this very well, as can be seen from the low standard deviation. 3.7 Improved Fast EP Hypothesis Yao et al. [21] state that generally, Cauchy mutation performs better when the current search point is far away from the global minimum, while Gaussian mutation is better at finding a local optimum in a good region. It would be ideal if Cauchy mutation is used when search points are far away from the global optimum and Gaussian mutation is adopted when search points are in the neighborhood of the global optimum. Unfortunately, the global optimum is usually unknown in practice, making the ideal switch from Cauchy to Gaussian mutation very difficult... IFEP generates two offspring from each parent, one by Cauchy mutation and the other by Gaussian. The better one is then chosen as the offspring. Results Using IFEP mutation, with an initial mutation strength of 1.0, the algorithm was run over 40 epochs of training with 100 individuals in the population and combined fitness scaling, using the complex self-adaptation schema, and self-adaptation strengths bounding n set to 4. The results for IFEP compared to classical (Gaussian) EP were then plotted in fig In addition, the number of Gaussian vs the number of Cauchy mutated individuals at each epoch was plotted over 15 runs in fig Yao et al. [21] state that IFEP tends to make use of Cauchy-generated individuals early on in the search, when large mutation step sizes are beneficial, before switching to predominantly Gaussian-generated individuals closer to the global optimum. The number of Gaussian and Cauchy-generated individuals was plotted again over 500 epochs to observe any potential differences in the IFEP selection scheme across longer runs (fig 3.12). 23

25 Figure 3.9: Cauchy-based mutation and self-adaptation 24

26 Figure 3.10: Improved Fast-EP mutation 25

27 Figure 3.11: Utilisation of Gaussian vs Cauchy mutation operators (40 epochs) Figure 3.12: Utilisation of Gaussian vs Cauchy mutation operators (500 epochs) 26

28 Conclusion Both classical (Gaussian) EP and IFEP produce exactly the same results, clearly and exactly following the home-wins profit line. The standard deviation is zero for both, indicating that both methods are capable over small numbers of runs (ie. the five being recorded for the experiment) of producing stable and consistent results. It is clear, now, that there is some evolutionary plateau causing the model to be unable to improve on the home-wins betting strategy which it has successfully converged upon. The number of Cauchy-generated individuals being chosen should decrease as the GA approaches the optimum, as more Gaussian-generated individuals are chosen instead. However neither fig 3.11 nor fig 3.12 shows such a pattern, with the number Gaussian-generated individuals chosen at each epoch consistently exceeding the number of Cauchy-generated individuals. According to Liang et al. [14] Gaussian mutation only takes precedence when the algorithm is near the global optimum, which suggests that in the current representation the optimum is possibly very flat and ill-defined. 3.8 Points given for a correct prediction Hypothesis The fitness evaluation strategy detailed in section gives three points for a predicted home win in an analogy of the football league, with one point for a predicted draw and no points for a predicted home loss. However, there is no strict requirement to follow this analogy directly, as fitness is calculated simply as a function of the total error between the predictions and the results. Using two points for a predicted win and one for a draw could have a similar effect. Results Using IFEP mutation, with an initial mutation strength of 1.0, the algorithm was run over 40 epochs of training with 100 individuals in the population and combined fitness scaling, using the complex self-adaptation schema, and self-adaptation strengths bounding n set to 4. The results for granting different numbers of points for a win were recorded and plotted in fig Conclusion Granting two points to each individual for a correct prediction badly harms the ability of the GA to find fit individuals at each generation, as can be seen from the much lower mean line, and the very large standard deviation. Conversely, granting four or more points for a correct prediction, whilst keeping a draw at one point and an incorrect prediction at zero points, leads to exactly the same model output as when using three points. From this, it is clear to see that increasing the number of points given for a correct prediction above two has the effect of biasing the model towards predicting home wins. Although techniques such as fitness scaling and self-adaptation produce some (admittedly sometimes minor) positive effects, it will be necessary to overcome this plateau in the evolved model in some more significant way to achieve consistently better results than just predicting the home win every time. 3.9 Experts Multiple oracles (ensemble machine) Number of oracles Hypothesis If we define each individual in the population to be a predictor, we can choose a number of the best-evolved predictors and nominate them as oracles which will be consulted by the Gambler. So far only one oracle (the fittest individual) has been consulted, but it is possible that by consulting an ensemble of differently-trained oracles and taking a consensus on their predictions, an advantage can be gained by introducing more variety into the consulted population of predictors. Results Using IFEP mutation, with an initial mutation strength of 1.0, the algorithm was run over 40 epochs of training with 100 individuals in the population and combined fitness scaling, using the complex self-adaptation schema, and self-adaptation strengths bounding n set to 4. The results for using differing numbers of oracles were then plotted in fig

29 Figure 3.13: Points given by an individual if the home team is predicted to win 28

30 Figure 3.14: Using a number of oracles 29

31 Conclusion Clearly, using more than one oracle, the final results are much worse than the previouslyobtained results or even naive betting on home wins. However, in the critical dip in betting returns at the start of the testing data set, the additional oracles enabled the system to perform consistently better on average, cutting the losses dramatically. It appears that, if the multiple oracles perform worse towards the end of the testing set, they perform better in the hard-to-predict start period, due to the variety in the predictions that they produce. This can be seen from the results for five oracles on the graph, for example this plot provides the best performance in the early part of the test set, but by far the worst performance by the end of the test set. Varying number of oracles dynamically Hypothesis A useful strategy may be to use multiple oracles (eg. the five shown to be successful previously) for the first part of the season, and implement a gating scheme to switch to using just one oracle (the fittest) once the returns from multiple-oracle prediction start to fall, and to alternate between them as the performance of the model demands. Results Using IFEP mutation, with an initial mutation strength of 1.0, the algorithm was run over 40 epochs of training with 100 individuals in the population and combined fitness scaling, using the complex self-adaptation schema, and self-adaptation strengths bounding n set to 4. The oracles were alternated after every x consecutive losses, and the results for different values of x were plotted in fig Figure 3.15: Switching number of oracles after x consecutive losses In addition, the oracles were then alternated after every x losses in y bets, rather than after every x consecutive losses, in order to give a bit more flexibility to the conditions in which the oracles are alternated. y was held at 20, and x was varied to find the best gradient x y. The results were plotted in fig

PREDICTING the outcomes of sporting events

PREDICTING the outcomes of sporting events CS 229 FINAL PROJECT, AUTUMN 2014 1 Predicting National Basketball Association Winners Jasper Lin, Logan Short, and Vishnu Sundaresan Abstract We used National Basketball Associations box scores from 1991-1998

More information

Opleiding Informatica

Opleiding Informatica Opleiding Informatica Determining Good Tactics for a Football Game using Raw Positional Data Davey Verhoef Supervisors: Arno Knobbe Rens Meerhoff BACHELOR THESIS Leiden Institute of Advanced Computer Science

More information

Predicting the Total Number of Points Scored in NFL Games

Predicting the Total Number of Points Scored in NFL Games Predicting the Total Number of Points Scored in NFL Games Max Flores (mflores7@stanford.edu), Ajay Sohmshetty (ajay14@stanford.edu) CS 229 Fall 2014 1 Introduction Predicting the outcome of National Football

More information

A Novel Approach to Predicting the Results of NBA Matches

A Novel Approach to Predicting the Results of NBA Matches A Novel Approach to Predicting the Results of NBA Matches Omid Aryan Stanford University aryano@stanford.edu Ali Reza Sharafat Stanford University sharafat@stanford.edu Abstract The current paper presents

More information

MA PM: Memetic algorithms with population management

MA PM: Memetic algorithms with population management MA PM: Memetic algorithms with population management Kenneth Sörensen University of Antwerp kenneth.sorensen@ua.ac.be Marc Sevaux University of Valenciennes marc.sevaux@univ-valenciennes.fr August 2004

More information

WELCOME TO FORM LAB MAX

WELCOME TO FORM LAB MAX WELCOME TO FORM LAB MAX Welcome to Form Lab Max. This welcome document shows you how to start using Form Lab Max and then as you become more proficient you ll discover your own strategies and become a

More information

Football is one of the biggest betting scenes in the world. This guide will teach you the basics of betting on football.

Football is one of the biggest betting scenes in the world. This guide will teach you the basics of betting on football. 1 of 18 THE BASICS We re not going to fluff this guide up with an introduction, a definition of football or what betting is. Instead, we re going to get right down to it starting with the basics of betting

More information

Swinburne Research Bank

Swinburne Research Bank Swinburne Research Bank http://researchbank.swinburne.edu.au Combining player statistics to predict outcomes of tennis matches. Tristan Barnet & Stephen R. Clarke. IMA journal of management mathematics

More information

Using Poisson Distribution to predict a Soccer Betting Winner

Using Poisson Distribution to predict a Soccer Betting Winner Using Poisson Distribution to predict a Soccer Betting Winner By SYED AHMER RIZVI 1511060 Section A Quantitative Methods - I APPLICATION OF DESCRIPTIVE STATISTICS AND PROBABILITY IN SOCCER Concept This

More information

1. OVERVIEW OF METHOD

1. OVERVIEW OF METHOD 1. OVERVIEW OF METHOD The method used to compute tennis rankings for Iowa girls high school tennis http://ighs-tennis.com/ is based on the Elo rating system (section 1.1) as adopted by the World Chess

More information

Staking plans in sports betting under unknown true probabilities of the event

Staking plans in sports betting under unknown true probabilities of the event Staking plans in sports betting under unknown true probabilities of the event Andrés Barge-Gil 1 1 Department of Economic Analysis, Universidad Complutense de Madrid, Spain June 15, 2018 Abstract Kelly

More information

Home Team Advantage in English Premier League

Home Team Advantage in English Premier League Patrice Marek* and František Vávra** *European Centre of Excellence NTIS New Technologies for the Information Society, Faculty of Applied Sciences, University of West Bohemia, Czech Republic: patrke@kma.zcu.cz

More information

LOCOMOTION CONTROL CYCLES ADAPTED FOR DISABILITIES IN HEXAPOD ROBOTS

LOCOMOTION CONTROL CYCLES ADAPTED FOR DISABILITIES IN HEXAPOD ROBOTS LOCOMOTION CONTROL CYCLES ADAPTED FOR DISABILITIES IN HEXAPOD ROBOTS GARY B. PARKER and INGO CYLIAX Department of Computer Science, Indiana University, Bloomington, IN 47405 gaparker@cs.indiana.edu, cyliax@cs.indiana.edu

More information

The next criteria will apply to partial tournaments. Consider the following example:

The next criteria will apply to partial tournaments. Consider the following example: Criteria for Assessing a Ranking Method Final Report: Undergraduate Research Assistantship Summer 2003 Gordon Davis: dagojr@email.arizona.edu Advisor: Dr. Russel Carlson One of the many questions that

More information

from ocean to cloud HEAVY DUTY PLOUGH PERFORMANCE IN VERY SOFT COHESIVE SEDIMENTS

from ocean to cloud HEAVY DUTY PLOUGH PERFORMANCE IN VERY SOFT COHESIVE SEDIMENTS HEAVY DUTY PLOUGH PERFORMANCE IN VERY SOFT COHESIVE SEDIMENTS Geoff Holland, Sarah Dzinbal (Alcatel-Lucent Submarine Networks) Email: geoff.holland@alcatel-lucent.com Alcatel-Lucent Submarine Networks

More information

Calculation of Trail Usage from Counter Data

Calculation of Trail Usage from Counter Data 1. Introduction 1 Calculation of Trail Usage from Counter Data 1/17/17 Stephen Martin, Ph.D. Automatic counters are used on trails to measure how many people are using the trail. A fundamental question

More information

Introduction to Pattern Recognition

Introduction to Pattern Recognition Introduction to Pattern Recognition Jason Corso SUNY at Buffalo 12 January 2009 J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 12 January 2009 1 / 28 Pattern Recognition By Example Example:

More information

Equation 1: F spring = kx. Where F is the force of the spring, k is the spring constant and x is the displacement of the spring. Equation 2: F = mg

Equation 1: F spring = kx. Where F is the force of the spring, k is the spring constant and x is the displacement of the spring. Equation 2: F = mg 1 Introduction Relationship between Spring Constant and Length of Bungee Cord In this experiment, we aimed to model the behavior of the bungee cord that will be used in the Bungee Challenge. Specifically,

More information

EVOLVING HEXAPOD GAITS USING A CYCLIC GENETIC ALGORITHM

EVOLVING HEXAPOD GAITS USING A CYCLIC GENETIC ALGORITHM Evolving Hexapod Gaits Using a Cyclic Genetic Algorithm Page 1 of 7 EVOLVING HEXAPOD GAITS USING A CYCLIC GENETIC ALGORITHM GARY B. PARKER, DAVID W. BRAUN, AND INGO CYLIAX Department of Computer Science

More information

Using Markov Chains to Analyze a Volleyball Rally

Using Markov Chains to Analyze a Volleyball Rally 1 Introduction Using Markov Chains to Analyze a Volleyball Rally Spencer Best Carthage College sbest@carthage.edu November 3, 212 Abstract We examine a volleyball rally between two volleyball teams. Using

More information

Atmospheric Rossby Waves in Fall 2011: Analysis of Zonal Wind Speed and 500hPa Heights in the Northern and Southern Hemispheres

Atmospheric Rossby Waves in Fall 2011: Analysis of Zonal Wind Speed and 500hPa Heights in the Northern and Southern Hemispheres Atmospheric Rossby Waves in Fall 211: Analysis of Zonal Wind Speed and 5hPa Heights in the Northern and Southern s Samuel Cook, Craig Eckstein, and Samantha Santeiu Department of Atmospheric and Geological

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 256 Introduction This procedure computes summary statistics and common non-parametric, single-sample runs tests for a series of n numeric, binary, or categorical data values. For numeric data,

More information

Analysis of performance at the 2007 Cricket World Cup

Analysis of performance at the 2007 Cricket World Cup Analysis of performance at the 2007 Cricket World Cup Petersen, C., Pyne, D.B., Portus, M.R., Cordy, J. and Dawson, B Cricket Australia, Department of Physiology, Australian Institute of Sport, Human Movement,

More information

Lab Report Outline the Bones of the Story

Lab Report Outline the Bones of the Story Lab Report Outline the Bones of the Story In this course, you are asked to write only the outline of a lab report. A good lab report provides a complete record of your experiment, and even in outline form

More information

Autodesk Moldflow Communicator Process settings

Autodesk Moldflow Communicator Process settings Autodesk Moldflow Communicator 212 Process settings Revision 1, 3 March 211. Contents Chapter 1 Process settings....................................... 1 Profiles.................................................

More information

SECRET BETTING CLUB FINK TANK FREE SYSTEM GUIDE DECEMBER 2013 UPDATE

SECRET BETTING CLUB FINK TANK FREE SYSTEM GUIDE DECEMBER 2013 UPDATE SECRET BETTING CLUB FINK TANK FREE SYSTEM GUIDE DECEMBER 2013 UPDATE WELCOME TO THE FINK TANK FOOTBALL VALUE SYSTEM Welcome to this Free Fink Tank Football System Guide from the team at the Secret Betting

More information

CS472 Foundations of Artificial Intelligence. Final Exam December 19, :30pm

CS472 Foundations of Artificial Intelligence. Final Exam December 19, :30pm CS472 Foundations of Artificial Intelligence Final Exam December 19, 2003 12-2:30pm Name: (Q exam takers should write their Number instead!!!) Instructions: You have 2.5 hours to complete this exam. The

More information

A Portfolio of Winning. Football Betting Strategies

A Portfolio of Winning. Football Betting Strategies A Portfolio of Winning Football Betting Strategies Contents Lay The Draw Revisited Overs/Unders 2.5 Goals Rating Method La Liga Specials Early Season Profits Http://www.scoringforprofit.com 1 1. Lay The

More information

The Incremental Evolution of Gaits for Hexapod Robots

The Incremental Evolution of Gaits for Hexapod Robots The Incremental Evolution of Gaits for Hexapod Robots Abstract Gait control programs for hexapod robots are learned by incremental evolution. The first increment is used to learn the activations required

More information

Honest Mirror: Quantitative Assessment of Player Performances in an ODI Cricket Match

Honest Mirror: Quantitative Assessment of Player Performances in an ODI Cricket Match Honest Mirror: Quantitative Assessment of Player Performances in an ODI Cricket Match Madan Gopal Jhawar 1 and Vikram Pudi 2 1 Microsoft, India majhawar@microsoft.com 2 IIIT Hyderabad, India vikram@iiit.ac.in

More information

Clutch Hitters Revisited Pete Palmer and Dick Cramer National SABR Convention June 30, 2008

Clutch Hitters Revisited Pete Palmer and Dick Cramer National SABR Convention June 30, 2008 Clutch Hitters Revisited Pete Palmer and Dick Cramer National SABR Convention June 30, 2008 Do clutch hitters exist? More precisely, are there any batters whose performance in critical game situations

More information

Tokyo: Simulating Hyperpath-Based Vehicle Navigations and its Impact on Travel Time Reliability

Tokyo: Simulating Hyperpath-Based Vehicle Navigations and its Impact on Travel Time Reliability CHAPTER 92 Tokyo: Simulating Hyperpath-Based Vehicle Navigations and its Impact on Travel Time Reliability Daisuke Fukuda, Jiangshan Ma, Kaoru Yamada and Norihito Shinkai 92.1 Introduction Most standard

More information

Game Theory (MBA 217) Final Paper. Chow Heavy Industries Ty Chow Kenny Miller Simiso Nzima Scott Winder

Game Theory (MBA 217) Final Paper. Chow Heavy Industries Ty Chow Kenny Miller Simiso Nzima Scott Winder Game Theory (MBA 217) Final Paper Chow Heavy Industries Ty Chow Kenny Miller Simiso Nzima Scott Winder Introduction The end of a basketball game is when legends are made or hearts are broken. It is what

More information

Modern volleyball aspects

Modern volleyball aspects Modern volleyball aspects Table of contents Aims 3 Introduction 4 Part 1 Men s volleyball of top level. Main indicators 5 Part 2 Women s volleyball of top level. Main indicators 29 Part 3 Men s and Women

More information

1.1 The size of the search space Modeling the problem Change over time Constraints... 21

1.1 The size of the search space Modeling the problem Change over time Constraints... 21 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 I What Are the Ages of My Three Sons? : : : : : : : : : : : : : : : : : 9 1 Why Are Some Problems Dicult to Solve? : : :

More information

Lecture 10. Support Vector Machines (cont.)

Lecture 10. Support Vector Machines (cont.) Lecture 10. Support Vector Machines (cont.) COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Soft margin SVM Intuition and problem

More information

Fail Operational Controls for an Independent Metering Valve

Fail Operational Controls for an Independent Metering Valve Group 14 - System Intergration and Safety Paper 14-3 465 Fail Operational Controls for an Independent Metering Valve Michael Rannow Eaton Corporation, 7945 Wallace Rd., Eden Prairie, MN, 55347, email:

More information

Using Actual Betting Percentages to Analyze Sportsbook Behavior: The Canadian and Arena Football Leagues

Using Actual Betting Percentages to Analyze Sportsbook Behavior: The Canadian and Arena Football Leagues Syracuse University SURFACE College Research Center David B. Falk College of Sport and Human Dynamics October 2010 Using Actual Betting s to Analyze Sportsbook Behavior: The Canadian and Arena Football

More information

Predicting Tennis Match Outcomes Through Classification Shuyang Fang CS074 - Dartmouth College

Predicting Tennis Match Outcomes Through Classification Shuyang Fang CS074 - Dartmouth College Predicting Tennis Match Outcomes Through Classification Shuyang Fang CS074 - Dartmouth College Introduction The governing body of men s professional tennis is the Association of Tennis Professionals or

More information

Compiling your own betting tissues

Compiling your own betting tissues Compiling your own betting tissues Compiling your own betting tissues is easier than you probably believe, but you will need some helpful resources and a framework to follow to begin with. There are many

More information

What Causes the Favorite-Longshot Bias? Further Evidence from Tennis

What Causes the Favorite-Longshot Bias? Further Evidence from Tennis MPRA Munich Personal RePEc Archive What Causes the Favorite-Longshot Bias? Further Evidence from Tennis Jiri Lahvicka 30. June 2013 Online at http://mpra.ub.uni-muenchen.de/47905/ MPRA Paper No. 47905,

More information

Hydronic Systems Balance

Hydronic Systems Balance Hydronic Systems Balance Balancing Is Misunderstood Balancing is application of fundamental hydronic system math Balance Adjustment of friction loss location Adjustment of pump to requirements By definition:

More information

Reliability. Introduction, 163 Quantifying Reliability, 163. Finding the Probability of Functioning When Activated, 163

Reliability. Introduction, 163 Quantifying Reliability, 163. Finding the Probability of Functioning When Activated, 163 ste41912_ch04_123-175 3:16:06 01.29pm Page 163 SUPPLEMENT TO CHAPTER 4 Reliability LEARNING OBJECTIVES SUPPLEMENT OUTLINE After completing this supplement, you should be able to: 1 Define reliability.

More information

Line Following with RobotC Page 1

Line Following with RobotC Page 1 Line Following with RobotC Page 1 Line Following with By Michael David Lawton Introduction Line following is perhaps the best way available to VEX Robotics teams to quickly and reliably get to a certain

More information

Mathematics of Pari-Mutuel Wagering

Mathematics of Pari-Mutuel Wagering Millersville University of Pennsylvania April 17, 2014 Project Objectives Model the horse racing process to predict the outcome of a race. Use the win and exacta betting pools to estimate probabilities

More information

Building an NFL performance metric

Building an NFL performance metric Building an NFL performance metric Seonghyun Paik (spaik1@stanford.edu) December 16, 2016 I. Introduction In current pro sports, many statistical methods are applied to evaluate player s performance and

More information

What is going on in modern volleyball

What is going on in modern volleyball What is going on in modern volleyball Table of contents Aims 3 Introduction 4 Key findings 5 Part 1 Men s volleyball of top level. Main indicators 6 Part 2 Women s volleyball of top level. Main indicators

More information

A Hare-Lynx Simulation Model

A Hare-Lynx Simulation Model 1 A Hare- Simulation Model What happens to the numbers of hares and lynx when the core of the system is like this? Hares O Balance? S H_Births Hares H_Fertility Area KillsPerHead Fertility Births Figure

More information

Is Home-Field Advantage Driven by the Fans? Evidence from Across the Ocean. Anne Anders 1 John E. Walker Department of Economics Clemson University

Is Home-Field Advantage Driven by the Fans? Evidence from Across the Ocean. Anne Anders 1 John E. Walker Department of Economics Clemson University Is Home-Field Advantage Driven by the Fans? Evidence from Across the Ocean Anne Anders 1 John E. Walker Department of Economics Clemson University Kurt W. Rotthoff Stillman School of Business Seton Hall

More information

Simulating Major League Baseball Games

Simulating Major League Baseball Games ABSTRACT Paper 2875-2018 Simulating Major League Baseball Games Justin Long, Slippery Rock University; Brad Schweitzer, Slippery Rock University; Christy Crute Ph.D, Slippery Rock University The game of

More information

A Fair Target Score Calculation Method for Reduced-Over One day and T20 International Cricket Matches

A Fair Target Score Calculation Method for Reduced-Over One day and T20 International Cricket Matches A Fair Target Score Calculation Method for Reduced-Over One day and T20 International Cricket Matches Rohan de Silva, PhD. Abstract In one day internationals and T20 cricket games, the par score is defined

More information

Primary Objectives. Content Standards (CCSS) Mathematical Practices (CCMP) Materials

Primary Objectives. Content Standards (CCSS) Mathematical Practices (CCMP) Materials ODDSBALLS When is it worth buying a owerball ticket? Mathalicious 204 lesson guide Everyone knows that winning the lottery is really, really unlikely. But sometimes those owerball jackpots get really,

More information

Basketball field goal percentage prediction model research and application based on BP neural network

Basketball field goal percentage prediction model research and application based on BP neural network ISSN : 0974-7435 Volume 10 Issue 4 BTAIJ, 10(4), 2014 [819-823] Basketball field goal percentage prediction model research and application based on BP neural network Jijun Guo Department of Physical Education,

More information

Movement Options. Pairs

Movement Options. Pairs Movement Options Pairs The purposes of a movement are to maximise the number of different opponents played against and the number of score comparisons made with opponents who will appear in the same part

More information

Journal of Quantitative Analysis in Sports

Journal of Quantitative Analysis in Sports Journal of Quantitative Analysis in Sports Volume 1, Issue 1 2005 Article 5 Determinants of Success in the Olympic Decathlon: Some Statistical Evidence Ian Christopher Kenny Dan Sprevak Craig Sharp Colin

More information

Wildlife Ad Awareness & Attitudes Survey 2015

Wildlife Ad Awareness & Attitudes Survey 2015 Wildlife Ad Awareness & Attitudes Survey 2015 Contents Executive Summary 3 Key Findings: 2015 Survey 8 Comparison between 2014 and 2015 Findings 27 Methodology Appendix 41 2 Executive Summary and Key Observations

More information

Neural Networks II. Chen Gao. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Neural Networks II. Chen Gao. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Neural Networks II Chen Gao ECE-5424G / CS-5824 Virginia Tech Spring 2019 Neural Networks Origins: Algorithms that try to mimic the brain. What is this? A single neuron in the brain Input Output Slide

More information

Should bonus points be included in the Six Nations Championship?

Should bonus points be included in the Six Nations Championship? Should bonus points be included in the Six Nations Championship? Niven Winchester Joint Program on the Science and Policy of Global Change Massachusetts Institute of Technology 77 Massachusetts Avenue,

More information

HOW TO READ FORM LAB S GAME NOTES

HOW TO READ FORM LAB S GAME NOTES HOW TO READ FORM LAB S GAME NOTES Form Lab s Game Notes are a quick and easy way to find bets for your short list each week. To get the best out of the Game Notes and find these most profitable angles,

More information

Lesson 14: Games of Chance and Expected Value

Lesson 14: Games of Chance and Expected Value Student Outcomes Students use expected payoff to compare strategies for a simple game of chance. Lesson Notes This lesson uses examples from the previous lesson as well as some new examples that expand

More information

SAMPLE MAT Proceedings of the 10th International Conference on Stability of Ships

SAMPLE MAT Proceedings of the 10th International Conference on Stability of Ships and Ocean Vehicles 1 Application of Dynamic V-Lines to Naval Vessels Matthew Heywood, BMT Defence Services Ltd, mheywood@bm tdsl.co.uk David Smith, UK Ministry of Defence, DESSESea-ShipStab1@mod.uk ABSTRACT

More information

intended velocity ( u k arm movements

intended velocity ( u k arm movements Fig. A Complete Brain-Machine Interface B Human Subjects Closed-Loop Simulator ensemble action potentials (n k ) ensemble action potentials (n k ) primary motor cortex simulated primary motor cortex neuroprosthetic

More information

Introduction Definition of decision-making: the capacity of the player to execute an action following some conscious tactical or strategical choice.

Introduction Definition of decision-making: the capacity of the player to execute an action following some conscious tactical or strategical choice. Decision Making in Rugby Here is a paper by Pierre Villepreux. He presented it at the 1993 Asian Pacific Congress in Calgary. Eleven years have passed, but this remains one of the benchmark papers on this

More information

Analyses of the Scoring of Writing Essays For the Pennsylvania System of Student Assessment

Analyses of the Scoring of Writing Essays For the Pennsylvania System of Student Assessment Analyses of the Scoring of Writing Essays For the Pennsylvania System of Student Assessment Richard Hill The National Center for the Improvement of Educational Assessment, Inc. April 4, 2001 Revised--August

More information

THERMALLING TECHNIQUES. Preface

THERMALLING TECHNIQUES. Preface DRAFT THERMALLING TECHNIQUES Preface The following thermalling techniques document is provided to assist Instructors, Coaches and Students as a training aid in the development of good soaring skills. Instructors

More information

Application Block Library Fan Control Optimization

Application Block Library Fan Control Optimization Application Block Library Fan Control Optimization About This Document This document gives general description and guidelines for wide range fan operation optimisation. Optimisation of the fan operation

More information

Comparison of Wind Turbines Regarding their Energy Generation.

Comparison of Wind Turbines Regarding their Energy Generation. Comparison of Wind Turbines Regarding their Energy Generation. P. Mutschler, Member, EEE, R. Hoffmann Department of Power Electronics and Control of Drives Darmstadt University of Technology Landgraf-Georg-Str.

More information

B. AA228/CS238 Component

B. AA228/CS238 Component Abstract Two supervised learning methods, one employing logistic classification and another employing an artificial neural network, are used to predict the outcome of baseball postseason series, given

More information

Tournament Operation Procedures

Tournament Operation Procedures Tournament Operation Procedures Date of last revision: NOTE: In the case of a discrepancy between the content of the English-language version of this document and that of any other version of this document,

More information

The final set in a tennis match: four years at Wimbledon 1

The final set in a tennis match: four years at Wimbledon 1 Published as: Journal of Applied Statistics, Vol. 26, No. 4, 1999, 461-468. The final set in a tennis match: four years at Wimbledon 1 Jan R. Magnus, CentER, Tilburg University, the Netherlands and Franc

More information

2 When Some or All Labels are Missing: The EM Algorithm

2 When Some or All Labels are Missing: The EM Algorithm CS769 Spring Advanced Natural Language Processing The EM Algorithm Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Given labeled examples (x, y ),..., (x l, y l ), one can build a classifier. If in addition

More information

TECHNICAL STUDY 2 with ProZone

TECHNICAL STUDY 2 with ProZone A comparative performance analysis of games played on artificial (Football Turf) and grass from the evaluation of UEFA Champions League and UEFA Cup. Introduction Following on from our initial technical

More information

A New Piston Gauge to Improve the Definition of High Gas Pressure and to Facilitate the Gas to Oil Transition in a Pressure Calibration Chain

A New Piston Gauge to Improve the Definition of High Gas Pressure and to Facilitate the Gas to Oil Transition in a Pressure Calibration Chain A New iston Gauge to Improve the Definition of High Gas ressure and to Facilitate the Gas to Oil Transition in a ressure Calibration Chain ierre Delajoud, Martin Girard DH Instruments, Inc. 4765 East Beautiful

More information

Midas Method Betting Software 3.0 Instruction Manual

Midas Method Betting Software 3.0 Instruction Manual Midas Method Betting Software 3.0 Instruction Manual Congratulations on getting access the Midas Method Betting software, this manual is designed to teach you how to operate the betting software. System

More information

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Wenbing Zhao. Department of Electrical and Computer Engineering

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Wenbing Zhao. Department of Electrical and Computer Engineering EEC 686/785 Modeling & Performance Evaluation of Computer Systems Lecture 6 Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Outline 2 Review of lecture 5 The

More information

March Madness Basketball Tournament

March Madness Basketball Tournament March Madness Basketball Tournament Math Project COMMON Core Aligned Decimals, Fractions, Percents, Probability, Rates, Algebra, Word Problems, and more! To Use: -Print out all the worksheets. -Introduce

More information

March Madness Basketball Tournament

March Madness Basketball Tournament March Madness Basketball Tournament Math Project COMMON Core Aligned Decimals, Fractions, Percents, Probability, Rates, Algebra, Word Problems, and more! To Use: -Print out all the worksheets. -Introduce

More information

Stats 2002: Probabilities for Wins and Losses of Online Gambling

Stats 2002: Probabilities for Wins and Losses of Online Gambling Abstract: Jennifer Mateja Andrea Scisinger Lindsay Lacher Stats 2002: Probabilities for Wins and Losses of Online Gambling The objective of this experiment is to determine whether online gambling is a

More information

Power-law distribution in Japanese racetrack betting

Power-law distribution in Japanese racetrack betting Power-law distribution in Japanese racetrack betting Takashi Ichinomiya Nonlinear Studies and Computation, Research Institute for Electronic Science, Hokkaido University, Sapporo 060-0812, Japan. Abstract

More information

TRIP GENERATION RATES FOR SOUTH AFRICAN GOLF CLUBS AND ESTATES

TRIP GENERATION RATES FOR SOUTH AFRICAN GOLF CLUBS AND ESTATES TRIP GENERATION RATES FOR SOUTH AFRICAN GOLF CLUBS AND ESTATES M M Withers and C J Bester Department of Civil Engineering University of Stellenbosch, Private Bag X1, Matieland, 7602 ABSTRACT There has

More information

Pokémon Organized Play Tournament Operation Procedures

Pokémon Organized Play Tournament Operation Procedures Pokémon Organized Play Tournament Operation Procedures Revised: September 20, 2012 Table of Contents 1. Introduction...3 2. Pre-Tournament Announcements...3 3. Approved Tournament Styles...3 3.1. Swiss...3

More information

Outline. Terminology. EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Steps in Capacity Planning and Management

Outline. Terminology. EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Steps in Capacity Planning and Management EEC 686/785 Modeling & Performance Evaluation of Computer Systems Lecture 6 Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Outline Review of lecture 5 The

More information

An examination of try scoring in rugby union: a review of international rugby statistics.

An examination of try scoring in rugby union: a review of international rugby statistics. An examination of try scoring in rugby union: a review of international rugby statistics. Peter Laird* and Ross Lorimer**. *4 Seton Place, Edinburgh, EH9 2JT. **66/5 Longstone Street, Edinburgh, EH14 2DA.

More information

Reduction of Speed Limit at Approaches to Railway Level Crossings in WA. Main Roads WA. Presenter - Brian Kidd

Reduction of Speed Limit at Approaches to Railway Level Crossings in WA. Main Roads WA. Presenter - Brian Kidd Australasian College of Road Safety Conference A Safe System: Making it Happen! Melbourne 1-2 September 2011 Reduction of Speed Limit at Approaches to Railway Level Crossings in WA Radalj T 1, Kidd B 1

More information

Evaluation of Regression Approaches for Predicting Yellow Perch (Perca flavescens) Recreational Harvest in Ohio Waters of Lake Erie

Evaluation of Regression Approaches for Predicting Yellow Perch (Perca flavescens) Recreational Harvest in Ohio Waters of Lake Erie Evaluation of Regression Approaches for Predicting Yellow Perch (Perca flavescens) Recreational Harvest in Ohio Waters of Lake Erie QFC Technical Report T2010-01 Prepared for: Ohio Department of Natural

More information

Neural Nets Using Backpropagation. Chris Marriott Ryan Shirley CJ Baker Thomas Tannahill

Neural Nets Using Backpropagation. Chris Marriott Ryan Shirley CJ Baker Thomas Tannahill Neural Nets Using Backpropagation Chris Marriott Ryan Shirley CJ Baker Thomas Tannahill Agenda Review of Neural Nets and Backpropagation Backpropagation: The Math Advantages and Disadvantages of Gradient

More information

Navigate to the golf data folder and make it your working directory. Load the data by typing

Navigate to the golf data folder and make it your working directory. Load the data by typing Golf Analysis 1.1 Introduction In a round, golfers have a number of choices to make. For a particular shot, is it better to use the longest club available to try to reach the green, or would it be better

More information

Introduction to Pattern Recognition

Introduction to Pattern Recognition Introduction to Pattern Recognition Jason Corso SUNY at Buffalo 19 January 2011 J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 1 / 32 Examples of Pattern Recognition in

More information

steps to designing effective practice

steps to designing effective practice 22 23 steps to designing effective practice How do you decide what coaching practices to deliver? And what process do you follow when designing the practice? Here, FA Youth Coach Educator, Ben Bartlett,

More information

Revisiting the Hot Hand Theory with Free Throw Data in a Multivariate Framework

Revisiting the Hot Hand Theory with Free Throw Data in a Multivariate Framework Calhoun: The NPS Institutional Archive DSpace Repository Faculty and Researchers Faculty and Researchers Collection 2010 Revisiting the Hot Hand Theory with Free Throw Data in a Multivariate Framework

More information

2013 CIVL PLENARY ANNEX 24B SOFTWARE PROPOSALS

2013 CIVL PLENARY ANNEX 24B SOFTWARE PROPOSALS 2013 CIVL PLENARY ANNEX 24B SOFTWARE PROPOSALS In conjunction with the Hang Gliding, Paragliding and Aerobatic Subcommittees 1. Speed rank used to calculate time validity At the 2008 CIVL plenary, the

More information

ORF 201 Computer Methods in Problem Solving. Final Project: Dynamic Programming Optimal Sailing Strategies

ORF 201 Computer Methods in Problem Solving. Final Project: Dynamic Programming Optimal Sailing Strategies Princeton University Department of Operations Research and Financial Engineering ORF 201 Computer Methods in Problem Solving Final Project: Dynamic Programming Optimal Sailing Strategies Due 11:59 pm,

More information

A Novel Gear-shifting Strategy Used on Smart Bicycles

A Novel Gear-shifting Strategy Used on Smart Bicycles 2012 International Conference on Industrial and Intelligent Information (ICIII 2012) IPCSIT vol.31 (2012) (2012) IACSIT Press, Singapore A Novel Gear-shifting Strategy Used on Smart Bicycles Tsung-Yin

More information

Transformer fault diagnosis using Dissolved Gas Analysis technology and Bayesian networks

Transformer fault diagnosis using Dissolved Gas Analysis technology and Bayesian networks Proceedings of the 4th International Conference on Systems and Control, Sousse, Tunisia, April 28-30, 2015 TuCA.2 Transformer fault diagnosis using Dissolved Gas Analysis technology and Bayesian networks

More information

A REVIEW OF AGE ADJUSTMENT FOR MASTERS SWIMMERS

A REVIEW OF AGE ADJUSTMENT FOR MASTERS SWIMMERS A REVIEW OF ADJUSTMENT FOR MASTERS SWIMMERS Written by Alan Rowson Copyright 2013 Alan Rowson Last Saved on 28-Apr-13 page 1 of 10 INTRODUCTION In late 2011 and early 2012, in conjunction with Anthony

More information

Title: 4-Way-Stop Wait-Time Prediction Group members (1): David Held

Title: 4-Way-Stop Wait-Time Prediction Group members (1): David Held Title: 4-Way-Stop Wait-Time Prediction Group members (1): David Held As part of my research in Sebastian Thrun's autonomous driving team, my goal is to predict the wait-time for a car at a 4-way intersection.

More information

AGA Swiss McMahon Pairing Protocol Standards

AGA Swiss McMahon Pairing Protocol Standards AGA Swiss McMahon Pairing Protocol Standards Final Version 1: 2009-04-30 This document describes the Swiss McMahon pairing system used by the American Go Association (AGA). For questions related to user

More information

Optimal Weather Routing Using Ensemble Weather Forecasts

Optimal Weather Routing Using Ensemble Weather Forecasts Optimal Weather Routing Using Ensemble Weather Forecasts Asher Treby Department of Engineering Science University of Auckland New Zealand Abstract In the United States and the United Kingdom it is commonplace

More information

NFL Overtime-Is an Onside Kick Worth It?

NFL Overtime-Is an Onside Kick Worth It? Anthony Tsodikov NFL Overtime-Is an Onside Kick Worth It? It s the NFC championship and the 49ers are facing the Seahawks. The game has just gone into overtime and the Seahawks win the coin toss. The Seahawks

More information

The MACC Handicap System

The MACC Handicap System MACC Racing Technical Memo The MACC Handicap System Mike Sayers Overview of the MACC Handicap... 1 Racer Handicap Variability... 2 Racer Handicap Averages... 2 Expected Variations in Handicap... 2 MACC

More information