Genetic Algorithm Optimized Gravity Based RoboCup Soccer Team

Genetic Algorithm Optimized Gravity Based RoboCup Soccer Team Tory Harter Advisor: Dr. Jon Denning Taylor University July 28, 2015 ABSTRACT This report describes how we use genetic algorithms to optimize coefficients to calculate the weights across the soccer field for Taylor University s gravity strategy player. After running hundreds of generations of evolution, and starting from completely randomized coefficients, we saw a steady improvement in overall performance of the gravity player. With these results, we believe that with fewer experimental design constraints, we could improve Taylor University s current gravity player coefficients. 1 Introduction The RoboCup soccer simulator is a challenging AI problem. Competing autonomous players play against each other in a simulated soccer match. Most teams are made up of hand designed players, which meticulously handle a wide variety of different in game scenarios. There are other teams that have designed RoboCup players using evolutionary algorithms, such as genetic programming (de Klepper, 1999; Luke, Hohn, Farris, Jackson, & Hendler, 1998, Giuliani, 2012). Our team uses a hand designed strategy, the gravity strategy, that assigns weight values to all points on the field, and the player acts on the heaviest weighted point. These points are calculated using two equations: a move equation and a kick equation. These equations take into account several different factors that we believe are important when making a decision related to the equation s particular action. Each of these factors have a coefficient that scales the significance of that factor, with these coefficients originally being hand set. For our project, which is based off the work of the previous Taylor University RoboCup members, we wanted to utilize the optimization power of genetic algorithms, an algorithm that

functions similarly to natural selection with the focus being survival of the fittest, to fine tune these coefficients and improve the previous members gravity player. We conducted our experiment starting from scratch, using randomized coefficient values for our initial generation of evolution. We performed 400 generations, and ended up with a player that had improved from the completely randomized player, which shows us that using genetic algorithms is a viable approach to optimizing the gravity player. With this information we believe we can move forward from this starting point and improve the base gravity player using genetic algorithms. Several improvements to the experiment have been considered, and may provide more promising results in the future. 2 The RoboCup Soccer Simulation Figure 1: Starting kick off positions of the 2D RoboCup Simulation League. RoboCup is an international competition, founded in 1997, whose goal is to promote robotics and AI research (Kitano, Asada, Kuniyoshi, Noda, & Osawa, 1995). There are several different leagues in RoboCup; Small Size, Medium Size, Humanoid, Standard Platform, and Simulation Leagues. The virtual soccer match takes place in the simulation league, where teams submit artificial agents to act as players in the soccer game. These autonomous agents then compete

against other teams AI players. Where the other leagues are focused on robotics, the simulation league is focused on AI, with concentration being on teamwork and strategy. For our work on improving the previous RoboCup members player, we focused on the 2D simulation league, our interest being more in teamwork and strategy for autonomous agents, instead of motor skills and player athleticism in robotics. The simulation consists of two teams, each with eleven players. Matches last for 6000 in game cycles with a half time at 3000 cycles, and a cycle being 100ms. Like in normal soccer, a team scores points when they kick the soccer ball into the opponent s goal, with the team having the most points at the end of the match winning. The two teams interface with a central soccer server which controls the simulation and all processes within it. Players perform actions by sending messages to the server every cycle, and receive information about the match by messages sent from the server. Some information sent by the server includes a general player information message, such as where they are at, and how much stamina they have. Another being a vision message information about what the player sees, and where those seen entities are located on the field. Player actions that are sent to the server include: moving to a location on the field, kicking the ball, tackling, dashing, saying information to teammates around them, etc. For our team we utilize the move, dash, and kicking player actions. Within the simulation there is an entity called the Coach, which can be used to manipulate players and objects in the match, as well as control some aspects of the overall simulation. There is also a noise system, that adds randomness to the simulation, making the games in deterministic. 3 The Gravity Strategy The gravity player was created by previous members of Taylor University s RoboCup team, and was made as a way for a player to dynamically adjust movement and kicking destinations without having to manually account for certain teammate/opponent field position configurations. One cause for this strategy was as a way to solve the problem of having players bunch up on the field. By taking into account the distance to teammates, players can be more inclined to spread out across the field rather than bunching up.

Figure 2: The left images shows the clumping that the gravity strategy tries to remove. The right image shows a better configuration of player positions, being spread out from each other. The original creators of the gravity strategy also saw benefits regarding kicking as well. Since the strategy is factoring in teammates distance and opponents distances from a point to kick to, gravity players can make fairly accurate passes, without many occurrences of an opponent intercepting. For every cycle of the match, the 52.5x34 unit RoboCup field is divided into a 315x204 array, whose elements are points on the field. Depending on the action the player is going to take, either moving or kicking the ball, we then calculate a weight for each point in our array, whose value is dependant on the specific point s relation to certain factors in the equation. After all the points have been assigned a weight, we find the maximum weighted point, and either move towards that point or kick the ball at that point (See Figure 3). Figure 3 depicts the weight values for a typical starting position for two teams. The dark blue are areas that are calculated at negative infinity, or are places that an opponent could get to the ball before the player or a teammate. The dark red area has the heaviest weighing points, and is the area we will be moving or kicking towards. The black arrow originating from the player points to the location of the point on the field that has the largest gravity weight, and is the location that the player will either kick or move to. The figures are based off an individual from generation 369, as this was the last generation from our 400 where an individual scored a point. The player coefficients for the equations are based off the highest scoring individual. The gravity strategy utilizes two separate, but similar, equations to calculate the weights that are assigned to each of the points. Equation 1 is specific to player movement, and Equation 2 for kicking the ball. If the player is able to kick the ball they will evaluate the kick equation, and if they are not able then they will evaluate the move equation. These two equations are evaluated independently, thus the weights for one equation do not affect the weights for another equation.

In Figure 3 the players are positioned in a sample kick off location. For the kick equation the player plans to kick the ball behind him, and seemingly into their own goal. This may be a result of the weights factoring in kicking towards teammates being much stronger than that of kicking towards the opponent s goal. Figure 3: Visualization of the kick and move equations given a sample kick off location. The arrow indicates the highest weighting point where the player will move or kick to. The dark blue is a location where either an opponent can reach the ball if we kick there, or a teammate or opponent can reach that point before we do if we move there.

If the player is going to perform a move action, for each point in our field array we first check to see if an opponent can reach the location of the ball before we could at that point. If they can, then we set the weight of that point to negative infinity, which indicates a lowest possible weight for that cycle. If the opponent cannot, we then calculate the weight of that point based off the move equation. The equation is the following (see Table 1 for value descriptions): Equation 1: Move If the player is going to perform a kick action, for each point in our field array not on the border of the field, we check to see if the point is in a place where a teammate could not get to the ball before an opponent could. If the point meets this criteria, then we set that point s weight to the default equation, otherwise we set the weight to the main kick equation s result. The full equation is the following (see Table 1 for value descriptions): Equation 2: Kick Equations 1 and 2 were designed by our RoboCup team, and take into account factors we believe to be important in regards to the action that they pertain to. The alpha coefficient values are the focus of this experiment, and determine the importance, or weight, that a particular factor in the equation has on the overall action. Our gravity strategy can only be as good as equations 1 and 2 allow, which has brought up the question of if we are taking into account every factor that we should, or possibly taking into account factors that do not affect the overall performance of the gravity player.

Value Description The gravity weight value of a specific point. Coefficient constants controlled by genetic algorithm. Distance to the nearest edge of the soccer field. Distance to the ball. Distance to the player. Distance to the opponent s goal. Distance to closest opponent. Distance to closest teammate. The y axis coordinate of the player, with range [ 34,34]. Table 1: Variables used in the kick and move equations for the gravity player. Z values are computed at every point on the field. Distances are computed from the point. 4 Using the Genetic Algorithm Each of the different factors of the kick and move equations have a coefficient. These coefficients act as the interface for optimization between the two equations and the genetic algorithm. By changing these coefficients, we increase or decrease the relative priority that factor has in the equation. Previously, members of the RoboCup team had set these values by hand, estimating the values based off what they believed was an appropriate modifier to the factor. We saw this as an avenue for optimization, and chose to use a genetic algorithm to optimize these values, since genetic algorithms are designed to search for global maxima or minima, instead of a simple hill climbing algorithm that might get stuck on a local maxima or minima (Goldberg, 1989). We chose to use the DEAP (Distributed Evolutionary Algorithms in Python) Python library (Fortin, De Rainville, Gardner, Parizeau, & Gagné, 2012) for easy integration with our main codebase. Our genetic algorithm (GA) used a population of individuals each being made up of eleven floating point values between 0 and 1. These values mapped directly to the coefficient values in the kick and move equations. During a match, each of the 11 team members are being controlled by multiple instances of the same individual. We calculated the fitness of these individuals for a simulated match as the team s final score minus the opponent s final score. By factoring in both the individual s points and the opponent s, we believe that we are encouraging

both offensive and defensive play from the individuals. This method calculating fitness was influenced by previous work done by Sean Luke and his use of genetic programming in the field of RoboCup (Luke, 1998), and this method reduced any ambiguity that might arise from other methods, such as ball possession time or frequency an individual passes to a teammate. Individuals competed against other individuals from the population, so for one simulation we can calculate the fitness of two individuals at a time. With 24 machines running in parallel, we choose our population size of 48 to maximize the number of individuals we can simulate without increasing the simulation time, allowing all simulations for a generation to be run in parallel. For our reproduction phase of the GA we used a two point crossover method that had a 60% occurrence rate, and a Gaussian mutation with a 5% mutation rate. We chose this crossover rate since we wanted to heavily encourage any individual that showed higher performance, since most players would likely not score during the simulation. We chose this mutation rate based off some preliminary simulations that indicated that a higher mutation rate tended to, in some instance, work against the progression of the player as frequent mutations would undo any progress achieved during the run. However, with a low mutation rate, and such a massive search space, we were only able to cover a fraction of the space, and having a lower mutation rate reduced our ability to further explore. Covering only a small percentage of the search space is expected, as the space consists of all combinations of 11 coefficients, whose values are floating point numbers between 0 and 1. In the future we might consider a dynamic mutation rate, based off current generation number, or average individual fitness scores. We ran 400 generations, with every generation each individual, whose multiple instances make up an 11 player team, playing one game against another individual, whose multiple instances make up the other 11 player team, in the population. We plan to run simulations with a higher number of generations in future work to better explore the search space. 5 Implementation Before performing our experiment, we had to set up or improve the framework and systems that we would be utilizing. We had a good starting point, as the previous RoboCup members had already set up RoboCup and the gravity player, as well as other player scripts. We built upon their work, so that we could evolve the gravity player. Before we started designing the testing system, we made some general fixes/improvements to the gravity player. Firstly, we added a basic look around function, that makes the player spin around in circles until they see the ball, then they perform their gravity calculations. Before we added this, the gravity player wouldn t move if the ball wasn t in vision, with games inevitably ending up with the ball in all the players blind spot, and no one moving. Having them spin around was a quick fix, but one that allowed the gravity player to be fully active during a game.

Another fix was our kicking power. Originally, the player would kick the ball as hard as they could to a specific location, which often resulted in the ball going out of bounds, or being intercepted by the opponent. To fix this we simply scaled the kicking power to the distance between the player and the position on the field that they wanted to kick to. The last general improvement we made was redesigning the kick and move equations. The current equations are very similar to the originals, however; we added in a few more factors to each, and added a coefficient modifier to each of the factors to act as our interface for the genetic algorithm. Though we believe that our redesign of the equations improved their overall performance, we believe that the equations are still suboptimal, and can be further improved. For implementing our experiment, we began by first recording basic information about the gravity players during a simulation. During each simulation, we record the players positions, where they want to move or kick to, and the location of the ball. We can then use this recorded data as input for the data modeler. The data modeler generates figures based off the player's recorded data. These models were not used extensively during our experiment, as we focused on the players scored points as well as their gravity weight values. The data modeler also handles our main archiving system, which allows use to store hierarchically our simulation data for later use. We also created a gravity weight modeler, that takes set player and ball positions, and generates a visualisation of the weight values for a player across the field. The generated figures are used in this paper to illustrate the changes in weight values given different coefficient values. Next we developed a system for automated multi machine testing. From a central host machine, we would issue commands via ssh to 24 different machines. Then we would have each machine being a simulation, using a custom coach we created. This coach automatically starts games, continues the game at half time, and closes the simulation when the game is finished. Once all 24 machines had finished their simulations, control would return back to the central machine. The number of machines we had at our disposal affected the size of our population. We tested our simulations using two individuals from the population playing against each other. With only one simulation being able to run per machine, we could have a maximum of 48 individuals in our population and still run all the simulations in parallel. This meant that our generation time lengths only took as long as the longest lasting simulation for that generation. Finally, we created the main genetic algorithm program, that would be executed on the central host machine. The program oversees all of the genetic algorithm processes, using the DEAP Python package. From this program we use our automated multi machine system to issue simulations to different machines to be run in parallel. 6 Experiment Results

Our experiment consisted of one central machine running the main genetic algorithm process, then connecting to 24 other machines and running simulations with two individuals from the population. Once all the machines were finished with the simulation, the central machine would assign individuals their fitness scores and then begin the reproduction process to produce the next generation s population of individuals. We ran 400 generations, with each individual playing one game per generation and each generation lasting around 11 to 15 minutes. Due to an extremely large search space, with more possible combinations than number of atoms in the universe, it understandably takes time to explore this space and find useful coefficients. This results in low scoring matches initially as we continue to explore the search space to find a set of coefficients that will provide a scoring player. Therefore we chose to aggregate the total number of points scored over 25 generation intervals to better visualize the data. For the first 50 generations no individuals scored in a match. This was expected since in the early stages of the evolutionary processes we should be exploring the search space from a randomly chosen set of coefficients. The next series of generations up to generation 175 displayed a turbulence in points scored, however; with only three points being scored in 125 generation this appears to be a fairly unimportant event. The next 25 generations saw the highest amount of total points scored. This is a notable event, since afterwards our score rate increased to at least a consistent 1 point per 25 generations. The next 125 generations followed this constant rate, which seems to indicate that we had made a general but slight improvement to the population. The next 50 generations then displayed an increase in total points, but returned to the 1 per 25 generations rate after the final 25 generations. The red linear fit line in Figure 4, displaying the correlation between total points scored and generations, shows a steady increase in overall fitness. We believe with more generations we would have seen this increase continue. We hypothesize that with additional generations, we would see a similar three step pattern as shown in the figure: consistent total points per 25 generations, oscillation with the rate increasing, and finally a reestablishing of a consistent rate but at an increased rate. The oscillation between rates appears to be transitioning between consistent total point rates. The next aspect of the experiment we explored was how the gravity weights across the field changed as the generations evolve. From the initial generation to the final generation there appeared to be a slight change in the weight values. Compared to the original gravity player with hand set coefficients, there is a very noticeable difference, see Figure 5. Figure 5 has positions manually set for demonstration purposes. Figure 5 illustrates the calculated kicking weights of the original and evolved players. It is important to note the range of the weights is set based on the maximum weight for each figure, and the weight color is relative to this scale. The original player is kicking towards the center of the field, while the evolved player is kicking to the top center of the field. An interesting note is that the gradient toward the goal that is added in the equations, as in the weights increase as the

distance to the point on field and the goal decreases, does not appear to be present here. This might be a result of the coefficient for that factor outputting very small weights, and thus not scaling compared to other factors. The goal of the gravity strategy is to find ideal locations to move or kick to. Future work is needed to determine ideal moving or kicking locations. Figure 4: The total points scored by all individuals in the population in intervals of 25 generations. Note the red linear fit line indicates a steady increase in rate of total points scored. Figure 6 illustrates the calculated weights for moving of the original and evolved players, and are an excellent example of the differences between these two players. The original player is only moving slightly forward, and this may be caused by a combination of higher weights being farther from the goal, or higher weights being farther from teammates/opponents. The evolved player is nearly showing the opposite. The evolved player is moving towards the goal, which appears to have a stronger influence in the move equation that it did in the kick equation. It appears that the evolved player has an advantage as it is moving towards the ball, as well as the opponent s goal.

Figure 5: This figure gives a comparison between the original and evolved players kick equation weights. The evolved player is kicking towards their teammates, while the original player is kicking towards the middle of the field.

Figure 6: This figure gives a comparison between the original and evolved players move equation weights. The evolved player is moving towards the ball and goal, while the original player is moving forward slightly. 7 Future Work We wanted to use this experiment as a starting point into improving the gravity player using genetic algorithms, and because of this we reduced the size of our experimental design, though the size of the experiments search space is still massive. For example we used a relatively small number of generations and were unable to thoroughly explore the search space, or at least a large portion of it, and because of this our experimental setup did not allow the genetic algorithm to

converge. Moving forward with our experimental design, we will increase the total number of generations we run. Another expansion of our experiment would be increasing the number of games that an individual plays in each generation. Since the games are in deterministic, by playing multiple games, and against different opponent individuals, we believe that we can obtain a better measure of individual performance. Another limitation of our experimental design was our population size of 48 individuals. We had available 24 machines for this experiment, and ran a single simulation at a time per machine. Increasing this number would allow for better exploration of the search space due to a wider genetic diversity. During our simulations we tested with individuals competing against other individuals from the population. Testing against other higher performing teams may result in increased performance evolution rates. A direction we would like to go in with our research is using genetic programming to evolve the kick and move equations, which we believe are suboptimal currently. In taking a genetic programming approach we would want to take advantage of the discovery aspect of genetic programming, to develop an equation that our current, or future, RoboCup team members may not have been able to design. Our current experimental setup evolves the kick and move equations together, however evolving them separately could allow the GA to converge quicker. After optimizing them separately, we would then use those values as starting points to test them together, and see if we gain any addition player performance. Another option would be to have the number of coefficients remain the same as in the experiment, but have the starting individuals have the hand set coefficients that the original gravity player had. One possible option for checking performance of our player would be having our calculated field weights be analyzed by a soccer coach or someone with extensive knowledge on soccer player field positioning. By doing this, we could gain a sense of real world relation between our player and actual soccer teams, however this would add subjectivity to the experimental design, as well as increase time required to get results. Finally, our simulations had the default game noise disabled; enabling noise during evolution could result in our gravity player adapting to a noisy environment, and further improve overall performance. Due to the way game information is exchanged to the player, during some cycles we do not receive updated information, and thus must rely on a predictive world model to plan our next action. With noise enabled that predictive model would become less accurate, but by adapting our predictive model to noise, we could reclaim any lost accuracy. 8 Conclusion We worked on this project to see if we could improve Taylor University s RoboCup gravity strategy player by using genetic algorithms to optimize previously hand set coefficients. Because

of the large search space and limited number of generations/individuals, we did not reach a point where our evolved players had surpassed the base gravity player. However, from our evolved players we did see a gradual increase in average performance, and with more generations, we might reach the performance level of the base player. We consider this experiment as a starting point for further experimentation. Genetic algorithms have shown to be extremely successful in value optimization, and our problem set fits well with this strength. Perhaps with some of the changes mentioned previously we could increase the overall performance rate of our gravity player. References de Klepper, N. (1999). Genetic Programming with High Level Functions in the RoboCup Domain. http://www.researchgate.net/publication/2434814_genetic_programming_ with_high Level_Functions_in_the_RoboCup_Domain Fortin, Félix Antoine, François Michel De Rainville, Marc André Gardner, Marc Parizeau, and Christian Gagné. (2012). "DEAP: Evolutionary Algorithms Made Easy." Journal of Machine Learning Research 13: 2171 175. Giuliani, L. (2012). Evolving intelligent agents for the Robocup Soccer competition. Goldberg, D. (1989). A Gentle Introduction to Genetic Algorithms. In Genetic Algorithms in Search, Optimization, and Machine Learning. Addison Wesley Publishing Company. Luke, S. (1998). Genetic programming produced competitive soccer softbot teams for robocup97. Genetic Programming, 214 222. Luke, S., Hohn, C., Farris, J., Jackson, G., & Hendler, J. (1998). Co evolving soccer softbot team coordination with genetic programming. In RoboCup 97: Robot soccer world cup I (pp. 398 411). Springer Berlin Heidelberg. Kitano, Hiroaki, Minoru Asada, Yasuo Kuniyoshi, Itsuki Noda, and Eiichi Osawa. (1995). RoboCup: The Robot World Cup Initiative. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.5425