Inverse Kinematics Kicking in the Humanoid RoboCup Simulation League

Inverse Kinematics Kicking in the Humanoid RoboCup Simulation League Adrian Lopez-Mobilia Advisors: Peter Stone, Patrick MacAlpine May 3, 2012 Abstract This paper describes the inverse kinematics kicking system used in UT Austin Villa s submission to the humanoid RoboCup simulation league competition. It describes the infrastructure we designed and how machine learning is used to optimize a set of kicks with the hope of developing a versatile and omnidirectional kick. As of now, the ability to kick has not substantially affected the outcome of the 3D simulation competition as walk speed and dribbling have been the main focus and the strongest factors in a team s ability to do well competitively. The underlying goal is to create a set of kicks that can be optimized using machine learning to be faster to align but that can achieve similar distance and accuracy to fix keyframe based skills. To evaluate this goal we look to test the kick over many iterations against varied opponents and situations. We hypothesize that it is possible to create an abstracted representation of skills such as kicking through the use of inverse kinematics based curves that while simultaneously achieving the performance goals we pose and allowing a much simpler human interaction with the system. 1 Introduction The annual RoboCup competition [1] serves as a perfect testbed for creating and evaluating different systems of learning. In our case, we explore the idea of using parameterized skills and allowing machine learning to optimize the motion of the robot. For the case of kicking, we have gone through several iterations of which we ll describe in the following sections. Though the virtual domain of the 3D simulation league for the RoboCup competition does not 1

provide an exact physical representation of the real world, it does supply a fairly realistic environment to be able to test certain theories and expand on research that is not yet possible due to restrictions in accuracy and cost of hardware. There is an effort to try to better represent the real robots to their simulated versions for the learned aspects of simulation to translate over, but this is currently a separate area of focus. Our purpose here is to be able to create a system to work in a semi-ideal system such that we can abstract away constraining variables such as vision and effector noise; we can then slowly reincorporate these with other areas of research in future work. Current implementations of kicking tend to use fixed based effector values to interpolate over a set period of time. This, in many ways, represents an unrealistic basis and assumes perfect positioning of the ball relative to the kicking agent. Though thresholds can be placed on where to accurately kick from, this makes it difficult to learn different sets of kicks as effector values are not a clear abstraction from which to create different conceptualized kicks. This also constrains the ability to learn on robots with different leg structures or number of effectors. It restricts the ability to create new kicks and to learn to optimize them. With inverse kinematics, our approach was to create a series of curves dependent on either leg s reachability such that we could teach the agents a series of complex motions to follow without the need to know the exact structure of the agents body and effectors. With the level of abstraction of an inverse kinematics system designed specifically for each agent, we then would not need to worry about specific details and tweaks upon any change in shape or size. As long as we are working with similarly biped beings, we would be able to easily translate over the engine to then be able to re-learn these kicks. The most important aspect of our work lies in the ability to create a new system that does not require hours of human time to adjust. By allowing curves to be an abstraction of motion as simplified seed values to start off the optimization process of learning kicks, we open up the possibility to focus on new and more interesting problems like when to kick and where. With the focus around using inverse kinematics to kick, we also allow the system to be versatile to the ball shifting during a kick as well as any noise in the system or even other agents affecting the ball. 1.1 Related Work Most current teams in the RoboCup competition use fixed angle based keyframed algorithms that use PID controllers to apply torque at certain times to achieve a certain pose. There are several uses to this approach and it can 2

have valued results. Several teams including previous iterations of UT Austin Villa [5] have had success in optimizing and executing strong kicks from fixed effector keyframe skills. The team FCPortugal implemented a forward kick and side kick that both were optimized to produce good results [6], but these did not incorporate the idea of inverse kinematics. This falls short in instances where the optimization would not be able to use the same seed curve for different structured robots. In our case, though we would also have to rewrite the fix based shifting of weight for each robot, this weight shift would apply to any number of potential curves for kicks that we would then run optimizations on. There is also the idea that this weight shift itself could be abstracted out and learned either analytically with center of gravity calculations, though this would be somewhat hard and prone to instability, or through learning from a fixed set robot specific joint shift movements to then optimize. In the Standard Platform League (SPL) of the RoboCup competition, the team, B-Human [7], produced a series of Bezier Curves that can be altered on the fly, but these require several curves per kick, as multiple joints are moved independently. Though this approach provides precision, we propose that a curve with the path of the foot and one with its RPY rotation should provide enough data for an optimization procedure to produce a kick with similar precision. The UPennalizers team from the University of Pennsylvania, also a submission to the SPL league, uses an approach that does make more use of inverse kinematics [4]. The kicks are designed to be scripted motions, but they do allow the ability to kick directly from the walk. The issue here, and in a larger sense with the Standard Platform domain is that there is a strong inability to be able to run effective optimizations over many iterations. As hardware is expensive and fragile, the power to run thousands of iterations simultaneously is lost. There are many instances where inverse kinematics is used outside of the RoboCup domain. Robotic arms, especially in manufacturing processes where the system can be closed in a reasonable fashion but is also open to noise from slight changes. In our case, the objective is strongly altered by external variables like other robots knocking into the kicking one and noise in localization. Because of this, a fix based skill is very suceptible to breaking, but the inverse kinematics of the system becomes harder to tweak. The simulation league allows us to create a kick that can be run in parallel through slight modifications and the use of CMA-ES [8] to learn and modify control points. With examples like the great increase in the speed of the UT Austin Villa walk and even increases in fixed keyframe based skills, we can 3

optimize the parameterized curves of a kick. 2 Inverse Kinematics Kicking A solution to the issue of abstraction is the use of inverse kinematics as another layer between the skill of kicking to optimize and the skill itself. Allowing the shape of the actual robot to be computed analytically, we no longer need to worry about how to actually achieve certain poses of the robot, but rather just the path and objective that we want to perform. Though this still requires the ability to know about the structure of the robot, it separates the knowledge of the shape and effector function away from the skill. On a higher level, it creates a means to clearly represent the abstract idea of being able to kick not only forward, but at different angles away from having to actually sit down and think about how effectors would have to rotate in order to get the foot around the ball. 2.1 Fixed Waypoints Figure 1: The degrees of freedom represented in the initial inverse kinematics joint movements of the agent for placing the leg toward the ball to then try to extend the leg to reach the ball The first approach we implemented to extend this system consisted of taking a simplified inverse kinematic model of each leg of the agent and using 4

sets of way-points to kick through the ball. For instance, the forward kick we developed used three waypoints; one behind the robot to pull back the leg and gain momentum, one at the position of the ball, and one through it. This proved to work well for the forward and backwards kicks that assumed a fixed position of the ball but due to the complexity of the robot, it required precise localization and orientation relative to the ball. This should be expected as to accurately be able to kick the ball, it should be at a position relative to the agent that is consistent, but in this case, it proved to constrain the system and still did not allow us to create a kick that would be versatile enough in comparison to a keyframe based skill. The main advantage that was gained through this approach was that the kick was able to respond to noise throughout its trajectory as the waypoints were all created to be relative to the ball. Upon moving the ball such that it did not become out of reach for the agent, the kick was dynamically modified to sustain through the ball. Again though, this did not represent a substantial gain as it still was too slow and it especially did not allow any progress towards omni-directionality. With only a simplified model and the ability to only control the extension of the leg without rotation of the foot, this approach fell short in its inability to fully detail angled kicks that require precise control. It also did not allow any noise in approach angle as any rotation offset would be strongly reflected in the outcome of the kick. Figure 2: A path represented by the kick. 1) Lift leg from center 2) Pull leg back 3) Bring leg back to position of ball 4) Kick through 2.2 Extended Inverse Kinematics In an effort to extend the simplified inverse kinematics used in the initial fixed point process we move to make the system fully model the shape of the agents leg. Incorporated in a project with Nicu Stiurca, we modeled the inverse kinematics of the agent through OpenRAVE s [2] kinematics solver such that we could quickly request any location to place the foot with a 3D 5

position and rotation by roll, pitch, and yaw. This is a bit separate from the current code base and we intend to reintegrate this portion due to its external dependency, but it can be easily replaced with any other kinematics solver and shows the utility in being able to abstract out the motion of the legs. The value in this approach also lies in the ability to be able to take any newly defined agent regardless of size or design and be able to incorporate the same learning. 2.3 Curve Based Learning To address the issue of complexity in associating points and effector locations, we moved to using a system of waypoints interpolated by a Cubic Hermite Spline curve through all the points. The objective of this was to more accurately be able to define paths around the ball and include precision in learned kicks that is lost through an oversimplification of just keyframe points. Although we tried several different types of interpolating curves, we decided upon a simplified Cubic Hermite Spline abstraction as it gave the ability to pass through all waypoints and to do so in a shape more natural to the ability and inverse kinematic extension of the agent. This system provided such accuracy that we were able to develop a skill to step over and flick the ball behind the agent, though it required precise localization to be able to achieve the kick and did not seem to have any current useful application apart from testing the correctness of the system. These kicks were incorporated with an extension to the already developed skill base system which allows editing text based files that are loaded in every time the agent is run. With the abstraction of these skills, several iterations can be run at once without the need to re-compile the agent fully. 2.3.1 Interpolated Curves For each designed kick skill that we intend to create we established a series of inverse kinematics based control points that we intend the agents foot to pass through. These are intended mainly as seed values but allow us to start the agent s learning process at a much higher level without the need to learn a shape that we expect the foot will need to follow. As these are relative to the ball, they again remain independent from the structure of the agent itself. Given each set of control points, we interpolate using a simplified Cubic Hermite Spline based on the function such that for a given time t, the point q between two points on the curve will be: 6

q = p i h 1 + p i+1 h 2 + (p i+1 p i 1 ) h 3 + (p i+2 p i ) h 4 (1) where: h 1 (t) = 2t 3 3t 2 + 1 (2) h 2 (t) = 2t 3 + 3t 2 (3) h 3 (t) = t 3 2t 2 + t (4) h 4 (t) = t 3 t 2 (5) To then decide when to kick, based on a relative location and distance from the ball, a kick is initially selected as described in [reference aamas2012 paper] where the cost of a kick is defined by its offset rotation and expected placement relative to the agent. When the agent is close enough to the ball, it shifts its weight onto the support foot and computes the kicking foot trajectory necessary to perform the desired kick. At each time step during the kick, the kick engine interpolates the control way-points defined in the kick skill file to produce a target pose for the foot with respect to the agent. The IK system previously defined then computes the necessary joint angles of the kicking leg, and these angles are fed to the joint PID controllers. Figure 3 illustrates the program flow of the kick engine. 2.3.2 Directional Kicks We defined five kicks that assume that the ball is in front of the agent such that it can kick directly forward and at 45 and 90 angles either outward or inward, depending on which leg is used. We also created directional kicks which assume that the ball is to the side of or behind one of the legs. See Figure 4. 2.4 CMA-ES In previous years, UT Austin Villa established a robust structure for using CMA-ES (Covariance Matrix Adaptation Evolution Strategy) [3], an evolutionary algorithm used to find maxima from up to 300 open parameters. This fits in nicely to what we need for kicking as we look to open about 7

Figure 3: The flow of the agent deciding when to kick the ball and how to interpolate the curve created relative to the ball. forty parameters and evaluate a fitness function over these parameters in an efficient fashion. Using the condor job queuing system, we can then run 200 generations of 150 iterations in a matter of hours. This depends strongly on the jobs being run, but effectively over night, we can alter the parameters of a given skill to be optimized to an acceptable local maxima. Though we cannot ensure that this is the optimal set of parameters, it is generally within reasonable range for what we require and has provided excellent results in the past. Particularly, the UT Austin Villa 2011 team optimized a walk using this structure and was able to win the competition with limited kicking. In fact, the only kicking used was run in the first rounds of the 2011 Instanbul RoboCup and the final games, the team moved to only using dribbling as it proved to be much more consistent. With the ability to dribble past everyone and with the kick being too slow, it was not a viable option against other fast teams to try and lose control of the ball to line up a kick. 8

Figure 4: The agent can dynamically kick the ball in varied directions with respect to the placement of the ball at a, b, and c. 2.5 Kick Optimization The kick is optimized through adjusting each control point s relative position and RPY to the placement of the ball. We also allow the position that the agent must move to with respect to the ball to be altered so that both placement and trajectory are optimized. For a given run of any optimization an agent s fitness is computed over ten iterations at different placements around the ball where it needs to run up to the ball and kick it towards the opponents goal. The fitness of the agent is then decided upon by how far it can kick the ball from its starting point. The main optimization run using CMA-ES was established through placing the agent at 10 different points around the ball within a fixed radius. Running 200 generations each with 150 randomized agents, each simulation run essentially performs 300,000 kicks within a matter of hours. The advan- 9

tage of this approach is that the tweaked kicked gives a performance much greater than that which would be able through human interaction. Though there is a fair amount of overhead in wasted kicks running only 1,000 different kicks would take on individual several days independent of tweaking and finding potential variables to alter. 2.6 Interesting Side Effects of Learning An interesting side effect of the learning that we encountered is that pulling the leg back for momentum does not seem to give the agent more strength as it seems to be able to accelerate its effectors quickly through the ball. This would be a good instance to study in the transition to better represent the agent when shifting to a more realistic model as this lack of momentum would greatly alter the ability of a real robot to kick through a ball. Another interesting effect that we experienced was that of allowing the agent to fall when kicking. In many instances this can give the agent an advantage, but this too would be something inadmissable in real robots due to cost and in some ways could hinder the speed of the simulated agent in having to get back up and reposition. The expectation was that inverse kinematics would provide enough range to be able to kick quickly and effectively, but we found that much of the issue was due to inaccuracy and thrashing in positioning. We then opened parameters to optimize the approach to the ball, which greatly increased performance. 2.7 Positioning Positioning became a key aspect toward getting the inverse kinematics kick to trigger effectively. In many ways it also helped with fixed based effector motion kicks as well, but that was more of a focus toward the competition than for this investigation. We thus incorporated the notion of a point relative to the ball to approach when trying to trigger an inverse kinematic kick. If we achieve a threshold along the path from the stand position directly to the ball and a threshold on the vector perpendicular, we would accept the kick Figure 5. Establishing the initial seed was fairly straightforward in this sense as we were able to just place small thresholds on the placement relative to the ball and by having the fitness function for the optimization penalize time spent on positioning, these thresholds grew dynamically. 10

Figure 5: As the agent approaches a point relative to the ball to execute a kick it must satisfy both a distance in the direction towards the ball and perpendicular to it. 3 Performance 3.1 One Versus Two The most basic scenario that we established to be able to evaluate the effectiveness of created kicks was that of a modified penalty shootout. With our kicking agent close to the ball, one closely behind, and the goal-keeper in front of the goal, the purpose is to measure the speed and accuracy of the kick by the number of successes in a run. Attempting to dribble through the goal-keeper would generally fail by the goal-keeper simply intercepting the path of the running agent, thus allowing the secondary defensive agent to catch up to the ball and the offensive agent becoming hopeless against two others. With kicks developed by fixed keyframe motion, the intuition is that due to more set up time, the kick would not be successfully released due to time wasted aligning and the defensive secondary opponent catching up to the offensive one. 11

Figure 6: The ratio of success is defined by having either scored a goal within 30 seconds or not. 3.2 Optimization and Performance The forward kick was optimized to reach an average distance of around four meters. Though this is not near the optimized fixed based forward kick that we optimized to be able to achieve around twelve meters, the kick is substantially more reliable and is able to get a kick off when another agent is close. In the one versus two scenario previously described, the agent is able to get the kick off fast enough that the secondary opponent can not catch up. Even in the case where the secondary agent does catch up, the attacking agent is able to kick past the goal with good consistency. Kicking naively did not achieve optimal results as oftentimes the goal would stop the ball and the defender would catch up and trip up the offensive agent before being able to achieve another kick. We then focused on the idea of kicking only when there is enough time to do so. This proved to be the best method for scoring in the one versus two scenario achieving a success rate of.56 with an opponent starting.5m behind the offensive agent whereas dribbling only succeeded with a rate of.34, and the fixed effector kick with a.28 rate of success. Each run over 60 iterations, this proved to be a statistically significant result that showed that the kick could actually be used appropriately in such a situation. Though the existence of a situation does not necessarily imply a great step as that situation may not be entirely significant, we can also argue that the ability to be able to kick under pressure is the main objective towards kicking effectively. If we have enough time to do so, using an effector based skill that has been optimized and waiting to position correctly would be the appropriate action, but this is much less often the case during a game, especially under the circumstance that the other opponent can not be beaten by only dribbling through its agents. Another evaluation we performed was an optimization behavior we defined as drive ball to goal where the agent starts next to the ball at the center of the field and scores as quickly as possible. The fitness is then evaluated as 12

Figure 7: The amount of time taken to get the ball to the goal in number of seconds. the time that it takes to make a goal. The expected behavior is that if a kick is good enough, the able should be able to kick the ball far away, sprint to it, and continue the process until the goal is scored. This should be expected to work better than just dribbling the ball, but running 500 iterations for each dribbling, kicking with the inverse kinematics quick kick, and kicking with an optimized fixed effector based kick, we found that the kick did not achieve what we hoped for. In some ways this is expected in that the optimized inverse kinematics kick can achieve around four meters at a time, and as the distance to the goal from the start is 10.5m, three kicks would essentially have to be executed, thus causing the agent to accelerate down to and from the kick back to a sprint. Thus, dribbling still presented better results. The fixed based effector run did not do well due to it s inconsistency and effect of having to stand relative to the ball much more precisely. It did achieve extremes of almost instant goals due to being able to score directly from the starting position in extreme cases. We evaluated all runs penalizing for sending the ball out of bounds as in a normal game, and with out of bounds not having any effect and restarting the run if this occurred. This allowed us to compare consistency and especially the situation where the strong fixed effector kick would kick over the goal and out of bounds. Without allowing for out of bounds dribbling scored in an average of 24.29 seconds, using the inverse kinematic kick took 31.66 seconds, and using the strong fixed effector kick it achieved an average of 31.34 seconds. Again, the difference in the strong kick versus the quick one is strongly skewed by the ability to score directly from the start, and though this is an interesting point to look at, it does not provide all of the aspects of what we are looking for in that distance is not the most important metric. We also ran one hundred games against four of the best teams in the simulated league (Apollo3D, Cit3D, Boldhearts, and RoboCanes) using both an agent that could kick and one that would only 13

Figure 8: Goal differential displayed as number of goals scored over an entire game. dribble. This achieved very poor results, in the extreme case, dropping the performance of the agent against Apollo3D from an average goal differential of 1.24 goals for to a rate of.02 goals against. Though this is an appropriate metric and the one that will most heavily weigh decisions for continuing iterations, this result is not surprising. We have not yet optimized for the decision of when to kick and as shown through the drive ball to goal, kicking to drive the ball without the notion of passing or kick anticipation from teammates, will produce slower results. Again, this evaluation does provide detail to the effect of the kick itself, but the current focus is the basic skill to then be extended for use in broader situations. 4 Conclusions Overall we were able to get a kick that achieved what we hoped for. There are several assumptions, though that are not completely satisfactory and tend to break real world assumptions that were learned in simulation. The forward kick s learned inverse kinematics curve does not use any swing as the initial seed suggested it use. This is because in simulation there is no need for momentum with unrealistic torque able to be applied to effectors. The kick essentially walks through the ball by falling forward and kicking through it as it falls. We attempted to penalize the agent for falling forward but this heavily affected the quality of the kick as it seems that the agent being able to place its foot on the ball for more time allows much better range and control. Ideally the kick would be able to have the support foot step to the side of the ball, thus allowing it this extended time, but we were not able to achieve such accuracy in the walk. Whether falling is eventually more expensive is something that we still need to measure, but the main 14

objective focus of this is whether inverse kinematics would help. Inverse kinematics did allow in comparison to a fixed based skill a much larger range of motion in the ability to kick the ball and thus in a binary fashion allowed the ability to get a kick off as opposed to not being able to do so at all. It is possible that dribbling in a game is still better than being able to kick, but the hope is that taking steps toward improving the kick will eventually allow us instances where we can perfect ideas such as passing and localization to then be able to intellegently allow the agent to get kicks off that could be helpful. More specifially we were able to create a fixed abstraction of a kick based on a simplified path. This will greatly help in the future when trying to create different skills. It is possible to create very focused skills such as flicking the ball behind the player and to the sides, and the abstraction for the seed values of this is set. When the need for a specific skill arises the use of inverse kinematics based curves greatly helped the ability to be able to teach the agents the skill of kicking and it should help with any subsequent skills that need to be learned. 5 Future Work and Extensions Currently the only kick that has been optimized and shown to have the potential to replace a fixed keyframe based skill is that of the front kick. We expect to continue extending this in the near future to run the directional kicks that we created through similar optimizations of CMA-ES to evaluate the performance and potential enhancement of being able to choose from a set of kicks as opposed to just having to position fairly well for a single forward kick. A simple extension that would greatly increase performance is the ability to allow for anticipation of the location of the ball at a given time. As we now have a kick that can execute off of fairly loose parameters in placement, we can quickly begin a kick when we expect the ball will reach a point. Though in this domain this is not currently a major point of issue as the ball comes to a stop quickly, it could greatly affect the outcome of kicks when passing is properly implemented. Another simple extension would be the implementation of varied confidence in a kick by allowing the agent to kick from the given reach that it has been optimized to run to but having it be able to try to get a kick off even if it may not reach optimal distance. In the case of a game, if there is an opponent running at the ball, it is better to get it off quickly with lower confidence than aligning the kick perfectly as opposed to not being able to 15

move the ball at all. One extension that is seen little in the RoboCup league is that of the use of kicking for passing. As dribbling speed has dominated the league, passing has mainly been a hindrance. Under the assumption that using varied kicks would improve speed of getting kicks off, even if they are not perfectly accurate, it would be greatly beneficial to the team to be able to set up formations where the agents actually are able to pass of the ball to create plays. Potentially the most valuable extension of this work would be to take the learned behavior and test it on different robots with completely different structure and shape. This is mentioned earlier, but not in full detail. The ability to have heterogeneous robots could greatly alter the performance of a team. More broadly, the ability to have a fixed abstraction of curves that are easy to visualize and alter is both an easy way to teach robots a skill that they can then optimize through systems such as CMA-ES. 16

References [1] RoboCup. http://www.robocup.org/. [2] Rosen Diankov and James Kuffner. Openrave: A planning architecture for autonomous robotics. Technical Report CMU-RI-TR-08-34, Robotics Institute, Pittsburgh, PA, July 2008. [3] Nikolaus Hansen. The CMA Evolution Strategy: A Tutorial, January 2009. http://www.lri.fr/~hansen/cmatutorial.pdf. [4] Ashleigh Thomas Ross Boczar Alyin Caliskan Alexandra Lee Anirudha Majumdar Roman Shor Barry Scharfman Dan Lee Jordan Brindza, Levi Cai. Upennalizers robocup standard platform league team report 2010. Technical report, University of Pennsylvania, 2010. [5] Shivaram Kalyanakrishnan and Peter Stone. Learning complementary multiagent behaviors: A case study. In RoboCup 2009: Robot Soccer World Cup XIII, pages 153 165. Springer, 2010. [6] Luis Paulo Reis Luis Cruz and Luis Rei. Generic optimisation of humanoid robots behaviours. In EPIA 2011, pages 385 397, 2011. [7] Judith Muller, Tim Laue, and Thomas Rofer. Kicking a ball - modeling complex dynamic motions for humanoid robots. In RoboCup 2010: Robot Soccer World Cup XIV, volume 6556, pages 109 120. [8] Daniel Urieli, Patrick MacAlpine, Shivaram Kalyanakrishnan, Yinon Bentor, and Peter Stone. On optimizing interdependent skills: A case study in simulated 3D humanoid robot soccer. In Proc. of the Tenth Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2011), pages 769 776, May 2011. 17