Policy Gradient RL to learn fast walk

Similar documents
AIR FORCE INSTITUTE OF TECHNOLOGY

A new AI benchmark. Soccer without Reason Computer Vision and Control for Soccer Playing Robots. Dr. Raul Rojas

Genetic Algorithm Optimized Gravity Based RoboCup Soccer Team

Inverse Kinematics Kicking in the Humanoid RoboCup Simulation League

Machine Optimisation of Dynamic Gait Parameters for Bipedal Walking

Dynamically stepping over large obstacle utilizing PSO optimization in the B4LC system

ZSTT Team Description Paper for Humanoid size League of Robocup 2017

Sensing and Modeling of Terrain Features using Crawling Robots

The Incremental Evolution of Gaits for Hexapod Robots

intended velocity ( u k arm movements

In memory of Dr. Kevin P. Granata, my graduate advisor, who was killed protecting others on the morning of April 16, 2007.

University of Amsterdam. Faculty of Science The Netherlands. Dutch Nao Team. Technical Report. October

Written Report of Robot Design

Autonomous blimp control with reinforcement learning

Modeling and Learning Walking Gaits of Biped Robots

Robotics and Autonomous Systems

Design of Experiments Example: A Two-Way Split-Plot Experiment

Robotics and Autonomous Systems

Dynamic Lateral Stability for an Energy Efficient Gait

IEEE RAS Micro/Nano Robotics & Automation (MNRA) Technical Committee Mobile Microrobotics Challenge 2016

Learning Locomotion Test 3.2 Results

Robot Walking with Genetic Algorithms

Design of a double quadruped for the Tech United soccer robot

Neural Nets Using Backpropagation. Chris Marriott Ryan Shirley CJ Baker Thomas Tannahill

Lab: Relative Velocity (Speed)

Thresholded-Rewards Decision Problems: Acting Effectively in Timed Domains

Controlling Walking Behavior of Passive Dynamic Walker utilizing Passive Joint Compliance

An experimental validation of a robust controller on the VAIMOS autonomous sailboat. Fabrice LE BARS

Optimization and Search. Jim Tørresen Optimization and Search

2 The Arena The arena consists of a field with a random forest of obstacles, and a goal location marked by a high contrast circle on the ground:

OPTIMAL TRAJECTORY GENERATION OF COMPASS-GAIT BIPED BASED ON PASSIVE DYNAMIC WALKING

Real Time Bicycle Simulation Study of Bicyclists Behaviors and their Implication on Safety

Add this important safety precaution to your normal laboratory procedures:

Robot motion by simultaneously wheel and leg propulsion

UT Austin Villa: RoboCup D Simulation League Competition and Technical Challenges Champions

Sensitivity to the Gain of Optic Flow During Walking

Computer aided sand wedge shape assessment by using particle simulation

Gait Analysis of Wittenberg s Women s Basketball Team: The Relationship between Shoulder Movement and Injuries

An Investigation of Dynamic Soaring and its Applications to the Albatross and RC Sailplanes

Wind turbine Varying blade length with wind speed

Vertical Wind Energy Engineering Design and Evaluation of a Twisted Savonius Wind Turbine

Experimental Investigation of Dynamic Load Control Strategies using Active Microflaps on Wind Turbine Blades.

Decentralized Autonomous Control of a Myriapod Locomotion Robot

DEVELOPING locomotion controllers for legged robots is

Learning to Imitate a Fighter Pilot

TASK BOOK REPORT FORM

UT Austin Villa: RoboCup D Simulation League Competition and Technical Challenge Champions

Mixed Reality Competition Rules

Equine Results Interpretation Guide For Cannon Angles May 2013

485 Annubar Primary Flow Element Installation Effects

Adaptive Motor Patterns and Reflexes for Bipedal Locomotion on Rough Terrain*

Experimental Characterization and Modeling of Helium Dispersion in a ¼-Scale Two-Car Residential Garage

Efficient Gait Generation using Reinforcement Learning

Supplementary Figure 1 An insect model based on Drosophila melanogaster. (a)

Template Document of Comparison Tables Generated by the BBOB 2012 Post-Processing: Layout for the BBOB 2012 Noisy Testbed

FUT-K Team Description Paper 2016

Dynamic Omnidirectional Kicks on Humanoid Robots

Know Thine Enemy: A Champion RoboCup Coach Agent

Sample Solution for Problem 1.a

ICS 606 / EE 606. RoboCup and Agent Soccer. Intelligent Autonomous Agents ICS 606 / EE606 Fall 2011

Sony Four Legged Robot Football League Rule Book

Geophysical Model Functions for the Retrieval of Ocean Surface Winds

1. Outline of the newly developed control technologies

GAIT PARAMETER ADAPTATION TO ENVIRONMENTAL PERTURBATIONS IN QUADRUPEDAL ROBOTS

Failure Detection in an Autonomous Underwater Vehicle

Autodesk Moldflow Communicator Process settings

First Experimental investigations on Wheel- Walking for improving Triple-Bogie rover locomotion performances

Executive Summary of Accuracy for WINDCUBE 200S

Karachi Koalas 3D Simulation Soccer Team Team Description Paper for World RoboCup 2013

Overview. 2 Module 13: Advanced Data Processing

Combining wind and rain in spaceborne scatterometer observations: modeling the splash effects in the sea surface backscattering coefficient

SHUFFLE TURN OF HUMANOID ROBOT SIMULATION BASED ON EMG MEASUREMENT

PHYS Tutorial 7: Random Walks & Monte Carlo Integration

Towards Pronking with a Hexapod Robot

Traffic circles. February 9, 2009

Objectives. Materials TI-73 CBL 2

Biomechanics and Models of Locomotion

LOCOMOTION CONTROL CYCLES ADAPTED FOR DISABILITIES IN HEXAPOD ROBOTS

Transpiration. DataQuest OBJECTIVES MATERIALS

Development of Fish type Robot based on the Analysis of Swimming Motion of Bluefin Tuna Comparison between Tuna-type Fin and Rectangular Fin -

Reliable and Precise Gait Modeling for a Quadruped Robot

Dynamic Positioning Control Augmentation for Jack-up Vessels

Kinematics & Dynamics

Testing of hull drag for a sailboat

Sony Four Legged Robot Football League Rule Book

Sontek RiverSurveyor Test Plan Prepared by David S. Mueller, OSW February 20, 2004

Design of Monitoring System Step Walking With MPU6050 Sensor Based Android

3D Inversion in GM-SYS 3D Modelling

Next Generation Quartz Pressure Gauges

Motion Control of a Bipedal Walking Robot

EVOLVING HEXAPOD GAITS USING A CYCLIC GENETIC ALGORITHM

ZIPWAKE DYNAMIC TRIM CONTROL SYSTEM OUTLINE OF OPERATING PRINCIPLES BEHIND THE AUTOMATIC MOTION CONTROL FEATURES

Re: ENSC 440 Functional Specification for the Wall Climbing Robot. Dear Mr. Leung,

A Layered Approach to Learning Client Behaviors in the RoboCup Soccer Server

Modeling of Scroll Compressor Using GT-Suite

GOLFER. The Golf Putting Robot

Practical Bipedal Walking Control on Uneven Terrain Using Surface Learning and Push Recovery

LegenDary 2012 Soccer 2D Simulation Team Description Paper

RoboCup Standard Platform League (NAO) Technical Challenges

Results and conclusions of a floating Lidar offshore test

Transcription:

Policy Gradient RL to learn fast walk Goal: Enable an Aibo to walk as fast as possible

Policy Gradient RL to learn fast walk Goal: Enable an Aibo to walk as fast as possible Start with a parameterized walk Learn fastest possible parameters

Policy Gradient RL to learn fast walk Goal: Enable an Aibo to walk as fast as possible Start with a parameterized walk Learn fastest possible parameters No simulator available: Learn entirely on robots Minimal human intervention

Walking Aibos Walks that come with Aibo are slow RoboCup soccer: 25+ Aibo teams internationally Motivates faster walks

Walking Aibos Walks that come with Aibo are slow RoboCup soccer: 25+ Aibo teams internationally Motivates faster walks Hand-tuned gaits [2003] Learned gaits German UT Austin Hornby et al. Kim & Uther Team Villa UNSW [1999] [2003] 230 mm/s 245 254 170 270 (±5)

A Parameterized Walk Developed from scratch as part of UT Austin Villa 2003 Trot gait with elliptical locus on each leg

Locus Parameters z x y Ellipse length Ellipse height Position on x axis Position on y axis Body height Timing values 12 continuous parameters

Locus Parameters z x y Ellipse length Ellipse height Position on x axis Position on y axis Body height Timing values 12 continuous parameters Hand tuning by April, 03: 140 mm/s Hand tuning by July, 03: 245 mm/s

Experimental Setup Policy π = {θ 1,...,θ 12 }, V (π) = walk speed when using π

Experimental Setup Policy π = {θ 1,...,θ 12 }, V (π) = walk speed when using π Training Scenario Robots time themselves traversing fixed distance Multiple traversals (3) per policy to account for noise

Experimental Setup Policy π = {θ 1,...,θ 12 }, V (π) = walk speed when using π Training Scenario Robots time themselves traversing fixed distance Multiple traversals (3) per policy to account for noise Multiple robots evaluate policies simultaneously Off-board computer collects results, assigns policies

Experimental Setup Policy π = {θ 1,...,θ 12 }, V (π) = walk speed when using π Training Scenario Robots time themselves traversing fixed distance Multiple traversals (3) per policy to account for noise Multiple robots evaluate policies simultaneously Off-board computer collects results, assigns policies No human intervention except battery changes

Policy Gradient RL From π want to move in direction of gradient of V (π)

Policy Gradient RL From π want to move in direction of gradient of V (π) Can t compute V (π) θ i directly: estimate empirically

Policy Gradient RL From π want to move in direction of gradient of V (π) Can t compute V (π) θ i directly: estimate empirically Evaluate neighboring policies to estimate gradient Each trial randomly varies every parameter

Policy Gradient RL From π want to move in direction of gradient of V (π) Can t compute V (π) θ i directly: estimate empirically Evaluate neighboring policies to estimate gradient Each trial randomly varies every parameter

Experiments Started from stable, but fairly slow gait Used 3 robots simultaneously Each iteration takes 45 traversals, 7 1 2 minutes

Experiments Started from stable, but fairly slow gait Used 3 robots simultaneously Each iteration takes 45 traversals, 7 1 2 minutes Before learning After learning 24 iterations = 1080 field traversals, 3 hours

Results 300 Velocity of Learned Gait during Training Learned Gait (UT Austin Villa) 280 Velocity (mm/s) 260 240 220 Learned Gait (UNSW) Hand tuned Gait (UNSW) Hand tuned Gait (UT Austin Villa) Hand tuned Gait (German Team) 200 180 0 5 10 15 20 25 Number of Iterations

Results 300 Velocity of Learned Gait during Training Learned Gait (UT Austin Villa) 280 Velocity (mm/s) 260 240 220 Learned Gait (UNSW) Hand tuned Gait (UNSW) Hand tuned Gait (UT Austin Villa) Hand tuned Gait (German Team) 200 180 0 5 10 15 20 25 Number of Iterations Additional iterations didn t help Spikes: evaluation noise? large step size?

Learned Parameters Parameter Initial ǫ Best Value Value Front ellipse: (height) 4.2 0.35 4.081 (x offset) 2.8 0.35 0.574 (y offset) 4.9 0.35 5.152 Rear ellipse: (height) 5.6 0.35 6.02 (x offset) 0.0 0.35 0.217 (y offset) -2.8 0.35-2.982 Ellipse length 4.893 0.35 5.285 Ellipse skew multiplier 0.035 0.175 0.049 Front height 7.7 0.35 7.483 Rear height 11.2 0.35 10.843 Time to move through locus 0.704 0.016 0.679 Time on ground 0.5 0.05 0.430

Algorithmic Comparison, Robot Port Before learning After learning

Summary Used policy gradient RL to learn fastest Aibo walk All learning done on real robots No human itervention (except battery changes)

Outline Learning sensor and action models [Stronger, S, 06] Machine learning for fast walking [Kohl, S, 04] Learning to acquire the ball [Fidelman, S, 06] Color constancy on mobile robots [Sridharan, S, 05] Autonomous Color Learning [Sridharan, S, 06]

Grasping the Ball Three stages: walk to ball; slow down; lower chin Head proprioception, IR chest sensor ball distance Movement specified by 4 parameters

Grasping the Ball Three stages: walk to ball; slow down; lower chin Head proprioception, IR chest sensor ball distance Movement specified by 4 parameters Brittle!

Parameterization slowdown dist: when to slow down slowdown factor: how much to slow down capture angle: when to stop turning capture dist: when to put down head

Learning the Chin Pinch Binary, noisy reinforcement signal: multiple trials Robot evaluates self: no human intervention

Results Evaluation of policy gradient, hill climbing, amoeba 100 90 80 successful captures out of 100 trials 70 60 50 40 30 20 10 policy gradient amoeba hill climbing 0 0 2 4 6 8 10 12 iterations

What it learned Policy slowdown slowdown capture capture Success dist factor angle dist rate Initial 200mm 0.7 15.0 o 110mm 36% Policy gradient 125mm 1 17.4 o 152mm 64% Amoeba 208mm 1 33.4 o 162mm 69% Hill climbing 240mm 1 35.0 o 170mm 66%