Learning to Imitate a Fighter Pilot

Similar documents
Multi-Reactive Planning for Real-Time Strategy Games

Commercial Maneuvers for PA28RT-201

Kinematics & Dynamics

CSC242: Intro to AI. Lecture 21

CS472 Foundations of Artificial Intelligence. Final Exam December 19, :30pm

FIRING ON THE GROUND TARGETS USING AIR UNGUIDED MISSILES

Intro. Basics. Pilots. Fighters

Evaluating and Classifying NBA Free Agents

Policy Gradient RL to learn fast walk

TRIANGLES AND THIRD MAN RUNS GUIDE


Procedures for Off-Nominal Cases: Three Closely Spaced Parallel Runway Operations

Possession Playing Away From Pressure

FixedWingLib CGF. Realistic CGF Aircraft Entities ware-in-the-loop Simulations

Riding From The Snow Up 2/10/02. Riding From the Snow Up. Ron LeMaster 2/10/02. The Text Book. Ron LeMaster 2002 All rights reserved 1

Comparison of Control Methods: Learning Robotics Manipulation with Contact Dynamics

OPTIMIZATION OF MULTI-STAGE DECISION-MAKING PROCESS AT THE GAME MARINE ENVIRONMENT

Final Geography Project Come Fly With Me

Investigation on 3-D Wing of commercial Aeroplane with Aerofoil NACA 2415 Using CFD Fluent

Beechcraft Duchess 76 Maneuver Notes

CIVIL AIR PATROL United States Air Force Auxiliary Cadet Program Directorate. Cessna 172 Maneuvers and Procedures

Soccereus 2D Simulation Team Description Paper

Biomechanics and Models of Locomotion

Calspan Loss-of-Control Studies Using In-flight Simulation. Lou Knotts, President November 20, 2012

intended velocity ( u k arm movements

Introduction to the 2013 Edition

Dräger Swede Survival System Phase 1 Flashover Development Observation

A Case Study on Improving Defense Behavior in Soccer Simulation 2D: The NeuroHassle Approach

Transformational Safety Leadership. By Stanley Jules

Article II - Double Disc Court Revised March 23, 2016

Optimization and Search. Jim Tørresen Optimization and Search

Sports Simulation Simple Soccer

A Simple Horizontal Velocity Model of a Rising Thermal Instability in the Atmospheric Boundary Layer

Special Condition. 10 Minutes OEI Take-off Thrust at High Ambient Temperature rating

Instrumentation (and

Golf. By Matthew Cooke. Game Like Training

Numerical modeling of refraction and diffraction

Proposed Special Condition for the OEI High Ambient Take-Off Temperature rating

Logistic Regression. Hongning Wang

DEVELOPING COMBINED EVENT ATHLETES IN THE COLLEGIATE SYSTEM. Nate Davis Assistant Coach Combined Events, PV & HJ University of Wisconsin

Year 10 Key Performance Indicators Physical Education (Invasion Games)

Deakin Research Online

Spacecraft Simulation Tool. Debbie Clancy JHU/APL

Classroom Tips and Techniques: The Partial-Fraction Decomposition. Robert J. Lopez Emeritus Professor of Mathematics and Maple Fellow Maplesoft

steps to designing effective practice

Safety management The Firefighter Safety Maxim Risk assessment at an incident Tactical mode Emergency evacuation and tactical withdrawal

Modelling the distribution of first innings runs in T20 Cricket

THE ARCHANGEL STANDARD OPERATING PROCEDURE: MILSIM SQUAD VERSION 1

The American Kitefliers Association

Release: 1. UEPOPL002A Licence to operate a reciprocating steam engine

RYSA PLAYER DEVELOPMENT CURRICULUM

Stability & Control Aspects of UCAV Configurations

Building an NFL performance metric

Marking Guidelines 2009 examination June series. Physics Investigative Skills Assignment (ISA) Q. General Certificate of Education

A Methodology for Using Traffic Generators with Real-Time Constraints. Avinash Mehta Sr. Research and Development Engineer Synopsys India Pvt. Ltd.

The KING S Medium Term Plan Physical Education GIRLS Football Program Module

Assessment will involve students in individual and group scenarios.

The learning of complex whole body activity (Downhill skiing) by simulation

FUEL GAS FIRING CONTROL RJ (Dick) Perry Safety Systems Consultant 6 June 2016

Joseph Luxbacher, PhD Director of Coaching USC Travel Soccer Week #4 Curriculum. Small - Sided Games to Reinforce All Fundamental Skills

/435 Artificial Intelligence Fall 2015

Time-Delay Electropneumatic Applications

SoccerDrive.com Tryout Resources

THE CANDU 9 DISTRffiUTED CONTROL SYSTEM DESIGN PROCESS

CESSNA 172-SP PRIVATE & COMMERCIAL COURSE

1. A tendency to roll or heel when turning (a known and typically constant disturbance) 2. Motion induced by surface waves of certain frequencies.

Golf s Modern Ball Impact and Flight Model

Surf Soccer Curriculum

Introduction. AI and Searching. Simple Example. Simple Example. Now a Bit Harder. From Hammersmith to King s Cross

Review questions CPSC 203 midterm

AltMoC SPO.SPEC.HESLO.100 Standard Operating Procedures

UK rated gliding competition scoring explained

MANUAL KPS Pressure Control Valve

At each type of conflict location, the risk is affected by certain parameters:

TRANSITION GAMES TO TEACH THE FOUR GAME SITUATION ROLES

Leakage Current Testing Is it right for your application?

SOCCER TRAINING POTENTIONAL OR REAL INFORMATION

1. The deer in the pictures are numbered. Put the number next to the name that identifies each deer.

LegenDary 2012 Soccer 2D Simulation Team Description Paper

Analysis of hazard to operator during design process of safe ship power plant

Coaching Kicking Session 1

MOVEMENT QUALITY OF MARTIAL ART OUTSIDE KICKS. Manfred Vieten and Hartmut Riehle University of Konstanz, Konstanz, Germany

The canard. Why such a configuration? Credit : Jean-François Edange

Open Research Online The Open University s repository of research publications and other research outputs

EDEXCEL NATIONALS UNIT 6 MECHANICAL PRINCIPLES and APPLICATIONS. ASSIGNMENT No. 4

Introduction Roundabouts are an increasingly popular alternative to traffic signals for intersection control in the United States. Roundabouts have a

Using Mini Games as a Teaching Tool Tom Hilbert, Colorado State University

CFD ANALYSIS OF FLOW AROUND AEROFOIL FOR DIFFERENT ANGLE OF ATTACKS

The Modified Tank Ammunition

Make It Look Like Soccer. Prepared by Tom Goodman, M.Ed. Consultant to the Northeast Soccer League (NSL)

Planning and Acting in Partially Observable Stochastic Domains

The Science of Quantitative Risk Assessment for Explosives Safety

Rapid Fire Striking & Multiple Target Hitting

Land Based Marine Fire Fighter Task Book

14 The Divine Art of Hovering

Necromunda Frequently Asked Questions and Errata

DECISION-MAKING ON IMPLEMENTATION OF IPO UNDER TOPOLOGICAL UNCERTAINTY

Computer Integrated Manufacturing (PLTW) TEKS/LINKS Student Objectives One Credit

Flight Dynamics II (Stability) Prof. Nandan Kumar Sinha Department of Aerospace Engineering Indian Institute of Technology, Madras

Attacking and defending neural networks. HU Xiaolin ( 胡晓林 ) Department of Computer Science and Technology Tsinghua University, Beijing, China

Transcription:

Learning to Imitate a Fighter Pilot Shie Mannor Department of Electrical Engineering Technion ICML July 2011 S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 1 / 18

S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 2 / 18

We want to imitate S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 3 / 18

The system we have S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 4 / 18

Features of combat flight Versatility. Accuracy. Randomness. Multi-agency. Concurrency. S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 5 / 18

Features of combat flight Versatility. Accuracy. Randomness. Multi-agency. Concurrency. S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 5 / 18

Features of combat flight Versatility. Accuracy. Randomness. Multi-agency. Concurrency. S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 5 / 18

Features of combat flight Versatility. Accuracy. Randomness. Multi-agency. Concurrency. S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 5 / 18

Features of combat flight Versatility. Accuracy. Randomness. Multi-agency. Concurrency. S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 5 / 18

RL view Single reward: ±1 State space: Coordinates X, Y, Z, Velocity V X, V Y, V Z Differentials X, Y, Z, Velocity V X, V Y, V Z Attack angle, speed, fuel, sun location... Missiles/canons Defensive props Episodic task: 30-60 seconds per fight S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 6 / 18

RL view Single reward: ±1 State space: Coordinates X, Y, Z, Velocity V X, V Y, V Z Differentials X, Y, Z, Velocity V X, V Y, V Z Attack angle, speed, fuel, sun location... Missiles/canons Defensive props Episodic task: 30-60 seconds per fight S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 6 / 18

RL view Single reward: ±1 State space: Coordinates X, Y, Z, Velocity V X, V Y, V Z Differentials X, Y, Z, Velocity V X, V Y, V Z Attack angle, speed, fuel, sun location... Missiles/canons Defensive props Episodic task: 30-60 seconds per fight S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 6 / 18

Agents Pilots are good! Very fast responses Established flight methodology Principle: always better to be behind the enemy A bit like chess S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 7 / 18

Agents Pilots are good! Very fast responses Established flight methodology Principle: always better to be behind the enemy A bit like chess S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 7 / 18

Agents Pilots are good! Very fast responses Established flight methodology Principle: always better to be behind the enemy A bit like chess S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 7 / 18

Agents Pilots are good! Very fast responses Established flight methodology Principle: always better to be behind the enemy A bit like chess S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 7 / 18

Agents Pilots are good! Very fast responses Established flight methodology Principle: always better to be behind the enemy A bit like chess S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 7 / 18

RL: A Naive Approach Train agent against itself Train agent against a human player Temporal credit assignment problem Never win against even a moderately good player S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 8 / 18

RL: A Naive Approach Train agent against itself Train agent against a human player Temporal credit assignment problem Never win against even a moderately good player S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 8 / 18

RL: A Naive Approach Train agent against itself Train agent against a human player Temporal credit assignment problem Never win against even a moderately good player S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 8 / 18

An AI approach Just hack the damn thing Very hard to get good at the game Pilots training is 3-5 years Many maneuvers to consider S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 9 / 18

An AI approach Just hack the damn thing Very hard to get good at the game Pilots training is 3-5 years Many maneuvers to consider S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 9 / 18

An AI approach Just hack the damn thing Very hard to get good at the game Pilots training is 3-5 years Many maneuvers to consider S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 9 / 18

An AI approach Just hack the damn thing Very hard to get good at the game Pilots training is 3-5 years Many maneuvers to consider S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 9 / 18

A sample maneuver S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 10 / 18

Our approach: Plural Reinforcement Learning We can learn from many sources: Simulation Viewing good" traces of human VS human Annotated or not Human VS agent Ask pilot what to do? Ask pilot what went wrong S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 11 / 18

Our approach: Plural Reinforcement Learning We can learn from many sources: Simulation Viewing good" traces of human VS human Annotated or not Human VS agent Ask pilot what to do? Ask pilot what went wrong S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 11 / 18

Our approach: Plural Reinforcement Learning We can learn from many sources: Simulation Viewing good" traces of human VS human Annotated or not Human VS agent Ask pilot what to do? Ask pilot what went wrong S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 11 / 18

Imitation: Step 1 Learn sequence of actions = options = macro-actions from traces Call them maneuvers Tried unsupervised learning on traces failed. Flight is too versatile. Got verbal description from expert: Expert inaccurate Lots of free parameters Articulating maneuvers is hard Parameterized maneuvers. S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 12 / 18

Imitation: Step 1 Learn sequence of actions = options = macro-actions from traces Call them maneuvers Tried unsupervised learning on traces failed. Flight is too versatile. Got verbal description from expert: Expert inaccurate Lots of free parameters Articulating maneuvers is hard Parameterized maneuvers. S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 12 / 18

Imitation: Step 1 Learn sequence of actions = options = macro-actions from traces Call them maneuvers Tried unsupervised learning on traces failed. Flight is too versatile. Got verbal description from expert: Expert inaccurate Lots of free parameters Articulating maneuvers is hard Parameterized maneuvers. S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 12 / 18

Imitation: Step 1 Learn sequence of actions = options = macro-actions from traces Call them maneuvers Tried unsupervised learning on traces failed. Flight is too versatile. Got verbal description from expert: Expert inaccurate Lots of free parameters Articulating maneuvers is hard Parameterized maneuvers. S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 12 / 18

Value function approach Illicit value from expert. Full advantage Full disadvantage Equality... S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 13 / 18

Value function approach Illicit value from expert. Full advantage Full disadvantage Equality... S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 13 / 18

Value function elicitation Greedy over expert value function trembling smoothing Features for linear approximation unnatural Obtaining more constraints from traces Value function approach does not seem to cut it S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 14 / 18

Value function elicitation Greedy over expert value function trembling smoothing Features for linear approximation unnatural Obtaining more constraints from traces Value function approach does not seem to cut it S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 14 / 18

Value function elicitation Greedy over expert value function trembling smoothing Features for linear approximation unnatural Obtaining more constraints from traces Value function approach does not seem to cut it S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 14 / 18

Policy gradients Policy is complex Parametrization is tricky Had some success with isolated maneuvers Pilots do not use a pre-specified policy S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 15 / 18

Policy gradients Policy is complex Parametrization is tricky Had some success with isolated maneuvers Pilots do not use a pre-specified policy S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 15 / 18

Reward shaping Reward extremely rare Use expert for reward signals Hard to find a consistent reward function Randomization Risk aversion S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 16 / 18

Reward shaping Reward extremely rare Use expert for reward signals Hard to find a consistent reward function Randomization Risk aversion S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 16 / 18

Reward shaping Reward extremely rare Use expert for reward signals Hard to find a consistent reward function Randomization Risk aversion S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 16 / 18

Using traces for better features Main challenge: when to start which maneuver and with which params. How to combine all forms of input from pilots? S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 17 / 18

Challenges How to use rare feedback from expert? Stop and ask Initiating and defining maneuvers Better use of traces Reward shaping/learning maneuvers S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 18 / 18

Challenges How to use rare feedback from expert? Stop and ask Initiating and defining maneuvers Better use of traces Reward shaping/learning maneuvers S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 18 / 18

Challenges How to use rare feedback from expert? Stop and ask Initiating and defining maneuvers Better use of traces Reward shaping/learning maneuvers S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 18 / 18

Challenges How to use rare feedback from expert? Stop and ask Initiating and defining maneuvers Better use of traces Reward shaping/learning maneuvers S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 18 / 18

Challenges How to use rare feedback from expert? Stop and ask Initiating and defining maneuvers Better use of traces Reward shaping/learning maneuvers S. Mannor (Technion) Learning to Imitate a Fighter Pilot ICML July 2011 18 / 18