Autonomous blimp control with reinforcement learning

Similar documents
NEURAL NETWORKS BASED TYRE IDENTIFICATION FOR A TYRE INFLATOR OPERATIONS

The learning of complex whole body activity (Downhill skiing) by simulation

DESIGN AND CONSTRUCTION OF A FUEL LEVEL INDICATOR WITH LOW LEVEL ALARM SYSTEM

Wind Tunnel Instrumentation System

Fuzzy Control of a Robotic Blimp

DESIGN AND FABRICATION OF FISH FEED PELLETIZING MACHINE AGADA, DAVID ACHILE MATRIC NO. 2004/18368EA

Motion Control of a Bipedal Walking Robot

Computer Integrated Manufacturing (PLTW) TEKS/LINKS Student Objectives One Credit

THE ROLE OF THE LEARNING ORGANISATION PARADIGM IN IMPROVING INTELLECTUAL CAPITAL JACO JOHANNES PIENAAR

NSW Mines Rescue Unmanned Aerial Vehicle (UAV)

UDEVI, OBIAMAKA ANGELA PG/MA/11/58402

EXPLORING MOTIVATION AND TOURIST TYPOLOGY: THE CASE OF KOREAN GOLF TOURISTS TRAVELLING IN THE ASIA PACIFIC. Jae Hak Kim

Mars UAV MAE 435. October 15 th, 2014

In memory of Dr. Kevin P. Granata, my graduate advisor, who was killed protecting others on the morning of April 16, 2007.

Ultrasonic Level Transmitter

Algorithm for Line Follower Robots to Follow Critical Paths with Minimum Number of Sensors

Rescue Rover. Robotics Unit Lesson 1. Overview

TELECOMMUNICATIONS DEMAND IN THEORY AND PRACTICE

Rules and Discipline of the Playing Field of the Smart Car Race

Open Research Online The Open University s repository of research publications and other research outputs

CS 4649/7649 Robot Intelligence: Planning

Load Charts Annotated Instructor s Guide. Module MODULE OVERVIEW PREREQUISITES OBJECTIVES PERFORMANCE TASKS MATERIALS AND EQUIPMENT LIST

ZSTT Team Description Paper for Humanoid size League of Robocup 2017


The Cooperative Cleaners Case Study: Modelling and Analysis in Real-Time ABS

University of Canberra. This thesis is available in print format from the University of Canberra Library.

The Effect of a Seven Week Exercise Program on Golf Swing Performance and Musculoskeletal Screening Scores

Yokogawa Systems and PCI Training

An experimental validation of a robust controller on the VAIMOS autonomous sailboat. Fabrice LE BARS

THE CANDU 9 DISTRffiUTED CONTROL SYSTEM DESIGN PROCESS

Distributed Control Systems

Flight Dynamics II (Stability) Prof. Nandan Kumar Sinha Department of Aerospace Engineering Indian Institute of Technology, Madras

EVOLVING HEXAPOD GAITS USING A CYCLIC GENETIC ALGORITHM

REQUEST FOR PROPOSAL FOR ENGINEERING DESIGN. Robo Rat. Automated rope climbing robot

Keywords--Bio-Robots, Walking Robots, Locomotion and Stability Controlled Gait.

Design of Altitude Measurement System for Aerospace Applications (Digital Barometric Altimeter)

ZIN Technologies PHi Engineering Support. PHi-RPT CFD Analysis of Large Bubble Mixing. June 26, 2006

A new AI benchmark. Soccer without Reason Computer Vision and Control for Soccer Playing Robots. Dr. Raul Rojas

PropaGator Autonomous Surface Vehicle

UvA-DARE (Digital Academic Repository) Intelligent Robotics Lab Verschoor, C.R.; de Kok, P.M.; Visser, A. Link to publication

Application of Dijkstra s Algorithm in the Evacuation System Utilizing Exit Signs

ACCIDENT ANALYSIS OF SOUTHERN EXPRESSWAY

YAN GU. Assistant Professor, University of Massachusetts Lowell. Frederick N. Andrews Fellowship, Graduate School, Purdue University ( )

Courseware Sample F0

Automating Injection Molding Simulation using Autonomous Optimization

Signature redacted Signature of Author:... Department of Mechanical Engineering

2012 NRC - RULES AND REGULATIONS

Evaluating chaff fire pattern algorithms in a simulation environment. JP du Plessis Institute for Maritime Technology South Africa

Policy Gradient RL to learn fast walk

STOCK RETURN VOLATILITY AND COINTEGRATION OF U.S. AND ASIAN MARKETS IN ACCORDANCE WITH THE FINANCIAL CRISIS ( ) THESIS

Intelligent Decision Making Framework for Ship Collision Avoidance based on COLREGs

Design of a Microcontroller-Based Pitch Angle Controller for a Wind Powered Generator

UNIVERSITY OF WATERLOO

Monitoring of wheel dressing operations for precision grinding

Some Estimation Methods for Dynamic Panel Data Models

PASSENGER SHIP SAFETY. Preliminary recommendations arising from the Costa Concordia marine casualty investigation. Submitted by Italy SUMMARY

Online Companion to Using Simulation to Help Manage the Pace of Play in Golf

NATIONAL INSTRUMENTS AUTONOMOUS ROBOTICS COMPETITION Task and Rules Document

Electromyography Study on Lower Limb Muscle Synchronizations Strategies during Walking and Sitto-Stand Tasks on High-Heeled Shoes

Kenzo Nonami Ranjit Kumar Barai Addie Irawan Mohd Razali Daud. Hydraulically Actuated Hexapod Robots. Design, Implementation. and Control.

Computer Simulation Programms in Mine Rescue Education and Training, on the Example of Student Mine Rescue Teams

APPLYING VIRTUAL REALITY TO ONTARIO MINE RESCUE OPERATIONS

Written Report of Robot Design

FAI Sporting Code. UAV Class U. Section 12 Unmanned Aerial Vehicles Edition. Effective 1st January 2018

IEEE RAS Micro/Nano Robotics & Automation (MNRA) Technical Committee Mobile Microrobotics Challenge 2016

Implementation of Modern Traffic Light Control System

Field Instrumentation Training Strategy

STALLING BEHAVIOUR OF A CONTRA-ROTATING AXIAL COMPRESSOR STAGE

Kungl Tekniska Högskolan

Centre for Autonomous Systems

The Bubble Dynamics and Pressure Field Generated by a Seismic Airgun

Unmanned Aerial Vehicle Failure Modes Algorithm Modeling

Volume 2, Issue 5, May- 2015, Impact Factor: Structural Analysis of Formula One Racing Car

LOCOMOTION CONTROL CYCLES ADAPTED FOR DISABILITIES IN HEXAPOD ROBOTS

Real Time Bicycle Simulation Study of Bicyclists Behaviors and their Implication on Safety

GOLFER. The Golf Putting Robot

The Incremental Evolution of Gaits for Hexapod Robots

CS 4649/7649 Robot Intelligence: Planning

AC : MEASUREMENT OF HYDROGEN IN HELIUM FLOW

COMPARISON OF RESULTS OF AEROBIC POWER VALUE DERIVED FROM DIFFERENT MAXIMUM OXYGEN CONSUMPTION TESTING METHODS

ME 8843-Advanced Mechatronics. Project Proposal-Automatic Bike Transmission

Does a particular sporting background provide an advantage to an athlete entering the sport of triathlon?

Naval Postgraduate School, Operational Oceanography and Meteorology. Since inputs from UDAS are continuously used in projects at the Naval

Karachi Koalas 3D Simulation Soccer Team Team Description Paper for World RoboCup 2013

GUIDELINES ON OPERATIONAL INFORMATION FOR MASTERS IN CASE OF FLOODING FOR PASSENGER SHIPS CONSTRUCTED BEFORE 1 JANUARY 2014 *

FixedWingLib CGF. Realistic CGF Aircraft Entities ware-in-the-loop Simulations

Design of the Miniaturized ROV Structure and Control System

Virtual Breadboarding. John Vangelov Ford Motor Company

COSCAP-South Asia ADVISORY CIRCULAR FOR AIR OPERATORS

Wind turbine Varying blade length with wind speed

Neural Network in Computer Vision for RoboCup Middle Size League

Robot motion by simultaneously wheel and leg propulsion

Re: ENSC 440 Functional Specification for the Wall Climbing Robot. Dear Mr. Leung,

ALFA Task 2 Deliverable M2.2.1: Underwater Vehicle Station Keeping Results

Numerical Analysis of Wings for UAV based on High-Lift Airfoils

The role of UKCM now and the future

PEAPOD. Pneumatically Energized Auto-throttled Pump Operated for a Developmental Upperstage. Test Readiness Review

ROV Development ROV Function. ROV Crew Navigation IRATECH SUB SYSTEMS 2010

Hydrodynamic analysis of submersible robot

DEVELOPMENT OF AUTONOMOUS BLIMP ROBOT WITH INTELLIGENT CONTROL

Transcription:

University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 2009 Autonomous blimp control with reinforcement learning Yiwei Liu University of Wollongong Recommended Citation Liu, Yiwei, Autonomous blimp control with reinforcement learning, Master of Engineering thesis, School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, 2009. http://ro.uow.edu.au/theses/3116 Research Online is the open access institutional repository for the University of Wollongong. For further information contact the UOW Library: research-pubs@uow.edu.au

Autonomous Blimp Control with Reinforcement Learning A thesis submitted in fulfilment of the requirements for the award of the degree Master of Engineering by Research from University of Wollongong by Yiwei Liu School of Electrical, Computer and Telecommunications Engineering August 2009

Dedicated to my parents...

Acknowledgements It is a great pleasure to be able to show my faithful thanks to many people whom I am indebted to for the their support during the progression of my thesis. First and foremost, I wish to express my utmost gratitude to my principal supervisor, Dr. Zengxi Pan of the University of Wollongong (UoW), for guiding me to pursue postgraduate studies at the University of Wollongong and the support given throughout the study period in many ways. Your dedication, patience, knowledge and experience could not have been surpassed. I admire your guidance towards me academically and personally over last few years. I would like to thank my co-supervisor, Senior Lecturer David Stirling of the UoW, for his insightful technical contributions and helpful attitude. I would also like to offer many appreciations to my co-supervisor Professor and Head of School Fazel Naghdy of the UoW for his academic guidance of my research work and the support on my usual university affairs. Very special thanks go to my friend Matthew who also worked and studied in the center for Intelligent mechatronic research, for being generously supportive especially during hard times along the research way. My heartiest gratitude goes to my parents Yulin Liu and Xiulan Wei for all encouragements, guidance and sacrifices made on behalf of me to come this far. Finally, my thanks go to the rest of my colleague and friends Kai, Prabodha, and Nishad for being supportive in many ways. iii

Certification I, Yiwei Liu, declare that this thesis, submitted in fulfilment of the requirements for the award of Master of Engineering,in the School of Electrical, Computer and Telecommunications Engineering,University of Wollongong, is entirely my own work unless otherwise referenced or acknowledged.this manuscript has not been submitted for qualifications at any other academic institute. Yiwei Liu Date: 27 August 2009 iv

Abstract Blimps are a special type of airship without rigid structure on the body. Most existing blimps are manually operated by a pilot directly or through radio control. One of the most famous examples is the Goodyear Blimp used for commercial advertising. With the fast development of microcontroller and electronic technologies, autonomous blimps have recently attracted great research interest as a platform to access dangerous or difficulty-to-access environment in applications such as disaster exploration and rescue, security surveillance in public events and climate monitoring, etc.. This thesis investigates the problem of learning an optimal control policy for blimp autonomous navigation in a rescue task, and presents a new approach for navigation control of an autonomous blimp using an intelligent reinforcement learning algorithm. Compared to the traditional model based control methods, this control strategy does not require a dynamic model of the blimp, which provides significant advantage in many practical situations where the blimp system model is either hard to acquire or too complicated to apply. The blimp in this research is used as a prototype for the UAV Outback Challenge organized by Australian Research Centre for Aerospace Automation (ARCAA). The Challenge requires the UAV to fly autonomously to a designated area and rescue the dummy, named Jack. The objective of this research is to develop a control system, which could autonomously adjust the blimp heading direction to the rescue target. As the blimp is required to obtain a range of pilot skills through the learning and reinforcement mechanism during actual navigation trials it can automatically account for the environmental changes during the navigation tasks. v

vi The basic hardware structure and devices of the blimp control system were preliminarily developed. The developed controller does not require a dynamic model of the blimp, but however, is adaptive to the changes of the surrounding environment. The simulation data generated from a Webots Robotics Simulator (WRS) demonstrate satisfactory results for planar steering motion control. The Matlab was used to analyse the simulation data produced by WRS. Within the simulation environment, the blimp used the Q-learning method was successfully tested in the single target and continuous target tasks subjected to various environmental disturbance. The different learning parameters and initial conditions are also tested to acquire better solutions of blimp autonomous steering motions. Reinforcement learning within blimp control in this research is shown to be a promising and effective solution for autonomous navigation tasks.

Table of Contents 1 Introduction 1 1.1 Background................................ 2 1.2 Thesis objective.............................. 4 1.2.1 Developing an intelligent navigation control system for autonomous blimp................................ 4 1.2.2 Examination of intelligent control algorithm with machine learning.................................. 5 1.2.3 Discussion of parameters for reinforcement learning....... 5 1.2.4 Summary of the contributions................... 6 1.3 Thesis outline............................... 6 2 Literature Review 8 2.1 Blimp................................... 8 2.1.1 History of blimp......................... 10 2.1.2 Research and applications of autonomous blimps........ 11 2.2 Autonomous control of a blimp..................... 17 2.3 Reinforcement learning.......................... 19 2.4 Remaining research difficulties in blimp control............ 22 2.5 Research direction............................ 23 2.6 Conclusion................................. 25 3 Hardware Design and Simulation Environment 26 3.1 System structure............................. 26 3.2 Hardware design............................. 29 3.2.1 Ground unit............................ 31 3.2.2 On board unit.......................... 32 3.3 Functions of microcontroller and host computer in learning system.. 39 3.4 Simulation software........................... 43 3.5 Chapter summary............................. 45 4 Methodology 47 4.1 Reinforcement learning.......................... 47 4.1.1 Elements of reinforcement learning............... 49 4.1.2 Dynamic programming methods................. 51 4.1.3 Monte Carlo Methods...................... 53 4.1.4 Temporal-Difference methods.................. 54 4.2 Q-learning: off-policy TD control algorithm.............. 56 4.3 Autonomous blimp control using reinforcement learning........ 57 4.3.1 Task analysing.......................... 57 4.3.2 Navigation coordinator conversion................ 58 4.3.3 Structure and definition of reinforcement learning algorithm. 59 vii

4.3.4 Basic elements and equations of the Q-learning........ 61 4.3.5 Programming flow chart of Q-learning in Webots....... 64 4.4 Chapter summary............................. 65 5 Simulation Results and Preliminary Discussion 72 5.1 Webots simulation results........................ 72 5.1.1 Webots simulation environment................. 72 5.1.2 Webots simulation setup..................... 75 5.1.3 Simulation of navigation tasks.................. 76 5.2 Analysis of simulation results in Matlab................ 79 5.2.1 Basic results............................ 79 5.2.2 Effect of initial target position in Q-learning.......... 90 5.2.3 Exploration of the Q table................... 96 5.2.4 Turning performance with disturbance............. 99 5.2.5 Continuous learning tasks..................... 105 5.3 Chapter summary............................. 108 6 Further Discussion 112 6.1 Exploration vs. exploitation in the Q-learning............. 112 6.2 Effect of different parameters in Q-learning............... 117 6.3 Control policy of PID control and Q-learning.............. 118 6.4 Effect of Different target directions on the Q-learning performance.. 123 6.5 The learning contribution of previous experience and the immediate feed back.................................. 128 6.6 Chapter summary............................. 130 7 Summary and Future Work 132 7.1 Summary................................. 132 7.2 Achievements............................... 135 7.3 Future work................................ 136 viii

List of Figures 2.1 Good year blimp [1]............................ 9 2.2 The navigable balloon created by Giffard in 1852 [2].......... 11 2.3 First Zeppelin flight at Lake Constance [2]................ 12 2.4 Blimp control system developed by Silveira [3].............. 13 2.5 Unmanned blimp control system built by Kawai [4]........... 14 2.6 Structure of blimp flying display system [5]............... 14 2.7 Role of the airship in USAR system [6].................. 15 2.8 Role of the airship in USAR system [7].................. 16 2.9 Overview of the learning system on the blimp control.......... 24 3.1 Machine learning system overview of autonomous blimp........ 28 3.2 Blimp body for UOW evaluation prototype............... 29 3.3 Hardware component of blimp control system.............. 30 3.4 System structure of onboard unit..................... 33 3.5 Blimp measurement sensors........................ 36 3.6 Thrusters position of prototype blimp and servo and DC motors mounted on the blimp gondola........................... 38 3.7 Onboard control electronic circuit..................... 39 3.8 Standard system structure of the machine learning........... 40 3.9 Reinforcement learning system structure in this thesis......... 41 4.1 Procedural form of Q-learning....................... 57 4.2 The word framework and the blimp body coordinates.......... 59 4.3 Flow charts of generating the blimp angular speed in simulation.... 67 4.4 Flow charts of getting best actions.................... 68 4.5 Flow charts of getting environment.................... 69 4.6 Flow charts of states judgment...................... 70 4.7 Flow charts of Webots running procedure................ 71 5.1 Webots simulator environment...................... 73 5.2 Outputs of simulation data from Webots................. 74 5.3 Time counter of Webots.......................... 75 5.4 Blimp body coordinate in the virtual environment........... 76 5.5 Definition of world coordinate in the virtual world........... 77 5.6 Blimp navigation task........................... 78 5.7 Initial setting of a four quadrant target trials.............. 79 5.8 Blimp navigation paths.......................... 80 5.9 Angular difference and orientation results obtained from Matlab.... 82 5.10 Above, plan view of the blimp body referencing frame......... 83 5.11 Referencing coordinates of the virtual word............... 84 5.12 Q-learning results processed by Matlab.................. 86 5.13 Results of actions sequence........................ 88 ix

5.14 Simulation results of states change.................... 89 5.15 3D track of blimp movement in Matlab.................. 90 5.16 Turning performances in the short time learning process........ 91 5.17 Blimp action changes in the short time learning process........ 93 5.18 Turning performances in the long time learning process......... 95 5.19 Blimp state changes in long time learning process............ 96 5.20 Q-value surface plot for restricted Q-learning.............. 97 5.21 Extended, more sufficient Q-learning................... 98 5.22 Results of Q-value tables with differing amounts of learning iterations. 100 5.23 Control performances in orientation and angular difference (16, 16).. 101 5.24 Control performances as manifested in sequence of actions (16, 16).. 102 5.25 Control performances in sequence of states................ 103 5.26 Control performances in orientation and angular difference (-16, 16).. 104 5.27 Control performances in sequence of states................ 105 5.28 Other tests results of blimp control under disturbance......... 106 5.29 Angular difference and Orientation in continuous turning........ 107 5.30 Sequence of actions in continuous turning................ 109 5.31 Sequence of states in continuous turning................. 110 6.1 Backup diagram of one step Q-learning.................. 113 6.2 Different learning procedures....................... 116 6.3 The first control strategy of blimp navigation tasks........... 121 6.4 Webots programme flowchart of first strategy.............. 122 6.5 The Second control strategy of blimp navigation tasks......... 123 6.6 Quadrants of body frame......................... 124 6.7 Simulation results of angular difference and states in first scenario... 125 6.8 Simulation results of angular difference and orientation in second scenario..................................... 126 6.9 Simulation results of blimp states in second scenario.......... 127 x

List of Tables 3.1 Explanation of the evaluating variables selected to process in Matlab. 45 4.1 State-judgment table............................ 61 5.1 Variables to be evaluated in Matlab................... 81 xi