CS 4649/7649 Robot Intelligence: Planning Partially Observable MDP Sungmoon Joo School of Interactive Computing College of Computing Georgia Institute of Technology S. Joo (sungmoon.joo@cc.gatech.edu) 1 Some slides adapted from Dr. Mike Stilman s lecture slides Administrative Three lectures left - Nov. 25 th : POMDP and Summary of Planning under Uncertainties - Dec. 2 nd : Extension of Planning/Control: Language, Hybrid System - Dec. 4 th : Wrap up Due Reminder: - Project report: Due Dec. 4 th - Project report review: Due Dec. 11 th - Project presentation & presentation evaluation: Dec. 11 th S. Joo (sungmoon.joo@cc.gatech.edu) 2 1
Reality Two Sources of Error Sensing & State Estimation Uncertainty Sensors have noise You don t know exactly what the state is (e.g. mapping, localization, ) Action Execution Uncertainty Your actuators do not do what you tell them to The system responds differently than you expect : Friction gears, air resistance, etc. 10/21/2014 S. Joo (sungmoon.joo@cc.gatech.edu) 3 Reality Estimated state (uncertainty) (uncertainty) S. Joo (sungmoon.joo@cc.gatech.edu) 4 2
Reality State Estimation Estimated state (uncertainty) MDP (uncertainty) Plan, (Control) Policy S. Joo (sungmoon.joo@cc.gatech.edu) 5 Reality POMDP Estimated state (uncertainty) MDP (uncertainty) S. Joo (sungmoon.joo@cc.gatech.edu) 6 3
Markov Decision Process (MDP) Mathematical Frameworks Action Uncertainty Observation Uncertainty Markov Chain -Markov Property Hidden Markov Model (HMM) Markov Decision Process (MDP) Partially Observable MDP (POMDP) Both S. Joo (sungmoon.joo@cc.gatech.edu) 7 POMDP MDP Uncertainty about action outcome Uncertainty about the state due to imperfect observation Don t get to observe the state itself, instead get sensory measurements S. Joo (sungmoon.joo@cc.gatech.edu) 8 4
State Estimation Belief State (ex. Kalman Filter) S. Joo (sungmoon.joo@cc.gatech.edu) 9 Belief States: Example Kalman Filter: Gaussian(Mean & Covariance) Continuous Belief States S. Joo (sungmoon.joo@cc.gatech.edu) 10 5
POMDP Belief state (uncertainty) (uncertainty) S. Joo (sungmoon.joo@cc.gatech.edu) 11 POMDP MDP o ) ) MDP S. Joo (sungmoon.joo@cc.gatech.edu) 12 6
POMDP Probability distributions over states of the underlying MDP (i.e. belief state) The agent keeps an internal belief state, b, that summarizes its experience(observation & control input history). The agent uses a state estimator, SE, for updating the belief state b based on the last action a t-1, the current observation o at t, and the previous belief state b at t-1. S. Joo (sungmoon.joo@cc.gatech.edu) 13 Converting POMDP to Belief-States MDP Current belief distribution Current observation Previous action S. Joo (sungmoon.joo@cc.gatech.edu) 14 7
Converting POMDP to Belief-States MDP s 2 s 2 How? s 2 s 1 s 1 s 2 Normalizing Factor S. Joo (sungmoon.joo@cc.gatech.edu) 15 Converting POMDP to Belief-States MDP s 2 s 2 s 2 s 2 s 2 s s s 1 2 1 s 1 s 2 s 1 s 1 s 2 S. Joo (sungmoon.joo@cc.gatech.edu) 16 8
Total Probability If {B n : n = 1,2,3 } is a finite or countably infinite partition of a sample space, and each event B n is measurable, then for any event A of the same probability space, the following holds The T.P. can also be stated for conditional probabilities. Taking the B n as above, and assuming C is an event independent with any of the B n S. Joo (sungmoon.joo@cc.gatech.edu) 17 Converting POMDP to Belief-States MDP s 2 s 2 s 2 s 2 s 2 s 2 s 1 s 1 s 1 s 2 s 1 s 1 s 2 s 1 s 1 s 2 S. Joo (sungmoon.joo@cc.gatech.edu) 18 9
Converting POMDP to Belief-States MDP s 2 s 2 s 2 s 1 s 1 s 2 State Estimation S. Joo (sungmoon.joo@cc.gatech.edu) 19 POMDP to MDP Action update S. Joo (sungmoon.joo@cc.gatech.edu) 20 10
POMDP to MDP Observation update S. Joo (sungmoon.joo@cc.gatech.edu) 21 POMDP Example S. Joo (sungmoon.joo@cc.gatech.edu) 22 11
POMDP Example S. Joo (sungmoon.joo@cc.gatech.edu) 23 POMDP Example = = S. Joo (sungmoon.joo@cc.gatech.edu) 24 12
POMDP Example (conditioned on A1 and O2) S. Joo (sungmoon.joo@cc.gatech.edu) 25 POMDP Example S. Joo (sungmoon.joo@cc.gatech.edu) 26 13
POMDP Example S. Joo (sungmoon.joo@cc.gatech.edu) 27 POMDP Example T.P. S. Joo (sungmoon.joo@cc.gatech.edu) 28 14
POMDP Example S. Joo (sungmoon.joo@cc.gatech.edu) 29 POMDP Example S. Joo (sungmoon.joo@cc.gatech.edu) 30 15
How to Solve Belief-State MDP? S. Joo (sungmoon.joo@cc.gatech.edu) 31 Solving a POMDP S. Joo (sungmoon.joo@cc.gatech.edu) 32 16
Solving a POMDP S. Joo (sungmoon.joo@cc.gatech.edu) 33 Solving a POMDP: Step1 S. Joo (sungmoon.joo@cc.gatech.edu) 34 17
Solving a POMDP: Step1 S. Joo (sungmoon.joo@cc.gatech.edu) 35 Solving a POMDP: Step1 S. Joo (sungmoon.joo@cc.gatech.edu) 36 18
Solving a POMDP: Step2 S. Joo (sungmoon.joo@cc.gatech.edu) 37 Solving a POMDP: Step2 S. Joo (sungmoon.joo@cc.gatech.edu) 38 19
Solving a POMDP: Step2 S. Joo (sungmoon.joo@cc.gatech.edu) 39 Solving a POMDP: Step2 S. Joo (sungmoon.joo@cc.gatech.edu) 40 20
Solving a POMDP: Step2 S. Joo (sungmoon.joo@cc.gatech.edu) 41 POMDP in Higher Dimensions: Hyperplanes S. Joo (sungmoon.joo@cc.gatech.edu) 42 21
POMDP Summary Complex but Powerful technique - State explodes upon conversion to MDP - State becomes difficult to understand upon conversion to MDP - Unique cohesive method that trades off: : Value of ascertaining state : Value of pursuing a goal Exist more efficient algorithms: - Witness Algorithm (Littman 94) - Policy Iteration (Sondik, Hansen 97) Typically complexity is still prohibitive for large problems S. Joo (sungmoon.joo@cc.gatech.edu) 43 POMDP Summary Canonical solution method 1 Covered today - Run value iteration, but now the state space is the space of probability distributions :value and optimal action for every possible probability distribution :will automatically trade off information gathering actions versus actions that affect the underlying state Canonical solution method 2 Finite-horizon/MPC-style - Search over sequences of actions with limited look-ahead - Branching over actions and observations Canonical solution method 3 LQG-style - Plan in the MDP - Run probabilistic inference (filtering) to track probability distribution - Choose optimal action for MDP for what is currently the most likely state S. Joo (sungmoon.joo@cc.gatech.edu) 44 22
Active Monocular SLAM Example Robot s trajectory matters! Control objective Probing Obstacle Goal Trade-off : Control Objective vs Probing Dual Control Trade-off S. Joo (sungmoon.joo@cc.gatech.edu) 45 Active Monocular SLAM Example Scenario S. Joo (sungmoon.joo@cc.gatech.edu) 46 23
Active Monocular SLAM Example Sungmoon Joo, SLAM-based nonlinear optimal control approach to robot navigation with limited resources S. Joo (sungmoon.joo@cc.gatech.edu) 47 Active Monocular SLAM Example S. Joo (sungmoon.joo@cc.gatech.edu) 48 24