Self-Driving Vehicles That (Fore) See

Self-Driving Vehicles That (Fore) See Dariu M. Gavrila Intelligent Vehicles, TU Delft Berlin Tech.AD, March 5, 2018

Personal Introduction: Dariu M. Gavrila Born in Cluj (Romania) 1990 Doktoraal Degree in Computer Science at Vrij Universiteit (Amsterdam, NL) 1996 Ph.D. in Computer Science at Univ. of Maryland (College Park, USA) 1997 2016 Daimler R&D (Ulm, DE): Senior Principal Distinguished Scientist 2005 now part-time Professor Intelligent Perception System at Univ. Amsterdam, NL 2016 - now Professor Intelligent Vehicles at TU Delft 2013-2014: Market introduction PRE-SAFE brake with stereo vision-based pedestrian recognition in Mercedes-Benz S-, E- and C- Class 2

LiDAR 3D Data 6

Scenario Focus Phoenix, USA Delft, NL Highway relatively easy Urban relatively hard really hard

Anticipation: Prediction of Road User Motion/Behavior Road User Model A more accurate prediction of future motion of other road users facilitates a more adaptive driving style that is safe, comfortable, yet time efficient. This enhances social acceptance.?

Accurate Motion Models Capture Uncertainty Well Uncertainty is a fact of life At given prediction horizon: Uncertainty increases with a larger prediction horizon Correct but very unspecific Very specific but likely incorrect Specific and likely correct Bad Bad Good Performance metric: Likelihood of future actual position under predictive distribution

Research Line: Accurate Prediction of Road User Motion (Behavior) Sensor Data 3D Spatial Environment Semantic Scene (incl. Road User Location and Object Class) Road User Motion Cues Road User Motion Model Predicted Road User Motion

Semantic segmentation with Pyramid Scene Parsing Net Road Sidewalk Car Bicycle Motorcycle Pedestrian Rider Traffic Light Traffic Sign Pole Vegetation Building 0.5 fps on1216 x1936 images. Trained on Cityscapes. Zhao, Hengshuang, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. "Pyramid Scene Parsing Network." arxiv:1612.01105 (2016). 11

3D Object Detection 12

State Estimation & Path Prediction: Track Features Current production vehicles only consider pedestrian point kinematics Difficult? Indeed, pedestrian crossing might be only detected when already underway N. Schneider and D.M. Gavrila. Pedestrian Path Prediction with Recursive Bayesian Filters: A Comparative Study. Proc. German Conf. on Patt. Recog 2013

State Estimation & Path Prediction: Track + Image Features Point kinematics can be augmented with the image motion of the detected object Augmented visual features allow machine learning algorithms to predict pedestrian crossing ~200 ms earlier C. G. Keller and D. M. Gavrila. Will the Pedestrian Cross? A Study on Pedestrian Path Prediction. IEEE Trans. on Intell. Transp. Syst., 2014

Intent Recognition: Will the Pedestrian(s) Cross? A human driver relies heavily on context cues to anticipate how a traffic situation will evolve.

Intent Recognition: Will the Pedestrian(s) Cross? Has seen vehicle? On collision course? At curb? Bayesian Network Various context cues influence the pedestrian motion J. F. P. Kooij, N. Schneider, F. Flohr and D. M. Gavrila. Context-based pedestrian path prediction. Proc. European Conf. on Computer Vision 2014.

Estimation of Pedestrian Torso & Head Orientation Head and torso estimation takes into account physical constraints For clarity, results are only shown for one pedestrian at a time F. Flohr, M. Dumitru-Guzu, J. Kooij and D. Gavrila. Joint probabilistic pedestrian head and body orientation estimation. Intelligent Vehicles 2014

Intent Recognition: Will the Pedestrian(s) Cross?

Scenario: pedestrian sees vehicle and stops (two snapshots of a run) J. P. F. Kooij, N. Schneider, F. Flohr and D. M. Gavrila. Context-based pedestrian path prediction. Proc. European Conference on Computer Vision 2014. 19

Pedestrian Intent Recognition: Live Demo (Crossing Case) High tone: old warning, based on pedestrian point kinematics only Low tone: new warning, based on context-based pedestrian motion modeling New warning comes 1 second earlier!

Pedestrian Intent Recognition: Live Demo (Stopping Case) High tone: state-of-the-art warning, based on pedestrian point kinematics and current estimate of pedestrian position. Low tone: new warning, based on context-based pedestrian motion modeling and prediction of pedestrian position (1s ahead) No false alarm!

Automatic Braking vs. Evasion Only 300 ms from first sight of pedestrian to initiation of vehicle maneuver (braking or evasion) Automated system outperforms attentive human driver C. Keller, T. Dang, A. Joos, C. Rabe, H. Fritz, and D.M. Gavrila. Active Pedestrian Safety by Automatic Braking and Evasive Steering, IEEE Trans. on Intelligent Transportation Systems, 2011 22

Intent Recognition: Will the Cyclist Turn? At intersection? On collision course? Hand gesture? J.F.P. Kooij, F. Flohr, E.A.I. Pool and D.M. Gavrila. Context-based Path Prediction for Targets with Switching Dynamics. Submitted Int. J. Comp. Vision, 2018

Cyclist Motion Analysis from Track Data Annotated Data from Tsinghua- Daimler Cyclist Dataset Track Alignment Based on Road Geometry Unsupervised Motion Model Learning Prior knowledge of road topology improves cyclist path prediction (up 20% increase in positional accuracy at sharp turns) E.A.I. Pool, J.F.P. Kooij and D.M. Gavrila. Using Road Topology to Improve Cyclist Path Prediction. Proc. of the IEEE Intelligent Vehicles Conf. 2017 24

Motion Cues: Vulnerable Road Users (implicit vs. explicit) Kinematics Head Orientation / Gaze Body Orientation Pose Articulated Body Age Gender Clothing Objects Attributes Interaction with other Road Users (esp. ego-vehicle) Infrastructure Context

Motion Models for Vulnerable Road User Path Prediction Short-term prediction (0.5-2 s ahead) Long-term prediction (5-30 s ahead) Dynamical Systems (LDS, SLDS, Bayesian Networks) Non-Lin. Regression (RNNs/LSTMs) Trajectory Learning Goal-Directed Planning (Markov Decision Processes) Schneider2013gcpr, Keller2014tits, Kooij2014eccv, Völz2016itsc Bhattacharyya2017arXiv Ellis2009vs, Keller2014tits, Bera2016icra Kitani2012eccv Karasev2016icra, 26

Research Line: Accurate Prediction of Road User Motion (Behavior) Sensor Data 3D Spatial Environment Semantic Scene (incl. Road User Location and Object Class) How to improve semantic scene analysis? Road User Motion Cues Road User Motion Model What motion cues and models to use? Handcrafted or learned? Simple or complex motion models? What geographical support? Deep Learning for motion prediction, path planning and vehicle control Predicted Road User Motion Planned Vehicle Path Vehicle Control How to scale up to large number of scenarios and Big Data? (cf. unsupervised learning)

Experimental Validation Simulation TU Munich 4A Engineering Test Track with Dummy Field Lab Real Traffic

3Sat Nano (IAA Special, 15-09-2017) 29

Thank You Questions? 30