STAMP/STPA Beginner Introduction. Dr. John Thomas System Engineering Research Laboratory Massachusetts Institute of Technology

Similar documents
Systems Theoretic Process Analysis (STPA)

Systems Theoretic Process Analysis (STPA)

Basic STPA Tutorial. John Thomas

STPA Systems Theoretic Process Analysis John Thomas and Nancy Leveson. All rights reserved.

Basic STPA Exercises. Dr. John Thomas

Performing Hazard Analysis on Complex, Software- and Human-Intensive Systems

Using STPA in the Design of a new Manned Spacecraft

Introducing STAMP in Road Tunnel Safety

Missing no Interaction Using STPA for Identifying Hazardous Interactions of Automated Driving Systems

Failure Management and Fault Tolerance for Avionics and Automotive Systems

Modeling and Hazard Analysis Using Stpa

A systematic hazard analysis and management process for the concept design phase of an autonomous vessel.

HAZARD ANALYSIS PROCESS FOR AUTONOMOUS VESSELS. AUTHORS: Osiris A. Valdez Banda Aalto University, Department of Applied Mechanics (Marine Technology)

Three Approaches to Safety Engineering. Civil Aviation Nuclear Power Defense

Flight Systems Verification & Validation Mars 2020 Entry, Descent, and Landing

Go around manoeuvre How to make it safer? Capt. Bertrand de Courville

AIR CONDITIONING AND PRESSURIZATION CONTROLS AND INDICATORS

Safety-critical systems: Basic definitions

An STPA Primer. Version 1, August 2013

USER MANUAL OPERATION AND THE USE OF A CAR WITH. Diego G3 / NEVO SEQUENTIAL GAS INJECTION SYSTEM

BTW Lesson 5. Traffic Flow

SIMON Simulation of Non-Automotive Vehicle Free Rolling Response

Gas Accumulation Potential & Leak Detection when Converting to Gas

Development of a Detailed Multi-Body Computational Model of an ATV to Address and Prevent Child Injuries. Chandra K. Thorbole, Ph.D.

Lesson: Airspeed Control

OIL & GAS. MTS DP Committee. Workshop in Singapore Session 4 Day 2. Unwanted Thrust

D-Case Modeling Guide for Target System

if all agents follow RSS s interpretation then there will be zero accidents.

Unmanned Aerial Vehicle Failure Modes Algorithm Modeling

Every things under control High-Integrity Pressure Protection System (HIPPS)

Calspan Loss-of-Control Studies Using In-flight Simulation. Lou Knotts, President November 20, 2012

Flutter Testing. Wind Tunnel Testing (excerpts from Reference 1)

Collision Avoidance based on Camera and Radar Fusion. Jitendra Shah interactive Summer School 4-6 July, 2012

Safety Critical Systems

Flight Dynamics II (Stability) Prof. Nandan Kumar Sinha Department of Aerospace Engineering Indian Institute of Technology, Madras

Lecture 1 Temporal constraints: source and characterization

Lecture 04 ( ) Hazard Analysis. Systeme hoher Qualität und Sicherheit Universität Bremen WS 2015/2016

Driver Training School Instructor Curriculum Requirements for Student Learning & Performance Goals

Lane changing and merging under congested conditions in traffic simulation models

The Military CYPRES Quick Guide For Operators

GAS FUEL VALVE FORM AGV5 OM 8-03

Safety-Critical Systems

Vehicle Components. (High)/15765 (CAN) Cab Controller - Primary. PACCAR CECU3 N/A (High)/15765 (CAN) Vehicle Key Data Points

Autothrottle Use with Autopilot Off

TR Electronic Pressure Regulator. User s Manual

Analysis of Solana Beach Red Light Camera Enforcement Program By Jay Beeber, Executive Director, Safer Streets L.A., Member ITE

ENVIRONMENTAL CONTROL SYSTEM (ECS)

AFDXXX(X)AC Series Operators Manual Please read this manual thoroughly before attempting to operate your water maker.

THE SEPARATION OF FUSELAGE NOSE SECTION AND FUSELAGE MIDSECTION OF COMMERCIAL PLANE

Early evap monitors. Photo courtesy Toyota Motor Sales, U.S.A., Inc.; diagram courtesy University of Toyota

Dynamic Positioning Control Augmentation for Jack-up Vessels

Module 3 Developing Timing Plans for Efficient Intersection Operations During Moderate Traffic Volume Conditions

HOW TO PROCEED WITH TROUBLESHOOTING

CHAPTER 7: THE FEEDBACK LOOP

Surrogate UAV Approach and Landing Testing Improving Flight Test Efficiency and Safety

Burner Management System DEMO Operating instructions

Human Factors for Limited-Ability Autonomous Driving Systems

Accident Investigation and Hazard Analysis

frequently asked questions

Vehicle Components. (High)/15765 (CAN) Cab Controller - Primary. PACCAR CECU3 N/A (High)/15765 (CAN) Vehicle Key Data Points

Hazard Identification

Real-Time & Embedded Systems

Integrating Safety and Automation

Covert Texting During Simulated Driving Maneuvers Effects of Head-Up versus Head-Down Posture

ILS APPROACH WITH A320

Exemplary Conditional Automation (Level 3) Use Case Description Submitted by the Experts of OICA as input to the IWG ITS/AD

Simulation of the Hybtor Robot

ACCURATE PRESSURE MEASUREMENT FOR STEAM TURBINE PERFORMANCE TESTING

SUMMARY OF SAFETY INVESTIGATION REPORT

IFE Level 3 Diploma in Fire Science and Fire Safety

REACTOR 40 MECHANICAL Configuration Guide

Safety-Critical Systems. Rikard Land

2. USER INSTRUCTION. Table of contents: Pg.1/14 N:\FAP-2000: LWP

Fuzzy Logic in Traffic Control

Queue analysis for the toll station of the Öresund fixed link. Pontus Matstoms *

ADJUSTING TO, FOLLOWING, AND MEETING URBAN TRAFFIC

ZIPWAKE DYNAMIC TRIM CONTROL SYSTEM OUTLINE OF OPERATING PRINCIPLES BEHIND THE AUTOMATIC MOTION CONTROL FEATURES

Questions & Answers About the Operate within Operate within IROLs Standard

Lane Management System Team 1 Adam Pruim - Project Manager Curtis Notarantonio - Security/Safety Engineer Jake Heisey - Domain Expert/Customer

Fail Operational Controls for an Independent Metering Valve

FAI Sporting Code. UAV Class U. Section 12 Unmanned Aerial Vehicles Edition. Effective 1st January 2018

Tools for safety management Effectiveness of risk mitigation measures. Bernhard KOHL

Flight Profiles are designed as a guideline. Power settings are recommended and subject to change based

Fokker 50 - Air Conditioning & Pressurization

Prof. Brian C. Williams February 23 rd, /6.834 Cognitive Robotics. based on [Kim,Williams & Abramson, IJCAI01] Outline

5 Function Indicator. Outside Air Temperature (C) Outside Air Temperature (F) Pressure Altitude Density Altitude Aircraft Voltage STD TEMP SL 15000

Pedestrian Scenario Design and Performance Assessment in Driving Simulations

Safety models & accident models

The Nitrogen Threat. The simple answer to a serious problem. 1. Why nitrogen is a risky threat to our reactors? 2. Current strategies to deal with it.

Depressurization in Cruise, Aer Lingus Boeing , EI-CDD, north of Paris, December 9, 2000

Simulation Analysis of Intersection Treatments for Cycle Tracks

The Relationship Between Automation Complexity and Operator Error

Traditional Hazard Analysis

A study evaluating if targeted training for startle effect can improve pilot reactions in handling unexpected situations. DR. MICHAEL GILLEN, PH.D.

The World Leader in High Performance Signal Processing Solutions MEMS Webcast

WELCOME TO THE REVOLUTION

Montana Teen Driver Education and Training. Module 3.3. Mixing with Traffic. Montana Teen Driver Curriculum

Instruction Manual VCG-6

Purpose. Scope. Process flow OPERATING PROCEDURE 07: HAZARD LOG MANAGEMENT

Jabiru J230-SP Section 10

Transcription:

STAMP/STPA Beginner Introduction Dr. John Thomas System Engineering Research Laboratory Massachusetts Institute of Technology

Agenda Beginner Introduction What problems are we solving? How does STAMP/STPA solve those problems? Simple STAMP/STPA examples Intermediate tutorial Guided exercise: Apply STPA to a real system Research presentation Recent STPA research results 2

Mars Polar Lander During the descent to Mars, the legs were deployed at an altitude of 40 meters. Touchdown sensors (on the legs) sent a momentary signal The software responded as it was required to: by shutting down the descent engines. The vehicle free-fell and was destroyed upon hitting the surface at 50 mph. No component failed! All components performed exactly as designed! 3

Boeing 787 Lithium Battery Fires 2013 2014 Reliability analysis predicted 10 million flight hours between battery failures Two fires caused by battery failures in 52,000 flight hours Does not include 3 other lessreported incidents of smoke in battery compartment Just a simple component failure? 4

Boeing 787 Lithium Battery Fires A module monitors for smoke in the battery bay, controls fans and ducts to exhaust smoke overboard. Power unit experienced low battery voltage, shut down various electronics including ventilation. Smoke could not be redirected outside cabin All software requirements were satisfied! The requirements were inadequate 8

Schiaparelli Lander (2016) 11km: Parachute deployed 3.7km: IMU saturated Negative altitude calculated Parachute jettisoned Thrusters off Impact at 300 km/h (186 mph] Designed to withstand 10 km/h All components operated as designed! No component failure! http://spacenews.com/esa-mars-lander-crash-caused-by-1-second-inertial-measurement-error/

HITOMI Satellite (2016) Unable to detect bright stars for reference Parameters for Safe Hold Mode were incorrect Investigative subcommittee Need for approach to examine the overall design of the spacecraft JAXA We were unable to let go of our usual methods http://www.asahi.com/ajw/articles/aj201606200006.html All components operated as designed! Not a simple component failure!

Basic Control Theory Controller Process Model (beliefs) Control Actions Feedback Controlled Process Provides another way to think about accidents Forms foundation for STAMP/STPA

Problems with Software

Quote The hardest single part of building a software system is deciding precisely what to build. -- Fred Brooks, The Mythical Man-Month

Automotive recalls rapidly increasing! Image: https://hbr.org/2010/06/why-dinosaurs-will-keep-ruling-the-auto-industry/ar/1

Engine fires 2013 Ford Fusion / Escape 13 reports of engine fire Short time frame (~Sept - Dec) Owners asked to park their vehicles until further notice 99,153 brand new vehicles affected *Images from: https://autos.yahoo.com/blogs/motoramic/ford-tells-89-000-escape-fusion-owners-park-230316605.html http://gearheads.org/stop-driving-your-ford-escape/ 16

The Problem Ford press release: The original cooling system design was not able to address a loss of coolant system pressure under certain operating conditions, which could lead to a vehicle fire while the engine was running. Ford VP: "We had a sequence of events that caused the cooling system software to restrict coolant flow," he says. Most of the time, that would not be a problem and is the intended behavior. But in rare cases the coolant pressure coupled with other conditions may cause the coolant to boil. When the coolant boils, the engine may go into extreme overheating causing more boiling and rapid pressure increase. This caused coolant leaks near the hot exhaust that led to an engine fire. Ford has seen 12 fires in Escapes and one in a Fusion. Quotes from: http://corporate.ford.com/news-center/press-releases-detail/pr-ford-produces-fix-in-voluntary-37491 http://www.usatoday.com/story/money/cars/2012/12/10/ford-recall-escape-fusion-ecoboost/1759063/ 17

Quote Almost all software-related accidents can be traced back to flaws in the requirements specification -- Prof. Nancy Leveson, MIT These problems can pass every component and subsystem test, every simulation, and every verification effort!

Toyota Unintended Acceleration 2004-2009: 102 incidents 22

Toyota Unintended Acceleration 2004: Push-button ignition 2004-2009 102 incidents of uncontrolled acceleration Speeds exceed 100 mph despite stomping on the brake 30 crashes 20 injuries Today Software fixes for pushbutton ignition, pedals Pushbutton was reliable, Software was reliable. All component requirements were met. Overall system unsafe, unexpected! http://www.reuters.com/article/2010/07/14/us-toyota-idustre66d0fr20100714 http://www.statesman.com/business/u-s-toyota-cite-driver-error-in-many-803504.html 23

Toyota Unintended Acceleration 2004: Push-button ignition 2004-2009 102 incidents of uncontrolled acceleration Speeds exceed 100 mph despite stomping on the brake 30 crashes 20 injuries Today Software fixes for pushbutton ignition, pedals How can we be sure the requirements are right? How can we integrate human and technical considerations? http://www.reuters.com/article/2010/07/14/us-toyota-idustre66d0fr20100714 http://www.statesman.com/business/u-s-toyota-cite-driver-error-in-many-803504.html 24

344,000 minivans recalled Honda Odyssey Stability control software problem In certain circumstances, an error in the software can prevent the system from calibrating correctly, leading to pressure building up in the braking system, the National Highway Traffic Safety Administration said. If pressure builds to a certain point, "the vehicle may suddenly and unexpectedly brake hard, and without illuminating the brake lights, increasing the risk of a crash from behind," the NHTSA said. 2007-2008 models affected Problem discovered in 2013 These problems made it through all existing processes: design reviews, testing, etc. 25

Cost of Fix Addressing potential issues Reaction High Bolt-on Systems Thinking System Requirements Systems Engineering Low Concept Requirements Design Build Operate Need to address issues early Early decisions have biggest impact Illustration courtesy Bill Young

Human Interactions

China Airlines 006 Autopilot compensates for single engine malfunction Autopilot reaches max limits, aircraft turns slightly Pilots not notified Autopilot at its limits Pilots notice slight turn, disengage autopilot for manual control Aircraft immediately nosedives Pilot error or Clumsy automation? 30

Operator Error: Old View Human error is cause of most incidents and accidents So do something about human involved Fire them Retrain them Admonish them Rigidify their work with more rules and procedures Or do something about humans in general Marginalize them by putting in more automation (Leveson) 34

Operator Error: Systems View (Dekker, Rasmussen, Leveson, Woods, etc.) Human error is a symptom, not a cause All behavior affected by context (system) in which it occurs To understand human error, look at the system System designs can make human error inevitable When bad systems cause operator error, can we really blame the operators rather than designers? To do something about operator error, look at: Unintuitive equipment and system designs Usefulness of procedures Existence of goal conflicts and production pressures Human error is a symptom of the system and its design 35

Most stove tops Is this a design problem or just human error? *Image from D. Norman, 1988 36

Natural Mapping *Image from D. Norman, 1988 The right design will reduce human error 37

Toyota Unintended Acceleration 2004: Push-button ignition 2004-2009 102 incidents of uncontrolled acceleration Speeds exceed 100 mph despite stomping on the brake 30 crashes 20 injuries Today Software fixes for pushbutton ignition, pedals Software design problem or driver error? http://www.reuters.com/article/2010/07/14/us-toyota-idustre66d0fr20100714 40 http://www.statesman.com/business/u-s-toyota-cite-driver-error-in-many-803504.html

Tesla Summon Tesla: "the incident occurred as a result of the driver not being properly attentive... Drivers must agree to legal terms on their touch screen before the feature is allowed This feature will park Model S while the driver is outside the vehicle. Please note that the vehicle may not detect certain obstacles, including those that are very narrow (e.g., bikes), lower than the fascia, or hanging from the ceiling. As such, Summon requires that you continually monitor your vehicle's movement and surroundings while it is in progress and that you remain prepared to stop the vehicle at any time using your key fob or mobile app or by pressing any door handle. You must maintain control and responsibility for your vehicle when using this feature and should only use it on private property. OK CANCEL

42

STAMP: System Theoretic Accident Model and Processes Foundation of STPA 47

Systems approach to safety engineering (STAMP) STAMP Model (Leveson, 2012) Treat accidents as a control problem, not a failure problem Accidents are more than a chain of events, they involve complex dynamic processes. Prevent accidents by enforcing constraints on component behavior and interactions Captures many causes of accidents: Component failure accidents Unsafe interactions among components Complex human, software behavior Design errors Flawed requirements esp. software-related accidents 48

Basic STAMP Control Actions Controller Process Model Feedback Controlled Process Controllers use a process model to determine control actions Unanticipated behavior often occurs when the process model is incorrect Four types of inadequate control actions: 1) Control commands are not given 2) Inadequate commands are given 3) Potentially correct commands but too early, too late 4) Control action stops too soon or applied too long Tends to be a good model of both software and human behavior Explains software errors, human errors, interaction accidents, 49 (Leveson, 2012)

Basic STAMP Controller Process Model Control Actions Feedback Controlled Process 50 (Leveson, 2012)

Basic STAMP Controller Process Model Control Actions Feedback Controlled Process 51 (Leveson, 2012)

Basic STAMP Controller Process Model Control Actions Feedback Controlled Process 52 (Leveson, 2012)

Example Control Structure (Leveson, 2012)

STAMP and STPA STAMP Model Accidents are caused by inadequate control 54

STAMP and STPA CAST Accident Analysis STAMP Model How do we find inadequate control that caused an accident? Accidents are caused by inadequate control 55

STAMP and STPA CAST Accident Analysis STPA Hazard Analysis How do we find inadequate control in a design? STAMP Model Accidents are caused by inadequate control 56

STPA: System Theoretic Process Analysis 57

STPA (System-Theoretic Process Analysis) STPA Hazard Analysis STAMP Model (Leveson, 2012) System engineering foundation Define accidents, system hazards, Control structure Step 1: Identify unsafe control actions Step 2: Identify accident causal scenarios Control Actions Controller Controlled process Feedback 58

Definitions Accident (Loss) An undesired or unplanned event that results in a loss, including loss of human life or human injury, property damage, environmental pollution, mission loss, etc. Hazard A system state or set of conditions that, together with a particular set of worst-case environment conditions, will lead to an accident (loss). Definitions from Engineering a Safer World

Example System: Aviation System-level Accident (Loss):?

Example System: Aviation System-level Accident (Loss): Two aircraft collide

System-level Accident (Loss): Two aircraft collide System-level Hazard:?

System-level Accident (Loss): Aircraft crashes System-level Hazard: Two aircraft violate minimum separation

Aviation Examples System-level Accident (loss) A-1: Two aircraft collide A-2: Aircraft crashes into terrain / ocean System-level Hazards H-1: Two aircraft violate minimum separation H-2: Aircraft enters unsafe atmospheric region H-3: Aircraft enters uncontrolled state H-4: Aircraft enters unsafe attitude H-5: Aircraft enters prohibited area

Example system: Automotive vehicles Accidents and Hazards? Image: http://whitneylawgroup.com/wp-content/uploads/2013/07/52-car-pile-up.jpg

STPA (System-Theoretic Process Analysis) STPA Hazard Analysis STAMP Model (Leveson, 2012) System engineering foundation Define accidents, system hazards Control structure Step 1: Identify unsafe control actions Step 2: Identify accident causal scenarios Control Actions Controller Controlled process Feedback 74

Control Structure Examples

Example Control Structure (Leveson, 2012)

Proton Therapy Machine High-level Control Structure Gantry Cyclotron Beam path and control elements

Proton Therapy Machine High-level Control Structure Antoine PhD Thesis, 2012

Managing complexity Lesson from systems theory, cognitive science Human minds manage complexity through abstraction and hierarchy Use top-down process Start at a high abstract level Iterate to drill down into more detail Build hierarchical models of the system

Antoine PhD Thesis, 2012 Proton Therapy Machine Control Structure

Ballistic Missile Defense System Image from: http://www.mda.mil/global/images/system/aegis/ftm- 21_Missile%201_Bulkhead%20Center14_BN4H0939.jpg Safeware Corporation

Adaptive Cruise Control Image from: http://www.audi.com/etc/medialib/ngw/efficiency/video_assets/fallback_videos.par.0002.image.jpg

Qi Hommes

Automotive shift-by-wire Image: http://www.toyota.com.au/prius/features/hybrid-performance 89

Automotive Shift by Wire Your turn: Control structure? Application of STPA to a Shift by Wire System, STPA workshop 2014 90

Control structure for vehicle Driver Steering, brake, accelerator (engine), ignition, other controls Range control Shift Control Module Current range indication Status information Visual cues Sensory feedback Range commands Physical Vehicle *Similar for both mechanical/electrical implementations

Automotive Shift by Wire Application of STPA to a Shift by Wire System, STPA workshop 2014 92

(Leveson, 2012) STPA (System-Theoretic Process Analysis) System engineering foundation Define accidents, system hazards Control structure Step 1: Identify unsafe control actions Step 2: Identify accident causal scenarios Control Actions Controller Controlled process Feedback 93

STPA Step 1: Unsafe Control Actions (UCA) Control Actions Controller Controlled process Feedback 4 ways unsafe control may occur: A control action required for safety is not provided or is not followed An unsafe control action is provided that leads to a hazard A potentially safe control action provided too late, too early, or out of sequence A safe control action is stopped too soon or applied too long (for a continuous or non-discrete control action) Not providing causes hazard Providing causes hazard Incorrect Timing/ Order Stopped Too Soon / Applied too long Shifter Command????

Structure of an Unsafe Control Action Example: Computer provides open catalyst valve cmd while water valve is closed Source Controller Type Control Action Context Four parts of an unsafe control action Source Controller: the controller that can provide the control action Type: whether the control action was provided or not provided Control Action: the controller s command that was provided / missing Context: conditions for the hazard to occur (system or environmental state in which command is provided) 95

UCAs Safety Constraints Unsafe Control Action Safety Constraint

(leveson, 2012) STPA (System-Theoretic Process Analysis) System engineering foundation Define accidents, system hazards Control structure Step 1: Identify unsafe control actions Step 2: Identify accident causal scenarios Control Actions Controller Controlled process Feedback 97

STPA Step 2: Identify Causal Factors Select an Unsafe Control Action A. Identify what might cause it to happen Develop accident scenarios Identify controls and mitigations B. Identify how control actions may not be followed or executed properly Develop causal accident scenarios Identify controls and mitigations

Step 2A: Potential causes of UCAs UCA: Shift Control Module provides range command without driver new range selection Controller Inadequate Procedures (Flaws in creation, process changes, incorrect modification or adaptation) Control input or external information wrong or missing Process Model (inconsistent, incomplete, or incorrect) Missing or wrong communication with another Controller controller Inadequate or missing feedback Feedback Delays Controller Delayed operation Actuator Inadequate operation Conflicting control actions Controlled Process Component failures Sensor Inadequate operation Incorrect or no information provided Measurement inaccuracies Feedback delays Process input missing or wrong Changes over time Unidentified or out-of-range disturbance Process output contributes to system hazard

STPA Step 2: Identify Causal Factors Select an Unsafe Control Action A. Identify what might cause it to happen Develop accident scenarios Identify controls and mitigations B. Identify how control actions may not be followed or executed properly Develop causal accident scenarios Identify controls and mitigations

Step 2B: Potential control actions not followed Shift Control Module provides range command Controller Inadequate Procedures (Flaws in creation, process changes, incorrect modification or adaptation) Control input or external information wrong or missing Process Model (inconsistent, incomplete, or incorrect) Missing or wrong communication with another Controller controller Inadequate or missing feedback Feedback Delays Actuator Inadequate operation Sensor Inadequate operation Controller Delayed operation Conflicting control actions Process input missing or wrong Controlled Process Component failures Changes over time Range is not engaged Unidentified or out-of-range disturbance Incorrect or no information provided Measurement inaccuracies Feedback delays Process output contributes to system hazard