Resilience Engineering: The changing nature of safety

Similar documents
Capturing an Uncertain Future: The Functional Resonance Accident Model

How to be safe by looking at what goes right instead of what goes wrong

ASDA 2014 Scientific Seminar ATM Exploratory Research Tournament

Safety models & accident models

Understanding safety life cycles

Safety-Critical Systems

Hazard Identification

FMEA- FA I L U R E M O D E & E F F E C T A N A LY S I S. PRESENTED BY: AJITH FRANCIS

DATA ITEM DESCRIPTION Title: Failure Modes, Effects, and Criticality Analysis Report

A study on the relation between safety analysis process and system engineering process of train control system

Unit 5: Prioritize and Manage Hazards and Risks STUDENT GUIDE

LECTURE 3 MAINTENANCE DECISION MAKING STRATEGIES (RELIABILITY CENTERED MAINTENANCE)

Engr2110. Week 6 Product Safety/Liability

The Best Use of Lockout/Tagout and Control Reliable Circuits

FLIGHT TEST RISK ASSESSMENT THREE FLAGS METHOD

Preventive Maintenance

Marine Risk Assessment

Reliability engineering is the study of the causes, distribution and prediction of failure.

Employ The Risk Management Process During Mission Planning

Best Practice RBI Technology Process by SVT-PP SIMTECH

Risk Management Qualitatively on Railway Signal System

Federal Aviation Administration Safety & Human Factors Analysis of a Wake Vortex Mitigation Display System

Hazard & Risk Management

Safety assessments for Aerodromes (Chapter 3 of the PANS-Aerodromes, 1 st ed)

Release: 1. UEPOPL002A Licence to operate a reciprocating steam engine

A GUIDE TO WRITING A RISK ASSESSMENT FOR A BMAA EVENT

Basic STPA Tutorial. John Thomas

RISK ASSESSMENT HAZARD IDENTIFICATION AND RISK ASSESSMENT METHODOLOGY

SECTION 2 HAZARD ASSESMENT

Lecture 04 ( ) Hazard Analysis. Systeme hoher Qualität und Sicherheit Universität Bremen WS 2015/2016

(C) Anton Setzer 2003 (except for pictures) A2. Hazard Analysis

Mathematics of Planet Earth Managing Traffic Flow On Urban Road Networks

XVII Congreso de Confiabilidad

SOUTH AFRICAN QUALIFICATIONS AUTHORITY REGISTERED UNIT STANDARD:

IIUM EVENT SAFETY RISK ASSESSMENT

// RoWSaF Making roads safer for road workers rowsaf.org.uk. RoWSaF Strategy 2015

Aeronautical studies and Safety Assessment

Managing for Liability Avoidance. (c) Lewis Bass

Manned Underwater Operations (MUO) in a holistic safety perspective. Olav Hauso. Special Adviser. Petroleum Safety Authority, Norway (PSA)

11.2 Detailed Checklists CHECKLIST 1: FEASIBILITY STAGE AUDIT. 1.1 General topics Scope of project; function; traffic mix

1.0 PURPOSE 2.0 REFERENCES

Reliability of Safety-Critical Systems Chapter 4. Testing and Maintenance

CIRCUIT BREAKER TESTING - JOB SAFETY ANALYSIS

Review and Assessment of Engineering Factors

FIREFIGHTER RESCUE IS ONE OF THE MOST CHALLENGING SITUATIONS YOU'LL EVER FACE

Marine Education Society of Australasia HAZARD MANAGEMENT POLICY

Safety Guidelines for Live Entertainment and Events I Part 2. Hazard Identification and Risk Management 1

Risk Management Series Article 8: Risk Control

SMART!Hazard Manager PROCESS EXPLANATION

Workshop Information IAEA Workshop

Occupational Health & Safety

PRACTICAL EXAMPLES ON CSM-RA

Application of fuzzy logic to explosion risk assessment

MDEP Common Position No AP

To comply with the OHS Act, the responsible manager must carry out and document the following:

ADVISORY MATERIAL JOINT AMJ

Department of Internal Affairs Mandatory Non-Financial Performance Measures 2013 Roads and Footpaths

A GUIDE TO RISK ASSESSMENT IN SHIP OPERATIONS

DETERMINATION OF SAFETY REQUIREMENTS FOR SAFETY- RELATED PROTECTION AND CONTROL SYSTEMS - IEC 61508

MODEL AERONAUTICAL ASSOCIATION OF AUSTRALIA

Risk Management. Definitions. Principles of Risk Management. Types of Risk

Engineering Safety into the Design

FOCUS ON SUCCESS: A SAFETY-II APPROACH ON OPERATIONAL MANEUVERS IN THE ITAIPU BINACIONAL HYDROPOWER PLANT

Quality Management Determined from Risk Assessment

The Safety Case. The safety case

Safety-Critical Systems. Rikard Land

Codex Seven HACCP Principles. (Hazard Identification, Risk Assessment & Management)

JOB SAFETY ENVIRONMENT ANALYSIS (JSEA) Working With In Situ Testing Rigs

The Safety Case. Structure of Safety Cases Safety Argument Notation

Software Reliability 1

Bus and Transit Lane Review Update

Determining Occurrence in FMEA Using Hazard Function

PLANT RISK ASSESSMENT REPORT

RIIWMG203D Drain and Dewater Civil Construction Site Learner Guide

LONE WORKING HEALTH AND SAFETY GUIDANCE

Screw Gas Compressors & Dry Gas Seals Applications. Singapore, Mandarin Orchard Singapore. Training Course : Training Course For One Week In

a. identify hazardous conditions and potential accidents; b. provide information with which effective control measures can be established;

$ 12" $#&%$ 86.) *1! *1 /3 )00, , (1* Neighborhood Traffic Calming Part 3 Solutions Bradley William Yarger, P.E.

Reliability predictions in product development. Proof Engineering Co

THE BAKER REPORT HOW FINDINGS HAVE BEEN USED BY JOHNSON MATTHEY TO REVIEW THEIR MANUFACTURING OPERATIONS

Proposed Abstract for the 2011 Texas A&M Instrumentation Symposium for the Process Industries

The Criticality of Cooling

PROCEDURE. April 20, TOP dated 11/1/88

Three Approaches to Safety Engineering. Civil Aviation Nuclear Power Defense

Sharing practice: OEM prescribed maintenance. Peter Kohler / Andy Webb

Policy for Evaluation of Certification Maintenance Requirements

Safe Work Method Statement

Safety Critical Systems

SAFETY APPROACHES. The practical elimination approach of accident situations for water-cooled nuclear power reactors

A REVIEW OF AGE ADJUSTMENT FOR MASTERS SWIMMERS

Isolation Standard - Tags

Hazard Identification and Control

DEVELOPING YOUR PREVENTIVE MAINTENANCE PROGRAM

AUSTRALIA ARGENTINA CANADA EGYPT NORTH SEA U.S. CENTRAL U.S. GULF. SEMS HAZARD ANALYSIS TRAINING September 29, 2011

Hazard and Risk Assessment Guide

CONSTRUCTSAFE TIER 1 HEALTH AND SAFETY COMPETENCY TEST FRAMEWORK

Rock Climbing Risk Management Plan Lutanda Yarramundi

HS329 Risk Management Procedure

HAZARD ANALYSIS PROCESS FOR AUTONOMOUS VESSELS. AUTHORS: Osiris A. Valdez Banda Aalto University, Department of Applied Mechanics (Marine Technology)

Safe Work Method Statement

Transcription:

Resilience Engineering: The changing nature of safety Professor & ndustrial Safety Chair MNES ParisTech Sophia Antipolis, France Erik Hollnagel E-mail: erik.hollnagel@gmail.com Professor NTNU Trondheim, Norge

Normal accident theory (1984) n the whole, we have complex systems because we don t know how to produce the output through linear systems.

MT view: sharp-end, blunt-end Blunt-end failures happened earlier and somewhere else Morals, social norms Government Authority Company Management Workplace Activity Sharp-end failures happen here and now

Focus on operation (sharp end) Design Scope of work system (1984) rganisation (management) Downstream Maintenance Work has clear objectives and takes place in well-defined situations. Systems and technologies are loosely coupled and tractable. Upstream Technology, automation

Vertical and horizontal extensions Scope of work system (2008) rganisation (management) Downstream Design Maintenance Upstream Technology, automation A vertical extension to cover the entire system, from technology to organisation

Vertical and horizontal extensions Scope of work system (2008) rganisation (management) Downstream Design Maintenance ne horizontal extension, to cover the lifecycle, from design to maintenance Upstream Technology, automation A vertical extension to cover the entire system, from technology to organisation

Vertical and horizontal extensions Scope of work system (2008) rganisation (management) A second horizontal extension, to cover Downstream upstream and downstream processes Design Maintenance ne horizontal extension, to cover the lifecycle, from design to maintenance Upstream Technology, automation A vertical extension to cover the entire system, from technology to organisation

Tractable and intractable systems Tractable system (independent, clockwork) Description are simple with few details Principles of functioning known (white box) System does not change while being described Complicacy Comprehensibility Stability ntractable system (interdependent, teamwork) Elaborate descriptions with many details Principles of functioning unknown (black box) System changes before description is completed Fully specified Partly specified Underspecified

Performance variability is necessary Systems are so complex that work situations always are underspecified hence partly unpredictable Few if any tasks can successfully be carried out unless procedures and tools are adapted to the situation. Performance variability is both normal and necessary.? Many socio-technical systems are intractable. The conditions of work therefore never completely match what has been specified or prescribed. ndividuals, groups, and organisations normally adjust their performance to meet existing conditions, specifically actual resources and requirements. Because resources (time, manpower, information, etc.) always are finite, such adjustments will always be approximate rather than exact. Performance variability Success Failure

Efficiency-Thoroughness Trade-ff Thoroughness: Time to think Recognising situation. Choosing and planning. f thoroughness dominates, there may be too little time to carry out the actions. Neglect pending actions Miss new events Efficiency: Time to do mplementing plans. Executing actions. f efficiency dominates, actions may be badly prepared or wrong Miss pre-conditions Look for expected results Time & resources needed Time & resources available

The ETT principle The ETT principle describes the fact that people (and organisations) as part of their activities practically always must make a trade-off between the resources (time and effort) they spend on preparing an activity and the resources (time, effort and materials) they spend on doing it. ETTing favours thoroughness over efficiency if safety and quality are the dominant concerns, and efficiency over thoroughness if throughput and output are the dominant concerns. The ETT principle means that it is impossible to maximise efficiency and thoroughness at the same time. Neither can an activity expect to succeed, if there is not a minimum of either.

Failures or successes? When something goes wrong, e.g., 1 event out of 10.000 (10E-4), humans are assumed to be responsible in 80-90% of the cases. When something goes right, e.g., 9.999 events out of 10.000, are humans also responsible in 80-90% of the cases? Who or what are responsible for the remaining 10-20%? nvestigation of failures is accepted as important. Who or what are responsible for the remaining 10-20%? nvestigation of successes is rarely undertaken.

Why only look at what goes wrong? Safety = Reduced number of adverse events. Focus is on what goes wrong. Look for failures and malfunctions. Try to eliminate causes and improve barriers. Safety and core business compete for resources. Learning only uses a fraction of the data available 10-4 := 1 failure in 10.000 events 1-10 -4 := 9.999 nonfailures in 10.000 events Safety = Ability to succeed under varying conditions. Focus is on what goes right. Use that to understand normal performance, to do better and to be safer. Safety and core business help each other. Learning uses most of the data available

Risk profile (= lack of safety) Consequence Negligible Marginal Critical Very critical Low Low Low Moderate Moderate Low Low Moderate High High Moderate Moderate High High Extreme High High Extreme Extreme Extreme Rare Unlikely Possible Likely Certain Probability

Benefit profile (= safety) nnovative High High Very high Very high Very high Consequence Effective Acceptable Moderate Moderate Low Low High Moderate High High Very high High Negligible Low Low Low Moderate Moderate Rare Unlikely Possible Likely Certain Probability

Range of event outcomes utcome Positive Negative Neutral Serendipity Disasters Very low Good luck Accidents Normal outcomes (things that go right) ncidents Near misses Mishaps Very high Predictability

Frequency of event outcomes utcome Positive Serendipity Good luck Normal outcomes (things that go right) Negative Neutral Disasters Very low Accidents ncidents Near misses Mishaps Very high 10 6 4 10 10 2 Predictability

Being safe versus being unsafe utcome Positive Serendipity Safe Functioning (invisible) Normal outcomes (things that go right) Good luck Negative Neutral Disasters Very low Unsafe Functioning (visible) Functioning Accidents ncidents Near misses Mishaps Very high 10 6 4 10 10 2 Predictability

H i g h w o r k l o a d M e c h a n i c s H i g h w o r k l o a d E q u i p m e n t E x p e r t i s e E x p e r t i s e P G r o c e d u r e s P r o c e d u r e s r e a s e E x c e s s i v e e n d - p l a y L u b r i c a t i o n n t e r v a l a p p r o v a l s n t e r v a l a p p r o v a l s A l l o w a b l e e n d - p l a y n F A A i r c r a f t d e s i g n k n o w l e d g e A C o n t r o l l e d s t a b i l i z e r m o v e m e n t A i r c r a f t R e d u n d a n t d e s i g n L i m i t e d s t a b i l i z e r m o v e m e n t Non-linear accident model Assumption: Accidents result from unexpected combinations (resonance) of variability of normal performance. T C M a i n t e n a n c e o v e r s i g h t T C P R C e r t i f i c a t i o n T C P R T C E n d - p l a y c h e c k i n g P R T C T C J a c k s c r e w u p - d o w m o v e m e n t P R A ir c r a f t d e s i g n P R T C H o r i z o n t a l s t a b i l i z e r m o v e m e n t T C L i m i t i n g s t a b i l i z e r m o v e m e n t P R T C L u b r i c a t i o n T C P R A i r c r a f t p i t c h c o n t r o l P R J a c k s c r e w r e p l a c e m e n t P R P R Consequence: Accidents are prevented by monitoring and damping variability. Safety requires constant ability to anticipate future events. Hazardsrisks: Functional Resonance Accident Model Emerge from combinations of normal variability (socio-technical system), hence looking for ETT* and sacrificing decision * ETT = Efficiency-Thoroughness Trade-ff

High workload Mechanics High workload Equipment Expertise Expertise Procedures Procedures Lubrication Grease Excessive end-play nterval approvals nterval approvals Allowable end-play FAA Aircraft design knowledge Controlled stabilizer movement Aircraft Redundant design Limited stabilizer movement High workload Mechanics High workload Equipment Expertise Expertise Procedures Procedures Grease Excessive end-play Lubrication nterval approvals nterval approvals Allowable end-play FAA Aircraft design knowledge Controlled stabilizer movement Aircraft Redundant design Limited stabilizer movement Risks as non-linear combinations T C T C f accidents happen like this... End-play checking Maintenance oversight Aircraft design Certification Jackscrew up-down movement Horizontal stabilizer movement Aircraft pitch control Lubrication Limiting stabilizer movement Jackscrew replacement... then risks can be found like this... End-play checking Maintenance oversight Aircraft design Certification Jackscrew up-down movement Horizontal stabilizer movement Aircraft pitch control Lubrication Limiting stabilizer movement Jackscrew replacement P R P R Unexpected combinations (resonance) of variability of normal performance. Systems at risk are intractable rather than tractable. Unexpected combinations (resonance) of variability of normal performance. The established assumptions therefore have to be revised

Methods and reality Simple linear Focus on technology Low Hazard High Loose Coupling Tight Dams Marine transport Fault tree Post FMEA offices Chemical plants Railways Mining Power grids HAZP Emergency department Assembly lines Financial markets NPPs Nanotech Military LSCTS adventures Air traffic control Healthcare Manufacturing FMECA Universities ntegrated operations Space missions Public services R&D companies Low (tractable) Uncertainty High (intractable)

Methods and reality Simple linear Focus on technology Complex linear Focus on Human Factors Low Hazard High Loose Coupling Tight Dams Marine transport Railways Mining HPES Power grids Emergency department Chemical plants ATHEANA HAZP Fault Healthcare AEB tree TRACEr Manufacturing FMECA Universities Post FMEA HERA HEAT offices HCR THERP Assembly lines RCA CSN Financial markets NPPs Nanotech Military LSCTS adventures Air traffic control ntegrated operations Space missions Public services R&D companies Low (tractable) Uncertainty High (intractable)

Hazard and uncertainty Simple linear Focus on technology Complex linear Focus on Human Factors Complex linear Focus on organisations Non-linear Focus on normal functions Low Hazard High Loose Coupling Tight Dams Marine transport Power NPPs grids CREAM Emergency MT MERMS department AcciMap Chemical Swiss TRPD plants ATHEANA cheese Railways MRT Mining HPES STEP HAZP Fault Healthcare AEB tree TRACEr Manufacturing FMECA Universities Post FMEA HERA HEAT offices HCR THERP Assembly lines RCA CSN Low (tractable) Uncertainty Financial markets Nanotech RE FRAM STAMP Military LSCTS adventures Air traffic control ntegrated operations Space missions Public services R&D companies High (intractable)

From the negative to the positive Negative outcomes are caused by failures and malfunctions. All outcomes (positive and negative) are due to performance variability.. Safety = Reduced number of adverse events. Safety = Ability to respond when something fails. Safety = Ability to succeed under varying conditions. Eliminate failures and malfunctions as far as possible. mprove ability to respond to adverse events. mprove resilience.

What is safety? The ability to succeed under varying conditions (respond, monitor, anticipate, learn) The reduction of unnecessary harm to an acceptable minimum Systemic (process view) Safety should be As High As Reasonably Practicable (AHARP) Particularistic (product view) Risk should be As Low As Reasonably Practicable (ALARP)

Conclusions so far... Complex socio-technical systems can only function if performance is adjusted to conditions (ETT) Performance variability is the reason why things go right, but also the reason why things sometimes go wrong. We need to understand how things go right before we can understand how they go wrong. Resilience engineering is about how we can ensure that systems remain productive and safe in expected and unexpected conditions alike.