So it s Reliable but is it Safe? - a More Balanced Approach To ATM Safety Assessment

So it s Reliable but is it Safe? - a More Balanced Approach To ATM Safety Assessment ATM R&D Seminar Barcelona 2 nd to 5 th July 2007 Derek Fowler, Gilles Le Galo, Eric Perrin EUROCONTROL Stephen Thomas Entity Systems Ltd European 0 Organisation for the Safety of Air Navigation 31 May 2006

The Brief Reviewers felt that presentation would benefit from: Why is the approach important?? What s different from the traditional safety assessment?? Worked example 1

Thesis A system can fail even though none of its individual elements has failed [after Professor Nancy Leveson, MIT] A few possibilities: Inconsistent data Dysfunctional interactions Inadequate performance Abnormal environment / inputs Misuse So why do most Software standards focus on reliability?? Software unreliability has never been the cause of a major accident 2

Safety Assessments in European ATM Derived from SAE ARP 4754 / 4761 (civil airborne systems): Equipment focused Failure based: Safety Requirements mainly about reliability Not a problem historically: Systems have not been highly integrated Changes have been largely equipment replacement Alluded to in some of yesterday s talks But it is a problem for the future new concepts, automation etc But it is a problem for the future new concepts, automation etc 3

Traditional Approach Operational Environment ATM System ATM Service Hazards 4

How it Works Hazards: represent some kind of failure inside the box Consequence Analysis: how serious the Hazards are Safety Objectives: how often we can allow the Hazards to occur Causal Analysis: what could cause the Hazards Safety Requirements: how often we can allow the Causes to occur ie how reliable the box needs to be ATM System Operational Environment ATM Service Hazards 10 -n fixation!! 5

What do we Actually Need? Need to know also what the box is supposed to do and how well it needs to do it Need a broader approach to safety assessment Need to take a total system view Need to address 2 key issues: How safe will new ATM systems be when working to spec? How safe will they be when they fail? Failure Approach Success Approach Captured in a Generic Safety Argument 6

Generic Safety Argument - Project Safety Case: Arg0 (to Level 1) Cr001 Acceptably safe means that risk of an accident is [safety criteria tbd]: A0001 [Assumptions tbd]: Arg 0 [Subject X] will be acceptably safe. C0001 Applies to [operational environment etc tbd]: J0001 [Justification tbd]: 7 Arg 1 [Subject X] has been specified to be acceptably safe next slide Arg 2 [Subject X] has been implemented in accordance with the specification [tbd] Arg 3 The transition to operational service of [Subject X] will be acceptably safe [tbd] What is different is in Arg1 Arg 4 The safety of [Subject X] will continue to be demonstrated in operational service [tbd]

Generic Safety Argument: Arg1 (to Level 2) Arg 1 [Subject X] has been specified to be acceptably safe Arg 1.1 The underlying concept is intrinsically safe Arg 1.6 That which has been specified is realistic [tbd] [tbd] Arg 1.2 The corresponding system design is complete [tbd] Arg 1.3 The system design functions correctly & coherently under all expected (normal) environmental conditions Arg 1.4 The system design is robust against external abnormalities [tbd] Arg 1.5 All risks from internal system failures have been mitigated sufficiently [tbd] 8 [tbd]

A Simple Case Study Anticipated Landing Clearances in Low Visibility for MLS / GBAS Operations Covers Arg1.1 to 1.5 only 9

Arg1.1 Concept is intrinsically safe Objectives are to : show that the Concept has the potential (in the absence of failure) to be safe identify the key parameters that make it so 10

ILS Cat II/III Landing Clearance LSA OFZ AC 1 AC 2 08 26 2nm Landing Clearance given such that LSA / OFZ protected 11

MLS / GBAS Cat II/III Landing Clearance LSA Trigger Line OFZ AC 1 08 26 AC2 1nm Landing Clearance given such that LSA protected. AC 1 08 26 12 AC2.OFZ also protected

Therefore ALC in LV has potential to be safe (cf ILS Cat II/II) because: (reduced) LSA is still protected OFZ is still protected Key functionality / parameters : the time for AC1 to taxi from the Trigger Line until clear of OFZ must always be less than the time for AC2 fly the last 1 nm before THR the Trigger Line must be outside the MLS/GBAS LSA AC1 must continue taxiing until clear of OFZ AC2 must be given CLR by 1 nm from THR, or go around, to achieve: stabilised landing; or safe Missed Approach These are are the the foundations, but but are are not not the the whole building! 13

Arg1.2 System Design is Complete The objective is to: show that sufficient Safety Requirements have been specified for each element of the system (except for issues relating to failure) 14

Examples of Initial Safety Requirements (1) 15 Controller shall not issue a landing clearance to an aircraft until preceding aircraft has crossed the Trigger Line on the ATC A-SMGCS display Controller shall issue a landing clearance to an aircraft by the time it has reached 1nm from the runway THR (at the latest), or issue a go-around Trigger Line shall be displayed on the Controller s A-SMGCS HMI The minimum distance between the Trigger Line and the runway edge shall be determined as follows: Trigger Line shall always be further from the runway edge than the MLS/GBAS LSA Trigger Line shall be positioned such that the time for AC1 to taxi (or be towed) from the Trigger Line until it is clear of the OFZ is always less than the time needed for AC2 to cover the last 1 nm of its Final Approach). Trigger Line position shall take full account of the slowest average speed of an aircraft in taxiing (or being towed) between the Trigger Line and the edge of the OFZ, and the fastest average groundspeed of an aircraft on Final Approach Trigger Line position shall be determined for longest aircraft using airport Trigger Line position shall take full account of the accuracy / resolution of the A-SMGCS display of aircraft position and the Trigger Line

Examples of Initial Safety Requirements (cont..) Aerodrome Procedures shall require Pilots to go around at 200ft above THR if no landing clearance received from ATC Aerodrome Procedures shall require Pilots to continue taxiing until passed either: the ILS CAT II/III holding point if it exists or a sign indicating when the (whole) aircraft has cleared the edge of the OFZ Aerodrome Procedures shall require Pilots to inform the Controller if forced to stop before passing either: the ILS CAT II/III holding point if it exists, or a special sign indicating when their aircraft have cleared the OFZ Aerodrome Procedures shall require Pilots to transmit RT communication on TWR frequency when crossing active runway 16

1.3 System functions correctly & coherently under all expected environmental conditions Objective is to: show that the system design functions correctly and coherently under all normal environmental conditions 17

Techniques Static analysis of the system design Scenario / what-if analyses Real-time simulations Showed that: There were no dysfunctional interactions Data was consistent (if SRs met) Controllers found the system useable 18

Arg1.4 System is robust against external abnormalities Considered the reaction of the system to abnormal events in its operational environment from two perspectives: How well can the system continue to operate? Could such conditions cause the system to behave in a way that introduces additional risk? 19

Reaction to external abnormalities Failures included: Landing aid (MLS/GBAS) or satellite interference or failure (GBAS). Communication Failure Lighting outage A-SMGCS failure loss of facility Mitigation in each case was Missed Approach (if no visual acquisition of runway) Other abnormalities considered: Aircraft on-board emergencies High crosswinds Risk was judged to be no higher than for current operations 20

1.5 All risks from internal system failure mitigated sufficiently Internal failure of the system assessed, by FHA/PSSA, from two perspectives: how loss of functionality would reduce the effectiveness of the system. how anomalous behaviour of the system could induce risks that might otherwise not occur. 21

FHA/PSSA Main Conclusions ALC in LV introduces a new main Hazard: AC1 stops after Trigger Line, but before exiting OFZ, landing clearance having been given to AC2 If AC2 lands (or goes around before 200ft agl) risk is negligible: Trigger Line guarantees wing-tip clearance for landing case (SR!) MA before 200ft agl would put AC2 above tail of AC1 Worst case is if AC2 goes around later than 200ft agl: Qualitatively, we feel that risk is probably small cf capacity benefits Quantification of FHA/PSSA is in progress, to try to confirm this 22

Lessons Learnt Original, failure-based (FHA/PSSA) analysis was too limited and unnecessarily complex New, broader approach: is more comprehensive addresses functional and performance issues relating to the Concept, not just reliability issues has led to a more rigorous and detailed understanding and description of the ALC Concept and how it would have to be operated in practice has produced a much more readable Preliminary Safety Case which starts with the basic idea and then gradually builds up the case Around 30 30 Safety Requirements so so far far none specify reliability!!!! 23

So where are we now? Using (and still developing) the Generic Safety Argument on many EUROCONTROL programmes: eg FARADS, FASTI, TMA 2010 +, ACAS II, TBS, MTV/SESAR very positive response from operational colleagues Put together a Safety Assessment WG to: to produce a broader framework for Safety Assessment based on the Generic Safety Argument and Life-cycle model Provide a mapping between the framework and safety-related techniques eg SAM, Safety Cases, CTA/HRA, HF Case, FT/RT simulations, CRM, IRP etc etc Deliverable a simple guide on how to do safety [properly!] 24

Questions??? 25

ILS Localizer Beamwidth Reduction Task: safety assessment of reducing ILS Localiser beamwidth from (35 deg to 16 deg) ANSP Approach: applied minor-change procedure, approved by regulator did not develop a Safety Argument carried out traditional FHA/PSSA of potential failures used quantified RCS (ie absolute approach) validity / applicability not established 26

ILS Safety Assessment Results 6 Hazards identified generic ILS Localizer hazards only Quantified Safety Objective for each Hazard: Two of them have max frequency of 1 event per 100,000 years! 8 Safety Requirements specified: No quantification Completely unrelated / untraceable to the Safety Objectives Assumption: Acceptable approach paths exist that are flyable and are tolerably safe Virtually, nothing in in the the safety safety assessment actually addressed the the reduction in in the the width width of of the the ILS ILS Localiser Beam Beam!!!!!! 27

Questions??? 28

Generic Safety Argument: Arg1.1 and Arg1.2 (to Level 3) Arg 1.1 The underlying concept is intrinsically safe the operational context and scope of the Concept has been clearly described differences from existing operations have been described, understood and reconciled with Safety Criteria the impact of the concept on the operational environment (including interfaces with adjacent systems) has been assessed and shown to be consistent with the Safety Criteria the key functionality and performance parameters have been defined and shown to be consistent with the safety criteria 29 Arg 1.2 The corresponding system design is complete the boundaries of the system are clearly defined the Concept of Operations fully describes how the system is intended to operate everything necessary to achieve a safe implementation of the Concept related to equipment, people, procedures and airspace design - has been specified (as safety requirements), for each element of the system all safety requirements on, and assumptions about, external elements of the end-to-end system have been captured

Generic Safety Argument: Arg1.3 and Arg1.4 (to Level 3) Arg 1.3 The system design functions correctly & coherently under all expected (normal) environmental conditions Arg 1.4 The system design is robust against external abnormalities 30 the design is internally coherent eg is consistent in functionality (in equipment, procedures and human tasks), and in use of data, throughout the system all reasonably foreseeable normal operational conditions / range of inputs from adjacent systems have been identified the design is capable of delivering (or maintaining) the required risk reduction for the identified operational conditions / inputs the design functions correctly in a dynamic sense, for the identified operational conditions / inputs. the boundaries of the system are clearly defined the Concept of Operations fully describes how the system is intended to operate everything necessary to achieve a safe implementation of the Concept related to equipment, people, procedures and airspace design - has been specified (as safety requirements), for each element of the system all safety requirements on, and assumptions about, external elements of the end-to-end system have been captured

Generic Safety Argument: Arg1.5 (to Level 3) Arg 1.5 All risks from internal system failures have been mitigated sufficiently All reasonably foreseeable hazards, at the boundary of the system, identified Severity of the effects from each hazard correctly assessed, taking account of any external mitigation means Safety Objectives set for each hazard such that the corresponding aggregate risk is within the safety criteria All reasonably foreseeable causes of each hazard have been identified Safety Requirements have been specified (or Assumptions stated) for the causes of each hazard, taking account of any internal mitigation means A risk assessment has been carried out, and shows that the corresponding aggregate risk is within the specified safety criteria. 31

Generic Safety Argument: Arg1.6 (to Level 3) Arg 1.6 That which has been specified is realistic All aspects of the system design have been captured as Safety Requirements or (where applicable) as Assumptions All Safety Requirements are verifiable ie satisfaction can be demonstrated by direct means (eg testing) or (where applicable) indirectly through appropriate assurance processes (eg HAL, SWAL and PAL) All Safety Requirements are capable of being satisfied in a typical implementation in hardware, software, people and procedures. All Assumptions have been show to be necessary and valid 32