Reliability of Safety-Critical Systems Chapter 10. Common-Cause Failures - part 1

Similar documents
Reliability of Safety-Critical Systems Chapter 4. Testing and Maintenance

Reliability of Safety-Critical Systems Chapter 3. Failures and Failure Analysis

Reliability of Safety-Critical Systems 5.1 Reliability Quantification with FTs

Every things under control High-Integrity Pressure Protection System (HIPPS)

Reliability Analysis Including External Failures for Low Demand Marine Systems

Solenoid Valves used in Safety Instrumented Systems

CHAPTER 28 DEPENDENT FAILURE ANALYSIS CONTENTS

Eutectic Plug Valve. SIL Safety Manual. SIL SM.015 Rev 0. Compiled By : G. Elliott, Date: 19/10/2016. Innovative and Reliable Valve & Pump Solutions

DeZURIK. KGC Cast Knife Gate Valve. Safety Manual

DeZURIK Double Block & Bleed (DBB) Knife Gate Valve Safety Manual

DeZURIK. KSV Knife Gate Valve. Safety Manual

FP15 Interface Valve. SIL Safety Manual. SIL SM.018 Rev 1. Compiled By : G. Elliott, Date: 30/10/2017. Innovative and Reliable Valve & Pump Solutions

SIL explained. Understanding the use of valve actuators in SIL rated safety instrumented systems ACTUATION

Hydraulic (Subsea) Shuttle Valves

Bespoke Hydraulic Manifold Assembly

TRI LOK SAFETY MANUAL TRI LOK TRIPLE OFFSET BUTTERFLY VALVE. The High Performance Company

RESILIENT SEATED BUTTERFLY VALVES FUNCTIONAL SAFETY MANUAL

Ultima. X Series Gas Monitor

Safety Manual VEGAVIB series 60

Pneumatic QEV. SIL Safety Manual SIL SM Compiled By : G. Elliott, Date: 8/19/2015. Innovative and Reliable Valve & Pump Solutions

This manual provides necessary requirements for meeting the IEC or IEC functional safety standards.

Solenoid Valves For Gas Service FP02G & FP05G

SPR - Pneumatic Spool Valve

Failure Modes, Effects and Diagnostic Analysis

High Integrity Pressure Protection Systems HIPPS

Understanding the How, Why, and What of a Safety Integrity Level (SIL)

PROCEDURE. April 20, TOP dated 11/1/88

Implementing IEC Standards for Safety Instrumented Systems

Safety Manual VEGAVIB series 60

New Thinking in Control Reliability

Safety manual for Fisher GX Control Valve and Actuator

Safety Manual OPTISWITCH series relay (DPDT)

Safety Manual. Process pressure transmitter IPT-1* 4 20 ma/hart. Process pressure transmitter IPT-1*

Failure Modes, Effects and Diagnostic Analysis

Advanced Test Equipment Rentals ATEC (2832) OMS 600

Valve Communication Solutions. Safety instrumented systems

Hazard Operability Analysis

Rosemount 2130 Level Switch

SIL Safety Manual. ULTRAMAT 6 Gas Analyzer for the Determination of IR-Absorbing Gases. Supplement to instruction manual ULTRAMAT 6 and OXYMAT 6

Section 1: Multiple Choice

Failure Modes, Effects and Diagnostic Analysis

Neles ValvGuard VG9000H Rev 2.0. Safety Manual

EMERGENCY SHUT-DOWN RELIABILITY ADVANTAGE

L&T Valves Limited SAFETY INTEGRITY LEVEL (SIL) VERIFICATION FOR HIGH INTEGRITY PRESSURE PROTECTION SYSTEM (HIPPS) Report No.

Instrumented Safety Systems

Failure Modes, Effects and Diagnostic Analysis

Hazard Identification

Failure Modes, Effects and Diagnostic Analysis

Session One: A Practical Approach to Managing Safety Critical Equipment and Systems in Process Plants

Understanding safety life cycles

Failure Modes, Effects and Diagnostic Analysis

A systematic hazard analysis and management process for the concept design phase of an autonomous vessel.

EL-O-Matic E and P Series Pneumatic Actuator SIL Safety Manual

AUTHOR(S) CLIENT(S) Multiclient - PDS Forum CLASS. THIS PAGE ISBN PROJECT NO. NO. OF PAGES/APPENDICES

Partial Stroke Testing. A.F.M. Prins

Neles trunnion mounted ball valve Series D Rev. 2. Safety Manual

Failure Modes, Effects and Diagnostic Analysis

Achieving Compliance in Hardware Fault Tolerance

Wire ropes condition monitoring: conception and embodiment

Accelerometer mod. TA18-S. SIL Safety Report

LECTURE 3 MAINTENANCE DECISION MAKING STRATEGIES (RELIABILITY CENTERED MAINTENANCE)

Proposed Abstract for the 2011 Texas A&M Instrumentation Symposium for the Process Industries

The Key Variables Needed for PFDavg Calculation

DETERMINATION OF SAFETY REQUIREMENTS FOR SAFETY- RELATED PROTECTION AND CONTROL SYSTEMS - IEC 61508

Workshop Information IAEA Workshop

Safety Manual VEGASWING 61, 63. NAMUR With SIL qualification. Document ID: 52084

Vortex Meters for Liquids, Gas, and Steam

Using what we have. Sherman Eagles SoftwareCPR.

Purpose. Scope. Process flow OPERATING PROCEDURE 07: HAZARD LOG MANAGEMENT

Failure Modes, Effects and Diagnostic Analysis. Rosemount Inc. Chanhassen, MN USA

Section 1: Multiple Choice Explained EXAMPLE

Session: 14 SIL or PL? What is the difference?

Commissioning and safety manual

innova-ve entrepreneurial global 1

IE011: Process Control & Instrumentation Technology

IE068: Process Instrumentation Technology

Review and Assessment of Engineering Factors

Module No. # 01 Lecture No. # 6.2 HAZOP (continued)

Failure Modes, Effects and Diagnostic Analysis

Reliability engineering is the study of the causes, distribution and prediction of failure.

PREDICTING HEALTH OF FINAL CONTROL ELEMENT OF SAFETY INSTRUMENTED SYSTEM BY DIGITAL VALVE CONTROLLER

ACCURATE PRESSURE MEASUREMENT FOR STEAM TURBINE PERFORMANCE TESTING

Life Extension of Mobile Offshore Units

Failure Modes, Effects and Diagnostic Analysis. Rosemount Inc. Chanhassen, Minnesota USA

SAFETY OF NAVIGATION OPERATING ANOMALIES IDENTIFIED WITHIN ECDIS

SIL Safety Manual for Fisherr ED, ES, ET, EZ, HP, or HPA Valves with 657 / 667 Actuator

UNDERSTANDING SAFETY INTEGRITY LEVEL

Special Documentation Proline Promass 80, 83

ScanFish Katria. Intelligent wide-sweep ROTV for magnetometer surveys

(C) Anton Setzer 2003 (except for pictures) A2. Hazard Analysis

In-Service Inspection of Ammonia Storage Tanks

Advanced LOPA Topics

Reliable subsea gas transport; the history and contribution of DNV-OS-F101

Sharing practice: OEM prescribed maintenance. Peter Kohler / Andy Webb

IE039: Practical Process Instrumentation & Automatic Control

Suitable for anyone who is required to maintain industrial pneumatic systems. No prior knowledge of pneumatic or electrical principles is necessary.

Point level switches for safety systems

Vibrating Switches SITRANS LVL 200S, LVL 200E. Safety Manual. NAMUR With SIL qualification

Rosemount 2120 Level Switch

ADVANCE M.HAROONI WELCOME TO ADVANCE INSTRUMENT

Transcription:

Reliability of Safety-Critical Systems Chapter 10. Common-Cause Failures - part 1 Mary Ann Lundteigen and Marvin Rausand mary.a.lundteigen@ntnu.no &marvin.rausand@ntnu.no RAMS Group Department of Production and Quality Engineering NTNU (Version 1.0 per July 2015) M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 1 / 31

Reliability of Safety-Critical Systems Slides related to the book Reliability of Safety-Critical Systems Theory and Applications Wiley, 2014 Theory and Applications Marvin Rausand Homepage of the book: http://www.ntnu.edu/ross/ books/sis M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 2 / 31

About this slide series These slides builds on material presented in Chapter 10.1-10.3 of the textbook. M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 3 / 31

Common cause failures (CCFs) in brief Some keywords: One cause, multiple effects Modeling was introduced in the nuclear power industry in the 1970s. Increased attention also in other industries and in key standards like IEC 61508. Figure: NUREG/CR-4780: a foundation stone M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 4 / 31

CCFs as threats to safety CCFs are a particular concern in relation to safety barriers Hazards or Threats Safety barriers Victim/ Vulnerable target CCFs may lead to failure of one safety barrier, OR simultaneous failure of several safety barriers M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 5 / 31

Potential effects of CCFs Production? stopped Yes Detected & isolated (primary)? Detected and isolated (secondary)? No Yes cause No Yes No e.g., PSD Logic solver e.g., ESD Logic solver Input elements Final elements Input elements Final elements cause M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 6 / 31

Dependent failures Several types of dependent failures A CCF belongs to the category of dependent failures. Other categories of dependent failures are: Common mode failures (CMFs), which are a subcategory of CCFs, Cascading failures. Systematic failures may in some cases be or be the cause of CCFs. M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 7 / 31

Dependent failures Cascading failures Cascading failures: A sequence of item failures where the first failure shifts its load to one or more nearby items such that these fail and again shift their load to other item, and so on. Cascading failures are sometimes referred to as a Domino effect. M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 8 / 31

Definition of CCFs Single point of failure and CCFs Single point of failure: A single failure that result in the fail to function of other items and functions. Discussed in an article by Gentile and Summers (2006) Single point of failures may be loss of power supply, routers, communication links, logic solvers, etc. M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 9 / 31

Definitions of CCFs Single point of failure and CCFs Figure: From Gentil and Summers (2006), DOI10.1002/prs.10145 M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 10 / 31

Dependent failures Statistical properties Consider two items, 1 and 2, and let E i denote the event that item i is in a failed state. The probability that both items are in a failed state is Pr(E 1 E 2 ) = Pr(E 1 E 2 ) Pr(E 2 ) = Pr(E 2 E 1 ) Pr(E 1 ) The two events, E 1 and E 2 are said to be statistically independent if Pr(E 1 E 2 ) = Pr(E 1 ) and Pr(E 2 E 1 ) = Pr(E 2 ) such that Pr(E 1 E 2 ) = Pr(E 1 ) Pr(E 2 ) Note that when E 1 E 2 =, then Pr(E 1 E 2 ) = 0 and Pr(E 1 E 2 ) = 0. A set of events cannot be both mutually exclusive and independent. M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 11 / 31

Dependent failures Statistical properties Two items, 1 and 2, are dependent when Pr(E 1 E 2 ) Pr(E 1 ) and Pr(E 2 E 1 ) Pr(E 2 ) Items 1 and 2 are said to have a positive dependence when Pr(E 1 E 2 ) > Pr(E 1 ) and Pr(E 2 E 1 ) > Pr(E 2 ), such that Pr(E 1 E 2 ) > Pr(E 1 ) Pr(E 2 ) Items 1 and 2 are said to have a negative dependence when Pr(E 1 E 2 ) < Pr(E 1 ) and Pr(E 2 E 1 ) < Pr(E 2 ) Pr(E 1 E 2 ) < Pr(E 1 ) Pr(E 2 ) where E i is the event that item i is in a failed state. A CCF represents a positive dependence. M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 12 / 31

Dependent failures Positive or negative There are two types of dependencies: Positive dependence is usually most relevant in reliability and risk analyses. Negative dependency may also be relevant in some cases. Example of a negative dependency Consider two items that influence each other by producing vibration or heat. When one item fails and is down for repair, the other item will have an improved operating environment, and its probability of failure is reduced. M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 13 / 31

Dependent failures Intrinsic or extrinsic NUREG/CR-6268 defines two additional types of dependencies, intrinsic and extrinsic. Intrinsic dependency: A situation where the functional status of a component is affected by the functional status of other components. Sub-classes: Functional requirement dependency Functional input dependency Cascading failure Remark: NUREG/CR-6268 is a successor of NUREG/CR-4780. M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 14 / 31

Dependent failures Intrinsic or extrinsic Extrinsic dependency: A situation where the dependency or coupling is not inherent or intended in the functional characteristics of the system. Extrinsic dependencies may be related to: Physical or environment stresses. Human intervention M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 15 / 31

Definition of CCFs One unique interpretation? Did you think that CCF has one, unique interpretation? M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 16 / 31

Definition of CCFs Smith and Watson The definition of Smith and Watson (1980) is perhaps the most comprehensive one: 1. The items affected are unable to perform as required 2. Multiple failures exist within (but not limited to) redundant configurations 3. The failures are first-in-line type of failures and not the result of cascading failures 4. The failures occur within a defined critical time period (e.g., the time a plane is in the air during a flight) 5. The failures are due to a single underlying defect or physical phenomenon (the common cause ) 6. The effect of failures must lead to some major disabling of the system s ability to perfor as required M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 17 / 31

Definition of CCFs Reflections Demanding environment Consider a system of m gas detectors that are installed at an offshore facility. The performance of the detector may be affected by high salty humidity.the exposure will normally not normally result in failure of detectors at the same time. The time between detector failures may be rather long, and the question may then be: Is it or is it not a CCF? Cheep and unreliable You have gotten the honorable task to buy new light bulbs for a large room. You buy some cheep ones to save money, and after some time they start to fail one after the other. Is the multiple bulb failures a CCF, or just the consequence of having bought light bulbs with a shorter life? M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 18 / 31

Definition of CCFs Various industry sectors Nuclear industry (NEA, 2004): A dependent failure in which two or more component fault states exist simultaneously or within a short time interval, and are a direct result of a shared cause. Space industry (NASA PRA guide, 2002): The failure (or unavailable state) of more than one component due to a shared cause during the system mission. Process industry (IEC 61511, 2003): Failure, which is the result of one or more events, causing failures of two or more separate channels in a multiple channel system, leading to system failure. M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 19 / 31

Definition of CCFs Other examples Lundteigen and Rausand (2007) - related to safety-instrumented systems: 1. The CCF event comprises complete failures of two or more redundant components or two or more safety instrumented functions (SIFs) due to a shared cause 2. The multiple failures occur within the same inspection or function test interval 3. The CCF event may lead to failure of a single SIF or loss of several SIFs M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 20 / 31

Attributes Two types There are two types of CCFs: CCFs that occur at the same time due to a shock, or CCFs that occur over a certain time interval due to an increase in stress. For the last type, it is key to define what is meant by time interval. M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 21 / 31

Attributes Root causes and coupling factors The shared cause of a CCF may be split into two elements: the root cause and the coupling factor. Root cause: Why did the component fail? (i.e., looking at each affected component) Coupling factor: Why were several components affected? (i.e., looking at the relationships between the affected components) E 1 Root causes Coupling factors E 2 E 3 M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 22 / 31

Attributes Root causes and coupling factors Root cause: Most basic cause of item failure that, if corrected, would prevent recurrence of this and similar failures. Coupling factor: Property that makes multiple items susceptible to the same root cause. A coupling factor is also called a coupling mechanism. M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 23 / 31

Attributes Types of root causes We may distinguish between pre-operational and operational causes: Pre-operational root causes: Design, manufacturing, construction, installation, and commissioning errors. Operational root causes: Operation and maintenance-related: Inadequate maintenance and operational procedures, execution, competence and scheduling Environmental stresses: Internal and external exposure outside the design envelope or energetic events such as earthquake, fire, flooding. M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 24 / 31

Attributes Types of coupling factors To look for coupling factors is the same as to look for similarities... Same design (principles) Same hardware Same function Same software Same installation staff Same maintenance and operational staff Same procedures Same system/item interface Same environment Same (physical) location M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 25 / 31

Visualization Same environment Same hardware/software Same physical l ocation Design & Engineering errors Safety valve Safety valve Single point of O&M errors Environmental failure stresses Same staff Same installation/ tie-in/interface Same procedures M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 26 / 31

Reflection The distinction between independent failures and CCF is not always so clear: Assume that you buy some low price light bulbs. After a short period we assume that most of them has failed. Is this a CCF (cheap bulbs of same brand), or is the main explanation just that each bulb is not very reliable and has a higher failure rate. Some sensors are located on an offshore facility (topside) and are exposed to ocean weather (high humidity, salty water droplets). After a couple of years some of the sensors fail. Is this due to CCF, or is it the excessive environment that has resulted in an increased failure rate? M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 27 / 31

Defense measures Nevertheless, it is important to reduce the chance of having CCFs, as the effect of introducing redundancy may significantly reduced in lack of such measures. Possible defenses are: Increase separation and/or segregation: Two sensors may send their signal to two separate input cards of the logic solver. Cables for very critical systems, like from uninteruptable power supply, may be routed through separate locations Diversity: A vessel may be equipped with different level measurement types: ultrasonic or radar and magnetic float. M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 28 / 31

Defense measures Possible defenses are (continued): Increase robustness: Mount sensors inside weather protection cabinets Mount a fan in a cabinet to reduce temperature Increase inherent reliability of each channel/element All measures taken to increase the inherent reliability is efficient for reducing independent as well as CCFs. Keep design simple: Complex systems that are difficult to comprehend are more subject to systematic faults. If the faults are due to misunderstanding, it is more likely that the fault will be replicated for other equipment. M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 29 / 31

Defense measures Possible defenses are (continued): Analysis: Analysis is a way to reveal systematic faults, often those that are not evident without detailed investigation. Analysis may reveal if the systematic faults have been introduced for several equipment, or if they can result in a single point of failure. Procedures and human interface: Also this measure is related to the possible avoidance of systematic faults. Competence and training: Also this measure is related to the possible avoidance of systematic faults. M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 30 / 31

Defense measures Possible defenses are (continued): Environmental control Same meaning as increasing robustness, but the main focus is on robustness against environmental exposure. Diagnostics and coverage: All faults revealed by diagnostics can be acted upon immediately or otherwise managed by compensating measures. This means that it is likely that the effect of a subsequent failure will be reduced. M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 31 / 31