Reliability of Safety-Critical Systems Chapter 3. Failures and Failure Analysis

Similar documents
Reliability of Safety-Critical Systems Chapter 4. Testing and Maintenance

Reliability of Safety-Critical Systems Chapter 10. Common-Cause Failures - part 1

Reliability of Safety-Critical Systems 5.1 Reliability Quantification with FTs

Solenoid Valves used in Safety Instrumented Systems

Pneumatic QEV. SIL Safety Manual SIL SM Compiled By : G. Elliott, Date: 8/19/2015. Innovative and Reliable Valve & Pump Solutions

FP15 Interface Valve. SIL Safety Manual. SIL SM.018 Rev 1. Compiled By : G. Elliott, Date: 30/10/2017. Innovative and Reliable Valve & Pump Solutions

Eutectic Plug Valve. SIL Safety Manual. SIL SM.015 Rev 0. Compiled By : G. Elliott, Date: 19/10/2016. Innovative and Reliable Valve & Pump Solutions

Solenoid Valves For Gas Service FP02G & FP05G

Safety Manual VEGAVIB series 60

Bespoke Hydraulic Manifold Assembly

Safety Manual VEGAVIB series 60

DeZURIK. KGC Cast Knife Gate Valve. Safety Manual

SPR - Pneumatic Spool Valve

Failure Modes, Effects and Diagnostic Analysis

Reliability Analysis Including External Failures for Low Demand Marine Systems

Safety Manual OPTISWITCH series relay (DPDT)

Hydraulic (Subsea) Shuttle Valves

Every things under control High-Integrity Pressure Protection System (HIPPS)

DeZURIK Double Block & Bleed (DBB) Knife Gate Valve Safety Manual

DeZURIK. KSV Knife Gate Valve. Safety Manual

RESILIENT SEATED BUTTERFLY VALVES FUNCTIONAL SAFETY MANUAL

This manual provides necessary requirements for meeting the IEC or IEC functional safety standards.

Safety manual for Fisher GX Control Valve and Actuator

Safety Manual. Process pressure transmitter IPT-1* 4 20 ma/hart. Process pressure transmitter IPT-1*

Section 1: Multiple Choice

TRI LOK SAFETY MANUAL TRI LOK TRIPLE OFFSET BUTTERFLY VALVE. The High Performance Company

Failure Modes, Effects and Diagnostic Analysis

Failure Modes, Effects and Diagnostic Analysis. Rosemount Inc. Chanhassen, MN USA

AUTHOR(S) CLIENT(S) Multiclient - PDS Forum CLASS. THIS PAGE ISBN PROJECT NO. NO. OF PAGES/APPENDICES

Neles ValvGuard VG9000H Rev 2.0. Safety Manual

Failure Modes, Effects and Diagnostic Analysis. Rosemount Inc. Chanhassen, Minnesota USA

Understanding the How, Why, and What of a Safety Integrity Level (SIL)

Failure Modes, Effects and Diagnostic Analysis

SIL explained. Understanding the use of valve actuators in SIL rated safety instrumented systems ACTUATION

Failure Modes, Effects and Diagnostic Analysis

Achieving Compliance in Hardware Fault Tolerance

Failure Modes, Effects and Diagnostic Analysis

Failure Modes, Effects and Diagnostic Analysis

Failure Modes, Effects and Diagnostic Analysis

The Key Variables Needed for PFDavg Calculation

Ultima. X Series Gas Monitor

Failure Modes, Effects and Diagnostic Analysis

Neles trunnion mounted ball valve Series D Rev. 2. Safety Manual

SIL Safety Manual. ULTRAMAT 6 Gas Analyzer for the Determination of IR-Absorbing Gases. Supplement to instruction manual ULTRAMAT 6 and OXYMAT 6

Section 1: Multiple Choice Explained EXAMPLE

Failure Modes, Effects and Diagnostic Analysis

Failure Modes, Effects and Diagnostic Analysis

Implementing IEC Standards for Safety Instrumented Systems

Safety Manual VEGASWING 61, 63. NAMUR With SIL qualification. Document ID: 52084

Failure Modes, Effects and Diagnostic Analysis

Vibrating Switches SITRANS LVL 200S, LVL 200E. Safety Manual. NAMUR With SIL qualification

Failure Modes, Effects and Diagnostic Analysis

Understanding safety life cycles

Explaining the Differences in Mechanical Failure Rates: exida FMEDA Predictions and OREDA Estimations

High performance disc valves Series Type BA, BK, BW, BM, BN, BO, BE, BH Rev Safety Manual

High Integrity Pressure Protection Systems HIPPS

Failure Modes, Effects and Diagnostic Analysis

Jamesbury Pneumatic Rack and Pinion Actuator

Rosemount 2130 Level Switch

SIL Safety Manual for Fisherr ED, ES, ET, EZ, HP, or HPA Valves with 657 / 667 Actuator

L&T Valves Limited SAFETY INTEGRITY LEVEL (SIL) VERIFICATION FOR HIGH INTEGRITY PRESSURE PROTECTION SYSTEM (HIPPS) Report No.

Safety Critical Systems

Commissioning and safety manual

EL-O-Matic E and P Series Pneumatic Actuator SIL Safety Manual

THE IMPROVEMENT OF SIL CALCULATION METHODOLOGY. Jinhyung Park 1 II. THE SIL CALCULATION METHODOLOGY ON IEC61508 AND SOME ARGUMENT

REASSESSING FAILURE RATES

New Thinking in Control Reliability

Continuous Gas Analysis. ULTRAMAT 6, OXYMAT 6 Safety Manual. Introduction 1. General description of functional safety 2

Functional safety. Functional safety of Programmable systems, devices & components: Requirements from global & national standards

PL estimation acc. to EN ISO

Special Documentation Proline Promass 80, 83

H250 M9 Supplementary instructions

Session One: A Practical Approach to Managing Safety Critical Equipment and Systems in Process Plants

Safety-critical systems: Basic definitions

Session: 14 SIL or PL? What is the difference?

Valve Communication Solutions. Safety instrumented systems

Proof Testing A key performance indicator for designers and end users of Safety Instrumented Systems

DETERMINATION OF SAFETY REQUIREMENTS FOR SAFETY- RELATED PROTECTION AND CONTROL SYSTEMS - IEC 61508

Using what we have. Sherman Eagles SoftwareCPR.

PROCESS AUTOMATION SIL. Manual Safety Integrity Level. Edition 2005 IEC 61508/61511

A study on the relation between safety analysis process and system engineering process of train control system

Safety Integrity Verification and Validation of a High Integrity Pressure Protection System (HIPPS) to IEC 61511

YT-3300 / 3301 / 3302 / 3303 / 3350 / 3400 /

Advanced Test Equipment Rentals ATEC (2832) OMS 600

Analysis of Instrumentation Failure Data

Rosemount 2120 Level Switch

A quantitative software testing method for hardware and software integrated systems in safety critical applications

PROCEDURE. April 20, TOP dated 11/1/88

DATA ITEM DESCRIPTION Title: Failure Modes, Effects, and Criticality Analysis Report

Advanced LOPA Topics

2600T Series Pressure Transmitters Plugged Impulse Line Detection Diagnostic. Pressure Measurement Engineered solutions for all applications

Partial Stroke Testing. A.F.M. Prins

CHANGE HISTORY DISTRIBUTION LIST

Failure Modes, Effects, and Diagnostic Analysis of a Safety Device

PREDICTING HEALTH OF FINAL CONTROL ELEMENT OF SAFETY INSTRUMENTED SYSTEM BY DIGITAL VALVE CONTROLLER

COMPLIANCE with IEC EN and IEC EN 61511

Transmitter mod. TR-A/V. SIL Safety Report

Lecture 04 ( ) Hazard Analysis. Systeme hoher Qualität und Sicherheit Universität Bremen WS 2015/2016

Accelerometer mod. TA18-S. SIL Safety Report

YT-300 / 305 / 310 / 315 / 320 / 325 Series

Transcription:

Reliability of Safety-Critical Systems Chapter 3. Failures and Failure Analysis Mary Ann Lundteigen and Marvin Rausand mary.a.lundteigen@ntnu.no RAMS Group Department of Production and Quality Engineering NTNU (Version 1.1 per July 2015) Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 1 / 29

Reliability of Safety-Critical Systems Slides related to the book Reliability of Safety-Critical Systems Theory and Applications Wiley, 2014 Theory and Applications Marvin Rausand Homepage of the book: http://www.ntnu.edu/ross/ books/sis Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 2 / 29

Learning objectives To become familiar with key terms and concepts related to failures and failure classification To become familiar with different ways of failure classification strategies, using the following sources as basis: IEC 61508 (and IEC 61511) The PDS method OREDA To become aware of some typical SIS related failures Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 3 / 29

Definition of failure Failure: The termination of the ability of an item to perform a required function. [IEV 191-04-01] A failure is always related to a required function. The function is often specified together with a performance requirement. 1 A failure occurs when the function cannot be performed or has a performance that falls outside the performance requirement. Shutdown valve According to the performance requirement, the maximum closing time of a shutdown valve shall be no longer than 15 seconds. A failure of the closing function occurs when the closing time exceeds 15 seconds. 1 Also called a functional requirement Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 4 / 29

Failure attributes A failure is an event that occurs at a specific point in time. A failure may: Develop gradually Occur as a sudden event The failure may sometimes be revealed: On demand (i.e., when the function is needed) ( hidden ) During a functional test (also hidden ) By monitoring or diagnostics ( evident ) Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 5 / 29

Fault Fault: The state of an item characterized by inability to perform a required function [IEV 191-05-01] While a failure is an event that occurs at a specific point in time, a fault is a state that will last for a shorter or longer period. In most cases, an item will have a fault after a hardware failure has occurred and we say that the item is in a failed state. Design and installation errors may also prevent the item from performing its required function. The item has a fault that is not preceded by any hardware failure and we call this fault a systematic fault. Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 6 / 29

Error Error: Discrepancy between a computed, observed, or measured value or condition and the true, specified, or theoretically correct value or condition. [IEC 191-05-24]. An error is present when the performance of a function deviates from the target performance (i.e., the theoretically correct performance), but still satisfies the performance requirement. An error will often, but not always, develop into a failure. Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 7 / 29

Relationship failure, fault and error A failure may originate from an error. When the failure occurs, the item enters a fault state. Performance Actual performance Error Failure (event) Fault (state) Target value Acceptable deviation Time Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 8 / 29

Failure Mode definition Failure mode: The way a failure is observed on a failed item. [IEC 191-05-22] A failure mode is the way in which an item could fail to perform its required function. An item can fail in many different ways a failure mode is a description of a possible state of the item after it has failed. Pump Performance requirement: The pump must provide an output between 100 and 110 liters per minute. Associated failure modes may be: No output Too low output Too high output Too much fluctuation in output Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 9 / 29

Failure Mode attributes A failure mode is always related to a required function and the associated performance requirement. A failure mode is description of a fault (i.e., a state) and not of a failure (i.e., an event). A more correct term would therefore be fault mode 2 Some data sources list, for example, corrosion as a failure mode. This is wrong. Corrosion is a failure mechanism and may be a cause of a failure mode. It is, however, not a description of a lost function. 2 This term is used in IEC 60300, but failure mode is so common that a change of the term might confuse the users. Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 10 / 29

Classification of Failures Failures may be classified according to their: Causes: To avoid future occurrences and make judgments about repair Effects: To rank between critical and not so critical failures Detectability: To distinguish failures that may be revealed automatically (and shortly after their occurrence) and those that may be hidden until special effort is taken, such as proof tests. And several other criteria. Special category: Common-cause failures (CCFs) Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 11 / 29

Failure Classification in IEC 61508 IEC 61508 classify failures according to their: Causes: Effects: Random (hardware) faults Systematic faults (including software faults) Safe failures Dangerous failures Detectability: Detected - revealed by online diagnostics Undetected - revealed by functional tests or upon a real demand for activation Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 12 / 29

Random and systematic faults Random (hardware) faults: Failures resulting from the natural degradation (i.e. degradation within the design envelope) mechanisms of the item. Some additional random failure causes may be added: e.g, some types of human errors (ref. ISO-TR 12489) Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 13 / 29

Random and systematic faults Systematic faults (non-physical causes): Faults that can be related to a particular cause other than natural degradation and foreseen stressors. Systematic faults may be due to errors made during specification, design, operation and maintenance phases of the lifecycle. Each systematic failure may be regarded as one of a kind: can normally be eliminated by a modification, either of the design or manufacturing process, the testing and operating procedures, the training of the operators or documentation. Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 14 / 29

Safe and dangerous faults Safe (S) failure: The item may operate without any demand (PDS method handbook) Faiure which does not have the potential to put the safety-related system in a hazardous or fail-to function state (IEC 61508, part 4). or a Dangerous (D) failure: The item does not operate upon a demand (PDS method handbook) Failure which has the potential to put the safety-related system in a hazardous or fail-to function state (IEC 61508, part 4). Non-critical failure (or no-part/no-effect faults): Failures where the main functions of the item are not affected (adapted from IEC 61508-4). Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 15 / 29

Detected and undetected faults Safe and dangerous failures may be classified further as either detected (by diagnostics) or undetected (not detected by diagnostics): Safe undetected (SU): A spurious closure of a valve usually falls into this category Safe detected (SD): A non-critical alarm from a sensor may fall into this category Dangerous detected (DD): A critical alarm from a detector, such as out of range or loss of signal, may fall into this category. As long as the fault is not corrected, the item will remain unable to carry out the function. Dangerous undetected (DU): A failure that is hidden and which will prevent the execution of a safety-critical failure. A shutdown valve that is stuck in open position may be such an example. Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 16 / 29

Other failure classifications: PDS method The PDS method (www.sintef.no/pds) distinguishes between: Critical failures Dangerous failures: detected and undetected Safe detected and safe undetected (spurious) failures Non-critical failures Total λ Critical λ Crit Undetected Detected λ DU λ SU λ DD λ SD Taken into account for SFF calculations λ NONC Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 17 / 29

Other failure classifications: PDS method Failure Random hardware failure Systematic failure Aging failure - random failures due to natural (and foreseen) stressors Software failure - Inadequate specificaton - Programming error - Error during software update Installation failure - Gas detector capsulating left on after commisioning - Valve installed in wrong direction - Incorrect sensor location Operational failure - Valve left in wrong position - Sensor calibration failure - Detector in bypass mode Source: PDS method handbook (2009) Design related failure - Inadequate specificaton - Inadequate implementation Excessive stress failure - Excessive vibration - Unforeseen sand prod. - Too high temperature Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 18 / 29

Examples of Systematic Faults Systematic faults may be: Software faults: Programming errors, compilation errors, inadequate testing, unforeseen application conditions, change of system parameters, etc. Design related faults: Faults (other than software faults) introduced during the design phase of the equipment. It may be a fault in the system specification itself, a fault in the manufacturing process and/or in the quality assurance of the item. Installation faults: Faults introduced during the last phases prior to operation, i.e., during installation or commissioning. If detected, such faults are typically removed during the first months of operation and such faults are therefore often excluded from data bases. Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 19 / 29

Examples of Systematic Faults Systematic faults may be (continued): Excessive stress: Failures that occur from stresses beyond the design specification are placed upon the component. The excessive stresses may be caused either by external causes or by internal influences from the medium. Operational errors: Initiated by human errors during operation or maintenance/testing Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 20 / 29

Failure Classification Illustration Causes Effects Design errors Human interaction errors Systematic fault Safe Dangerous Usually not quantified* Detected Undetected Detected Undetected Environmental stresses Normal degradation *See ISO TR 12489 Random hardware failures Safe Dangerous l SD l SU l DD l DU Detected Undetected Detected Undetected *) Some systematic failures may be included in historical databases. Some systematic failures may also be catered for in the CCF rate. Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 21 / 29

PDS method vs. in IEC 61508 Failure Dangerous (D) Safe (S) IEC 61508 Dangerous detected (DD) Dangerous undetected (DD) Safe (spurious) No part No effect PDS Dangerous detected (DD) Dangerous undetected (DD) Safe undetected (spurious) Safe detected Non-critical (NONC) Dangerous (D) Safe (S) Failure Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 22 / 29

Failure classification in OREDA Critical, degraded, and incipient (further explained on a separate slide) Failure mechanisms: Pre-defined categories of immidiate failure causes defined for each maintainable item Boundary conditions, maintainable items: Maintainable items: Repairable items and subsystems for which failures are recorded and data are presented Boundary conditions: Assumptions made about maintainable item, including boundaries for what to consider as part of the maintainable item and what is not. Ref: ISO 14224 and OREDA handbooks Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 23 / 29

Failure classification in OREDA Critical failure: A failure of an item that causes immidiate cessation of its ability to perform a required function. In this case, its ability comprises two elements: Loss of ability to function on demand (safety-related) Loss of ability to maintain production (production-availability related) This means that critical failures usually include what IEC 61508 defines as DU, DD, and SU failures. Degraded failure: A partial failure where the item has a degraded performance, but is still able to perform i s essential functions Incipient failure: Also a partial failure, but its degradation is barely noticable and can be regarded as a very early symptom of a degradation under development Ref: ISO 14224 and OREDA handbooks Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 24 / 29

Definition of CCF Common cause failure: A dependent failure in which two or more component fault states exist simultaneously or within a short time interval, and are a direct result of a shared cause. Example: Two pressure transmitters in a SIF have failed due to a calibration error. Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 25 / 29

What does reliability data include? Reliability data supplied by manufacturers are often based on random hardware faults alone, while data sources based on industry experience may include both. It is sometimes argued that systematic faults should be excluded from the input data used for reliability quantification, as they do not (necessarily) share the properties of random hardware faults. A random hardware fault may re-occur, while a systematic fault (ideally) is removed forever once it has been corrected. Many standards on reliability analysis therefore limit reliability quantification to random hardware faults, and rely on procedures, testing, reviews, proper training, and so on to avoid systematic faults. Reflection In what way may the statements above impact the confidence in the results from the reliability calculations? Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 26 / 29

Failure Classification based on IEC 61508 Consider a shutdown valve: Failure mode Fail to close (FTC) Leakage in closed position (LCP) Premature (spurious) closure (PC) Fail to open (FTO) Leakage to environment (LTE) Classification DU DU SU SU SD www.franklinvalve.com Remark: Valves have usually limited diagnostic features, as opposed to sensors/transmitters and logic solvers. Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 27 / 29

FMEDA - a variant of FMECA An Failure modes, Effects, and Diagnostic Analysis (FMEDA) is an extension of an FMECA that is tailured-made for a SIS. FMEDA as a method was developed by the company Exida Is in principle, very similar to an FMECA, and a FMECA-like table is used Focus is placed on (and columns in the table are allocated to) the classification of each failure mode into DU, DD, SU or SD Failiure rates can be estimated for each failure category with basis in the classification and the overall failure rate of the item Also proof test coverage may be considered The approach can supplement manufacturers calculations of failure rates, and specific measures like the safe failure fraction (SFF) and diagnostic coverage factor (DC) More information is available from the book Safety instrumented systems verification, by William M. Goble and Harry Cheddie. Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 28 / 29

Possible extensions Example Reference: Goble, W.M. and Brombacher, A. Using a failure modes, effects and diagnosis analysis (FMEDA) to mesure diagnostic coverage in programmable electronic systems. DOI:10.1016/S0951-8320(99)00031-9 (Journal of Reliability Engineering and System Safety) Mary Ann Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.1) 29 / 29