Safety Critical Systems

Similar documents
Real-Time & Embedded Systems

18-642: Safety Plan 11/1/ Philip Koopman

Safety-critical systems: Basic definitions

Understanding safety life cycles

Reliability of Safety-Critical Systems Chapter 3. Failures and Failure Analysis

Safety Management in Multidisciplinary Systems. SSRM symposium TA University, 26 October 2011 By Boris Zaets AGENDA

Section 1: Multiple Choice

DATA ITEM DESCRIPTION Title: Failure Modes, Effects, and Criticality Analysis Report

PL estimation acc. to EN ISO

C. Mokkapati 1 A PRACTICAL RISK AND SAFETY ASSESSMENT METHODOLOGY FOR SAFETY- CRITICAL SYSTEMS

Using what we have. Sherman Eagles SoftwareCPR.

Safety-critical systems: Basic definitions

Section 1: Multiple Choice Explained EXAMPLE

(C) Anton Setzer 2003 (except for pictures) A2. Hazard Analysis

Safety-Critical Systems

Solenoid Valves For Gas Service FP02G & FP05G

Safety Manual. Process pressure transmitter IPT-1* 4 20 ma/hart. Process pressure transmitter IPT-1*

Hydraulic (Subsea) Shuttle Valves

A systematic hazard analysis and management process for the concept design phase of an autonomous vessel.

Safety Engineering - Hazard Identification Techniques - M. Jahoda

SIL Safety Manual. ULTRAMAT 6 Gas Analyzer for the Determination of IR-Absorbing Gases. Supplement to instruction manual ULTRAMAT 6 and OXYMAT 6

Valve Communication Solutions. Safety instrumented systems

Every things under control High-Integrity Pressure Protection System (HIPPS)

Pneumatic QEV. SIL Safety Manual SIL SM Compiled By : G. Elliott, Date: 8/19/2015. Innovative and Reliable Valve & Pump Solutions

Partial Stroke Testing. A.F.M. Prins

Functional safety. Functional safety of Programmable systems, devices & components: Requirements from global & national standards

Solenoid Valves used in Safety Instrumented Systems

Bespoke Hydraulic Manifold Assembly

PROCEDURE. April 20, TOP dated 11/1/88

FP15 Interface Valve. SIL Safety Manual. SIL SM.018 Rev 1. Compiled By : G. Elliott, Date: 30/10/2017. Innovative and Reliable Valve & Pump Solutions

THE IMPROVEMENT OF SIL CALCULATION METHODOLOGY. Jinhyung Park 1 II. THE SIL CALCULATION METHODOLOGY ON IEC61508 AND SOME ARGUMENT

Reliability engineering is the study of the causes, distribution and prediction of failure.

Safety Manual OPTISWITCH series relay (DPDT)

New Thinking in Control Reliability

DeZURIK. KGC Cast Knife Gate Valve. Safety Manual

SIL explained. Understanding the use of valve actuators in SIL rated safety instrumented systems ACTUATION

Safety Manual VEGAVIB series 60

EUROPEAN GUIDANCE MATERIAL ON INTEGRITY DEMONSTRATION IN SUPPORT OF CERTIFICATION OF ILS AND MLS SYSTEMS

Safety Manual VEGAVIB series 60

Eutectic Plug Valve. SIL Safety Manual. SIL SM.015 Rev 0. Compiled By : G. Elliott, Date: 19/10/2016. Innovative and Reliable Valve & Pump Solutions

Reliability of Safety-Critical Systems Chapter 4. Testing and Maintenance

SPR - Pneumatic Spool Valve

Implementing IEC Standards for Safety Instrumented Systems

Lecture 04 ( ) Hazard Analysis. Systeme hoher Qualität und Sicherheit Universität Bremen WS 2015/2016

DeZURIK Double Block & Bleed (DBB) Knife Gate Valve Safety Manual

Hazard analysis. István Majzik Budapest University of Technology and Economics Dept. of Measurement and Information Systems

Purpose. Scope. Process flow OPERATING PROCEDURE 07: HAZARD LOG MANAGEMENT

Neles trunnion mounted ball valve Series D Rev. 2. Safety Manual

DeZURIK. KSV Knife Gate Valve. Safety Manual

Safety Manual VEGASWING 61, 63. NAMUR With SIL qualification. Document ID: 52084

Failure modes and models

Implementing Emergency Stop Systems - Safety Considerations & Regulations A PRACTICAL GUIDE V1.0.0

Vibrating Switches SITRANS LVL 200S, LVL 200E. Safety Manual. NAMUR With SIL qualification

D-Case Modeling Guide for Target System

Failure Modes, Effects and Diagnostic Analysis

Software Reliability 1

Session: 14 SIL or PL? What is the difference?

DETERMINATION OF SAFETY REQUIREMENTS FOR SAFETY- RELATED PROTECTION AND CONTROL SYSTEMS - IEC 61508

CT433 - Machine Safety

Marine Risk Assessment

TRI LOK SAFETY MANUAL TRI LOK TRIPLE OFFSET BUTTERFLY VALVE. The High Performance Company

RESILIENT SEATED BUTTERFLY VALVES FUNCTIONAL SAFETY MANUAL

Proof Testing A key performance indicator for designers and end users of Safety Instrumented Systems

PI MODERN RELIABILITY TECHNIQUES OBJECTIVES. 5.1 Describe each of the following reliability assessment techniques by:

PROCESS AUTOMATION SIL. Manual Safety Integrity Level. Edition 2005 IEC 61508/61511

Special Documentation Proline Promass 80, 83

Ultima. X Series Gas Monitor

Failure Modes, Effects and Diagnostic Analysis

Adaptability and Fault Tolerance

YT-3300 / 3301 / 3302 / 3303 / 3350 / 3400 /

L&T Valves Limited SAFETY INTEGRITY LEVEL (SIL) VERIFICATION FOR HIGH INTEGRITY PRESSURE PROTECTION SYSTEM (HIPPS) Report No.

Hazard Identification

Software Safety Hazard Analysis

Accelerometer mod. TA18-S. SIL Safety Report

Achieving Compliance in Hardware Fault Tolerance

Jamesbury Pneumatic Rack and Pinion Actuator

This manual provides necessary requirements for meeting the IEC or IEC functional safety standards.

4. Hazard Analysis. Limitations of Formal Methods. Need for Hazard Analysis. Limitations of Formal Methods

Engr2110. Week 6 Product Safety/Liability

Transmitter mod. TR-A/V. SIL Safety Report

Failure Modes, Effects and Diagnostic Analysis

Hazard Operability Analysis

Introduction to Machine Safety Standards

Recent changes in workplace safety regulations have heightened the awareness of hazards associated with electrical arcs.

EMERGENCY SHUT-DOWN RELIABILITY ADVANTAGE

Safety manual for Fisher GX Control Valve and Actuator

High performance disc valves Series Type BA, BK, BW, BM, BN, BO, BE, BH Rev Safety Manual

The Safety Case. The safety case

FMEA- FA I L U R E M O D E & E F F E C T A N A LY S I S. PRESENTED BY: AJITH FRANCIS

Determining Occurrence in FMEA Using Hazard Function

CHAPTER 4 FMECA METHODOLOGY

YT-300 / 305 / 310 / 315 / 320 / 325 Series

Guidelines on Surveys for Dynamic Positioning System

Understanding the How, Why, and What of a Safety Integrity Level (SIL)

Analysis of Instrumentation Failure Data

Missing no Interaction Using STPA for Identifying Hazardous Interactions of Automated Driving Systems

Managing for Liability Avoidance. (c) Lewis Bass

'Dipartimento di Ingegneria Elettrica, Universita di Genova Via all 'Opera Pia, lla Genova, Italy

Failure Modes, Effects and Diagnostic Analysis. Rosemount Inc. Chanhassen, Minnesota USA

Functional Safety SIL Safety Instrumented Systems in the Process Industry

Transcription:

Safety Critical Systems Mostly from: Douglass, Doing Hard Time, developing Real-Time Systems with UML, Objects, Frameworks And Patterns, Addison-Wesley. ISBN 0-201-49837-5 1

Definitions channel a set of devices (including processors and software running on processor) that handle a related, cohesive set of data control flows from incoming event to ultimate system response. error - (ISO) A discrepancy between a computed, observed, or measured value or condition and the true, specified, or theoretically correct value or condition. error detection - Techniques used to identify errors in data transfers. See: check summation, cyclic redundancy check [CRC], parity check, longitudinal redundancy. error seeding - (IEEE) The process of intentionally adding known faults to those already in a computer program for the purpose of monitoring the rate of detection and removal, and estimating the number of faults remaining in the program. Contrast with mutation analysis. 2

Definitions (cont) fail-safe - (IEEE) A system or component that automatically places itself in a safe operational mode in the event of a failure. failure - (IEEE) The inability of a system or component to perform its required functions within specified performance requirements. failure analysis - Determining the exact nature and location of a program error in order to fix the error, to identify and fix other similar errors, and to initiate corrective action to prevent future occurrences of this type of error. Failure Modes and Effects Analysis [FMEA] (IEC) A method of reliability analysis intended to identify failures, at the basic component level, which have significant consequences affecting the system performance in the application considered. 3

Definitions (cont) Failure Modes and Effects Criticality Analysis [FMECA] (IEC) A logical extension of FMEA which analyzes the severity of the consequences of failure. fault an incorrect step, process, or data definition in a computer program which causes the program to perform in an unintended or unanticipated manner. fault seeding - See: error seeding. Fault Tree Analysis [FTA] (IEC) The identification and analysis of conditions and factors which cause or contribute to the occurrence of a defined undesirable event, usually one which significantly affects system performance, economy, safety or other required characteristics. 4

Definitions (cont) hazard - (DOD) A condition that is prerequisite to a mishap. hazard analysis. A technique used to identify conceivable failures affecting system performance, human safety or other required characteristics. See: FMEA, FMECA, FTA, software hazard analysis, software safety requirements analysis, software safety design analysis, software safety code analysis, software safety test analysis, software safety change analysis. hazard probability - (DOD) The aggregate probability of occurrence of the individual events that create a specific hazard. hazard severity - (DOD) An assessment of the consequence of the worst credible mishap that could be caused by a specific hazard. 5

Definitions (cont) mean time between failures [MTBF]A measure of the reliability of a computer system, equal to average operating time of equipment between failures, as calculated on a statistical basis from the known failure rates of various components of the system. mean time to failure [MTTF] A measure of reliability, giving the average time before the first failure. mean time to repair [MTTR] A measure of reliability of a piece of repairable equipment, giving the average time between repairs. reliability - (IEEE) The ability of a system or component to perform its required functions under stated conditions for a specified period of time. See: software reliability. reliability assessment - (ANSI/IEEE) The process of determining the achieved level of reliability for an existing system or system component. 6

Definitions (cont) risk - (IEEE) A measure of the probability and severity of undesired effects. Often taken as the simple product of probability and consequence. risk assessment - (DOD) A comprehensive evaluation of the risk and its associated impact. Safety - (DOD) Freedom from those conditions that can cause death, injury, occupational illness, or damage to or loss of equipment or property, or damage to the environment. safety critical - (DOD) A term applied to a condition, event, operation, process or item of whose proper recognition, control, performance or tolerance is essential to safe system operation or use; e.g., safety critical function, safety critical path, safety critical component. 7

Definitions (cont) safety critical computer software components - (DOD) Those computer software components and units whose errors can result in a potential hazard, or loss of predictability or control of a system. (computer system) security - (IEEE) The protection of computer hardware and software from accidental or malicious access, use, modification, destruction, or disclosure. Security also pertains to personnel, data, communications, and the physical protection of computer installations. 8

Probability of Failure Failure curve for HW Components Time 9

Failures vrs. Faults Failures are caused by Faults (it is difficult to distinguish between Fault and Error). There are two types of faults: Random Faults faults that occur in a random way, described by a probability function (e.g. a HW component that was working suddenly stops functioning) Systematic Faults faults deriving from a systematic cause, e.g. a design error. All SW errors are systematic faults. 10

Single-Point Failures Devices ought to be save when there are no faults and the device is used properly. Most experts however consider a device safe when any single point failure cannot lead to an incident. 11

Single-Point Failures (cont) <<processor>> Linear Accelerator Set Intensity Start Beam End Beam Beam Intensity Beam Duration UNSAFE <<actuator>> Beam Emitter <<sensor>> Beam Detector 12

Single-Point Failures (cont) Set Intensity Start Beam End Beam <<processor>> Linear Accelerator Set Intensity Start Beam End Beam Beam Intensity Beam Duration Ancillary data (self test, watchdog...) Activate <<processor>> Safety Processor Rise Lower <<actuator>> Beam Emitter <<sensor>> Beam Detector <<switch>> Cut-Off Switch <<actuator>> Mechanical Curtain Off 13

Safety Architectures Single Channel Protected Design (SCPD) In a SCPD architecture, a single channel exists for the control of some process. However, this single channel can be made by applying sufficient hazard control within the channel. E.g. Brake Brake Lever Pedal Sensor Computer Bus Engine 14

Safety Architectures (cont) Homogeneous Redundancy Pattern The HRP architecture use identical channels to increase reliability. All the redundant channels are run in parallel and the output of the channels is compared. If an oo number of channels is used, than a majority wins policy can be implemented that can detect and correct failures in the minority channels. 15

Safety Architectures (cont) Diverse Redundancy Pattern The DRP architecture provides redundant channels that are implemented by different means. This can be achieved in several ways: Different but equal Lightway redundancy Separation of monitoring and actuation Watchdog pattern 16

Safety Architectures (cont) Controller Subsystem Monitor Channel Actuation Channel System Boundary MONITOR ACTUATION PATTERN Des. Result. OK Env. condition Actuation Des. Result. FAULT Env. condition Actuation 17

Safety Architectures (cont) Watchdog Channel Service WATCHDOG PATTERN Service Service Service Timeout Reset 18

Eight Steps to Safety Identify hazards Determine the risks Define the safety measures Create safe requirements Create safe designs Implement safety Assure the safety process Test, test, test 19

Identify the hazards List the possible hazards For each hazard identify the faults leading to it (Fault Tree Analysis) Hazard Fault 1 Fault 2 Event Fault 3 Fault 4 20

Determine the risks Organizations like TUV have defined risk requirement categories that take into account: Severity of the risk Duration of the period of exposure to the risk Prevention of the danger Probability of occurrence of the danger Etc... 21

Define the safety measures A safety measure is a behaviour added to a system to handle hazard. There are several ways to handle a hazard: Obviation making the hazard physically impossible Education educate the users so that they won t cause hazards Alarming announce the hazard to the user so that he can take proper recovery actions Interlocks the hazard is removed by making some other devices, logic, intervene when the hazard presents itself Internal checking the hazard is prevented form happening because the system is able to detect a malfunction prior to an incident Safety equipment gloves, goggles, etc... Restriction of access let only knowledgeable users access the system Labelling -... Etc.. 22

Create safe requirements Safety has to be introduced at requirements level. 23

Create safe designs Work form safe requirements Adopt a fundamentally safe architecture Periodically, during design, revisit the hazard analysis to add hazards due to faults specific to design Select programming in the small measures that provide the appropriate level of detection and correction Ensure that independent channels truly lack common modes failures Adopt a consistent and appropriate set of strategies for handling faults once they are identified Build in power on-on and periodic run-time tests to identify latent faults. 24

Implement safety Language selection Subset selection Use the tools (i.e. Methodologies, programming languages, etc..) in a selected, predefined and constrained way. 25

Process / Test Make sure safety is a concern in all phases of your development process. Test / Test / Test 26