Reliability Risk Management Concept Models August 2012 g Earl Shockley, Senior Director of Reliability Risk Management
Latent organizational weaknesses and conditions Deficiencies in Programmatic barriers and defenses deficiencies Human Error 2 RELIABILITY ACCOUNTABILITY
What is an Event? An unwanted, undesirable change in the state of plants, systems, or components that leads to undesirable consequences to the safe and reliable operation of the plant or system Often driven by (Risk Clusters) Programmatic deficiencies Deficiencies in barriers and defenses Latent organizational weaknesses and conditions Errors in human performance and contextual factors Equipment design and/or maintenance issues 3 RELIABILITY ACCOUNTABILITY
Event Categorization Prioritizes EA based on risk and significance, response is systematic and the depth of analysis increases as the category rises. Cat 4/5 Cat 2/3 Cat 1 Loss of large amounts of load or generation 5,000 10,000 mw Large Unintended System Separations and Islanding Loss of a generation stations, loss of small to medium amounts of load 100MW >, Unintended System Separations and Islanding of 1000 MW 10,000 MW Unintended Loss of bulk power elements (Gen, transmission components, as designed, or controlled separations R e s p o n s e 4 RELIABILITY ACCOUNTABILITY
Events by Category 0 200 400 600 800 1000 1200 Cat 5 Cat 4 1395 occurrences recorded between October 25, 2010 and July 25, 2012 221 occurrences since the EA Process was put in place (21 February 2012) 289 events qualified (Cat 1 5) for review since 25 October 2010 (EA Field Trial start) 42 events since the EA Process was put in place (21 February 2012) Cat 3 Cat 2 Cat 1 Cat 0 Cat 0 Cat 1 Cat 2 Cat 3 Cat 4 Cat 5 Totals 1106 198 78 10 3 0 5 RELIABILITY ACCOUNTABILITY
Reliability Risk Management Concepts Severity Avoid Learn and Reduce Inverse Cost Benefit Reporting Threshold Trend lower tiered events identify emerging reliability risk trends. 6 RELIABILITY ACCOUNTABILITY
Drifting to Failure Concept* Hi Expectations: Desired approach to work (as imagined) Normal Practices: Work as actually performed Managements Stated Expectations liability Real Margin for Error Drift Error Normal Practice Re RISK CLUSTERS Programmatic deficiencies, deficiencies in barriers and defenses, Latent organizational weaknesses and conditions Errors in human performance and contextual factors Equipment design and/or maintenance issues Latent Errors unnoticed at the time made; often deeply or embedded within system. Lo Time * Adapted from Muschara Error Management Consulting, LLC 7 RELIABILITY ACCOUNTABILITY
Defenses in Depth But it is possible that under the wrong set of circumstances, an event could occur. Defense 4 Defense 2 Defense 3 Defense 1 Hazard Event 8 RELIABILITY ACCOUNTABILITY
Cause Code Definitions Short Title Design/Engineering Problem Equipment/Material Problem Individual Human Performance LTA Management Problem Communications LTA Other Problem Definition An event or condition that can be traced to a defect in design or other factors related to configuration, engineering, layout, tolerances, calculations, etc. Is defined as an event or condition resulting from the failure, malfunction, or deterioration of equipment or parts, including instruments or material. An event or condition resulting from the failure, malfunction, or deterioration of the individual human performance associated with the process. An event or condition that could be directly traced to managerial actions, or methodology (or lack thereof). Inadequate presentation or exchange of information. The problem was caused by factors beyond the control of the organization LTA = Less Than Adequate 9 RELIABILITY ACCOUNTABILITY
Root Cause determinations A Level Cause Code (of 127 Total "Qualified" events with CC "entered") 9% Design/Engineering Problem 37% 37% of the reports did not contain sufficient information to determine causal factors. 20% 3% Equipment/Material Problem Individual Human Performance LTA Management Problem Communication LTA Other Problem 2% 6% 2% 22% No Causes Found Information to determine cause LTA 10 RELIABILITY ACCOUNTABILITY
Identified Root Causes Identified Root Causes (80 events) (80 events) 4% 9% 14% Design/Engineering Problem Equipment/Material Problem See Deeper dive Chart See Deeper dive Chart Individual Human Performance LTA Management Problem 35% 30% Communication LTA Other Problem 5% Root cause for 80 events. 11 RELIABILITY ACCOUNTABILITY
Deeper Dive into Management 7 6 5 "Management Problem" Cause Factors A4B3C08 = Job Scoping did not identifyspecial circumstances or conditions A4B5C04 = Risks/consequences associated with change not adequately reviewed A4B1C04 = Managementfollo follow up didnotidentif identify problems A4B1C05 = Management assessment did not determine cause of previously event or known problem A4B1C06 = Previous Industry or in house experience was not effectively used to prevent recurrence A4B5C05 = System interactions not considered 4 3 2 1 0 A4B3C08 A4B5C04 A4B1C04 A4B1C05 A4B1C06 A4B5C05 A4B1C03 A4B1C08 A4B1C09 A4B3C09 A4B5C02 A4B5C03 A4 12 RELIABILITY ACCOUNTABILITY
10 Deeper Dive into Equipment "Equipment/Material Problem" Cause Factors A2B6C01: Defective or failed part 9 A2B6C07: Software failure A2B3C03: Post-maintenance/post-modification Testing LTA 8 A2B6C04: End-of-life failure A2B6C06: Contaminant A2B5C02: Fabricated item did not meet requirements 7 A2B3C02: Inspection/testing LTA A2B5C04: Product acceptance requirements LTA 6 5 4 3 2 1 0 A2B6C01 A2B6C07 A2B3C03 A2B6C04 A2B6C06 A2B5C02 A2B3C02 A2B5C04 13 RELIABILITY ACCOUNTABILITY
Defenses in Depth But it is possible that under the wrong set of circumstances, an event could occur. Defense 4 Defense 2 Defense 3 Defense 1 Hazard Event 14 RELIABILITY ACCOUNTABILITY
Fil Failure Mode: Current Risk - Cause Analysis Deficiencies 75 % of event analysis reports stop at the mode The manner whereby the failure is observed Failure Mechanism: Physical, chemical or other processes that led to the failure Error Mode: The manner whereby the Error is observed Error Mechanism: Human actions along the skills, rules, knowledge, continuum (SRK). 15 RELIABILITY ACCOUNTABILITY
Co-Regulation Concept: Sharing Responsibility with Industry Accountability Electric Reliability Organization Industry Collaborative Problem Solving Risk Identification 16 RELIABILITY ACCOUNTABILITY
Questions? 17 RELIABILITY ACCOUNTABILITY