Paper Presentation: COMPLETE AND SIMULTANEOUS DCS FAILURE IN TWO 500MW UNITS

Similar documents
COMPRESSOR PACK SUCTION CONTROLLER TYPE: LP41x

Product Overview. Product Description CHAPTER

FUEL GAS FIRING CONTROL RJ (Dick) Perry Safety Systems Consultant 6 June 2016

Industrial Compressor Controls Standard Custom

Every things under control High-Integrity Pressure Protection System (HIPPS)

EMERGENCY SHUT-DOWN RELIABILITY ADVANTAGE

Pressure switch Type BCP

FLUID POWER FLUID POWER EQUIPMENT TUTORIAL OTHER FLUID POWER VALVES. This work covers part of outcome 2 of the Edexcel standard module:

1 STARTUP Overview Buttons Basics of Control Resetting Faults and Alarms OPERATION...

THE CANDU 9 DISTRffiUTED CONTROL SYSTEM DESIGN PROCESS

Implementing IEC Standards for Safety Instrumented Systems

Standard Chiller 1/4 compressors Manual version: /10/96

Inert Air (N2) Systems Manual

Solenoid Valves used in Safety Instrumented Systems

Instrumented Safety Systems

Safe management of industrial steam and hot water boilers A guide for owners, managers and supervisors of boilers, boiler houses and boiler plant

Valve Communication Solutions. Safety instrumented systems

AUTOMATIC GAS MANIFOLDS

Installation, Operation and Maintenance Instructions for Electronically Controlled Pressurisation Units

Training Fees 3,400 US$ per participant for Public Training includes Materials/Handouts, tea/coffee breaks, refreshments & Buffet Lunch.

SIL explained. Understanding the use of valve actuators in SIL rated safety instrumented systems ACTUATION

Service & Support. Questions and Answers about the Proof Test Interval. Proof Test According to IEC FAQ August Answers for industry.

RECOMMENDED GOOD PRACTICE

Connect with Confidence NO POWER NO PROBLEM

DeZURIK. KSV Knife Gate Valve. Safety Manual

EDUCATION DEPARTMENT ACCU-TEST

Neles ValvGuard VG9000H Rev 2.0. Safety Manual

BUBBLER CONTROL SYSTEM

The HumiSys. RH Generator. Operation. Applications. Designed, built, and supported by InstruQuest Inc.

The Control of an Oil Injected Screw Compressor Package Deployed in a Naphtha Cracking Process

Installation Instructions / Operating & Maintenance Manual Midi-Fill Digital Pressurisation Unit Models MFD15 & MFD22

CONTINUITY OF SERVICE PLAN FOR THE LRIT SYSTEM

LECTURE 3 MAINTENANCE DECISION MAKING STRATEGIES (RELIABILITY CENTERED MAINTENANCE)

A4s Operation Manual

LT GasAnalyzer. LT GasAnalyzer Page 1 of 6

Safety Logic in Modular Batch Automation

OIL SUPPLY SYSTEMS ABOVE 45kW OUTPUT 4.1 Oil Supply

MANUAL DIRECT PURGE OPTION. UNION Instruments GmbH CWD2005 PLUS. General information, safety standards and regulations for direct purge option

Control of Hazardous Energy Lockout / Tagout Program

HumiSys HF High Flow RH Generator

Functional Analysis and Control System for the Thermosiphon Chiller. Lukasz Zwalinski PH/DT/PO - Cooling

AcornVac Vacuum Plumbing Systems - Trouble Shooting Guide

AUTROL 217 Expansion and pressurization Systems

Start up XXX from normal shutdown Introduction

USER MANUAL. Intelligent Diagnostic Controller IDC24-A IDC24-AF IDC24-AFL IDC24-F IDP24-A * IDP24-AF * IDP24-AFL * IDP24-F * 1/73

FULL STAINLESS STEEL EXPLOSION-PROOF SOLUTIONS OIL & GAS I OFFSHORE AND ONSHORE

Field Certification Certificate of Compliance

FUNDAMENTAL SAFETY OVERVIEW VOLUME 2: DESIGN AND SAFETY CHAPTER I: AUXILIARY SYSTEMS. A high-capacity EBA system [CSVS] [main purge]

TANK MANAGER FOR TWO TANKS OPERATING MANUAL. 10/31/11 C-More T6C L color touch panel

Safety in Petroleum Industry

Autocalibration Systems For In Situ Oxygen Analyzers

OPERATING PROCEDURES

API Piping Plan 62: A Reliable Quench System

Flamco Flexcon M-P Pump automat

Operating instructions Electrical switching facility pco

Section 1: Multiple Choice

IST-203 Online DCS Migration Tool. Product presentation

XC2 Client/Server Installation & Configuration

Standard 1/4 compressor chiller/heat pump with heating recovery Manual version: /02/97

Touch Screen Guide. OG-1500 and OG Part # T011

Roller AC Servo System

GAS FUEL VALVE FORM AGV5 OM 8-03

DeZURIK Double Block & Bleed (DBB) Knife Gate Valve Safety Manual

SAFETY DIRECTIVE 2.0 DEPARTMENTS AFFECTED. This Administrative Directive shall apply to all Town of Marana departments and employees.

South-Tek Systems - Nitrogen Generation Corrosion Inhibiting System. Designed for: Dry or Preaction Fire Protection Systems (FPS)

Operating instructions Safety Rope Emergency Stop Switches ZB0052 / ZB0053 ZB0072 / ZB0073

Hydraulic (Subsea) Shuttle Valves

innova-ve entrepreneurial global 1

Proposed Abstract for the 2011 Texas A&M Instrumentation Symposium for the Process Industries

PART 1.2 HARDWARE SYSTEM. Dr. AA, Process Control & Safety

THE UNIVERSITY OF NORTH CAROLINA

LENNOX SLP98UHV DIAGNOSTIC CODES

A TECHNICAL REFERENCE

CMC IMPLEMENTATION IN OBSOLETE ISKAMATIC TG CONTROL SYSTEM

Delayed Coker Automation & Interlocks

NGP-250/500 Nitrogen Generator Quick Start Guide

RESILIENT SEATED BUTTERFLY VALVES FUNCTIONAL SAFETY MANUAL

APPENDIX 10, Automation Standard

LT GasAnalyzer beyond standards

Portable Oil Lube Air Compressors

CT433 - Machine Safety

TRI LOK SAFETY MANUAL TRI LOK TRIPLE OFFSET BUTTERFLY VALVE. The High Performance Company

" High Voltage Switch Yards & How Safe are these really" <Peter Rhodes> <Principal and High Voltage Solution. Com Ltd>

Understanding IPL Boundaries

The Future of Hydraulic Control in Water-Systems

Special Condition. 10 Minutes OEI Take-off Thrust at High Ambient Temperature rating

SINGLE VALVE WITH LOW-FLOW BYPASS

Review of the Hall B Gas System Hardware. George Jacobs

LP Separator Level Control by Variable Speed and Multi Stage Brine Reinjection Pumps at Kawerau and Nga Awa Purua Geothermal Projects, New Zealand

Distributed Control Systems

HyperSecureLink V6.0x User Guide

AHE58/59 AC Servo System

PC-based systems ELOP II-NT

CPX EMERGENCY STANDBY MANIFOLD INSTALLATION, OPERATIONS & MAINTENANCE MANUAL

PROCEDURE. April 20, TOP dated 11/1/88

AIR COMPRESSOR OPERATING INSTRUCTION AND PARTS LIST

Reliability of Safety-Critical Systems Chapter 10. Common-Cause Failures - part 1

"PROVING HOUSE LOAD OPERATION OF GAS TURBINE IN PREMIX MODE IN CASE OF GRID FLUCTUATION"

A4 Operation Manual. Fig.1-1 Controller Socket Diagram

Description of Device Parameters Proline Prowirl 200 HART. Vortex flowmeter. Products Solutions Services. Main menu Language.

Transcription:

INDIAN POWER STATIONS POWER PLANT O&M CONFERENCE - 2012 Paper Presentation: COMPLETE AND SIMULTANEOUS DCS FAILURE IN TWO 500MW UNITS K C Tripathy, DGM (C&I) Anoop K, Dy Manager (C&I) B K Rathore, Dy Manager (C&I)

ROLE OF DCS IN POWER PLANTS Protections Alarms Controls Analysis DCS Operations Reporting Monitoring

DCS ARCHITECTURE FOR CASE-STUDY TOTAL: 214 processors, 46 workstations, 50 Ethernet switches Domain-1 for 1 st Unit 103 processors, 21 workstations 8 processors Network-A (Ethernet) Domain-3 for Common plant Network-B (Ethernet) 22 Layer-2 switches, 3 Layer-3 switches 4 workstations 22 Layer-2 switches, 3 Layer-3 switches Each functional group has: 2 redundant processors & 2 redundant networks Domain-2 for 2 nd Unit 103 processors, 21 workstations

INCIDENT NO. 1 REPORT (Both units initially @ full load) Maintenance Work An Ethernet switch in network-b common domain was found to be non-communicating. This switch (common to both units DCS) was restarted as per site procedure. System Events Both units all DCS processors simultaneously rebooted & lost all logic configurations. Both units boiler and TG tripped on hardwired backup protections outside DCS control. No indications/alarms available for operation. No SOE/event logs for troubleshooting. Operational Actions Emergency auxiliary drives were started directly from LT switchgear. DG incomer breaker was manually closed in one unit from switchgear. ECW pumps (that tripped in one unit) were directly started from HT switchgear. System Restoration Workstations were inoperative after the incident until network-b was temporarily switched off. Full download was done in DCS panels, taking 3-4 hours for complete restoration. Suspected switch was replaced with proper IP address and port settings.

PLANT SAFETY HAZARDS BOILER (tripped via hardwired MFT to all firing equipment) Furnace pressure of one unit was found near high trip limit after DCS restoration. Boiler drums were emptied, BCW pumps ran dry without any protection. ID-FD fans ran without any control or protection for over an hour. Safety relief valves floated for 15 minutes in both units, as bypass was unavailable. POTENTIAL RISK : Furnace explosion / Tube failures (if backups also fail) TURBINE (tripped via hardwired inter-trip from MFT) Turbine of one unit crash-halted, 2 days before hand-barring is possible. TG lub oil temperature control unavailable : low temperatures persisted. All TDBFP ran without any protection, one had exhaust diaphragm rupture. Hotwell of one unit emptied & CEPs ran dry without any protection. POTENTIAL RISK : Water ingress / Bearing failures (if backups also fail) GENERATOR (tripped via low fwd power delayed backup) Seal oil system of one unit out of service for a few minutes (due to UT-ST changeover failure & delay in DC SOP starting) Hydrogen leakage in one unit observed in TG floor. POTENTIAL RISK : Fire hazard

INCIDENTS NO. 2 & 3 REPORT Work before 2 nd incident DCS vendor representative was at site in order to troubleshoot the 1 st incident. While restoring uplink connection to another common Ethernet switch in network-b, similar complete DCS failure occurred (one unit tripped at full load, other boiler tripped) Activities after 2 nd incident Work before 3 rd failure (test) Activities after 3 rd failure The uplink connection that caused the 2 nd failure was removed. Operational actions and system restoration done similar to 1 st incident. Both units kept under safe shutdown for further testing. All Ethernet switch port settings and connections in the network checked thoroughly. Double physical connections between same set of switches were removed, wherever found. On reconnecting the uplink that caused 2 nd failure, complete DCS of both units failed again. The net-b uplink connection that caused the 2 nd and 3 rd failures was taped and kept removed. System restoration done similar to 1 st and 2 nd incidents & both units taken to full load. Threat still exists in network-a, BUT solution is pending from supplier for 2 ½ months.

INITIATING CAUSE (DCS NETWORK) (Representative diagram for network-b, similar for redundant network-a) (Broadcast storm initiated by restart / uplink initialization of any one switch in the loop)

ROOT CAUSE (DCS PROCESSOR) Single network overload caused working memory overflow and application crash of communication handler utility inside DCS processor firmware (incidentally common to both networks). All processors (active and redundant) abruptly rebooted at once. Thereby both redundant network & redundant processor concepts of DCS design were defeated. Since the particular version of DCS processor does not keep a backup copy of logics in non-volatile flash, all logic programs were lost. This resulted in a huge downtime of 3-4 hours. Indicative DCS processor volatile memory (RAM) allocation table Real-time OS REBOOT! Input /Output Scan (real-time data) Logic Execution (time classes) External Communications BIOS System AI AO DI DO Special Fast Medium Slow Net-A Net-B Backup link

DESIGN CONSIDERATIONS - PROCESSOR Network failure or overload Loss of ALL network communications or complete communication overload (in any or both of the redundant networks) should NOT lead to complete loss of control in ANY functional group in line with Functional Grouping Concept of DCS Backup copy of logics Processor memory allocation DCS safety compliance DCS processors always must keep a backup copy of logics in non-volatile flash memory, in order to avoid complete loss of logics and huge downtime of 3-4 hours for logic download and restoration. DCS processor functions of RTOS, I/O Scans, Logic execution and External communications have to be protected by dedicated memory allocation. The communication handler utilities inside processor firmware for either redundant network and hot backup link also should be completely independent. Internationally accepted safety certifications and standards (TŰV, SIL3 IEC 61508) must be mandatory for all DCS contract technical specifications. DCS vendor to certify total network safety compliance for its processors. Vendor solution tieup All DCS vendors must to be contractually mandated to provide prompt support and permanent solutions (within one month) in the unlikely event of complete DCS control failure with potential catastrophic consequences, throughout plant lifespan.

DESIGN CONSIDERATIONS - NETWORK Broadcast storms All sources of broadcast storms causing network overload must be identified and addressed through appropriate checks and balances (in architecture and configurations). Parallel double connections Parallel double connections (port redundancy) between any two adjacent pairs of switches must be prevented by design. Inadvertent ring or loop Multiple ( > 2 ) switches must not form a closed physical path (ring or loop). Special attention needed at the highest level of network hierarchy where multiple domains or units of same DCS connect. Firewall safety Hardware Firewalls must be installed in the DCS network at appropriate interfaces and configured to prevent Denial-of-service (DoS) attacks from outside the plant network. Typical firewall interfaces : for offsite PLCs, PADO, intranet, ERP etc.

DESIGN CONSIDERATIONS - BACKUP Design document Boiler backup systems TG backup systems Electrical backup systems Emergency trip systems A detailed backup design document (for all systems independent of DCS) that ensure basic minimum plant safety needs to be issued for every project. The list may include hardwired systems for automatic action & start/stop provisions and indications for manual intervention during DCS failure 2/3 MFT Processors fail must initiate hardwired MFT, and trip all firing equipment DC/emergency scanner air system auto-start, closing spray main block valves Direct indication for furnace pressure and excess oxygen independent of DCS Hardwired MFT or control failure of both turbine protection functional groups (monitored through watchdog relays) must directly energize turbine trip solenoids. Direct indication for turbine speed, vacuum, lub oil pressure & seal oil-h 2 DP. Direct pressure-switch auto-start, manual start buttons and status for emergency drives Enabling signal from DCS for UT-ST fast changeover electrical circuit must be ensured available through set-reset latch relay, to prevent loss of unit supply. Manual closing provision for DG incomer breaker & critical auxiliary drives on the electrical module at LT switchgear. The backup protections for boiler and turbine must use redundant power sources, both independent of normal power sources to protection panel. Failure of both backup supplies must initiate machine trip through DCS. Desk EPBs must trip machine through backup path. (Detailed list in technical paper)

PROJECT CONSIDERATIONS CRASH TEST Need to test To ensure safety in control systems : First introspect, then inspect, thus protect Complex network architecture makes DCS based protection systems vulnerable. Avalanche crash tests To be done from inside DCS network, directly upon its processors. During Factory Acceptance Test (FAT) at vendors shop-floor by Engineering Group. During Site Acceptance Test (SAT) at project site by Commissioning Group. Sample tests for FAT/SAT Creation of port redundancy, loop of multiple Ethernet switches followed by restart / uplink re-initialization of any one switch, in a fully networked system. Creation of a Denial-of-service (DoS) attack from inside the DCS network. The tests may be carried out on either network first, then both together.

PLANT O&M CONSIDERATIONS Online DCS maintenance Comprehensive DCS online maintenance procedure to be mandatorily submitted by DCS vendor & approved by Engineering, to be strictly followed at site. Typical procedures to be included: Logic backups/changes, network settings, singlepoint failure & restoration, complete MMI restoration, complete DCS restoration (Detailed list in technical paper) Backup healthiness Healthiness of all backup systems (independent of DCS) also to be ensured by C&I maintenance (check) and operation (witness) departments periodically during protection checking, typically after unit overhauls. Joint protocol to be recorded. Emergency operation Site-specific emergency handling operation procedures to be prepared to handle complete DCS failure situation, with assigned responsibilities.

CONCLUSION POINTS During the complete DCS failures, all major equipment in power plant faced high risk of catastrophic damage due to loss of all protections and controls. DCS Processors must retain primary control tasks of logic execution and I/O scan even under complete network failure or overload. DCS design contract specifications, vendor support tie-up, network crashtests, and site O&M procedures need to be strengthened further.

DCS Processors must be pre-designed & crash-tested against all network failures & overloads to ensure human & plant safety THANK YOU