LECTURE 3 MAINTENANCE DECISION MAKING STRATEGIES (RELIABILITY CENTERED MAINTENANCE)

LECTURE 3 MAINTENANCE DECISION MAKING STRATEGIES (RELIABILITY CENTERED MAINTENANCE) Politecnico di Milano, Italy piero.baraldi@polimi.it 1

Types of maintenance approaches Intervention Unplanned Planned Corrective Replacement or repair of failed units Scheduled Replacement or Repair following a predefined schedule Conditionbased Monitor the health of the system and then decide on repair actions based on the degradation level assessed Predictive Predict the Remaining Useful Life (RUL) of the system and then decide on repair actions based on the predicted RUL 2 2

3 decision making strategies Risk-Based Reliability Centered 3

4 RELIABILITY-CENTRED MAINTENANCE 4

Reliability-Centred (RCM) What is it? A systematic approach for establishing maintenance programs intervention approaches: Corrective maintenance Planned maintenance (scheduled, condition-based) Primary objective Determine the combination of maintenance tasks which will significantly reduce the major contributors to unreliability and maintenance cost in light of the consequences of failures 5

The RCM Method Focus on system functionality Find the most important functions of the system Avoid and remove maintenance actions which are not strictly necessary When a maintenance plan already exists, the results of RCM is usually the elimination of inefficient preventive maintenance tasks 6

RCM Experience A wide range of companies have reported success by using RCM, that is, cost reductions while maintaining or improving operations regularity: Aircraft industry. RCM is standard procedure for development of new commercial aircrafts Military forces (especially in the US) Nuclear power stations (especially in the US and in France) Oil companies. Most of the oil companies in the North Sea are using RCM Commercial shipping 7

Main Steps of a RCM Analysis 1. Study preparation 2. System selection and definition 3. Functional failure analysis (FFA) 4. Critical item selection 5. FMECA 6. Selection of maintenance actions 7. Determination of maintenance intervals 8. Preventive maintenance comparison analysis 9. In-service data collection and updating 8

1. Study Preparation Form RCM project group (Multi-disciplinarity) Define and clarify objectives and scope of work Identify requirements, policies, and acceptance criteria with respect to the safety and environmental protection Provide drawings and process diagrams (P&ID, ) Check discrepancies between as-built documentation and the real plant Define limitations for the analysis 9

2. System Selection and Definition A standby valve is a maintainable item The valve actuator is not a maintainable item 10

RCM Steps 3: Functional Failure Analysis 11 identify system functions identify functional failures judge functional failure criticality Functional Failure Analysis perform FMECA on MSI List of the dominant failure modes 11

3. Functional Failure Analysis Objectives: Identify and describe the system s required functions and performance criteria Describe input interfaces required for the system to operate Identify the ways in which the system might fail to function Pumping system To pump a fluid Fluid Containment 12

3. Functional Failure Analysis The criticality of functional failures must be judged on plant level and should be ranked with respect to: S = Safety of Personnel E = Environment Impact A = Production Availability C = Material Loss The consequences may be ranked as: H = High M = Medium L = Low N = Negligible 13

RCM Step 4: Critical Item Selection 14 identify system functions identify functional failures judge functional failure criticality Functional Failure Analysis Critical Item Selection Functional Significant Items (FSI) Cost Significant Items (FSI) + = Significant Items (MSI) List of the dominant failure modes 14

4. Critical Item Selection 15

RCM Step 5: FMECA 16 identify system functions identify functional failures judge functional failure criticality Functional Failure Analysis Critical item selection Functional Significant Items (FSI) Cost Significant Items (FSI) + = Significant Items (MSI) perform FMECA on MSI List of the dominant failure modes 16

6. Failure Modes, Effects and Criticality Analysis Objective: identify the dominant failure modes of the MSIs identified in step 4 This step is performed by filling-in a FMECA sheet 17

18 FAILURE MODES, EFFECTS AND CRITICALITIES ANALYSIS (FMECA) 18

FMECA Qualitative Inductive AIM: Identification of those component failure modes which could fail the item 19 19

FMECA: Procedure steps 1. For each item identify its operation modes (start-up, regime, shut-down, maintenance, etc.) and configurations (valves open or closed, pumps on or off, etc.); 2. For each item in each of its operation modes, compile a FMECA table 20 20

FMECA TABLE FUNCTION: OPERATION MODE: compone nt Failure mode Effect on other functional ity Effects on other items Effects on plant Probability* Severity + Criticality Detection methods Protectio ns and mitigatio n Descriptio n Failure modes relevant for the operationa l mode indicated Effects on the functional ity of the item Effects of failure mode on adjacent item and surroundi ng environm ent Effects on the functionali ty and availability of the entire plant Probability of failure occurrence (sometimes qualitative) Worst potential conseque nces (qualitativ e) Criticality rank of the failure mode on the basis of its effects and probabilit y (qualitativ e estimatio n of risk) Methods of detection of the occurren ce of the failure event Protectio ns and measure s to avoid the failure occurren ce 21 21

FMECA TABLE SUBSYSTEM: OPERATION MODE: component Functions PROCESS SHUTDOWN VALVE Shutdown the process (Designed with a closing time of 10s) 22 22

FMECA TABLE SUBSYSTEM: OPERATION MODE: Component Functions Failure Modes PROCESS SHUTDOWN VALVE Shutdown the process (Designed with a closing time of 10s) Close too slowly (> 14s) Close too fast (<6s) 23 23

FMECA TABLE SUBSYSTEM: OPERATION MODE: component Failure mode Effects on other items Effects on subsystem Effects on plant Probability* Description Failure modes relevant for the operational mode indicated Effects of failure mode on adjacent components and surrounding environment Effects on the functionality of the subsystem Effects on the functionality and availability of the entire plant Probability of failure occurrence (sometimes qualitative) Very unlikely: once per 1000 year or seldom Remote: Once per 100 year Occasional: Once per 10 years Probable: Once per year Frequent: Once per month or more often 24 24

FMECA TABLE SUBSYSTEM: OPERATION MODE: other component Failure mode Effects on components Effects on subsystem Effects on plant Probability* Severity + Criticality Description Failure modes relevant for the operational mode indicated Effects of failure mode on adjacent components and surrounding environment Effects on the functionality of the subsystem Effects on the functionality and availability of the entire plant Probability of failure occurrence (sometimes qualitative) Worst potential consequences (qualitative) Criticality rank of the failure mode on the basis of its effects and probability (qualitative estimation of risk) Safe = no relevant effects Marginal = Partially degradated system but no damage to humans Critical = system damage and damage also to humans. If no protective actions are undertaken the accident could lead to loss of the system and serious consequences on the humans Catastrophic = Loss of the system and serious consequences on humans 25 25

FMECA Table 26 component Failure Effects on Effects on subsystem mode other SUBSYSTEM: components Failure Effects of OPERATION MODE: Description modes relevant for the operational mode indicated failure mode on adjacent components and surrounding environment Effects on the functionalit y of the subsystem Effects on plant Effects on the functionality and availability of the entire plant Probability* Criticality+ Detection methods Probability of failure occurrence (sometimes qualitative) Criticality rank of the failure mode on the basis of its effects and probability (qualitativ e estimation of risk) Methods of detection of the occurrenc e of the failure event Protections and mitigation Protections and measures to avoid the failure occurrence Remarks Remarks and suggestio ns on the need to consider the failure mode as accident initiator Evident Failure (detected instantaneously) e.g. spurious stop of a running pump Hidden Failure (can be detected only during testing of the item) e.g. fail to start of a standby pump 26

Exercise: Domestic Hot Water 27 27

Example Boiler System: FMECA (1) Component Failure mode Detection methods Effect on whole system Compensating provision and remarks Critically class Failure frequency Jammed open Observe at pressure relief valve operation of TS controller; gas flow due to hot water loss Shut off water supply, reseal or replace relief valve Safe Likely Pressure relief valve (V04) Jammed close Manual testing No consequences. If combined with other component failure: rupture of container or pipes Periodic inspection; replacement Critical Rare Gas valve (V03) Jammed open Water at faucet too hot; pressure relief valve open (observation) Burner continues to operate, pressure relief valve opens Open hot water faucet to relieve pressure. Shut off gas supply. Pressure relief valve compensates. IE1 Critical Likely Jammed close Observe at output (water temperature too low) Burner ceases to operate Replacement Safe Negligible 28 28

Example Boiler System 2: FMECA (2) Component Failure mode Detection methods Effect on whole system Compensating provision and remarks Critically class Failure frequency Temperature measuring and comparing device (Tsc01) Fail to react to temperature rise above preset level Fail to react to temperature drop below preset level Observe at output (water at faucet too hot); Pressure relief valve opens Observe at output (water at faucet too cold) Controller, gas valve, burner continue to function on. Pressure relief valve opens Controller, gas valve, burner continue to function off. Pressure relief valve compensates. Open hot water faucet to relieve pressure. Shut off gas supply. IE2 Critical Negligible replacement Safe Negligible 29 29

RCM Steps 3-5 30 identify system functions identify functional failures judge functional failure criticality Functional Failure Analysis Critical item selection Functional Significant Items (FSI) Cost Significant Items (FSI) + = Significant Items (MSI) perform FMECA on MSI List of the dominant failure modes 30

6: RCM Decision Logic Input to RCM Decision logic: the dominant failure modes Identified in the previous step (FMECA) Condition Based Scheduled Scheduled Condition Based Corrective 31

6. Scheduled On-Condition Task 32 There are three criteria that must be met for an on-condition task to be applicable: 1. It must be possible to detect reduced failure resistance for a specific failure mode (e.g., degradation index, d) 2. It must be possible to define a potential failure condition that can be detected by an explicit task (e.g. threshold for the detection, d detection ) 3. There must be a reasonable consistent age interval between the time of potential failure (t detect ) is detected and the time of functional failure (t failure ) d failure d detection t detect t failure t 32

6: RCM Decision Logic: Scheduled Overhaul Input to RCM Decision logic: the dominant failure modes Identified in the previous step (FMECA) Condition Based Scheduled Scheduled Condition Based Corrective 33

6. Scheduled Overhaul 34 An overhaul task is considered applicable to an item only if the following criteria are met: 1. There must be an identifiable age at which there is a rapid increase in the items failure rate function. 2. A large proportion of the items must survive to that age. 3. It must be possible to restore the original failure resistance of the item by reworking it. λ(t) t 34

6: RCM Decision Logic: Scheduled Replacement Input to RCM Decision logic: the dominant failure modes Identified in the previous step (FMECA) Condition Based Scheduled Scheduled Condition Based Corrective 35

6. Scheduled replacement 36 A scheduled replacement task is applicable only under the following circumstances: 1. The item must be subject to a critical failure. 2. The item must be subject to a failure that has major potential consequences. 3. There must be an identifiable age at which the item shows a rapid increase in the failure rate function. 4. A large proportion of the items must survive to that age. 36

6: RCM Decision Logic: Scheduled Functional Test Input to RCM Decision logic: the dominant failure modes Identified in the previous step (FMECA) Condition Based Scheduled Scheduled Condition Based Corrective 37

6. Scheduled function test 38 A scheduled function test task is applicable to an item under the following conditions: 1. The item must be subject to a functional failure that is not evident to the operating crew during the performance of normal duties. 2. The item must be one for which no other type of task is applicable and effective. 38

6: RCM Decision Logic: Run To Failure Input to RCM Decision logic: the dominant failure modes Identified in the previous step (FMECA) Condition Based Scheduled Scheduled Condition Based Corrective 39

6. Run to failure 40 Run to failure is a deliberate decision to run to failure because the other tasks are not possible or the economics are less favorable. Run to failure maintenance is generally considered to be the most expensive option, and should only be used on low-cost and easy to replace components that are not critical to operations. 40

7. Determination of Intervals An opinion: The RCM Handbook; Naval Sea Systems Command, S9081-AB-GIB 010/MAINT, US Dept. of Defense, Washington DC 20301, 1983: The best thing you can do if you lack good information about the effect of age on reliability is to pick a periodicity that seems right. Later, you can personally explore the characteristic of the hardware at hand by periodically increasing the periodicity and finding out what happens 42

() Model Granularity Prater's principle of "optimal sloppiness" predictive power level of detail ---> The granularity of the model is determined by the problem and the availability / accuracy of the data 43

7. Determination of Intervals Scheduled tasks are to be performed at regular intervals. To determine the optimal interval is a very difficult task that has to be based on information about: the failure rate function, the likely consequences and costs of the failure the PM task is supposed to prevent, the cost and risk of the PM task In practice the various maintenance tasks have to be grouped into maintenance packages that are carried out at the same time, or in a specific sequence The maintenance intervals can therefore not be optimized for each single item. The whole maintenance package has, at least to some degree, to be treated as an entity 44

8. Planned (PM) Comparison Analysis Each maintenance task selected must meet two requirements: 1. It must be applicable: it can prevent a failure, reduce the probability of the occurrence of a failure to an acceptable level reduce the impact of a failure 2. It must be cost-effective (i.e., the task must not cost more than the failures it is going to prevent) Cost of Failure Cost of PM 45

8. PM Comparison Analysis: Cost of a PM Task The risk/cost related to maintenance induced failures The risk the maintenance personnel is exposed to during the task The risk of increasing the likelihood of failure of another item while the one is out of service The use and cost of physical resources The unavailability of physical resources elsewhere while in use on this task Production unavailability during maintenance Unavailability of protective functions during maintenance 46

8. PM Comparison Analysis: Cost of a Failure The consequences of the failure in terms of: loss of production possible violation of laws or regulations, reduction in plant or personnel safety damage to other equipment The consequences of not performing the PM task even if a failure does not occur (e.g., loss of warranty) Increased premiums for emergency repairs (such as overtime, expediting costs, or high replacement power cost) 47

Updating Process Short-term interval adjustments Medium-term task evaluation Long-term revision of the initial strategy goals - Reference Plan activities System results 48

RCM Comments General issues: maintenance people often rely on manufacturer s recommendations and end up with too frequent maintenances Difficult task to be dynamically based on the information available at the time, e.g. the knowledge of the failure rate value, the probable consequences and costs of the failure that PM is supposed to prevent, the costs and risks of PM Most of the models require information not available. This calls for expert opinion elicitation properly supported by sensitivity and uncertainty analysis 49