Introducing reliability and availability requirements into TOC models

Size: px
Start display at page:

Download "Introducing reliability and availability requirements into TOC models"

Transcription

1 Page 1 of 17 Introducing reliability and availability requirements into TOC models Evin Stump Senior Systems Engineer Galorath Incorporated estump@galorath.com Wendy Lee Systems Engineer / Cost Analyst Galorath Incorporated wlee@galorath.com Ext 655 Abstract Two important system requirements are reliability and availability. Reliability is the probability of no disabling failures over a certain span of time, while availability is the ratio of the time the system will actually fulfill its operational expectations, to the total time it is expected to fulfill them. Both of these are often included in system specifications. An explicit reliability requirement may be stated somewhat as follows: The probability of no disabling system failures in 5,000 hours of operations shall be at least 99.9%. An availability requirement may be stated something like this: The system shall be available for operations at least 95% of the times when it is expected to be available. These requirements are often set with highly imperfect knowledge of their effects on total ownership cost (TOC). The result, if the requirements are rigorously maintained, can be a huge effect on total ownership cost. The effects can be felt in several areas of cost; particularly design labor hours, prototype material cost, production labor hours, production material cost, and support and possibly operational costs after the system is fielded. This paper explores ways that parametric TOC models are often deficient in addressing reliability and maintainability costs, and suggests how they can be made to better deal with them.

2 Page 2 of 17 Arrangement of this Paper This paper has eight major sections and several subsections. These are the major sections and their concerns: Why a Reliability Requirement? This section defines the necessary content of a reliability requirement and why such a requirement is often imposed. Reliability Flowdown Here is discussed the fact that reliability requirements are typically imposed at system level, and must be flowed down to all elements of hardware and software, even to the component level. During any such flowdown, the required reliability gradually increases above the system level requirement. How a Reliability Goal Is Achieved This section discusses the more important tools available to reliability engineers to meet reliability goals. Lifetime Phases of an Element Each hardware element typically undergoes various phases of its life during which the environmental stresses, and hence the failure rates, vary. These phases and their consequences are discussed. Determining MTBF of WBS Elements Mean time between failures (MTBF) is typically the number which most drives maintenance costs due to random failures. Its determination is discussed. Non-Random Failures Most of the concerns of reliability engineers, and also most cost models, are with random failures. Their intensity is typically measured by MTBF. But substantial failure costs are in some cases attributable to non-random failures. These costs are sometimes greater than costs of random failures but are often ignored in cost analysis. Overhauls Many hardware items are required to be overhauled from time to time. The various conditions of overhauls are discussed. What Is an Availability Requirement and Why Is it Imposed? The nature of an availability requirement and the reasons for its imposition are discussed. Why a Reliability Requirement? Not infrequently, a reliability requirement is stated in vague terms: The equipment shall operate reliably throughout its lifetime. Such a requirement is all but meaningless. It s equivalent to a requirement that software be user friendly. The problem is that reliability, like user friendliness, can have many meanings depending on both context and personal point of view. A meaningful and universally understandable reliability requirement always requires two numbers, plus a clear definition of failure, and also an environmental context. The definition of failure is important because reliability is about the absence of failure. Generally, failure is defined in terms of loss of a desired functionality. Loss of the nameplate attached to an item of equipment would not normally be classified as a failure, but failure to perform of a single key on a 100 key keyboard probably would be. Minor degradations of capability, such as a single sticky key, might or might not be classified as a failure depending on circumstances. Whatever the definition, it must be clearly expressed to be useful.

3 Page 3 of 17 Two numbers are needed to define reliability concisely: 1) a timespan and 2) a probability. Timespan can be expressed in any convenient units of time, such as hours, days, or months. Probability, at least for high reliability equipment, is generally expressed in 9 s, e.g., , or equivalently, 99.95%. The probability number always holds true only for a specific operating environment, or sometimes a nonoperating environment. For example, the reliability of a printed circuit board (PCB) might be defined in this way: For at least 50,000 operating hours, the PCB must have a probability of at least of no failures. Environmental limits are listed below. A failure is defined as any loss of signal output of longer than one second duration. Why would a request for proposal contain such a requirement? From the customer point of view, it provides high confidence that the product will be functional when the customer needs it to be. However, both customers and providers should understand that product acquisition costs virtually always rise sharply when: The definition of reliability becomes more comprehensive and exacting The count of 9 s in the reliability probability measure increases The required reliability timespan increases The operating (or for some systems the non-operating) environment becomes more severe and more damaging to the product. Higher reliability is often traded off against higher acquisition costs. On the other hand, higher reliability generally decreases post-production support cost, and sometimes operating cost as well. Higher development and production costs may well be repaid many times over by lower operations and support (O&S) costs. Of course, from a time value of money standpoint the front-end acquisition costs have more weight, and often are more intensely scrutinized. Users of parametric cost models should be aware of the critical dependency of acquisition cost on reliability. They need to understand how their development and production models, as well as their O&S models deal with this issue and in particular how reliability and operating environment information is captured by their model. In this paper, these issues are considered in more detail. Reliability Flowdown Most project efforts to develop either hardware or software describe the work to be done in a hierarchical work breakdown structure (WBS). When a reliability requirement is imposed on hardware, it most likely will be imposed from the top down, that is, from the highest rollup WBS element. However, in a system of systems situation, reliability requirements may be imposed at more than one rollup level. In any event, what is imposed at a high level must be flowed down to lower levels. Mathematically, the reliability at any rollup level is the product of the reliabilities of all lower level elements the directly feed that rollup. This is because reliability is a probability and it follows the rules

4 Page 4 of 17 of probabilities. The relevant rule here is that the joint probability of two or more independent events is the product of their separate probabilities. For example, in Figure 1 the top level rollup has been assigned a reliability requirement of (99.95%) over some period of time and within a specified environment. The top level element is fed by three lower level rollups. The product of their reliabilities must be at least If the second level rollups all are assumed to have equal reliability, then their reliabilities must all be at least (0.9995)^(1/3), i.e., the cube root of This happens to be , as shown in Figure 1. 1 In Figure 1 we have assumed that below the second level rollups, there are only leaf elements, that is, elements that have no children. Note that the first leaf element has the same reliability requirement as its parent, because it stands alone. However, the next two leaf elements report to the same parent, so their reliability allocations must be ( )^(1/2), i.e., the square root of Finally, the last three leaf element are assigned reliabilities ( )^(1/3) = Noteworthy is that as the number of sub-elements increases, the flowdown reliability allocations increase somewhat. In Figure 1, the biggest increase was from to , an increase of only about a half of one percent. But to a reliability engineer it is more difficult than it would at first appear. It adds almost another 9, and more 9 s can be hard and expensive -- to come by. This is not the end of the story. Each of the leaf elements typically contains a variety of components, often referred to as parts. In certain hardware items, these parts may be quite numerous. For example, in a printed circuit board (PCB) there could be, say, 100 parts, such as ASICs, resistors, capacitors, diodes, connectors, etc. Each of these parts has what is known to reliability engineers as a failure rate, typically measured in failures per hour, or alternately in failures per thousand or per billion hours. The failure rate of a part depends on certain factors such as its quality (degree of care taken in its design and manufacture to assure suitability to purpose), its ruggedness, its complexity, and its operating 1 Equal assignment of reliabilities to lower level children in the WBS hierarchy is probably the most common allocation method used by reliability engineers, but it is not the only one. Sometimes other methods are used, but the results seldom stray far from the results due to equal assignments.

5 Page 5 of 17 environment. Its operating environment of interest often includes the ambient temperature range, shocks and vibrations to which an item might be subject, chemical or other attacks to which it might be subject, and possibly other factors. Electronic parts generally are more prone to higher failure rates when applied voltages are higher and also when ambient temperatures are higher. Table 1 Failure Rate vs. Reliability Example λ R (1k hours) R (7k hours) R (10K hours) 1.00E % % % 1.00E % % % 1.00E % % % 1.00E % % % So how are reliability and failure rate connected? In a version of reliability theory commonly used by engineers, namely the exponential theory, the reliability of a single part is given by R(t) = exp(-λt), where t is the elapsed time in hours and λ is the failure rate in failures per hour. Some typical values of this function are tabulated in Table 1 for three periods of time: 1,000 hours, 7,000 hours, and 10,000 hours. 2 The failure rate (λ) ranges from 1E-05 to 1E-08 failures per hour, fairly typical of high reliability hardware. In a WBS leaf element having multiple parts connected in series, the failure rate of the whole is the sum of the failure rates of the individual parts. 3 If the parts all have a similar failure rate, which is fairly common, then as an approximation we can write: λ c N λ avg In this equation λ c is the composite failure rate of the leaf element, N is the parts count, and λ avg is the average failure rate of the individual parts. The composite reliability of the element R c is then given by: R c (t) = exp(-λ c t) By reference to Figure 1, it can be seen that R c (t) must be equal to or greater than the reliability flowdown. With the aid of a bit of algebraic manipulation (solving for λ c ), this implies: λ c -ln(r c (t))/t If in fact λ c does exceed this value there are things that can be done to still achieve a desired reliability, as will be discussed in the next section. How a Reliability Goal Is Achieved Engineers use several approaches to achieve a desired reliability goal. The most important of these are: 2 In military hardware, and indeed in most reliability critical commercial hardware, failure rate of a part is often assumed to be constant over the useful life of the hardware. This is a very good approximation in most cases. Note that one calendar year is approximately 8,760 hours. 3 Connected in series means connected in such a way that a failure of a single part is a failure of the entire element. Later we will discuss parallel connections, in which the element does not fail unless all of its parallel connections fail.

6 Page 6 of 17 Reduction of component count Increased component quality Redundancy of circuits or of elements 4 Environmental Protection Test and Fix Cross strapping Reduction of component count Recall these two equations, previously cited: R c (t) = exp(-λ c t) λ c N λ avg From the first equation, it can be seen that the composite reliability depends on the composite failure rate, λ c. From the second, it can be seen that the composite failure rate depends on N, the parts count. Therefore, if the same functionality can be achieved with fewer parts, there is a gain in reliability. For electronic elements, the use of integrated circuits of various types effectively reduces parts count, and this is one reason they are widely used. For mechanical elements, a reduction in parts count can often be achieved by combining two or more parts into one. This has the added benefit of reducing assembly costs. Also for mechanical elements, minimizing the number of moving parts is especially helpful, because moving parts, such as bearings, gears, valves, pistons, threads, springs, and levers, almost always have higher failure rates than non-moving parts such as structures. Increased component quality High component quality gets a lot of attention from reliability engineers. Achieving it is both difficult and costly, but it can result in big gains in reliability. Most major NASA projects, as well as many Air Force or other projects involve spacecraft that are not repairable if they fail, and that are expected to operate without disabling failure for many years, sometimes as many as fifteen. It is in projects of this type that component quality matters most. Little wonder, then, that NASA pays great attention to parts quality. A number of other government agencies and commercial companies have the same concerns, so the pursuit of high quality components is widespread. Equally widespread is the concern about the cost of such components, and a variety of strategies have been and are being used to get first rate quality without incurring first rate costs. This has led to a rather confusing mix of parts classifications and parts testing requirements, with various organizations or authorities tending to have their own approach to the problem. The most likely classification systems encountered by an aerospace/defense cost analyst are NASA levels 1, 2, and 3, with level 1 being the highest quality, and the S, B, and C system used by several organizations, including the USAF. S level parts are generally required for spacecraft and some aircraft 4 An element is a WBS element. A circuit" is a part of an element.

7 Page 7 of 17 applications, B level is often regarded as sufficient for other military applications, and C level may be adequate for commercial applications other than heavy duty industrial applications. Generally, the higher the level the more costly is the part, mainly because of requirements for expensive testing and documentation. Always a consideration in selecting parts quality is the issue of reparability. If a system can be accessed for reasonably rapid repair, and if occasional failures that can be quickly repaired are acceptable, then lower quality parts may be selected in the interests of lower development and production costs. But of course, there will be an increase in maintenance costs. Redundancy of circuits or elements If when using the highest quality parts and the minimum number of parts a certain leaf element has a calculated reliability of 98.7%, but the flowdown reliability required is 99.99%, is the situation hopeless? Not necessarily. Redundancy may come to the rescue. Redundancy is one technique for achieving fault tolerant designs, that is, designs in which certain failures do not prevent continued operation. Redundancy generally is mostly needed only for electronic elements, because mechanical elements generally have both lower parts counts and higher reliabilities for individual parts. Many mechanical elements are essentially designed for infinite life, due to the safety factors employed. However, for electronics, element redundancy seldom involves using (say) two parallel resistors instead of one, because if one of them fails open circuit, then the circuit is still likely to fail because the circuit resistance will suddenly increase sharply. Most electronic circuits are tolerant of small to moderate variations in the electrical properties of a component, but they usually will not be tolerant of a large change. Of course, a short circuit failure is generally even more disastrous. Redundancy is most likely to take place at either the circuit level or at the element level. At the circuit level an entire circuit, say perhaps an oscillator, or an analog to digital converter, may be replicated on a single PCB. This can be done two ways, both of which increase reliability, but by different amounts. One way is to have both redundant circuits in operation continuously. Another circuit called a voter tests the output of both circuits and selects the output deemed best. If one redundant circuit fails, then the voter will select the output of the one that has not failed. The other way is to have one of the circuits non-operational until the other one fails, at which time it is switched on. This also requires a voter, but a somewhat more sophisticated (and costly) one. It should be noted that introducing redundancy on a PCB increases its size, weight, and cost, not to mention, in most configurations, its power draw and its heat output. One s first impulse about including redundant elements on a PCB is that this increases parts count and thus reduces reliability, and it does, but the offsetting effects of redundancy generally more than make up for this loss. When using redundancy at the element level, two (sometimes three) PCBs replace one. Again, a voter is needed, and again, there is the option of both PCBs being in continuous operation, or only one in operation until it fails. At failure a different one is turned on.

8 Page 8 of 17 Can redundancy be more than two deep? In principle, it can be any number deep, but the reality is that it usually becomes unwieldy beyond three deep. 5 Beyond three deep, engineering considerations of volume, weight, power draw, heat generation, and cost generally call a halt. Is redundancy limited to simple parallel configurations? No, redundancy can have a number of different and often complex configurations. Note in Figure 2 that there are three paths to successful operation: A-B, D-E, and A-C-E. A total failure would require either: A failure of both A & D A failure of both B & E A failure of A, C, & E While such a circuit can result in very high reliabilities over long periods of time, it is also costly, high in power consumption, difficult to design, and difficult to test. A reliability engineer would call for such a circuit only if the reliability requirements are extreme. However, such arrangements are not unusual in mechanical or electromechanical subsystems, and the various elements involved are likely to not be identical. For example, if is fairly common to see a power source such as a generator backed up by a battery, plus circuitry to change the battery s DC output at a low voltage into an AC output at a much higher voltage. It should be kept in mind that a redundancy arrangement used to boost reliability may not have the same reliability over its useful life. Consider, for example, a simple redundancy arrangement comprising two identical circuits in parallel, both operating continuously. Suppose that the useful life expectancy is ten years, 87,600 hours, and further suppose that the circuit is not repairable. Also, assume that each of the circuits individually has a reliability of over 87,600 hours. The math, shown below, predicts that with two equal parallel components both operating properly, the reliability is increased from three 9s to six 9s. R 2 in parallel = 1 (1 R 1only ) 2 = 1 ( ) 2 = But if during the lifetime one of the parallel circuits should fail, then for the remainder of the lifetime the reliability decreases to This can have serious consequences in some situations. 5 There is also the issue of diminishing returns. Double redundancy increases MTBF by a factor of 1.5. Triple redundancy increases it only by a factor of Quadruple redundancy increases it only by a factor of Every added layer of redundancy has less and less effect. This is one reason why redundancy usually does not go beyond three layers. There is also a diminishing returns effect with respect to reliability.

9 Page 9 of 17 While the major concerns of a reliability engineer are usually assessing reliability, and whether a reliability requirement has been met, the major concern of a cost analyst is usually mean time between failures (MTBF), because that primarily drives repair costs in repairable systems subject to random failures. In systems with no redundancy, MTBF c = 1 / λ c, a very simple relationship. With the aid of a bit of algebra it can further be written that: MTBF c = -t / ln(r(t)) This equation directly relates MTBF c to reliability when there is no redundancy. Thus the concerns of the reliability engineer can be directly related to the concerns of the cost analyst. Generally, long term costs of maintenance due to random failures are more or less proportional to MTBF c. The picture changes significantly, however, when multiple redundancy is employed. For redundancy of two circuits in parallel, MTBF c is no longer equal to 1/λ c. The derivation is not shown, but the new relationship is MTBF c = 3/2λ c. 6 This reduces corrective maintenance costs by roughly the same factor, but it may increase preventive maintenance costs because under some preventive maintenance policies it may be required to frequently check parallel redundant arrangements to see if one of the circuits or elements has failed. 7 As mentioned previously, a failure of one parallel redundant element may reduce the reliability considerably. The need for such preventive maintenance is prompted by requirements such as this, which are appearing more often: The unit shall have a 95% chance of remaining fully operational after a second failure of a similar device. (This is a direct quote from a recent NASA requirements document. Italics provided by the authors of this paper. It is likely that the NASA author actually intended to say at least a 95% chance.) Environmental Protection Reliability of an element is specific to the extremes of environment under which the element is expected to operate. The harsher the environment, the lower will be the reliability. Electronics are especially sensitive to applied voltage and working temperature. They are also sensitive to nuclear radiation. Reliability can be increased by reducing the environmental impacts, often done by providing some form of protection, such as barriers, cooling, isolation, or insulation. While environmental protections incur some costs, they are often far less that the costs of not providing them. Protection may also be useful in reducing non-random failures, such as those induced by corrosion. As an example, naval aircraft are sometimes washed down with precious fresh water to minimize salt corrosion. Test and Fix The basic idea in test and fix is to subject an element, almost always an electronic element (PCB), to intensive accelerated life testing until it fails. The testing is usually a combination of hot / cold cycling and random vibration (vibration across a wide spectrum of frequencies) at levels sufficiently high to severely stress but not destroy the tested element. 6 Some O&S models assume that MTBF is always the reciprocal of failure rate. This can be grossly in error. 7 Such a check can be part of preventive maintenance, but currently the trend is to design in a capability to overcome such a failure while still remaining operational.

10 Page 10 of 17 The cause of the failure, commonly called the failure mode, is then thoroughly investigated, and a specific fix is designed and implemented. This process may be repeated until no further failure modes are found. This process of reliability enhancement, often called Highly Accelerated Life Testing (HALT), tends to produce very high but somewhat unknowable reliability values. Unfortunately for accurate support cost analysis, it also creates fairly large uncertainties as to mean time between failures (MTBF), although it is generally the case that the MTBF will exceed the useful life of the product, given the rapid changes in electronic technology currently being experienced. At this writing, we believe that a reasonable approach when practices such as HALT are being used is to assume that they are equivalent to three deep parallel redundancies. This provides high values of reliability and long values of MTBF. The result may be less than one predicted failure in a lifetime, which from a cost standpoint may be negligible. While processes such as HALT are expensive, especially when applied to every PCB delivered, as they sometimes are, there is a large payback in terms of reliability, reduced corrective maintenance cost, and availability. Cross strapping Cross strapping is a method of providing multiple interconnections so that the normal effects of a failure are avoided by reliance on an alternate power or signal source in the event of a failure of the current source. Some cross strapping schemes are quite complex. Care must be taken in designing cross strapped system to avoid inadvertent applications of an incorrect voltage or signal. Cross strapped systems are commonly difficult and expensive to design and test. Analysis of reliability and MTBF of cross strapped systems can be very complex and may require use of an iterative computer algorithm. It is not uncommon for the MTBF of such systems to far exceed the useful life of the system. The block diagram below has multiple cross strapping and illustrates the high level of complexity that can result.

11 Page 11 of 17 Figure 3 Cross Strapping Example Design diversity Design diversity means effecting redundancy by using redundant elements that are unlike. This approach can possibly make a redundant element more robust with respect to certain damaging environments, or can increase the likelihood of continued operation in the event of unexpected stresses. It will likely of course be more expensive than using identical designs for the redundant elements. Lifetime Phases of an Element Most elements that are subject to random failures are also subject to various lifetime phases that can increase (or decrease) their propensity to fail. The projected useful life of an element is the first consideration in addressing the element s phases. The second consideration is how the useful life divides up into phases that may have significantly different failure rates. Various allocation schemes have been used to portray this. One that is fairly universal is the following: Operational phase the phase of useful life in which the element is fully operational and in active use. This generally, but not always, is the phase having the highest stresses and consequently the highest failure rates. For an aircraft, the operational phase most often coincides with flight. For a ship, the operational phase most often coincides with being under way. However, it should be noted that certain elements are not necessarily operating when their platform is. Alert phase the part of useful life in which the element is not operational, but can rapidly transition to an operational state. In the alert state, the element is generally quiescent, and its

12 Page 12 of 17 failure rate is typically much lower than in the operational state (perhaps only about ten to twenty percent) but usually not negligible, as is often incorrectly assumed. Some systems, such as containerized missiles, spend most of their life in this state. Out of service phase the phase of useful life in which the element is unavailable for operational service. For example, most elements of an aircraft are out of service when it is undergoing tests for fatigue cracks in the fuselage. Most elements of a ship are out of service when it is in drydock. Elements returned to a depot for overhaul are out of service until they are returned to inventory. An element down for maintenance is also out of service. The out of service failure rate is usually the same as or similar to the alert failure rate, but there can be exceptions. Note that an element in either the operational or the alert state is normally considered to be available for purposes of calculating its availability, while an element that is out of service is considered to be unavailable. Availability is discussed later in this paper. Patterns of service can differ considerably from element type to element type, and can also differ considerably from operational environment to operational environment. For example, an element of a containerized missile will generally spend most of its useful life in the alert state. Its operational state coincides with the missile being expended, and there can be no maintenance cost in that phase. There may be an out of service life if the missile is periodically recycled to a depot for inspection and / or overhaul. An element of a spacecraft typically spends its life in the operational state, but some elements may be turned off for significant periods of time, so there could be a significant alert state for them. These considerations are of great interest to a spacecraft reliability engineer, but not to a cost analyst, because spacecraft elements are not accessible for maintenance (except possibly for certain commands that can be issued from a ground station to reconfigure a partially failed spacecraft). For elements of earthbound platforms such as aircraft, ships, tanks, trucks, ground stations, etc., the life phases usually coincide closely with the life phases of the platform. There are exceptions, such as electronic items that often remain unused. For example, some ships still carry a LORAN navigation system, which usually is not turned on unless GPS and inertial navigation systems both fail. While an aircraft carrier s air search radar generally operates continuously when the ship is at sea, that is not true for submarines, especially not for nuclear submarines, which spend most of their time at sea under the surface, where air search radar cannot be used. Active sonar is another example of a system whose use pattern often does not coincide with the operational pattern of the ship that carries it. It is generally turned on only when there is a tactical need for it. On the other hand, passive sonar is likely to be constantly in use in some classes of ships when they are underway. The key takeaway with regard to lifetime phases is that understanding them is vitally important to estimation of both operation and support costs.

13 Page 13 of 17 Determining MTBF of WBS Elements From all that has been said up to this point it should be clear that to get accurate estimates of the cost of corrective maintenance of random failures, one needs reasonably good MTBF values. Where do those come from? We should first recognize that reliability is the probability of an event, the event being the non-failure of an item over a stated period of time, and under specified environmental conditions. To arrive at such probabilities, we must compile statistical data on the failure-free performance of items of interest in the time domain. We do this by observing items under specified conditions of operation and environment, and measuring their time to failure. For simple components, a fairly large sample is usually selected for testing. The times to failure will vary randomly, but it is possible to compute a mean value MTBF. From the same data it is also possible to infer failure rates. Wearout Life (failure rate increases w/ time) A complication arises in that for most hardware there are three types of failure: early (often called infant mortality ), random, with essentially constant failure rate, and wearout. This gives rise to the well-known bathtub curve of reliability engineering. An example is shown in Figure 4. Normally for critical equipment, early electronic failures are weeded out by a process called burn-in, and equipment is retired or overhauled before wearout begins. 8 Therefore most reliability analysis concerns itself only with random failures based on a constant failure rate. A huge majority of projects do not test all of their components for MTBF from scratch. That would be prohibitively expensive. Fortunately, reliability engineers have available libraries listing typical failure rates for most commonly used components. Testing is reserved for unique situations where reliability is critical and for which no reliable failure data exist. Sometimes it is necessary to use components in environments that differ from environments for which in which they are most commonly used. To that end, derating factors are compiled. A derating is a 8 If equipment remains in use after wearout begins, the failure rate increases, and even accelerates. Corrective maintenance becomes much more costly. An adequate reliability theory exists for this case but we do not address it in this paper.

14 Page 14 of 17 correction factor applied to failure rates when components are used in extreme environments. They probably are most commonly applied to temperature sensitive electronic parts that must be used in hot or high voltage environments, but there are many other applications as well, for example, derating of ball bearings for higher than normal loads. From failure rates of simple components, failures rates and MTBF values for various assemblies of components can be derived. Recall that it matters how the components are assembled. There are major differences in MTBF and reliability between assemblies that contain redundancies in the form of parallel arrangements or cross strapping and assemblies that do not. Some O&S models do not recognize this. One frequent issue regarding MTBF is that values predicted from laboratory tests do not match well with results reported from field operations. This an extremely difficult issue because of the many gaps, lack of timeliness, errors, and lack of specificity in field failure reports, versus the close controls usually used in laboratory testing. The reliability engineers who screen and tabulate data from field reports must do sometimes heroic feats of data interpretation. Still, imperfect data is usually better than no data. Non-random Failures Many failures of hardware are classified as random because their actual time of occurrence is not predictable. But there is a class of failures for which time to failure may not be perfectly predictable, but it can be estimated reasonably well. O&S models often do not recognize this important class of failures. An example is aircraft tires. While occasionally they do actually fail randomly, more often they are replaced after a certain number of landings because after that number of landings they are deemed to be unsafe. Cargo extraction parachutes are another example. They often get deployment friction burns and lengthy exposure to sunlight, both of which weaken them. After a certain number of deployments they are typically removed from service. A non-random failure sometimes is an end-of-life event, but this is not quite the same thing as the useful life estimate, because useful life in calendar terms can be much prolonged simply by using the item less often, thus exposing it less often to a hurtful environment. Often the amount of exposure to a harmful environment deemed to result in failure is a policy matter, and is determined by testing or from experience. Since that exposure is likely to vary considerably in the various lifetime phases above listed, it should be estimated separately for each phase and the results combined. One particular non-random failure mode, corrosion, is often omitted from O&S analyses. But across all DoD equipment it accounts for an average of about a fourth of all corrective maintenance costs. For a few systems, such as the C-130 aircraft, it accounts for almost half of all maintenance costs.

15 Page 15 of 17 Overhauls High value assets subject to random and / or non-random failures are sometimes sent to depots for extensive repairs and renewals intended to make them like new. The motivation is that the intrinsic value of the item makes the overhaul expense worthwhile. O&S models sometimes do not give adequate treatment to overhauls. Overhauls may be fairly complex to deal with from a modeling standpoint. For example, the frequency of overhauls, usually a policy matter, can be driven by: Cumulative operating hours Cumulative alert hours Cumulative out of service hours Combinations of the above Cumulative calendar hours regardless of operational state Occurrence of a random failure Random selection from a population Sometimes overhauls are not done on an entire population. Instead, they are based on a sampling plan. This is most likely to be done for systems in which detection of some failures is possible only in a well equipped depot. Costs of overhauls usually are generated by: Inspections to determine equipment status Repairs, which may include some remanufacturing Testing to verify condition before return to service. Overhauls of electronic items may not be advisable, because any repair or remanufacture of them often reduces their reliability. What Is an Availability Requirement and Why Is it Imposed? Certain systems are operationally critical in the sense that if they fail to operate there can be substantial losses. In industry, these losses are typically of time and money. But for military equipment, the loss could be a battle or even a war. Recall the famous military proverb: For want of a nail the shoe was lost. For want of a shoe the horse was lost. For want of a horse the rider was lost. For want of a rider the battle was lost. For want of a battle the kingdom was lost. And all for want of a horseshoe nail.

16 Page 16 of 17 This proverb, the sources of which have been traced back as far as the year 1390, was until recent years applied primarily to failures of logistic supply. But in recent years it also has been applied to equipment designs and maintainability planning as well. For example, would it not have been nice to have horseshoes that have no need of nails? Or perhaps even horses that have no need of shoes? In that sense its message can be summarized in a single word: AVAILABILITY The general sense of availability can be grasped easily enough: Will I ever need it? Will it be there if I need it? But in today s risk conscious world, we need both a clear definition and a way to define it quantitatively. Here is one definition that is easily understood: Availability is the proportion of need time that a system (or subsystem or equipment) is in an adequately functioning condition. Obviously, as reliability increases so does availability, all else being equal. In terms of the representative partitioning of useful life previously described in this paper, equipment generally is desired to be available in both its operational and alert states, but not in the out of service state. What can make equipment unavailable when it is expected to be available? Ignoring the possibility of a secondary failure, such as a general power outage which forces the equipment to the off condition, the answer is a failure of the equipment itself. If by some miracle a failure could always be repaired instantaneously, availability would always be 100%. But repairs take time, the average of which is commonly denoted mean time to repair or sometimes mean time to recover. Either way, the acronym is MTTR. Probably the most commonly cited mathematical representation of availability is: = [ ] [ ] + [ ] Here, A = availability, which like reliability is commonly expressed in nines; E[Uptime] = expected uptime during system life, and E[Downtime] = expected downtime during system life. Example An equipment has an MTBF of 100,000 hours and an MTTR of 1 hour. Its availability is 100,000/100,001 = The unavailability is = Defining a calendar year as 8,760 hours and assuming that the equipment is in continuous operation, this unavailability translates to hours (5.256 minutes) downtime per calendar year. Obviously important to availability are reliability, MTBF, accessibility of the equipment to maintenance staff, skill level of maintenance staff, adequate diagnostic equipment or built-in-test capability, and readiness of spares. All of this and perhaps other factors should be considered when assigning an MTTR value to equipment in an O&S model.

17 Page 17 of 17 Availability flowdown As previously described in this paper reliability flows down to leaf elements of a WBS from a higher level reliability requirement in a mathematically simple way. This flowdown emulates what a reliability engineer has supposedly done, namely meeting the reliability requirement in the most economical way. Availability, like reliability, is a probability and the flowdown math is the same. Can an availability flowdown be used to check the availability compliance of a leaf element? Yes, it can. Assuming that an MTBF number and an MTTR number are both available at a leaf element, as they should be, the availability calculation shown above can be used to calculate the element availability. That calculation can then be compared to the flowdown value. It should be at least as large. Summary The goal of this paper is to review some of the ways that TOC models are often deficient in addressing reliability and availability requirements as cost drivers, and to suggest how to approach fixing these deficiencies. Specifically, the following have been noted: Reliability and availability are strong drivers of projects cost in aerospace and defense and models sometimes ignore this Reliability requirements directly influence MTBF values of hardware items, and MTBF assignments often do not recognize this direct relationship Reliability is a strong function of the operating (sometimes also the non-operating) environment, and this must be accounted for but often is not Models do not always adequately recognize the beneficial effects of various types of redundancy and other reliability enhancements It is sometimes not recognized that a hardware element may experience more than one life cycle phase, with failure rates varying significantly between phases For many systems, non-random costs are major costs, and are often ignored Overhaul costs often are not carefully analyzed Further Reading The equations and most other information presented in this paper are readily available from many sources. Two very good ones are: Bazovsky, Reliability Theory and Practice, Dover, 2004 Pecht (editor), Product Reliability, Maintainability, and Supportability Handbook, CRC Press, 1995

Electrical Equipment Failures Cause & Liability. Prepared by: Robert Abend, PE on 11 August 2014

Electrical Equipment Failures Cause & Liability. Prepared by: Robert Abend, PE on 11 August 2014 Electrical Equipment Failures Cause & Liability Prepared by: Robert Abend, PE on 11 August 2014 About the Author Robert (Bob) Abend gained his initial career experience in the semiconductor component industry

More information

Selecting Maintenance Tactics Section 4

Selecting Maintenance Tactics Section 4 ARE 24 Facilities Maintenance Management Prepared By: KAMAL A. BOGES # 2321 November 2 nd, 2003 Selecting Maintenance Tactics Section 4 Uptime Strategies for Excellence in Maintenance Management By: John

More information

Ch.5 Reliability System Modeling.

Ch.5 Reliability System Modeling. Certified Reliability Engineer. Ch.5 Reliability System Modeling. Industrial Engineering & Management System Research Center. - 1 - Reliability Data. [CRE Primer Ⅵ 2-6] Sources of Reliability Data. Successful

More information

2600T Series Pressure Transmitters Plugged Impulse Line Detection Diagnostic. Pressure Measurement Engineered solutions for all applications

2600T Series Pressure Transmitters Plugged Impulse Line Detection Diagnostic. Pressure Measurement Engineered solutions for all applications Application Description AG/266PILD-EN Rev. C 2600T Series Pressure Transmitters Plugged Impulse Line Detection Diagnostic Pressure Measurement Engineered solutions for all applications Increase plant productivity

More information

Reliability predictions in product development. Proof Engineering Co

Reliability predictions in product development. Proof Engineering Co Reliability predictions in product development Proof Engineering Co Contents Review of reliability theory Ways to predict part reliability Converting parts reliability into a system reliability Ways to

More information

ASSESSING THE RELIABILITY OF FAIL-SAFE STRUCTURES INTRODUCTION

ASSESSING THE RELIABILITY OF FAIL-SAFE STRUCTURES INTRODUCTION ASSESSING THE RELIABILITY OF FAIL-SAFE STRUCTURES Abraham Brot * Abstract: A computer simulation method is described, that can be used to assess the reliability of a dual-path fail-safe design. The method

More information

Reliability. Introduction, 163 Quantifying Reliability, 163. Finding the Probability of Functioning When Activated, 163

Reliability. Introduction, 163 Quantifying Reliability, 163. Finding the Probability of Functioning When Activated, 163 ste41912_ch04_123-175 3:16:06 01.29pm Page 163 SUPPLEMENT TO CHAPTER 4 Reliability LEARNING OBJECTIVES SUPPLEMENT OUTLINE After completing this supplement, you should be able to: 1 Define reliability.

More information

Failure Data Analysis for Aircraft Maintenance Planning

Failure Data Analysis for Aircraft Maintenance Planning Failure Data Analysis for Aircraft Maintenance Planning M. Tozan, A. Z. Al-Garni, A. M. Al-Garni, and A. Jamal Aerospace Engineering Department King Fahd University of Petroleum and Minerals Abstract This

More information

Safety Critical Systems

Safety Critical Systems Safety Critical Systems Mostly from: Douglass, Doing Hard Time, developing Real-Time Systems with UML, Objects, Frameworks And Patterns, Addison-Wesley. ISBN 0-201-49837-5 1 Definitions channel a set of

More information

The Criticality of Cooling

The Criticality of Cooling Reliability Solutions White Paper January 2016 The Criticality of Cooling Utilities, power plants, and manufacturing facilities all make use of cooling towers for critical heat transfer needs. By cycling

More information

FP15 Interface Valve. SIL Safety Manual. SIL SM.018 Rev 1. Compiled By : G. Elliott, Date: 30/10/2017. Innovative and Reliable Valve & Pump Solutions

FP15 Interface Valve. SIL Safety Manual. SIL SM.018 Rev 1. Compiled By : G. Elliott, Date: 30/10/2017. Innovative and Reliable Valve & Pump Solutions SIL SM.018 Rev 1 FP15 Interface Valve Compiled By : G. Elliott, Date: 30/10/2017 FP15/L1 FP15/H1 Contents Terminology Definitions......3 Acronyms & Abbreviations...4 1. Introduction...5 1.1 Scope.. 5 1.2

More information

LECTURE 3 MAINTENANCE DECISION MAKING STRATEGIES (RELIABILITY CENTERED MAINTENANCE)

LECTURE 3 MAINTENANCE DECISION MAKING STRATEGIES (RELIABILITY CENTERED MAINTENANCE) LECTURE 3 MAINTENANCE DECISION MAKING STRATEGIES (RELIABILITY CENTERED MAINTENANCE) Politecnico di Milano, Italy piero.baraldi@polimi.it 1 Types of maintenance approaches Intervention Unplanned Planned

More information

Solenoid Valves For Gas Service FP02G & FP05G

Solenoid Valves For Gas Service FP02G & FP05G SIL Safety Manual SM.0002 Rev 02 Solenoid Valves For Gas Service FP02G & FP05G Compiled By : G. Elliott, Date: 31/10/2017 Reviewed By : Peter Kyrycz Date: 31/10/2017 Contents Terminology Definitions......3

More information

DATA ITEM DESCRIPTION Title: Failure Modes, Effects, and Criticality Analysis Report

DATA ITEM DESCRIPTION Title: Failure Modes, Effects, and Criticality Analysis Report DATA ITEM DESCRIPTION Title: Failure Modes, Effects, and Criticality Analysis Report Number: Approval Date: 20160106 AMSC Number: N9616 Limitation: No DTIC Applicable: Yes GIDEP Applicable: Yes Defense

More information

A hose layline contains important information for specifying the replacement assembly: manufacturer, hose trade name, working pressure and hose ID.

A hose layline contains important information for specifying the replacement assembly: manufacturer, hose trade name, working pressure and hose ID. CONTENTS Introduction Pressure Pressure Drop Temperature Rating Bend Radius Conclusion Additional Information SIDEBAR: Understanding Hydraulic Hose Reinforcement INTRODUCTION Hydraulic hose has a finite

More information

Accelerometer mod. TA18-S. SIL Safety Report

Accelerometer mod. TA18-S. SIL Safety Report Accelerometer mod. TA18-S SIL Safety Report SIL005/11 rev.1 of 03.02.2011 Page 1 of 7 1. Field of use The transducers are made to monitoring vibrations in systems that must meet particular technical safety

More information

Bespoke Hydraulic Manifold Assembly

Bespoke Hydraulic Manifold Assembly SIL SM.0003 1 Bespoke Hydraulic Manifold Assembly Compiled By : G. Elliott, Date: 12/17/2015 Contents Terminology Definitions......3 Acronyms & Abbreviations..4 1. Introduction 5 1.1 Scope 5 1.2 Relevant

More information

3. Real-time operation and review of complex circuits, allowing the weighing of alternative design actions.

3. Real-time operation and review of complex circuits, allowing the weighing of alternative design actions. PREFERRED RELIABILITY PAGE 1 OF 5 PRACTICES VOLTAGE & TEMPERATURE MARGIN TESTING Practice: Voltage and Temperature Margin Testing (VTMT) is the practice of exceeding the expected flight limits of voltage,

More information

CHAPTER 1 INTRODUCTION TO RELIABILITY

CHAPTER 1 INTRODUCTION TO RELIABILITY i CHAPTER 1 INTRODUCTION TO RELIABILITY ii CHAPTER-1 INTRODUCTION 1.1 Introduction: In the present scenario of global competition and liberalization, it is imperative that Indian industries become fully

More information

MIL-STD-883G METHOD

MIL-STD-883G METHOD STEADY-STATE LIFE 1. PURPOSE. The steady-state life test is performed for the purpose of demonstrating the quality or reliability of devices subjected to the specified conditions over an extended time

More information

SPR - Pneumatic Spool Valve

SPR - Pneumatic Spool Valve SIL SM.008 Rev 7 SPR - Pneumatic Spool Valve Compiled By : G. Elliott, Date: 31/08/17 Contents Terminology Definitions:... 3 Acronyms & Abbreviations:... 4 1.0 Introduction... 5 1.1 Purpose & Scope...

More information

Fail Operational Controls for an Independent Metering Valve

Fail Operational Controls for an Independent Metering Valve Group 14 - System Intergration and Safety Paper 14-3 465 Fail Operational Controls for an Independent Metering Valve Michael Rannow Eaton Corporation, 7945 Wallace Rd., Eden Prairie, MN, 55347, email:

More information

Eutectic Plug Valve. SIL Safety Manual. SIL SM.015 Rev 0. Compiled By : G. Elliott, Date: 19/10/2016. Innovative and Reliable Valve & Pump Solutions

Eutectic Plug Valve. SIL Safety Manual. SIL SM.015 Rev 0. Compiled By : G. Elliott, Date: 19/10/2016. Innovative and Reliable Valve & Pump Solutions SIL SM.015 Rev 0 Eutectic Plug Valve Compiled By : G. Elliott, Date: 19/10/2016 Contents Terminology Definitions......3 Acronyms & Abbreviations...4 1. Introduction..5 1.1 Scope 5 1.2 Relevant Standards

More information

Pneumatic QEV. SIL Safety Manual SIL SM Compiled By : G. Elliott, Date: 8/19/2015. Innovative and Reliable Valve & Pump Solutions

Pneumatic QEV. SIL Safety Manual SIL SM Compiled By : G. Elliott, Date: 8/19/2015. Innovative and Reliable Valve & Pump Solutions SIL SM.0010 1 Pneumatic QEV Compiled By : G. Elliott, Date: 8/19/2015 Contents Terminology Definitions......3 Acronyms & Abbreviations..4 1. Introduction 5 1.1 Scope 5 1.2 Relevant Standards 5 1.3 Other

More information

Transactions on the Built Environment vol 7, 1994 WIT Press, ISSN

Transactions on the Built Environment vol 7, 1994 WIT Press,  ISSN Service dependability of Italian high speed railway system: modeling and preliminary evaluations R. Calabria, L. Delia Ragione, G. Pulcini & M. Rap one Istituto Motori CNR, Via Marconi 8, 80125 Napoli,

More information

From Bombe stops to Enigma keys

From Bombe stops to Enigma keys From Bombe stops to Enigma keys A remarkably succinct description of the Bombe written many years ago, reads as follows:- The apparatus for breaking Enigma keys, by testing a crib and its implications

More information

Safety-Critical Systems

Safety-Critical Systems Software Testing & Analysis (F22ST3) Safety-Critical Systems Andrew Ireland School of Mathematical and Computer Science Heriot-Watt University Edinburgh Software Testing & Analysis (F22ST3) 2 What Are

More information

Reliability of Safety-Critical Systems Chapter 4. Testing and Maintenance

Reliability of Safety-Critical Systems Chapter 4. Testing and Maintenance Reliability of Safety-Critical Systems Chapter 4. Testing and Maintenance Mary Ann Lundteigen and Marvin Rausand mary.a.lundteigen@ntnu.no RAMS Group Department of Production and Quality Engineering NTNU

More information

Software Reliability 1

Software Reliability 1 Software Reliability 1 Software Reliability What is software reliability? the probability of failure-free software operation for a specified period of time in a specified environment input sw output We

More information

2 FUSION FITTINGS FOR USE WITH POLYETHYLENE PRESSURE PIPES DESIGN FOR DYNAMIC STRESSES

2 FUSION FITTINGS FOR USE WITH POLYETHYLENE PRESSURE PIPES DESIGN FOR DYNAMIC STRESSES Industry Guidelines Part 2 FUSION FITTINGS FOR USE WITH POLYETHYLENE PRESSURE PIPES DESIGN FOR DYNAMIC STRESSES ISSUE 5.1 Ref: POP10B 15 MAR 2010 Disclaimer In formulating this guideline PIPA has relied

More information

Reliability of Safety-Critical Systems Chapter 10. Common-Cause Failures - part 1

Reliability of Safety-Critical Systems Chapter 10. Common-Cause Failures - part 1 Reliability of Safety-Critical Systems Chapter 10. Common-Cause Failures - part 1 Mary Ann Lundteigen and Marvin Rausand mary.a.lundteigen@ntnu.no &marvin.rausand@ntnu.no RAMS Group Department of Production

More information

Real-Time & Embedded Systems

Real-Time & Embedded Systems Real-Time & Embedded Systems Agenda Safety Critical Systems Project 6 continued Safety Critical Systems Safe enough looks different at 35,000 feet. Bruce Powell Douglass The Air Force has a perfect operating

More information

Simple Time-to-Failure Estimation Techniques for Reliability and Maintenance of Equipment

Simple Time-to-Failure Estimation Techniques for Reliability and Maintenance of Equipment F E A T U R E A R T I C L E Simple Time-to-Failure Estimation Techniques for Reliability and Maintenance of Equipment Key Words: Reliability and maintenance, time-to-failure estimation, reactive maintenance

More information

Using what we have. Sherman Eagles SoftwareCPR.

Using what we have. Sherman Eagles SoftwareCPR. Using what we have Sherman Eagles SoftwareCPR seagles@softwarecpr.com 2 A question to think about Is there a difference between a medical device safety case and any non-medical device safety case? Are

More information

Hydraulic (Subsea) Shuttle Valves

Hydraulic (Subsea) Shuttle Valves SIL SM.009 0 Hydraulic (Subsea) Shuttle Valves Compiled By : G. Elliott, Date: 11/3/2014 Contents Terminology Definitions......3 Acronyms & Abbreviations..4 1. Introduction 5 1.1 Scope 5 1.2 Relevant Standards

More information

Safety-critical systems: Basic definitions

Safety-critical systems: Basic definitions Safety-critical systems: Basic definitions Ákos Horváth Based on István Majzik s slides Dept. of Measurement and Information Systems Budapest University of Technology and Economics Department of Measurement

More information

High Density FPGA Package BIST Technique

High Density FPGA Package BIST Technique High Density FPGA Package BIST Technique Douglas Goodman, James Hofmeister, Justin Judkins, PhD Ridgetop Group Inc. 3580 West Ina Road Tucson, AZ 85741 (520) 742-3300 Doug@ridgetop-group.com Abstract Over

More information

You Just Experienced an Electrical Failure, What Should You Do Next? By Don Genutis Hampton Tedder Technical Services

You Just Experienced an Electrical Failure, What Should You Do Next? By Don Genutis Hampton Tedder Technical Services You Just Experienced an Electrical Failure, What Should You Do Next? By Don Genutis Hampton Tedder Technical Services Why Failures Occur Insulation Failure - Every electrical component is comprised of

More information

Point level switches for safety systems

Point level switches for safety systems Point level switches for safety systems By: Bill Sholette Level Products Business Manager Northeast US - Endress+Hauser Point level switches are often used in applications designed to prevent accidents.

More information

Application Notes. Aluminium Electrolytic Capacitors

Application Notes. Aluminium Electrolytic Capacitors Application Notes Aluminium Electrolytic Capacitors , now part of the Evox Rifa Group, is one of Europe s leading manufacturers of Large Can Aluminium Capacitors. The Evox Rifa Group is a major global

More information

Understanding safety life cycles

Understanding safety life cycles Understanding safety life cycles IEC/EN 61508 is the basis for the specification, design, and operation of safety instrumented systems (SIS) Fast Forward: IEC/EN 61508 standards need to be implemented

More information

CHAPTER 4 FMECA METHODOLOGY

CHAPTER 4 FMECA METHODOLOGY CHAPTER 4 FMECA METHODOLOGY 4-1. Methodology moving into Criticality Analysis The FMECA is composed of two separate analyses, the FMEA and the Criticality Analysis (CA). The FMEA must be completed prior

More information

PROCEDURE. April 20, TOP dated 11/1/88

PROCEDURE. April 20, TOP dated 11/1/88 Subject: Effective Date: page 1 of 2 Initiated by: Failure Modes and Effects Analysis April 20, 1999 Supersedes: TOP 22.019 dated 11/1/88 Head, Engineering and Technical Infrastructure Approved: Director

More information

PI MODERN RELIABILITY TECHNIQUES OBJECTIVES. 5.1 Describe each of the following reliability assessment techniques by:

PI MODERN RELIABILITY TECHNIQUES OBJECTIVES. 5.1 Describe each of the following reliability assessment techniques by: PI 21. 05 PI 21. 05 MODERN RELIABILITY TECHNIQUES OBJECTIVES 5.1 Describe each of the following reliability assessment techniques by: ~) Stating its purpose. i1) Giving an e ample of where it is used.

More information

Chapter 5: Methods and Philosophy of Statistical Process Control

Chapter 5: Methods and Philosophy of Statistical Process Control Chapter 5: Methods and Philosophy of Statistical Process Control Learning Outcomes After careful study of this chapter You should be able to: Understand chance and assignable causes of variation, Explain

More information

THE CANDU 9 DISTRffiUTED CONTROL SYSTEM DESIGN PROCESS

THE CANDU 9 DISTRffiUTED CONTROL SYSTEM DESIGN PROCESS THE CANDU 9 DISTRffiUTED CONTROL SYSTEM DESIGN PROCESS J.E. HARBER, M.K. KATTAN Atomic Energy of Canada Limited 2251 Speakman Drive, Mississauga, Ont., L5K 1B2 CA9900006 and M.J. MACBETH Institute for

More information

Availability analysis of railway track circuit

Availability analysis of railway track circuit Availability analysis of railway track circuit A P Patra * and U Kumar Luleå Railway Research Center, Division of Operation and Maintenance Engineering, Luleå University of Technology, Sweden Abstract:

More information

Reliability of Safety-Critical Systems Chapter 3. Failures and Failure Analysis

Reliability of Safety-Critical Systems Chapter 3. Failures and Failure Analysis Reliability of Safety-Critical Systems Chapter 3. Failures and Failure Analysis Mary Ann Lundteigen and Marvin Rausand mary.a.lundteigen@ntnu.no RAMS Group Department of Production and Quality Engineering

More information

Every things under control High-Integrity Pressure Protection System (HIPPS)

Every things under control High-Integrity Pressure Protection System (HIPPS) Every things under control www.adico.co info@adico.co Table Of Contents 1. Introduction... 2 2. Standards... 3 3. HIPPS vs Emergency Shut Down... 4 4. Safety Requirement Specification... 4 5. Device Integrity

More information

Solenoid Valves used in Safety Instrumented Systems

Solenoid Valves used in Safety Instrumented Systems I&M V9629R1 Solenoid Valves used in Safety Instrumented Systems Operating Manual in accordance with IEC 61508 ASCO Valves Page 1 of 7 Table of Contents 1 Introduction...3 1.1 Terms and Abbreviations...3

More information

Improving distillation tower operation

Improving distillation tower operation Improving distillation tower operation Measuring differential pressure across long sections of distillation columns has always been challenging, but purpose-built sensor systems provide a solution Fast

More information

DETERMINATION OF SAFETY REQUIREMENTS FOR SAFETY- RELATED PROTECTION AND CONTROL SYSTEMS - IEC 61508

DETERMINATION OF SAFETY REQUIREMENTS FOR SAFETY- RELATED PROTECTION AND CONTROL SYSTEMS - IEC 61508 DETERMINATION OF SAFETY REQUIREMENTS FOR SAFETY- RELATED PROTECTION AND CONTROL SYSTEMS - IEC 61508 Simon J Brown Technology Division, Health & Safety Executive, Bootle, Merseyside L20 3QZ, UK Crown Copyright

More information

Achieving Compliance in Hardware Fault Tolerance

Achieving Compliance in Hardware Fault Tolerance Mirek Generowicz FS Senior Expert (TÜV Rheinland #183/12) Engineering Manager, I&E Systems Pty Ltd Abstract The functional safety standards ISA S84/IEC 61511 (1 st Edition, 2003) and IEC 61508 both set

More information

C. Mokkapati 1 A PRACTICAL RISK AND SAFETY ASSESSMENT METHODOLOGY FOR SAFETY- CRITICAL SYSTEMS

C. Mokkapati 1 A PRACTICAL RISK AND SAFETY ASSESSMENT METHODOLOGY FOR SAFETY- CRITICAL SYSTEMS C. Mokkapati 1 A PRACTICAL RISK AND SAFETY ASSESSMENT METHODOLOGY FOR SAFETY- CRITICAL SYSTEMS Chinnarao Mokkapati Ansaldo Signal Union Switch & Signal Inc. 1000 Technology Drive Pittsburgh, PA 15219 Abstract

More information

Understanding the How, Why, and What of a Safety Integrity Level (SIL)

Understanding the How, Why, and What of a Safety Integrity Level (SIL) Understanding the How, Why, and What of a Safety Integrity Level (SIL) Audio is provided via internet. Please enable your speaker (in all places) and mute your microphone. Understanding the How, Why, and

More information

Quality Planning for Software Development

Quality Planning for Software Development Quality Planning for Software Development Tom Walton Alcatel Networks tom.waltoniliiialcatel.com Abstract A historical project is used as a reference model for the verification planning process. The planning

More information

Safety Manual OPTISWITCH series relay (DPDT)

Safety Manual OPTISWITCH series relay (DPDT) Safety Manual OPTISWITCH series 5000 - relay (DPDT) 1 Content Content 1 Functional safety 1.1 In general................................ 3 1.2 Planning................................. 5 1.3 Adjustment

More information

A GUIDE TO RISK ASSESSMENT IN SHIP OPERATIONS

A GUIDE TO RISK ASSESSMENT IN SHIP OPERATIONS A GUIDE TO RISK ASSESSMENT IN SHIP OPERATIONS Page 1 of 7 INTRODUCTION Although it is not often referred to as such, the development and implementation of a documented safety management system is an exercise

More information

Chapter 5: Comparison of Inspection and Testing Results

Chapter 5: Comparison of Inspection and Testing Results Chapter 5: Comparison of Inspection and Testing Results Visual inspection, on-board testing, and laboratory analyses have distinct sensitivities and limitations. The results of the on-board testing can

More information

Advanced Pump Control for Irrigation Applications

Advanced Pump Control for Irrigation Applications Advanced Pump Control for Irrigation Applications Paul Nistler VFD Applications Engineer And Julian Atchia Director of Research and Development SJE Rhombus 22650 County Hwy 6 Detroit Lakes MN 56502 Executive

More information

PL estimation acc. to EN ISO

PL estimation acc. to EN ISO PL estimation acc. to EN ISO 3849- Example calculation for an application MAC Safety / Armin Wenigenrath, January 2007 Select the suitable standard for your application Reminder: The standards and the

More information

Reliability Considerations for Power Supplies

Reliability Considerations for Power Supplies Reliability Considerations for Power Supplies Power supplies may not have the glamour, nor get the attention that processors and displays receive, but they are just as vital to system operation. A failed

More information

The Best Use of Lockout/Tagout and Control Reliable Circuits

The Best Use of Lockout/Tagout and Control Reliable Circuits Session No. 565 The Best Use of Lockout/Tagout and Control Reliable Circuits Introduction L. Tyson Ross, P.E., C.S.P. Principal LJB Inc. Dayton, Ohio Anyone involved in the design, installation, operation,

More information

Operator Exposed to Chlorine Gas

Operator Exposed to Chlorine Gas Operator Exposed to Chlorine Gas Lessons Learned Volume 04 Issue 29 2004 USW Operator Exposed to Chlorine Gas Purpose To conduct a small group lessons learned activity to share information gained from

More information

DeZURIK. KGC Cast Knife Gate Valve. Safety Manual

DeZURIK. KGC Cast Knife Gate Valve. Safety Manual KGC Cast Knife Gate Valve Safety Manual Manual D11036 August 29, 2014 Table of Contents 1 Introduction... 3 1.1 Terms... 3 1.2 Abbreviations... 4 1.3 Product Support... 4 1.4 Related Literature... 4 1.5

More information

How to Define Your Systems and Assets to Support Reliability. How to Define Your Failure Reporting Codes to Support Reliability

How to Define Your Systems and Assets to Support Reliability. How to Define Your Failure Reporting Codes to Support Reliability BACKFED RELIABILITY How to Define Your Systems and Assets to Support Reliability How to Define Your Failure Reporting Codes to Support Reliability How to Generate Risk Prioritization Numbers (RPN) from

More information

COMPARISON OF DIFFERENTIAL PRESSURE SENSING TECHNOLOGIES IN HOSPITAL ISOLATION ROOMS AND OTHER CRITICAL ENVIRONMENT APPLICATIONS

COMPARISON OF DIFFERENTIAL PRESSURE SENSING TECHNOLOGIES IN HOSPITAL ISOLATION ROOMS AND OTHER CRITICAL ENVIRONMENT APPLICATIONS COMPARISON OF DIFFERENTIAL PRESSURE SENSING TECHNOLOGIES IN HOSPITAL ISOLATION ROOMS AND OTHER CRITICAL ENVIRONMENT APPLICATIONS APPLICATION NOTE LC-136 Introduction Specialized spaces often times must

More information

Vibration and Pulsation Analysis and Solutions

Vibration and Pulsation Analysis and Solutions 1 Vibration and Pulsation Analysis and Solutions Brian Howes, M.Sc., P.Eng. Beta Machinery Analysis Ltd. Problems created by excessive vibration in machinery can have serious economic impact. Frequently

More information

Advanced Test Equipment Rentals ATEC (2832) OMS 600

Advanced Test Equipment Rentals ATEC (2832) OMS 600 Established 1981 Advanced Test Equipment Rentals www.atecorp.com 800-404-ATEC (2832) OMS 600 Continuous partial discharge monitoring system for power generators and electrical motors Condition monitoring

More information

5.1 Introduction. Learning Objectives

5.1 Introduction. Learning Objectives Learning Objectives 5.1 Introduction Statistical Process Control (SPC): SPC is a powerful collection of problem-solving tools useful in achieving process stability and improving capability through the

More information

Pressure Sensor Bridge Configurations

Pressure Sensor Bridge Configurations Pressure Sensor Bridge Configurations 1. Purpose Describe different pressure sensor bridge configurations, when each can and cannot be used, and the advantages and disadvantages of each. 2. Introduction

More information

Three Approaches to Safety Engineering. Civil Aviation Nuclear Power Defense

Three Approaches to Safety Engineering. Civil Aviation Nuclear Power Defense Three Approaches to Safety Engineering Civil Aviation Nuclear Power Defense Civil Aviation Fly-fix-fly: analysis of accidents and feedback of experience to design and operation Fault Hazard Analysis: Trace

More information

Transmitter mod. TR-A/V. SIL Safety Report

Transmitter mod. TR-A/V. SIL Safety Report Transmitter mod. TR-A/V SIL Safety Report SIL003/09 rev.1 del 09.03.2009 Pagina 1 di 7 1. Employ field The transmitters are dedicated to the vibration monitoring in plants where particular safety requirements

More information

PRO-ROD TM COILED ROD. Reduce Maintenance Increase Production Enhance Profit

PRO-ROD TM COILED ROD. Reduce Maintenance Increase Production Enhance Profit PRO-ROD TM COILED ROD Reduce Maintenance Increase Production Enhance Profit DOVER ARTIFICIAL LIFT Elevating the Potential of Artificial Lift Production. Dover Artificial Lift, part of Dover Energy, offers

More information

Determining Occurrence in FMEA Using Hazard Function

Determining Occurrence in FMEA Using Hazard Function Determining Occurrence in FMEA Using Hazard Function Hazem J. Smadi Abstract FMEA has been used for several years and proved its efficiency for system s risk analysis due to failures. Risk priority number

More information

A study on the relation between safety analysis process and system engineering process of train control system

A study on the relation between safety analysis process and system engineering process of train control system A study on the relation between safety analysis process and system engineering process of train control system Abstract - In this paper, the relationship between system engineering lifecycle and safety

More information

Failure Modes, Effects and Diagnostic Analysis

Failure Modes, Effects and Diagnostic Analysis Failure Modes, Effects and Diagnostic Analysis Project: Solenoid Drivers KFD2-SL2-(Ex)1.LK.vvcc KFD2-SL2-(Ex)*(.B).vvcc Customer: Pepperl+Fuchs GmbH Mannheim Germany Contract No.: P+F 06/09-23 Report No.:

More information

AUTOMATIC FLOW CARTRIDGES. Competition Analysis Not All Automatics Are Created Equal

AUTOMATIC FLOW CARTRIDGES. Competition Analysis Not All Automatics Are Created Equal - AUTOMATIC FLOW CARTRIDGES Competition Analysis Not All Automatics Are Created Equal IMI FLOW DESIGN / AUTOMATIC FLOW CARTRIDGES / F345.0 COMPETITION ANALYSIS Improve the of your Premium System Components...

More information

Employ The Risk Management Process During Mission Planning

Employ The Risk Management Process During Mission Planning Employ The Risk Management Process During Mission Planning TSG 154-6465 Task(s) TASK NUMBER TASK TITLE Taught or 154-385-6465 Employ The Risk Management Process During Mission Planning Supported Task(s)

More information

SIL explained. Understanding the use of valve actuators in SIL rated safety instrumented systems ACTUATION

SIL explained. Understanding the use of valve actuators in SIL rated safety instrumented systems ACTUATION SIL explained Understanding the use of valve actuators in SIL rated safety instrumented systems The requirement for Safety Integrity Level (SIL) equipment can be complicated and confusing. In this document,

More information

Verification Of Calibration for Direct-Reading Portable Gas Monitors

Verification Of Calibration for Direct-Reading Portable Gas Monitors U. S. Department of Labor Occupational Safety and Health Administration Directorate of Science, Technology and Medicine Office of Science and Technology Assessment Verification Of Calibration for Direct-Reading

More information

Engineering Note. Algorithms. Overview. Detailed Algorithm Description. NeoFox Calibration and Measurement. Products Affected: NeoFox

Engineering Note. Algorithms. Overview. Detailed Algorithm Description. NeoFox Calibration and Measurement. Products Affected: NeoFox Engineering Note Topic: NeoFox Calibration and Measurement Products Affected: NeoFox Date Issued: 04/18/2011 Algorithms Overview NeoFox is a dynamic measurement system that has been designed to work with

More information

Reliability Analysis Including External Failures for Low Demand Marine Systems

Reliability Analysis Including External Failures for Low Demand Marine Systems Reliability Analysis Including External Failures for Low Demand Marine Systems KIM HyungJu a*, HAUGEN Stein a, and UTNE Ingrid Bouwer b a Department of Production and Quality Engineering NTNU, Trondheim,

More information

Effects of Traffic Signal Retiming on Safety. Peter J. Yauch, P.E., PTOE Program Manager, TSM&O Albeck Gerken, Inc.

Effects of Traffic Signal Retiming on Safety. Peter J. Yauch, P.E., PTOE Program Manager, TSM&O Albeck Gerken, Inc. Effects of Traffic Signal Retiming on Safety Peter J. Yauch, P.E., PTOE Program Manager, TSM&O Albeck Gerken, Inc. Introduction It has long been recognized that traffic signal timing can have an impact

More information

GUIDE TO RUNNING A BIKE SHARE. h o w t o p l a n a n d o p e r a t e a s u c c e s s f u l b i k e s h a r e p r o g r a m

GUIDE TO RUNNING A BIKE SHARE. h o w t o p l a n a n d o p e r a t e a s u c c e s s f u l b i k e s h a r e p r o g r a m GUIDE TO RUNNING A BIKE SHARE h o w t o p l a n a n d o p e r a t e a s u c c e s s f u l b i k e s h a r e p r o g r a m 20150209 The bicycle is the most loved form of transportation. No other machine

More information

Citation for published version (APA): Canudas Romo, V. (2003). Decomposition Methods in Demography Groningen: s.n.

Citation for published version (APA): Canudas Romo, V. (2003). Decomposition Methods in Demography Groningen: s.n. University of Groningen Decomposition Methods in Demography Canudas Romo, Vladimir IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please

More information

White Paper. Chemical Sensor vs NDIR - Overview: NDIR Technology:

White Paper. Chemical Sensor vs NDIR - Overview: NDIR Technology: Title: Comparison of Chemical Sensor and NDIR Technologies TSN Number: 25 File:\\MII- SRV1\Metron\Bridge_Analyzers\Customer_Service_Documentation\White_Papers\25_G en EGA NDIR vs Chemical Sensor.docx Created

More information

Becker* Products Below Ground Ball Valve Regulators

Becker* Products Below Ground Ball Valve Regulators GE Oil & Gas Becker* Products Below Ground Ball Valve Regulators Reduce Noise Levels at Large Volume Regulator Stations GE s Becker* Below Ground ball valve regulator has the long-term proven reliability,

More information

The Application Of Computer Modeling To Improve The Integrity Of Ballast Tanks

The Application Of Computer Modeling To Improve The Integrity Of Ballast Tanks Paper No. 4255 The Application Of Computer Modeling To Improve The Integrity Of Ballast Tanks Robert Adey, Guy Bishop, John Baynham, CM BEASY Ltd Ashurst Lodge Southampton, SO40 7AA, UK ABSTRACT Generally

More information

A SEMI-PRESSURE-DRIVEN APPROACH TO RELIABILITY ASSESSMENT OF WATER DISTRIBUTION NETWORKS

A SEMI-PRESSURE-DRIVEN APPROACH TO RELIABILITY ASSESSMENT OF WATER DISTRIBUTION NETWORKS A SEMI-PRESSURE-DRIVEN APPROACH TO RELIABILITY ASSESSMENT OF WATER DISTRIBUTION NETWORKS S. S. OZGER PhD Student, Dept. of Civil and Envir. Engrg., Arizona State Univ., 85287, Tempe, AZ, US Phone: +1-480-965-3589

More information

Probability Risk Assessment Methodology Usage on Space Robotics for Free Flyer Capture

Probability Risk Assessment Methodology Usage on Space Robotics for Free Flyer Capture 6 th IAASS International Space Safety Conference Probability Risk Assessment Methodology Usage on Space Robotics for Free Flyer Capture Oneil D silva Roger Kerrison Page 1 6 th IAASS International Space

More information

A i r c r a f t E l e c t r i c a l S y s t e m s ( 1 2 B )

A i r c r a f t E l e c t r i c a l S y s t e m s ( 1 2 B ) 8 5 4 9 A i r c r a f t E l e c t r i c a l S y s t e m s ( 1 2 B ) 40S/40E/40M An Aviation and Aerospace Technologies Course 8 5 4 9 : A i r c r a f t E l e c t r i c a l S y s t e m s ( 1 2 B ) 4 0

More information

Review and Assessment of Engineering Factors

Review and Assessment of Engineering Factors Review and Assessment of Engineering Factors 2013 Learning Objectives After going through this presentation the participants are expected to be familiar with: Engineering factors as follows; Defense in

More information

Safety Manual VEGAVIB series 60

Safety Manual VEGAVIB series 60 Safety Manual VEGAVIB series 60 Contactless electronic switch Document ID: 32002 Contents Contents 1 Functional safety... 3 1.1 General information... 3 1.2 Planning... 4 1.3 Adjustment instructions...

More information

TG GUIDELINES CONCERNING CALIBRATION INTERVALS AND RECALIBRATION

TG GUIDELINES CONCERNING CALIBRATION INTERVALS AND RECALIBRATION GUIDELINES CONCERNING CALIBRATION INTERVALS AND RECALIBRATION Approved By: Senior Manager: Mpho Phaloane Revised By: Field Manager: Neville Tayler Date of Approval: 2015-08-26 Date of Implementation: 2015-08-26

More information

FLIGHT TEST RISK ASSESSMENT THREE FLAGS METHOD

FLIGHT TEST RISK ASSESSMENT THREE FLAGS METHOD FLIGHT TEST RISK ASSESSMENT THREE FLAGS METHOD Author: Maximilian Kleinubing BS. Field: Aeronautical Engineering, Flight Test Operations Keywords: Flight Test, Safety Assessment, Flight Test Safety Assessment

More information

Policy Management: How data and information impacts the ability to make policy decisions:

Policy Management: How data and information impacts the ability to make policy decisions: Policy Management: How data and information impacts the ability to make policy decisions: Louis Cripps Regional Transportation District, Asset Management Denver, Colorado Quick exercise... What do these

More information

Service Calibration are you doing it properly?

Service Calibration are you doing it properly? ABB MEASUREMENT & ANALYTICS ARTICLE Service Calibration are you doing it properly? A variety of factors, including wear and tear and non-ideal installation conditions, can cause the performance of an instrument

More information

(DD/MMM/YYYY): 10/01/2013 IP

(DD/MMM/YYYY): 10/01/2013 IP Title: Submitter: CPCP for Safe Life Items EASA, MRB Section Applies To: Vol 1: Vol 2: Both: X Issue: Problem: A Corrosion Prevention and Control Programme (CPCP) is required for all primary aircraft structure

More information

An Application of Signal Detection Theory for Understanding Driver Behavior at Highway-Rail Grade Crossings

An Application of Signal Detection Theory for Understanding Driver Behavior at Highway-Rail Grade Crossings An Application of Signal Detection Theory for Understanding Driver Behavior at Highway-Rail Grade Crossings Michelle Yeh and Jordan Multer United States Department of Transportation Volpe National Transportation

More information