Read an Excerpt
Reliability, Maintainability and RiskPractical methods for engineers
By David J Smith
Butterworth-HeinemannCopyright © 2011 David J. Smith
All right reserved.
Chapter OneThe History of Reliability and Safety Technology
Safety/Reliability engineering did not develop as a unified discipline, but grew out of the integration of a number of activities, previously the province of various branches of engineering.
Since no human activity can enjoy zero risk, and no equipment has a zero rate of failure, there has emerged a safety technology for optimizing risk. This attempts to balance the risk of a given activity against its benefits and seeks to assess the need for further risk reduction depending upon the cost.
Similarly, reliability engineering, beginning in the design phase, attempts to select the design compromise that balances the cost of reducing failure rates against the value of the enhanced performance.
The abbreviation RAMS is frequently used for ease of reference to reliability, availability, maintainability and safety-integrity.
1.1 Failure Data
Throughout the history of engineering, reliability improvement (also called reliability growth), arising as a natural consequence of the analysis of failure, has long been a central feature of development. This 'test and correct' principle was practiced long before the development of formal procedures for data collection and analysis for the reason that failure is usually self-evident and thus leads, inevitably, to design modifications.
The design of safety-related systems (for example, railway signaling) has evolved partly in response to the emergence of new technologies but largely as a result of lessons learnt from failures. The application of technology to hazardous areas requires the formal application of this feedback principle in order to maximize the rate of reliability improvement. Nevertheless, as mentioned above, all engineered products will exhibit some degree of reliability growth even without formal improvement programs.
Nineteenth- and early twentieth-century designs were less severely constrained by the cost and schedule pressures of today. Thus, in many cases, high levels of reliability were achieved as a result of over-design. The need for quantified reliability assessment techniques during the design and development phase was not therefore identified.
Therefore, failure rates of engineered components were not required, as they are now, for use in prediction techniques and consequently there was little incentive for the formal collection of failure data.
Another factor is that, until well into the twentieth century, component parts were individually fabricated in a 'craft' environment. Mass production, and the attendant need for component standardization, did not apply and the concept of a valid repeatable component failure rate could not exist. The reliability of each product was highly dependent on the craftsman/ manufacturer and less determined by the 'combination' of component reliabilities.
Nevertheless, mass production of standard mechanical parts has been the case for over a hundred years. Under these circumstances defective items can be readily identified, by inspection and test, during the manufacturing process, and it is possible to control reliability by quality-control procedures.
The advent of the electronic age, accelerated by the Second World War, led to the need for more complex mass-produced component parts with a higher degree of variability in the parameters and dimensions involved. The experience of poor field reliability of military equipment throughout the 1940s and 1950s focused attention on the need for more formal methods of reliability engineering. This gave rise to the collection of failure information from both the field and from the interpretation of test data. Failure rate databanks were created in the mid-1960s as a result of work at such organizations as UKAEA (UK Atomic Energy Authority) and RRE (Royal Radar Establishment, UK) and RADC (Rome Air Development Corporation, US).
The manipulation of the data was manual and involved the calculation of rates from the incident data, inventories of component types and the records of elapsed hours. This was stimulated by the advent of reliability prediction modeling techniques that require component failure rates as inputs to the prediction equations.
The availability and low cost of desktop personal computing (PC) facilities, together with versatile and powerful software packages, has permitted the listing and manipulation of incident data with an order of magnitude less effort. Fast automatic sorting of data encourages the analysis of failures into failure modes. This is no small factor in contributing to more effective reliability assessment, since raw failure rates permit only parts count reliability predictions. In order to address specific system failures it is necessary to input specific component failure modes into the fault tree or failure mode analyses.
The requirement for field recording makes data collection labor intensive and this remains a major obstacle to complete and accurate information. Motivating staff to provide field reports with sufficient relevant detail is an ongoing challenge for management. The spread of PC facilities in this area will assist in that interactive software can be used to stimulate the required information input at the same time as other maintenance-logging activities.
With the rapid growth of built-in test and diagnostic features in equipment, a future trend ought to be the emergence of automated fault reporting.
Failure data have been published since the 1960s and each major document is described in Chapter 4.
1.2 Hazardous Failures
In the early 1970s the process industries became aware that, with larger plants involving higher inventories of hazardous material, the practice of learning by mistakes was no longer acceptable. Methods were developed for identifying hazards and for quantifying the consequences of failures. They were evolved largely to assist in the decision-making process when developing or modifying plants. External pressures to identify and quantify risk were to come later.
By the mid-1970s there was already concern over the lack of formal controls for regulating those activities which could lead to incidents having a major impact on the health and safety of the general public. The Flixborough incident in June 1974 resulted in 28 deaths and focused public and media attention on this area of technology. Successive events such as the tragedy at Seveso in Italy in 1976 right through to the Piper Alpha offshore and more recent Paddington rail and Texaco Oil Refinery incidents have kept that interest alive and resulted in guidance and legislation, which are addressed in Chapters 19 and 20.
The techniques for quantifying the predicted frequency of failures were originally applied to assessing plant availability, where the cost of equipment failure was the prime concern. Over the last twenty years these techniques have also been used for hazard assessment. Maximum tolerable risks of fatality have been established according to the nature of the risk and the potential number of fatalities. These are then assessed using reliability techniques. Chapter 10 deals with risk in more detail.
1.3 Reliability and Risk Prediction
System modeling, using failure mode analysis and fault tree analysis methods, has been developed over the last thirty years and now involves numerous software tools which enable predictions to be updated and refined throughout the design cycle. The criticality of the failure rates of specific component parts can be assessed and, by successive computer runs, adjustments to the design configuration (e.g. redundancy) and to the maintenance philosophy (e.g. proof test frequencies) can be made early in the design cycle in order to optimize reliability and availability. The need for failure rate data to support these predictions has therefore increased and Chapter 4 examines the range of data sources and addresses the problem of variability within and between them.
The value and accuracy of reliability prediction, based on the concept of validly repeatable component failure rates, has long been controversial.
First, the extremely wide variability of failure rates of allegedly identical components, under supposedly identical environmental and operating conditions, is now acknowledged. The apparent precision offered by reliability prediction models is thus not compatible with the accuracy of the failure rate parameter. As a result, it can be argued that simple assessments of failure rates and the use of simple models suffice. In any case, more accurate predictions can be both misleading and a waste of money.
The main benefit of reliability prediction of complex systems lies not in the absolute figure predicted but in the ability to repeat the assessment for different repair times, different redundancy arrangements in the design configuration and different values of component failure rate. This has been made feasible by the emergence of PC tools (e.g. fault tree analysis packages) that permit rapid reruns of the prediction. Thus, judgements can be made on the basis of relative predictions with more confidence than can be placed on the absolute values.
Second, the complexity of modern engineering products and systems ensures that system failure is not always attributable to single component part failure. More subtle factors, such as the following, can often dominate the system failure rate:
failure resulting from software elements
failure due to human factors or operating documentation
failure due to environmental factors
failure whereby redundancy is defeated by factors common to the replicated units
failure due to ambiguity in the specification
failure due to timing constraints within the design
failure due to combinations of component parameter tolerance.
Excerpted from Reliability, Maintainability and Risk by David J Smith Copyright © 2011 by David J. Smith. Excerpted by permission of Butterworth-Heinemann. All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.