Reliability is the ability of an apparatus, machine, or system to consistently perform its intended or required function or mission, on demand and without degradation or failure. Consistency and validity of test results determined through statistical methods after repeated trials.
Reliability engineering is engineering that emphasizes dependability in the lifecycle management of a product. Dependability, or reliability, describes the ability of a system or component to function under stated conditions for a specified period of time. Reliability may also describe the ability to function at a specified moment or interval of time (Availability). Reliability engineering represents a sub-discipline within systems engineering. Reliability is theoretically defined as the probability of success (Reliability = 1 − Probability of Failure), as the frequency of failures; or in terms of availability, as a probability derived from reliability, testability and maintainability. Testability, Maintainability and maintenance are often defined as a part of “reliability engineering” in Reliability Programs. Reliability plays a key role in the cost-effectiveness of systems.
Reliability Centered Maintenance
Preventive Maintenance (PM) assumes that failure probabilities can be determined statistically for individual machines and components, and parts can be replaced or adjustments can be performed in time to preclude failure. For example, a common practice has been to replace or renew bearings after so many operating hours assuming that bearing failure rate increases with time in service.
Reliability-centered maintenance is much more than just another way to do maintenance. In a nutshell, it’s a way of looking at system performance in terms of the impact of a failure and then mitigating those results by design, detection, or effective maintenance.
You don’t need to understand every detail about your systems to practice reliability-centered maintenance; rather, you must understand what components and equipment can be allowed to fail, what must be monitored, and what the consequences are of the failure.
While this approach requires a good understanding of design constraints and system-performance data, you don’t need a time-consuming and expensive process to positively impact most facilities systems. The concept of RCM has been adopted across several government and industry operations as a strategy for performing maintenance. RCM applies maintenance strategies based on consequence and cost of failure. In addition, RCM seeks to minimize maintenance and improve reliability throughout the life-cycle by using proactive techniques such as improved design specifications, integration of condition monitoring in the commissioning process, and the Age Exploration (AE) process.
Many professionals view reliability-centered maintenance and predictive technologies as add-on technologies to a preventive maintenance program; however, reliability-centered maintenance is a tool that redesigns your overall maintenance strategy to improve system performance. An outcome of a reliability-centered maintenance analysis might show that you need to improve or replace some system components or equipment, improve your monitoring capabilities, or change or eliminate preventive maintenance activities.
Reliability-Centered Maintenance (RCM) is the optimum mix of reactive, time- or interval-based, condition-based, and proactive maintenance practices. The basic application of each strategy is shown in figure below. These principal maintenance strategies, rather than being applied independently, are integrated to take advantage of their respective strengths in order to maximize facility and equipment reliability while minimizing life-cycle costs.
The first component of an RCM program, is Reactive. The reactive component’s attributes are: small items, non-critical, inconsequential, unlikely to fail, and redundant. The second component is Interval (Preventative Maintenance). Interval’s attributes are: subject to wear-out, consumable replacement, failure pattern known. Condition based Maintenance is the third component. Its attributes are: random failure patterns, not subject to wear, and PM induced failures. Proactive is the fourth component. Its attributes are: Root Cause Failure Analysis, age exploration, Failure modes and Effects Analysis, and acceptance testing.
RCM includes reactive, time-based, condition-based, and proactive tasks. In addition, a user should understand system boundaries and facility envelopes, system/equipment functions, functional failures, and failure modes, all of which are critical components of the RCM program.
Preventive Maintenance (PM) assumes that failure probabilities can be determined statistically for individual machines and components, and parts can be replaced or adjustments can be performed in time to preclude failure. For example, a common practice has been to replace or renew bearings after so many operating hours assuming that bearing failure rate increases with time in service.
It should not be inferred from the above that all interval-based maintenance should be replaced by condition-based maintenance. In fact, interval-based maintenance is appropriate for those instances where abrasive, erosive, or corrosive wear takes place, material properties change due to fatigue, embrittlement, etc. and/or a clear correlation between age and functional reliability exists.
The concept of RCM has been adopted across several government and industry operations as a strategy for performing maintenance. RCM applies maintenance strategies based on consequence and cost of failure. In addition, RCM seeks to minimize maintenance and improve reliability throughout the life-cycle by using proactive techniques such as improved design specifications, integration of condition monitoring in the commissioning process, and the Age Exploration (AE) process.
The primary RCM principles are:
- RCM is Function Oriented—RCM seeks to preserve system or equipment function, not just operability for operability’s sake. Redundancy of function, through multiple pieces of equipment, improves functional reliability but increases life-cycle cost in terms of procurement and operating costs.
- RCM is System Focused—RCM is more concerned with maintaining system function than with individual component function.
- RCM is Reliability Centered—RCM treats failure statistics in an actuarial manner. The relationship between operating age and the failures experienced is important. RCM is not overly concerned with simple failure rate; it seeks to know the conditional probability of failure at specific ages (the probability that failure will occur in each given operating age bracket).
- RCM Acknowledges Design Limitations—RCM objective is to maintain the inherent reliability of the equipment design, recognizing that changes in inherent reliability are the province of design rather than of maintenance. Maintenance can, at best, only achieve and maintain the level of reliability for equipment that was provided for by design. However, RCM recognizes that maintenance feedback can improve on the original design. In addition, RCM recognizes that a difference often exists between the perceived design life and the intrinsic or actual design life and addresses this through the Age Exploration (AE) process.
- RCM is driven by Safety, Security, and Economics—Safety and security must be ensured at any cost; thereafter, cost-effectiveness becomes the criterion.
- RCM Defines Failure as “Any Unsatisfactory Condition”—Therefore, failure may be either a loss of function (operation ceases) or a loss of acceptable quality (operation continues but impacts quality).
- RCM Uses a Logic Tree to Screen Maintenance Tasks—This provides a consistent approach to the maintenance of all kinds of equipment.
- RCM Tasks Must Be Applicable—The tasks must address the failure mode and consider the failure mode characteristics.
- RCM Tasks Must Be Effective—The tasks must reduce the probability of failure and be cost-effective.
- RCM Acknowledges Three Types of Maintenance Tasks—These tasks are time-directed (PM), condition-directed (CM), and failure finding (one of several aspects of Proactive Maintenance). Time-directed tasks are scheduled when appropriate. Condition-directed tasks are performed when conditions indicate they are needed. Failure-finding tasks detect hidden functions that have failed without giving evidence of pending failure. Additionally, performing no maintenance, Run-to-Failure, is a conscious decision and is acceptable for some equipment.
- RCM is a Living System—RCM gathers data from the results achieved and feeds this data back to improve design and future maintenance. This feedback is an important part of the Proactive Maintenance element of the RCM program.
RCM Types – There are several ways to conduct and implement an RCM program. The program can be based on rigorous Failure Modes and Effects Analysis (FMEA), complete with mathematically-calculated probabilities of failure based on design or historical data, intuition or common-sense, and/or experimental data and modeling. These approaches may be called Classical, Rigorous, Intuitive, Streamlined, or Abbreviated. Other terms sometimes used for these same approaches include Concise, Preventive Maintenance (PM) Optimization, Reliability Based, and Reliability Enhanced. All are applicable. The decision of what technique to use should be left to the end user and be based on:
- Consequences of failure
- Probability of failure
- Historical data available
- Risk tolerance
- Resource availability
RCM Analysis – The RCM analysis should carefully consider and answer the following questions:
- What does the system or equipment do; what are the functions?
- What functional failures are likely to occur?
- What are the likely consequences of these functional failures?
- What can be done to reduce the probability of the failure(s), identify the onset of failure(s), or reduce the consequences of the failure(s)?
Answers to these four questions can be used with the decision logic tree depicted in figure below
The analysis process has only four possible outcomes:
- Perform Condition-Based actions (CM).
- Perform Interval (Time- or Cycle-) Based actions (PM).
- Determine that redesign will solve the problem and accept the failure risk, or determine that no maintenance action will reduce the probability of failure install redundancy.
- Perform no action and choose to repair following failure (Run-to-Failure).