Reliability predictions are an important tool for making design trade-off decisions and estimating future system reliability. They are often used for making initial product support decisions such as how many spares are required to support fielded systems. Inaccurate predictions can lead to overly conservative designs and/or excessive spare parts procurement resulting in added life cycle cost (LCC).
Parts stress modeling is a method in engineering and especially electronics to find an expected value for the rate of failure of the mechanical and electronic components of a system. It is based upon the idea that the more components that there are in the system, and the greater stress that they undergo in operation, the more often they will fail.
Parts count modeling is a simpler variant of the method, with component stress not taken into account. Various organisations have published standards specifying how parts stress modeling should be carried out. Some from electronics are:
- MIL-HDBK-217 (US Department of Defense)
- SR-332 Reliability Prediction Procedure for Electronic Equipment
- HRD-4 (British Telecom)
- SR-1171 Methods and Procedures for System Reliability Analysis
- and many others
These “standards” produce different results, often by a factor of more than two, for the same modeled system. The differences illustrate the fact that this modeling is not an exact science. System designers often have to do the modeling using a standard specified by a customer, so that the customer can compare the results with other systems modeled in the same way.
All of these standards compute an expected overall failure rate for all the components in the system, which is not necessarily the rate at which the system as a whole fails. Systems often incorporate redundancy or fault tolerance so that they do not fail when an individual component fails.
Several companies provide programs for performing parts stress modeling calculations. It’s also possible to do the modeling with a spreadsheet.
All these models implicitly assume the idea of “random failure”. Individual components fail at random times but at a predictable rate, analogous to the process of nuclear decay. One justification for this idea is that components fail by a process of wear out, a predictable decay after manufacture, but that the wear out life of individual components is scattered widely about some very long mean. The observed “random” failures are then just the extreme outliers at the early edge of this distribution. However, this may not be the whole picture.