Weibull Distribution
Dear Statman- I have heard the term "Weibull distribution," but I don’t know what it means. Could you please explain it?
Signed, Weibull’s wobble but they don’t fall down.
Dear Wobbly- The Weibull distribution is often used to describe the life times of parts. These can be light bulbs, capacitors, disk drives, ball bearings, etc. When a number of parts are put on test, they don’t all fail at the same time (if they do, you might wonder if something went wrong). Usually, there is some spread in the failure times.
In the past, I have discussed the normal distribution, or bell curve. It is a very handy tool in describing all sorts of different data. However, it may not work well here. That’s because the normal distribution allows some observations to be negative. When you life test something, you know it didn’t fail before time t = 0. So, the normal distribution won’t do.
If parts fail according to a Weibull distribution, the probability that any single part will fail at a particular time, t is F(t) = 1 – exp[-(t/a)b] where "a" is called the scale parameter, "b" is called the shape parameter, and F is called the cumulative distribution function. If we knew a and b, then we could plug them into the above formula and calculate F (the probability of failure) at any time, t.
The parameters a and b can be estimated from the data. An example will make this clear. Suppose some light bulbs are life tested and fail at the following times: 270, 289, 290, 292, 293, 296, 310, 313, 339, and 345 hours. The equation for F(t) can be turned into a regression equation as follows:
F(t) = 1 – exp[-(t/a)b] ln[(1 – F(t)] = -(t/a)b ln(-ln[1 – F(t)]) = bln(t) – bln(a) or Y = mX + cwhere: Y = ln(-ln[1 – F(t)]), m = b, X = ln(t), and c = -bln(a).
"ln" means the natural log. Now we have the more familiar equation for a straight line with slope m and y-intercept c. The next step is to estimate F(t). Note that the failure times above are ranked from lowest to highest. There are 10 failure times, and each one can be assigned a rank of 1, 2, 3, … etc. The probability of failure at a particular time t, F(t), can be roughly estimated by the rank of the failure time divided by the sample size, in this case, 10. Let’s put all this in a table:
Weibull life data
rank |
failure time |
F |
Y
ln(-ln[1 F(t)]) |
X
ln(t) |
1 |
270 |
0.1 |
-2.250 |
5.598 |
2 |
289 |
0.2 |
-1.500 |
5.666 |
3 |
290 |
0.3 |
-1.031 |
5.670 |
4 |
292 |
0.4 |
-0.672 |
5.677 |
5 |
293 |
0.5 |
-0.367 |
5.680 |
6 |
296 |
0.6 |
-0.087 |
5.690 |
7 |
310 |
0.7 |
0.186 |
5.737 |
8 |
313 |
0.8 |
0.476 |
5.746 |
9 |
339 |
0.9 |
0.834 |
5.826 |
10 |
345 |
1 |
- |
- |
The last row has no data since ln(0) isnt defined.
Plotting this data yields:
| Using regression software, the slope of the best-fit line
is estimated to be 14.2, and the intercept to be 81.2. From above, the slope is an
estimate of b and a is found to be 309.5. Using another technique called "maximum
likelihood", the estimates of a and b are 13.6 and 314.5, respectively. It turns out
the "real" values are 13 and 300. So, both sets of estimates are pretty good.
|
| If you tested thousands of parts and made up a histogram
of the life times, it would look like a slightly "lop-sided" bell curve: |
|

Another interesting point. The shape parameter b can also
tell us something about the failure rate. It turns out that when b < 1, the failure
rate is decreasing, but when b > 1, the failure rate increases. Suppose your life data
come from a Weibull distribution and you find that b < 1. That means that as time goes
on, the failure rate becomes smaller. This might be because you have defects in some of
your parts causing high failure rates at the beginning (so-called infant mortality). As
the defective parts die, the failure rate goes down. Alternatively, if b > 1, then the
failure rate is increasing. As parts start to approach their maximum possible life they
will begin wearing out, causing an increased failure rate.
There are all sorts of special cases of life data that
statisticians have found ways around. For example, what happens if two parts fail at the
same time? What happens if you have to end the test, but some parts havent failed?
How do you take those non-failures into account? These cases require more advanced
statistical techniques. If you are interested, drop me a line and I would be happy to
explain it.