Calculating a and b for the Weibull Distribution from the Mean and Variance

Dear Statman,

I am a student looking for an easy way to calculate the "a" and the "b" in the Weibull distribution by using just the mean and the variance. Is this possible and if so, is it a good technique to use?

Sincerely,
Hopeful

Dear Hopeful-

There is a way to do what you want, but I don't recommend it. First, the formulas are a little complicated. Also, unless your sample size is large, the variance you calculate will have a lot of spread associated with it. That means you won't be very confident that you are near the "true" variance. Thus, your estimates for a and b might not be high quality. It is better to use the method of maximum likelihood as described previously (see E-Math News Vol. 2 No. 5). Most statistics software packages will do that procedure for you.

However, if you still want to use the mean and the variance, you must first be very sure the data do indeed come from a Weibull distribution. For those of you who aren’t familiar with this, the cumulative distribution function (cdf) for the Weibull distribution is given by

F(t) = 1 – exp[-(t/a)b]

where "a" is the scale factor and "b" is the shape factor. The cdf gives the probability of failure up to a certain time, t. The parameters a and b are estimated from the failure times. If the failure data do come from a Weibull distribution, and you know a and b, then all you have to do is "plug and chug" using the formula above to find the fraction of parts you expect to fail by a desired time.

Suppose we have a sample of parts and all of them fail. Suppose also that from previous experience we expect the data to come from a Weibull distribution. The mean and the variance of the sample can be calculated using the formulas below.

sample mean = wpe1.gif (1036 bytes)

 

sample variance = wpe2.gif (1120 bytes)

where ti is the ith failure time and ? indicates summation. The sample mean is just the sum of all the failure times divided by the sample size, n. The sample variance is the sum of the squared differences between the mean and each failure time, divided by n – 1.

Usually, random variables (like failure times) are described by "parametric" distributions. A parametric distribution is one where knowledge of one or more values (parameters) completely specifies the distribution. When we know the parameters, we can calculate probabilities as in the equation for the Weibull cdf above. As another example, some of you may be familiar with the normal distribution (bell curve). It is described by two parameters, usually denoted by m ?mean) and ?2 (variance). Suppose we observe a sample from the normal distribution (like heights or weights of people). If we know m?and ?2, we can find the probability of observing someone, say, with a height less than 5 feet or a weight over 180 lbs. The sample mean and variance formulas above are used to estimate ? and ?2, respectively. It turns out that for the normal distribution, the parameters are the mean and variance. In general, this simple relationship between distribution parameters and the distribution mean and variance does not hold.

In the case of the Weibull, the parameters are called a and b. The link between these parameters and the Weibull mean and variance is more complicated than for the normal distribution. The mean and the variance of a Weibull random variable are given by:

mean = a*gamma[1 + (1/b)]

variance = a2*gamma[1 + (2/b)] - [a*gamma(1 + 1/b)]2

where a is the scale factor, b is the shape factor and "gamma" is the gamma function given by:

gamma(z) = wpe3.gif (1067 bytes)

Most spreadsheet applications can calculate the gamma function. So, if we had a very large sample (>10,000) of failure times from a Weibull distribution (a and b known) and calculated the sample mean and variance, these values would be very close to the Weibull mean and variance determined using the above formulas.

Let’s do an example. Suppose we test 10 diodes and get the following failure times (in hours):

56.4, 106.3, 126.0, 152.6, 168.7, 203.7, 206.3, 276.2, 304.9, 309.3

Using the formulas for the sample mean and variance, we find that the sample mean is 191 and the sample variance is 7346. Now we have to solve for a and b using the equations for the Weibull mean and variance. Unfortunately, this is a little difficult to do analytically. But, there are a few alternatives. One thing we can do is plug in different numbers for a and b until we get close to the right values. The following table illustrates this.

 

a

b

G[1 + (1/b)]

G[(1 + (2/b)]

mean = a*G[1 + (1/b)]

Var = a^2*G[1 + (2/b)] - mean^2

200

1

1.00

2.00

200.00

40000.00

200

1.5

0.90

1.19

180.55

15027.61

200

2

0.89

1.00

177.25

8584.07

200

2.5

0.89

0.93

177.45

5765.87

200

3

0.89

0.90

178.60

4213.32

250

1

1.00

2.00

250.00

62500.00

250

1.5

0.90

1.19

225.69

23480.64

250

2

0.89

1.00

221.56

13412.61

250

2.5

0.89

0.93

221.82

9009.17

250

3

0.89

0.90

223.24

6583.31

300

1

1.00

2.00

300.00

90000.00

300

1.5

0.90

1.19

270.82

33812.13

300

2

0.89

1.00

265.87

19314.17

300

2.5

0.89

0.93

266.18

12973.20

300

3

0.89

0.90

267.89

9479.96

 

In the above table, "G" means the gamma function. We want to find values of a and b that will produce a mean of 191 and a variance of 7346. None of the values listed above fit the bill exactly. A crude guess would be between 200 and 250 for a and between 2 and 2.5 for b. A better method is graphical. We have formulas for the Weibull mean and variance for different values of a and b. If we produce a "contour plot" of the mean and variance, we can estimate the shape and scale parameters more accurately (various software packages are available that can do this for you). The contour plot is a graph that gives the value of the response we are interested in for different values of our inputs, in this case a and b. Figures 1 and 2 are contour plots for the mean and variance. The contour lines are labeled with different numbers. The numbers indicate that the response has that particular value any place on that contour line. From Figure 1, we see that the line for the 191 should lie roughly parallel to the b axis, between the lines for 180 and 200. From Figure 2, the line for 7.346 (7346 divided by 1000 for the plot) should lie between the lines for 5 and 10. The graphs are marked with an "X" where I would guess both the mean = 191 and the variance = 7.346. This point corresponds to a value of a = 210 and b = 2.38.Of course, if your software can handle it, you can get it to solve the equations numerically for a and b. I did that using MapleTM and got values of a = 215.5 and b = 2.37. The method of maximum likelihood gives estimates of a = 215.5 and b = 2.57. These are all in fair agreement to the true values of a = 267 and b = 2.25 I used to simulate the data. If we had a much larger sample, all the estimates would have been better. Lastly, note that we had complete data in this case. That means all parts had failed. Suppose some parts hadn’t failed, but the hour they were removed from the test was known. Those times are called "censoring" times. Unfortunately, using the mean and the variance to find a and b wouldn’t work because we don’t know how to estimate the sample mean and variance easily with censored data. In that case, I would recommend using the method of maximum likelihood. As you can see, there are several different ways of calculating values of a and b. Using the sample mean and variance to calculate a and b can be a little tricky. I guess if you are desperate enough you can use it, but better techniques are available to you. Ihope this has helped.

wpe2.gif (11160 bytes)Thanks,

Statman

 

 

 

 

1

Objective Design of Experiments

TEL: 866-683-6173
TEL: 360-510-6611
FAX: 206- 905-0849

Math Options Inc.
336 36th St. # 179
Bellingham, WA 98225-6580