UBICC IKE07 Performance Analysis for Skewed Data 191 191

8/7/2019 UBICC IKE07 Performance Analysis for Skewed Data 191 191

http://slidepdf.com/reader/full/ubicc-ike07-performance-analysis-for-skewed-data-191-191 1/8



Ubiquitous Computing and Communication JournalEarlier version of this paper has been published in the conference proceedings of IKE'07: June 2007.

2

general; is the median and X0.00135 = σ µ 3−

corresponds to lower tail.

In case of normal data, it is easy to estimatequantile points. However, for the non-normal data, it

is not easy to estimate them. To deal with non-

normality; one approach is to transform the non-normal data to approximately normal data using

mathematical functions. Johnson [3] proposed asystem of distributions based on the moment method

called the Johnson transformation system. Box andCox [4] also used transformation method for non-

normal data by presenting family of power

transformations. Somerville and Montgomery [5] proposed using a square-root transformation to

transform a skewed distribution into a normal one.

The main objective of all these transformations is

that one can apply conventional PCIs once the data istransformed to normal data.

Clements [6] proposed a percentile method to

calculate C p and C pk indices for the non-normal

data using the Pearson family of curves. Liu and

Chen [7] proposed a modified Clements PCI percentile method using Burr XII distribution.

Ahmad et al. [8] compared Liu and Chen’s method

with the commonly used Box-Cox method andconcluded that Burr method provides slightly better

estimates of PCI for the non-normal data.

In this paper, we will review and compare CDF,

Clements and Burr methods which are commonlyused to evaluate the PCIs for the non-normal data.

This paper is organized in the following manner. PCImethods for the comparison study are discussed in

section 2. For illustrational purposes, a simulation

study using Weibull, Gamma and Beta distributions

is presented in section 3 & 4, an application example

with real world data is presented in section 5 and theconclusion is given in section 6.

2. PCI FOR NON-NORMAL DATA

In this section a brief review of the threedifferent methods that are used in this paper is

presented.

2.1 Clements Percentile PCI Method

Clements method is popular among quality

practitioners in industry. Clements [6] proposed that

6σ in equation (1) be replaced by the length of the

interval between the upper and lower 0.135 percentage points of the distribution of X. Therefore,

the denominator in equation (1) can be replaced by )( p p LU − , i.e.

)/()( p p p LU lsl usl C −−= (5)

where pU is the upper percentile i.e. 99.865

percentile of observations and p L is the lower

percentile i.e. 0.135 percentile of observations. Since

the median “M ” is the preferred central value for a

skewed distribution, so he defined puC and pl C as

follows:

)/()( M U M usl C p pu −−= (6)

)/()( p pl LM lsl M C −−= (7)

and }min{ , pu pu pk C C C = (8)

Clements approach uses the standard estimators

of skewness and kurtosis that are based on 3rd

and 4th

moments respectively, and may not be reliable for

very small sample sizes [7]. Wu et al [9] have

conducted a research study indicating that theClements method cannot accurately measure the

capability indices, especially when the underlying

data distribution is skewed.

2.2 Burr Percentile PCI Method

Burr [11] proposed a distribution called Burr XIIdistribution, whose probability density function is

defined by:

<

≥≥+

+

−

=

00

1,;01

)1(

1

)(

yif

k c xif k c

x

cckx

x f (9)

Cumulative distribution function is defined by:

1,;0

)1(

11)( ≥≥

−+−= k c xif

k c x

x F (10)

where c and k represent the skewness and

kurtosis coefficients of the Burr distribution

respectively.

Liu and Chen [7] introduced a modification

based on the Clements method, whereby instead of

using Pearson curve percentiles, they replaced themwith percentiles from an appropriate Burr

distribution. The proposed modified method is asfollows

• Estimate the sample mean, sample standard

deviation, skewness and kurtosis of theoriginal sample data.

• Calculate standardized moments of

skewness ( 3α )and kurtosis ( 4α ) for the




3

given sample size n (see Appendix I for

details)

• Use the values of 3α and 4α to select the

appropriate Burr parameters c and k ,

Burr IW [11]. Then use the standardized

tails of the Burr distribution XII todetermine standardized 0.135, 0.5, 99.865

percentiles (X).

• Calculate estimated percentiles using Burr

table for lower, median, and upper

percentiles as follows:

• Calculate estimated percentiles using Burr

table for lower, median, and upper

percentiles as follows:

)(00135.0

s X x L p X+= (11)

)(99865.0

s X xU p X+= (12)

)(50.0

s X xM X+= (13)

• Calculate process capability indices using

equations 5-8.

2.3 CDF PCI METHOD

Wierda [12] introduced a new approach toevaluate process capability for a non-normal data

using Cumulative Distribution Function (CDF).

Castagliola [13] used CDF approach to compute proportion of non-conforming items and then

estimate the capability index using this proportion.

Castagliola showed the relationship between processcapability and proportion of non-conforming items

and used CDF method to evaluate PCI for nonnormal data by fitting a Burr distribution to the

process data. He used a polynomial approximation to

replace empirical function in the Burr distribution,and then used the proposed method given by

equation (14). To calculate C p we give a short

proof of this well known result in Appendix II.

Using CDF method C p and C pk are defined by;

3

))(5.05.0(1∫ +−Φ

=

usl lsl

dx x f

pC (14)

),min( pl pu pk C C C = (15)

where3

))(5.0(1∫ +−Φ

=

T lsl

dx x f

pl C (16)

3

))(5.0(1∫ +−Φ

=usl T

dx x f

puC (17)

where )( x f represents the probability density

function of the process and T represents the process

mean for normal data and process median for non-

normal data. In this paper )( x f in Equation (14) is

replaced by Equation (9) i.e. Burr density function

(see details in Appendix III).

3. SIMULATION STUDY

Three non-normal distributions; Gamma, Weibull

and Beta have been used to generate random data in

this simulation. These distributions are used toinvestigate the effects of non-normal data on the

process capability index. These distributions are

known to have the parameter values that can

represent mild to severe departures from normality.

These parameters are selected so that we can

compare our simulation results with existing resultsusing the same parameters in the literature.

The probability density function of Gamma

distribution, with parameters α and β, is given by

( ) 0,0,,1

)(

1≥>

−−

Γ= x

x

e x x f β α β α

α β α (18)

The parameters used in this simulation are shape=4.0

and scale= 0.5

Gamma probability density function

76543210

0.5

0.4

0.3

0.2

0.1

0.0

Figure 1: pdf of Gamma distribution with

parameters (shape= 4.0, scale= 0.5)




4

The probability density function of Weibull

distribution with shape (α ) and scale ( β ) is given

by

( ) 0,0,,)1( ≥>−−= x xe x x f β α β

α

α

β

α (19)

The parameters used in this simulation are: α = 1.0

and β = 1.2

Weibull probability density function

121086420

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

Figure 2: pdf of Weibull distribution with parameters (α = 1.0, β = 1.2)

The probability distribution function of Beta

distribution with shape 1 (α ) and shape 2 ( β ) is

given by

10,)1(])()(

)([)(

11 <<−ΓΓ

+Γ= −− x x x x f β α

β α

β α (20)

The parameters used in this simulation are: α = 4.4

and β = 13.3

Beta probability density function0.70.60.50.40.30.20.10.0

4

3

2

1

0

Figure 3: pdf of Beta distribution with parameters

(α = 4.4, β = 13.3)

3.1 Comparison CriteriaThe criterion for comparison in this simulation

study is based on proportion of non-conformances(PNC). The proportion of non-confirming units for a

normal distribution can be determined by [2]

PNC = )3C ( pu−Φ (21)

The puC values in table (1) are computed

using equation (17) where )( x f is replaced by the

corresponding distributions (i.e. Gamma, Weibull

and Beta). Probability of non-conforming items

(PNC) is calculated using equation (21) as suggested

by Castagliola [13] for all three methods (e.g. for

Gamma distribution with puC value 0.8698,

corresponding PNC value using equation (21) will be

0.0045351).

Figure 4 presents flowchart of estimating

PNC and PCI’s using different methods and differentnon-normal distributions. The exact PNC value ( p) in

this flow chart is obtained using following equation.

∫ −=

usl

dx x f PNC

0

)(1 (22)

where )( x f represents the corresponding

distribution function of Gamma, Weibull and Betadistributions.

Compute C pu using CDF method (Equation (14))

and compute PNC for the corresponding C pu,

(Equation (21)), call it p1

Generate sample data using non-normal

distribution (e.g. Gamma, Weibull, Beta etc.)

Compute C pu

using Burr method and compute

PNC for the corresponding C pu,

call it p2

Compute C pu using Clements method and

compute PNC for the corresponding C pu,

call it p3

Access the efficacy of different methods by

comparing p1 , p2, p3 and exact p (Equation (22))

Figure 4: Simulation methodology flowchart

3.2 Simulation Results

These C pu*

values in table (1) are used to accessthe efficacy of the three method in estimating process

capability index for non-normal data. Table (1)

shows the results of this comparison.




5

*Computed from Equation (21) – percentile and exact distribution

The simulation results given in Table (1) show

that puC values obtained using Clements method are

worse than those obtained using Burr and CDF

methods. The puC values obtained using the CDF

method are the closet to those puC values obtained

using direct distribution percentiles in the

conventional approach; thus, leading to better estimates of the PCIs compare with the Burr method.

Our comparison criteria is that the method which

yields expected proportion of non-nonconformitiesclosest to that obtained using exact distribution

would be the most superior method.

Table (2) – proportion of nonconformance

Table (2) – proportion of nonconformance (PNC)

Comparison of expected proportion of

nonconformance (PNC) with exact PNC Distribution

Clements p3

Burr p2

CDF p1

Exact p

Gamma 0.00454 0.00326 0.00135 0.0013

Weibull 0.00182 0.00170 0.00101 0.0010

Beta 0.01287 0.00844 0.00131 0.0013

Results in table 2 show that PNC values obtainedusing Clements method are worse than the other 2

methods. In this table PNC values using CDF method

are close to the PNC values obtained using exactdistribution. Thus the later method is giving better

estimates of non-conformances as compared to thecommonly used Clements and Burr methods.

4. DISCUSSION

Simulation study shows that both Burr and PNC

methods are estimating pu

C values more accurately

than commonly used Clements method. Looking at

the results as depicted in tables 1 & 2, we conclude

that:

• CDF method is superior to both percentile

methods (Burr & Clements)

• Burr method is still performing better than

the commonly used Clements method.

• CDF method is the one for which the

estimated puC value deviates least from the

target puC value.

• For the given sample size, PNC value

obtained using CDF method is comparablewith the targeted PNC value obtained from

exact distribution.

During simulation, we have observed that data

having moderate departure from normality provides

better estimates of capability indices compared withdata having severe departures from normality.

5. REAL DATA EXAMPLE

A case study using data from a manufacturing

industry is conducted. All three methods have been

deployed to estimate the non-normal process

capability for the experimental data. Data has beencollected from an in-control manufacturing process.

The data is the measurements of bonding area

between two surfaces with upper specification limit(USL = 24). The summary statistics of the process

data is:

µ (mean) =23.4809, σ = 0.5650, µ ~ (median) =

23.3963, 3µ (skewness) = 1.1098, 4µ (kurtosis) =

4.9740.

Data

F r e q u e n c y

25.525.024.524.023.523.022.5

200

150

100

50

0

Figure 5: Histogram of the real data

We have selected 30 samples of size 50 from these

data points. For each sample; we computed the

process capability index puC and proportion of non-

conforming PNC by using Clements, Burr and the

CDF method. The mean and standard deviation of

the estimated puC values are given in table (3).

Table (1) shows the results of this comparison

Distribution USL Cpu*

Cpu

Clements

Cpu

Burr

Cpu

CDF

Gamma(4,0.5) 6.3405 1.000 0.8698 0.9069 1.0000

Weibull(1,1.2) 5.0 1.043 0.9694 0.9738 1.0292

Beta(4.4,13.3) 0.5954 1.002 0.7434 0.7965 1.0028




6

Figure 6: Comparison of three methods

For CDF method; we have replaced thecorresponding )( x f by the Bur distribution. The

Burr parameters for each sample have been estimatedusing maximum likelihood estimation. The exact

PNC for experimental data is 0.168. This PNC value

is obtained by using upper specification limitUSL=24) and calculating the proportion of data that

falls outside the specification limit.

The results presented in table 3, indicates that theexpected PNC based on 30 samples of size 50, using

CDF method is the closest estimate to the exact PNC.

Table 3 also indicates that CDF method has the leastvariability as compared to the other two methods.

6. CONCLUSIONS

In this paper a comparison between three

methods of estimating the process capability and the proportion of non-conformance in the manufacturing

industry is presented. The CDF method is notsensitive to distribution of the process data and

therefore can be applied to any real set of data as

long as a suitable distribution can be fitted to it.However, to apply the CDF method, one must

identify the corresponding distribution. One of the

significant characteristics of Burr XII distribution isthat, when mean, variance, skewness and kurtosis of

the process data are obtained; using Burr tables (Liu

and Chen [7]) we can fit a suitable Burr distribution.Therefore we can conclude that by replacing the

probability density function )( x f in the CDF method

with the appropriate Burr density function would

lead to a better estimate for PCI and PNC of non-

normal data.Simulation studies for different non-normal

distributions show that the CDF method using Burr

distribution produces better estimates of PCI.

This paper strongly recommends further research

to extend the CDF method to non-normal

multivariate PCI studies in this area.

APPENDIX I:

Standardized moments of skewness ( 3α )and kurtosis

( 4α ) for the given sample size n can be computed as

follows:

Skewnessnn

n*

)1(

)2(

3−

−=α

(1)

where

∑−

−−=

3)(

)2)(1( s

x j x

nn

nSkewness (2)

where x is mean of the observations and s is the

standard deviation.

)1(

)1(*)3(*

)1)(1(

)3)(2(

4 +

−+

−+

−−=

n

n Kurtosis

nn

nnα

(3)

where

)3)(2(

2)1(3

4

)3)(2)(1(

)1(

−−

−−∑

−

−−−

+=

nn

n

s

x j x

nnn

nn Kurtosis

(4)

APPENDIX II:

Conventionally capability index C p is defined as:

σ 6

lsl usl C p

−= (1)

If the process X is normally distributed with mean µ

and standard deviation σ, i.e.

, then

(2)

And . On face value, it is

not obvious that (1) and (2) are equal. Here is the

proof:

Table (3) – result of the real example based on 30 samples of

size n=50

C pu → MeanStandard

deviation

Expected PNC

using Eq (21)

CDF 0.313277 0.023811 0.17365

Burr 0.347917 0.065859 0.14830

Clements 0.360691 0.080264 0.13961




7

• We first note that .

(Draw a normal graph and you will see

this!)

• Since , we must also have that

(3)

which is equivalent to:

(4)

1. Because the of is symmetric about

the origin,

(5)

2. By equation (3). Finally,

(**)

where, we have used (3) and (5), which concludes

the proof.

APPENDIX III:

In this paper we fit Burr distribution function

)( x f to process data and then evaluate the PCI using

CDF method. To fit the data distribution with Bur

distribution, we need to estimate c and k parameters. The likelihood function of univariate

Burr is:

C

Cn

i

k c

i

n

i

c

i

nn

n

x

xk c

x xk c L

1

1

1

1

1

)1(

)(

),....,;,(

=

+

=

−

+

= (1)

In univariate Burr distribution there are two

parameters c and k ; and to estimate these

parameters the maximum likelihood function with

sample size n is:

∑ ∑= =

−+++−+=n

i

n

i

i

c

i xc xk k cn L1 1

log)1()1log()1()log()log(log

(2)

The deferential equations with respect to parameters

c and k are:

∑∑−= +

+−+=∂

∂ n

ic

i

c

iin

i

i x

x xk x

c

n

c

l

11 1

loglog)1(log

(3)

∑=

+−=∂

∂ n

i

c

i xk

n

k

l

1

)1log( (4)

In this paper, unknown Burr parameters c and k have been determined by maximizing equation (2)

using systematic random search algorithm named“Simulated Annealing”.

REFERENCES

[1] M. Deleryd, K. Vannman ‘process capability

plots—a quality improvement tool’ Qual.

Reliab. Engng. Int. 15: 213–227 (1999).

[2] L C Tang, S E Than (1999) Computing

process capability indices for non-normal

data : a review and comparative study. Qual.Reliab. Engng. Int. 15: 339-353.

[3] Johnson NL (1949) System of frequencycurves generated by methods of translation.

Biometrika 36:149–176

[4] Box GEP, Cox DR (1964) An analysis of transformation. J Roy Stat Soc B 26:211–243

[5] Somerville S, Montgomery D (1996) Process

capability indices and non-normaldistributions. Quality Engineering 19(2):305–

316.

[6] Clements JA (1989) Process capability

calculations for non-normal distributions.

Quality Progress 22:95–100[7] Pei-Hsi Liu, Fei-Long Chen (2006), “Process

capability analysis of non-normal process data

using the Burr XII distribution”, Int J AdvManuf Technol 27: 975–984

[8] S. Ahmad, M. Abdollahian, P. Zeephongsekul

(2007) Process capability analysis for non-

quality characteristics using Gammadistribution. 4th international conference on

information technology – new generations,

USA, April, 02-04: 425-430




8

[9] Wu HH, Wang JS, Liu TL (1998) Discussions

of the Clements-based process capability

indices. In: Proceedings of the 1998 CIIE

National Conference, pp 561–566[10] Burr IW (1942) Cumulative frequency

distribution. Ann Math Stat 13:215–232

[11] Burr IW (1973) Parameters for a general

system of distributions to match a grid of á3and á4. Commun Stat 2:1–21

[12] Wierda SJ. A multivariate process capability

index. ASQC Quality Congress Transactions,Boston, MA, 1993, American Society for

Quality Control: Milwaukee, WI, 1993; 342–

348.[13] Castagliola P (1996) Evaluation of non-normal

process capability indices using Burr’s

distributions. Qual Eng 8(4):587–593

[14] Rodriguez RN (1977) A guide to the Burr typeXII distributions. Biometricka, 64:129–134

[15]. Chou CY, Cheng PH (1997) Ranges control

chart for non-normal data. J Chinese Inst IndEng 14(4):401–409

[16] Hatke M.A. (1949) A certain cumulative

probability function. Ann Math Stat, Vol. 20,

No. 3:461-463.

[17] C.H. Yeh, F.C. Li, P.K. Wang, Economicdesign of control charts with Burr distribution

for non- normally data under Weibull shock

models: 12th international conference onReliability and Quality in Design, (2006) 323-

327.

[18] V.E. Kane, Process capability indices, J. Qual.Technol. 18 (1986) 41–52 .

[19] Montgomery, D., ‘Introduction to Statistical

Quality Control 5th edition, Wiley, New York, New York

[20] Zimmer WJ, Burr IW (1963) Variablessampling plans based on non normal

populations. Ind Qual. Control July:18–36

Documents

UBICC IKE07 Performance Analysis for Skewed Data 191 191