View
231
Download
0
Embed Size (px)
Citation preview
8/7/2019 UBICC IKE07 Performance Analysis for Skewed Data 191 191
http://slidepdf.com/reader/full/ubicc-ike07-performance-analysis-for-skewed-data-191-191 1/8
8/7/2019 UBICC IKE07 Performance Analysis for Skewed Data 191 191
http://slidepdf.com/reader/full/ubicc-ike07-performance-analysis-for-skewed-data-191-191 2/8
Ubiquitous Computing and Communication JournalEarlier version of this paper has been published in the conference proceedings of IKE'07: June 2007.
2
general; is the median and X0.00135 = σ µ 3−
corresponds to lower tail.
In case of normal data, it is easy to estimatequantile points. However, for the non-normal data, it
is not easy to estimate them. To deal with non-
normality; one approach is to transform the non-normal data to approximately normal data using
mathematical functions. Johnson [3] proposed asystem of distributions based on the moment method
called the Johnson transformation system. Box andCox [4] also used transformation method for non-
normal data by presenting family of power
transformations. Somerville and Montgomery [5] proposed using a square-root transformation to
transform a skewed distribution into a normal one.
The main objective of all these transformations is
that one can apply conventional PCIs once the data istransformed to normal data.
Clements [6] proposed a percentile method to
calculate C p and C pk indices for the non-normal
data using the Pearson family of curves. Liu and
Chen [7] proposed a modified Clements PCI percentile method using Burr XII distribution.
Ahmad et al. [8] compared Liu and Chen’s method
with the commonly used Box-Cox method andconcluded that Burr method provides slightly better
estimates of PCI for the non-normal data.
In this paper, we will review and compare CDF,
Clements and Burr methods which are commonlyused to evaluate the PCIs for the non-normal data.
This paper is organized in the following manner. PCImethods for the comparison study are discussed in
section 2. For illustrational purposes, a simulation
study using Weibull, Gamma and Beta distributions
is presented in section 3 & 4, an application example
with real world data is presented in section 5 and theconclusion is given in section 6.
2. PCI FOR NON-NORMAL DATA
In this section a brief review of the threedifferent methods that are used in this paper is
presented.
2.1 Clements Percentile PCI Method
Clements method is popular among quality
practitioners in industry. Clements [6] proposed that
6σ in equation (1) be replaced by the length of the
interval between the upper and lower 0.135 percentage points of the distribution of X. Therefore,
the denominator in equation (1) can be replaced by )( p p LU − , i.e.
)/()( p p p LU lsl usl C −−= (5)
where pU is the upper percentile i.e. 99.865
percentile of observations and p L is the lower
percentile i.e. 0.135 percentile of observations. Since
the median “M ” is the preferred central value for a
skewed distribution, so he defined puC and pl C as
follows:
)/()( M U M usl C p pu −−= (6)
)/()( p pl LM lsl M C −−= (7)
and }min{ , pu pu pk C C C = (8)
Clements approach uses the standard estimators
of skewness and kurtosis that are based on 3rd
and 4th
moments respectively, and may not be reliable for
very small sample sizes [7]. Wu et al [9] have
conducted a research study indicating that theClements method cannot accurately measure the
capability indices, especially when the underlying
data distribution is skewed.
2.2 Burr Percentile PCI Method
Burr [11] proposed a distribution called Burr XIIdistribution, whose probability density function is
defined by:
<
≥≥+
+
−
=
00
1,;01
)1(
1
)(
yif
k c xif k c
x
cckx
x f (9)
Cumulative distribution function is defined by:
1,;0
)1(
11)( ≥≥
−+−= k c xif
k c x
x F (10)
where c and k represent the skewness and
kurtosis coefficients of the Burr distribution
respectively.
Liu and Chen [7] introduced a modification
based on the Clements method, whereby instead of
using Pearson curve percentiles, they replaced themwith percentiles from an appropriate Burr
distribution. The proposed modified method is asfollows
• Estimate the sample mean, sample standard
deviation, skewness and kurtosis of theoriginal sample data.
• Calculate standardized moments of
skewness ( 3α )and kurtosis ( 4α ) for the
8/7/2019 UBICC IKE07 Performance Analysis for Skewed Data 191 191
http://slidepdf.com/reader/full/ubicc-ike07-performance-analysis-for-skewed-data-191-191 3/8
Ubiquitous Computing and Communication JournalEarlier version of this paper has been published in the conference proceedings of IKE'07: June 2007.
3
given sample size n (see Appendix I for
details)
• Use the values of 3α and 4α to select the
appropriate Burr parameters c and k ,
Burr IW [11]. Then use the standardized
tails of the Burr distribution XII todetermine standardized 0.135, 0.5, 99.865
percentiles (X).
• Calculate estimated percentiles using Burr
table for lower, median, and upper
percentiles as follows:
• Calculate estimated percentiles using Burr
table for lower, median, and upper
percentiles as follows:
)(00135.0
s X x L p X+= (11)
)(99865.0
s X xU p X+= (12)
)(50.0
s X xM X+= (13)
• Calculate process capability indices using
equations 5-8.
2.3 CDF PCI METHOD
Wierda [12] introduced a new approach toevaluate process capability for a non-normal data
using Cumulative Distribution Function (CDF).
Castagliola [13] used CDF approach to compute proportion of non-conforming items and then
estimate the capability index using this proportion.
Castagliola showed the relationship between processcapability and proportion of non-conforming items
and used CDF method to evaluate PCI for nonnormal data by fitting a Burr distribution to the
process data. He used a polynomial approximation to
replace empirical function in the Burr distribution,and then used the proposed method given by
equation (14). To calculate C p we give a short
proof of this well known result in Appendix II.
Using CDF method C p and C pk are defined by;
3
))(5.05.0(1∫ +−Φ
=
usl lsl
dx x f
pC (14)
),min( pl pu pk C C C = (15)
where3
))(5.0(1∫ +−Φ
=
T lsl
dx x f
pl C (16)
3
))(5.0(1∫ +−Φ
=usl T
dx x f
puC (17)
where )( x f represents the probability density
function of the process and T represents the process
mean for normal data and process median for non-
normal data. In this paper )( x f in Equation (14) is
replaced by Equation (9) i.e. Burr density function
(see details in Appendix III).
3. SIMULATION STUDY
Three non-normal distributions; Gamma, Weibull
and Beta have been used to generate random data in
this simulation. These distributions are used toinvestigate the effects of non-normal data on the
process capability index. These distributions are
known to have the parameter values that can
represent mild to severe departures from normality.
These parameters are selected so that we can
compare our simulation results with existing resultsusing the same parameters in the literature.
The probability density function of Gamma
distribution, with parameters α and β, is given by
( ) 0,0,,1
)(
1≥>
−−
Γ= x
x
e x x f β α β α
α β α (18)
The parameters used in this simulation are shape=4.0
and scale= 0.5
Gamma probability density function
76543210
0.5
0.4
0.3
0.2
0.1
0.0
Figure 1: pdf of Gamma distribution with
parameters (shape= 4.0, scale= 0.5)
8/7/2019 UBICC IKE07 Performance Analysis for Skewed Data 191 191
http://slidepdf.com/reader/full/ubicc-ike07-performance-analysis-for-skewed-data-191-191 4/8
Ubiquitous Computing and Communication JournalEarlier version of this paper has been published in the conference proceedings of IKE'07: June 2007.
4
The probability density function of Weibull
distribution with shape (α ) and scale ( β ) is given
by
( ) 0,0,,)1( ≥>−−= x xe x x f β α β
α
α
β
α (19)
The parameters used in this simulation are: α = 1.0
and β = 1.2
Weibull probability density function
121086420
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
Figure 2: pdf of Weibull distribution with parameters (α = 1.0, β = 1.2)
The probability distribution function of Beta
distribution with shape 1 (α ) and shape 2 ( β ) is
given by
10,)1(])()(
)([)(
11 <<−ΓΓ
+Γ= −− x x x x f β α
β α
β α (20)
The parameters used in this simulation are: α = 4.4
and β = 13.3
Beta probability density function0.70.60.50.40.30.20.10.0
4
3
2
1
0
Figure 3: pdf of Beta distribution with parameters
(α = 4.4, β = 13.3)
3.1 Comparison CriteriaThe criterion for comparison in this simulation
study is based on proportion of non-conformances(PNC). The proportion of non-confirming units for a
normal distribution can be determined by [2]
PNC = )3C ( pu−Φ (21)
The puC values in table (1) are computed
using equation (17) where )( x f is replaced by the
corresponding distributions (i.e. Gamma, Weibull
and Beta). Probability of non-conforming items
(PNC) is calculated using equation (21) as suggested
by Castagliola [13] for all three methods (e.g. for
Gamma distribution with puC value 0.8698,
corresponding PNC value using equation (21) will be
0.0045351).
Figure 4 presents flowchart of estimating
PNC and PCI’s using different methods and differentnon-normal distributions. The exact PNC value ( p) in
this flow chart is obtained using following equation.
∫ −=
usl
dx x f PNC
0
)(1 (22)
where )( x f represents the corresponding
distribution function of Gamma, Weibull and Betadistributions.
Compute C pu using CDF method (Equation (14))
and compute PNC for the corresponding C pu,
(Equation (21)), call it p1
Generate sample data using non-normal
distribution (e.g. Gamma, Weibull, Beta etc.)
Compute C pu
using Burr method and compute
PNC for the corresponding C pu,
call it p2
Compute C pu using Clements method and
compute PNC for the corresponding C pu,
call it p3
Access the efficacy of different methods by
comparing p1 , p2, p3 and exact p (Equation (22))
Figure 4: Simulation methodology flowchart
3.2 Simulation Results
These C pu*
values in table (1) are used to accessthe efficacy of the three method in estimating process
capability index for non-normal data. Table (1)
shows the results of this comparison.
8/7/2019 UBICC IKE07 Performance Analysis for Skewed Data 191 191
http://slidepdf.com/reader/full/ubicc-ike07-performance-analysis-for-skewed-data-191-191 5/8
Ubiquitous Computing and Communication JournalEarlier version of this paper has been published in the conference proceedings of IKE'07: June 2007.
5
*Computed from Equation (21) – percentile and exact distribution
The simulation results given in Table (1) show
that puC values obtained using Clements method are
worse than those obtained using Burr and CDF
methods. The puC values obtained using the CDF
method are the closet to those puC values obtained
using direct distribution percentiles in the
conventional approach; thus, leading to better estimates of the PCIs compare with the Burr method.
Our comparison criteria is that the method which
yields expected proportion of non-nonconformitiesclosest to that obtained using exact distribution
would be the most superior method.
Table (2) – proportion of nonconformance
Table (2) – proportion of nonconformance (PNC)
Comparison of expected proportion of
nonconformance (PNC) with exact PNC Distribution
Clements p3
Burr p2
CDF p1
Exact p
Gamma 0.00454 0.00326 0.00135 0.0013
Weibull 0.00182 0.00170 0.00101 0.0010
Beta 0.01287 0.00844 0.00131 0.0013
Results in table 2 show that PNC values obtainedusing Clements method are worse than the other 2
methods. In this table PNC values using CDF method
are close to the PNC values obtained using exactdistribution. Thus the later method is giving better
estimates of non-conformances as compared to thecommonly used Clements and Burr methods.
4. DISCUSSION
Simulation study shows that both Burr and PNC
methods are estimating pu
C values more accurately
than commonly used Clements method. Looking at
the results as depicted in tables 1 & 2, we conclude
that:
• CDF method is superior to both percentile
methods (Burr & Clements)
• Burr method is still performing better than
the commonly used Clements method.
• CDF method is the one for which the
estimated puC value deviates least from the
target puC value.
• For the given sample size, PNC value
obtained using CDF method is comparablewith the targeted PNC value obtained from
exact distribution.
During simulation, we have observed that data
having moderate departure from normality provides
better estimates of capability indices compared withdata having severe departures from normality.
5. REAL DATA EXAMPLE
A case study using data from a manufacturing
industry is conducted. All three methods have been
deployed to estimate the non-normal process
capability for the experimental data. Data has beencollected from an in-control manufacturing process.
The data is the measurements of bonding area
between two surfaces with upper specification limit(USL = 24). The summary statistics of the process
data is:
µ (mean) =23.4809, σ = 0.5650, µ ~ (median) =
23.3963, 3µ (skewness) = 1.1098, 4µ (kurtosis) =
4.9740.
Data
F r e q u e n c y
25.525.024.524.023.523.022.5
200
150
100
50
0
Figure 5: Histogram of the real data
We have selected 30 samples of size 50 from these
data points. For each sample; we computed the
process capability index puC and proportion of non-
conforming PNC by using Clements, Burr and the
CDF method. The mean and standard deviation of
the estimated puC values are given in table (3).
Table (1) shows the results of this comparison
Distribution USL Cpu*
Cpu
Clements
Cpu
Burr
Cpu
CDF
Gamma(4,0.5) 6.3405 1.000 0.8698 0.9069 1.0000
Weibull(1,1.2) 5.0 1.043 0.9694 0.9738 1.0292
Beta(4.4,13.3) 0.5954 1.002 0.7434 0.7965 1.0028
8/7/2019 UBICC IKE07 Performance Analysis for Skewed Data 191 191
http://slidepdf.com/reader/full/ubicc-ike07-performance-analysis-for-skewed-data-191-191 6/8
Ubiquitous Computing and Communication JournalEarlier version of this paper has been published in the conference proceedings of IKE'07: June 2007.
6
Figure 6: Comparison of three methods
For CDF method; we have replaced thecorresponding )( x f by the Bur distribution. The
Burr parameters for each sample have been estimatedusing maximum likelihood estimation. The exact
PNC for experimental data is 0.168. This PNC value
is obtained by using upper specification limitUSL=24) and calculating the proportion of data that
falls outside the specification limit.
The results presented in table 3, indicates that theexpected PNC based on 30 samples of size 50, using
CDF method is the closest estimate to the exact PNC.
Table 3 also indicates that CDF method has the leastvariability as compared to the other two methods.
6. CONCLUSIONS
In this paper a comparison between three
methods of estimating the process capability and the proportion of non-conformance in the manufacturing
industry is presented. The CDF method is notsensitive to distribution of the process data and
therefore can be applied to any real set of data as
long as a suitable distribution can be fitted to it.However, to apply the CDF method, one must
identify the corresponding distribution. One of the
significant characteristics of Burr XII distribution isthat, when mean, variance, skewness and kurtosis of
the process data are obtained; using Burr tables (Liu
and Chen [7]) we can fit a suitable Burr distribution.Therefore we can conclude that by replacing the
probability density function )( x f in the CDF method
with the appropriate Burr density function would
lead to a better estimate for PCI and PNC of non-
normal data.Simulation studies for different non-normal
distributions show that the CDF method using Burr
distribution produces better estimates of PCI.
This paper strongly recommends further research
to extend the CDF method to non-normal
multivariate PCI studies in this area.
APPENDIX I:
Standardized moments of skewness ( 3α )and kurtosis
( 4α ) for the given sample size n can be computed as
follows:
Skewnessnn
n*
)1(
)2(
3−
−=α
(1)
where
∑−
−−=
3)(
)2)(1( s
x j x
nn
nSkewness (2)
where x is mean of the observations and s is the
standard deviation.
)1(
)1(*)3(*
)1)(1(
)3)(2(
4 +
−+
−+
−−=
n
n Kurtosis
nn
nnα
(3)
where
)3)(2(
2)1(3
4
)3)(2)(1(
)1(
−−
−−∑
−
−−−
+=
nn
n
s
x j x
nnn
nn Kurtosis
(4)
APPENDIX II:
Conventionally capability index C p is defined as:
σ 6
lsl usl C p
−= (1)
If the process X is normally distributed with mean µ
and standard deviation σ, i.e.
, then
(2)
And . On face value, it is
not obvious that (1) and (2) are equal. Here is the
proof:
Table (3) – result of the real example based on 30 samples of
size n=50
C pu → MeanStandard
deviation
Expected PNC
using Eq (21)
CDF 0.313277 0.023811 0.17365
Burr 0.347917 0.065859 0.14830
Clements 0.360691 0.080264 0.13961
8/7/2019 UBICC IKE07 Performance Analysis for Skewed Data 191 191
http://slidepdf.com/reader/full/ubicc-ike07-performance-analysis-for-skewed-data-191-191 7/8
Ubiquitous Computing and Communication JournalEarlier version of this paper has been published in the conference proceedings of IKE'07: June 2007.
7
• We first note that .
(Draw a normal graph and you will see
this!)
• Since , we must also have that
(3)
which is equivalent to:
(4)
1. Because the of is symmetric about
the origin,
(5)
2. By equation (3). Finally,
(**)
where, we have used (3) and (5), which concludes
the proof.
APPENDIX III:
In this paper we fit Burr distribution function
)( x f to process data and then evaluate the PCI using
CDF method. To fit the data distribution with Bur
distribution, we need to estimate c and k parameters. The likelihood function of univariate
Burr is:
C
Cn
i
k c
i
n
i
c
i
nn
n
x
xk c
x xk c L
1
1
1
1
1
)1(
)(
),....,;,(
=
+
=
−
+
= (1)
In univariate Burr distribution there are two
parameters c and k ; and to estimate these
parameters the maximum likelihood function with
sample size n is:
∑ ∑= =
−+++−+=n
i
n
i
i
c
i xc xk k cn L1 1
log)1()1log()1()log()log(log
(2)
The deferential equations with respect to parameters
c and k are:
∑∑−= +
+−+=∂
∂ n
ic
i
c
iin
i
i x
x xk x
c
n
c
l
11 1
loglog)1(log
(3)
∑=
+−=∂
∂ n
i
c
i xk
n
k
l
1
)1log( (4)
In this paper, unknown Burr parameters c and k have been determined by maximizing equation (2)
using systematic random search algorithm named“Simulated Annealing”.
REFERENCES
[1] M. Deleryd, K. Vannman ‘process capability
plots—a quality improvement tool’ Qual.
Reliab. Engng. Int. 15: 213–227 (1999).
[2] L C Tang, S E Than (1999) Computing
process capability indices for non-normal
data : a review and comparative study. Qual.Reliab. Engng. Int. 15: 339-353.
[3] Johnson NL (1949) System of frequencycurves generated by methods of translation.
Biometrika 36:149–176
[4] Box GEP, Cox DR (1964) An analysis of transformation. J Roy Stat Soc B 26:211–243
[5] Somerville S, Montgomery D (1996) Process
capability indices and non-normaldistributions. Quality Engineering 19(2):305–
316.
[6] Clements JA (1989) Process capability
calculations for non-normal distributions.
Quality Progress 22:95–100[7] Pei-Hsi Liu, Fei-Long Chen (2006), “Process
capability analysis of non-normal process data
using the Burr XII distribution”, Int J AdvManuf Technol 27: 975–984
[8] S. Ahmad, M. Abdollahian, P. Zeephongsekul
(2007) Process capability analysis for non-
quality characteristics using Gammadistribution. 4th international conference on
information technology – new generations,
USA, April, 02-04: 425-430
8/7/2019 UBICC IKE07 Performance Analysis for Skewed Data 191 191
http://slidepdf.com/reader/full/ubicc-ike07-performance-analysis-for-skewed-data-191-191 8/8
Ubiquitous Computing and Communication JournalEarlier version of this paper has been published in the conference proceedings of IKE'07: June 2007.
8
[9] Wu HH, Wang JS, Liu TL (1998) Discussions
of the Clements-based process capability
indices. In: Proceedings of the 1998 CIIE
National Conference, pp 561–566[10] Burr IW (1942) Cumulative frequency
distribution. Ann Math Stat 13:215–232
[11] Burr IW (1973) Parameters for a general
system of distributions to match a grid of á3and á4. Commun Stat 2:1–21
[12] Wierda SJ. A multivariate process capability
index. ASQC Quality Congress Transactions,Boston, MA, 1993, American Society for
Quality Control: Milwaukee, WI, 1993; 342–
348.[13] Castagliola P (1996) Evaluation of non-normal
process capability indices using Burr’s
distributions. Qual Eng 8(4):587–593
[14] Rodriguez RN (1977) A guide to the Burr typeXII distributions. Biometricka, 64:129–134
[15]. Chou CY, Cheng PH (1997) Ranges control
chart for non-normal data. J Chinese Inst IndEng 14(4):401–409
[16] Hatke M.A. (1949) A certain cumulative
probability function. Ann Math Stat, Vol. 20,
No. 3:461-463.
[17] C.H. Yeh, F.C. Li, P.K. Wang, Economicdesign of control charts with Burr distribution
for non- normally data under Weibull shock
models: 12th international conference onReliability and Quality in Design, (2006) 323-
327.
[18] V.E. Kane, Process capability indices, J. Qual.Technol. 18 (1986) 41–52 .
[19] Montgomery, D., ‘Introduction to Statistical
Quality Control 5th edition, Wiley, New York, New York
[20] Zimmer WJ, Burr IW (1963) Variablessampling plans based on non normal
populations. Ind Qual. Control July:18–36