49
On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties Yogendra P. Chaubey * Department of Mathematics and Statistics Concordia University, Montreal, Canada H3G 1M8 E-mail: [email protected] * Joint work with M. Singh, ICARDA, Aleppo, Syria and Debaraj Sen, Department of Mathematics and Statistics, Concordia University, Montreal, Canada Talk to be presented at the International Workshop on Applied Mathematics and Omics Technologies for Discovering Biodiversity and Genetic Resources for Climate Change Mitigation and Adaptation to Sustain Agriculture in Drylands, ICARDA, Rabat, Morocco June 24-27, 2014 Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 1 / 49

THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

  • Upload
    icarda

  • View
    305

  • Download
    1

Embed Size (px)

Citation preview

Page 1: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

On Normalizing Transformations of the Coefficient ofVariation for a Normal Population with an Application to

Evaluation of Uniformity of Plant Varieties

Yogendra P. Chaubey∗

Department of Mathematics and StatisticsConcordia University, Montreal, Canada H3G 1M8

E-mail: [email protected]

∗ Joint work with M. Singh, ICARDA, Aleppo, Syria and Debaraj Sen,

Department of Mathematics and Statistics, Concordia University, Montreal,

Canada

Talk to be presented at the International Workshop on AppliedMathematics and Omics Technologies for Discovering Biodiversity and Genetic

Resources for Climate Change Mitigation and Adaptation to Sustain Agriculturein Drylands, ICARDA, Rabat, Morocco

June 24-27, 2014

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 1 / 49

Page 2: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Abstract

The variance stabilizing transformation (VST), that was formallyintroduced by Bartlett (1947, Biometrics) is quite popular in statisticalapplications due to its approximate normalizing property. This property ismainly due to the fact that the variance stabilizing transformations may bemore symmetric compared to the the untransformed statistics. Chaubeyand Mudholkar (1983, Technical Report, Concordia University) developeda differential equation, analogous to Bartlett’s, for obtaining anapproximately symmetrizing transformations and illustrated it’s use insome common examples. In general, the transformation may becomputationally intensive as illustrated in Chaubey, Singh and Sen (2013,Comm. Stat. - Theor. Meth.) in terms of coefficient of variation fromnormal samples. In this talk we review these transformations in this lightand examine some new transformations along with an application toevaluating the uniformity of plant varieties.

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 2 / 49

Page 3: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Outline

1 Introduction

2 Symmetrizing and Variance Stabilizing Transformations

3 A Condition under which VST is STFisher’s transformation of correlation coeff.Arcsin Transformation for the Binomial ProportionSquare root transformation for Poisson RVChi-square Random Variable

4 Symmetrizing transformations in Standard Cases

5 VST and ST for Coefficient of VariationAppendix: R-Codes for Computing the Symmetrizing TransformationSmall Sample AdjustmentInverse Gaussian Distribution

6 An Application

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 3 / 49

Page 4: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Introduction

The transformations along with the approximations are important forboth genetic resources data and climate data and appear as aprerequisite for raw data analysis.

The earliest consideration of a transformation that stabilizes thevariance is due to Fisher (1915, 1922) in proposing Z = tanh−1r and√

2χ2ν − 1 as approximately normalizing transformations of the

correlation coefficient r and the χ2ν variable respectively.

Bartlett (1947) introduced variance stabilizing transformationsformally for the purpose of utilizing the usual analysis of variance inthe absence of homoscedasticity.

He showed how to derive these using a differential equation, and asillustrations, confirmed the variance stabilizing character of z and√χ2ν and gave many additional examples including the square root of

a Poisson random variable and the function arcsin√p of the binomial

sample proportion p.

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 4 / 49

Page 5: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Introduction

Since then, these transformations have been variously studied andrefined essentially with a view to improving normality. Thus,Anscomb (1948) improved

√X of the the Poisson variable X to√

X + (3/8), arcsin√p to arcsin

√(p+ 3/8)/(1 + (3/4)), and

Hotelling (1953) in his definitive study of the distribution of thecorrelation coefficient, proposed numerous improvements of Z.

Now, we note that even though many variance stabilizingtransformations of random variables have near normal distributionsand they simplify the inference problems such as confidence intervalestimation of the parameter, the stability of variance is not necessaryfor normality. However, approximate symmetry is clearly a prerequisiteof any approximately normalizing transformation.

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 5 / 49

Page 6: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Introduction

Hence, an approximately symmetrizing transformation of a randomvariable may be a more effective method of normalizing it thanstabilizing its variance.

Historically, this was first illustrated by Wilson and Hilferty (1931),who showed that the cube root of a chi square variable obtained bythem as an approximately symmetrizing power-transformationprovides a normal approximation superior to that based on Fisher’svariance stabilizing transformation.

Their approach of constructing a skewness reducing powertransformation has now been extended to many other distributions,e.g. to non-central chi square by Sankaran (1959), to quadratic formsby Jensen and Solomon (1972), to sample variance from non-normalpopulations and multivariate likelihood ratio statistics by Mudho1karand Trivedi (1980, 1981a, 1981b).

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 6 / 49

Page 7: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Introduction

In this talk, we present the results explored in Chaubey and Mudholkar(1983) with respect to developing a differential equation analogous toBartlett’s, which gives an approximately symmetrizing transformation.

This paper also examines some of the standard transformations in thislight.

Next we consider the computing aspects of these transformationsillustrated for coefficient of variation for normal populations asdiscussed in Chaubey, Singh and Sen (2014) and indicate itsadaptation to inverse Gaussian case.

An application in the context of assessing uniformity of two plantvarieties is illustrated.

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 7 / 49

Page 8: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Preliminaries

Let Tn be a statistic based on a random sample of size n, constructedto estimate a parameter θ. Further, assume that

√n(Tn − θ) tends to

follow N(0, σ2(θ)) as n→∞. Denote the jth central moment of Tnby

µj(θ) = E(Tn − µ(θ))j , j = 1, 2, ...

whereµ(θ) = E(Tn).

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 8 / 49

Page 9: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Preliminaries

A smooth function g(Tn), intended for use as a transformation, canbe approximated by the Taylor’s expansion as

g(Tn)− g(θ) ≈ (Tn − θ)g′(θ) +1

2(Tn − θ)2g′′(θ), (2.1)

where

g′(θ) =dg(θ)

dθand g′′(θ) =

d2g(θ)

dθ2.

Hence as a first approximation we have

g(Tn)− E[g(Tn)] ≈ (Tn − µ(θ))(g′(θ) + ξ1(θ)g′′(θ))

+1

2[(Tn − µ(θ))2 − µ2(θ)]g′′(θ). (2.2)

where ξ1(θ) = µ(θ)− θ.

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 9 / 49

Page 10: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Preliminaries

Define

R =g′′(θ)

g′(θ)and R1 =

R

1 + ξ1(θ)R.

then we have from (2.8), approximate expression of the variance (µ2gof g(Tn)

µ2g = (g′(θ))2(1 + ξ1(θ)R)2 [µ2(θ)

+R1µ3(θ) +1

4R2

1(µ4(θ)− µ22(θ))] (2.3)

Similarly the third central moment µ3g of Tn (up to order O(1/n2))can be approximately given by

µ3g = (g′(θ))3(1 + ξ1(θ)R)3[µ3(θ) +

3

2R1(µ4(θ)− µ22(θ))

], (2.4)

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 10 / 49

Page 11: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Variance Stabilizing Transformation

where we have omitted terms containing central moments of orderhigher than 4 (this assumes that the third and fourth centralmoments are of order O(1/n2) and the higher order moments are oflower order).

Variance stabilizing transformation: (See Rao (1973)). (V ST ), maynow be obtained using (2.3). Ignoring the last two terms, g(.) is anapproximate V ST if (g′(θ))2µ2(θ) is constant, or,

g′(θ) =C

σ(θ)

where C is a constant. Hence

g(θ) = C

∫1

σ(θ)dθ. (2.5)

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 11 / 49

Page 12: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Symmetrizing Transformation:

To derive the symmetrizing transformation (ST ), the third momentof g(Xn) given in (2.4) may be equated to zero. Thus for a ST g,

µ3(θ) +3

2R1(µ4(θ)− µ22(θ)) = 0 (2.6)

that givesg′′(θ)

g′(θ)= −2

3

µ3(θ)

µ4(θ)− µ22(θ), (2.7)

where again the term involving ξ1µ3(θ) have been ignored.

The solution of this equation can be written as (see Chaubey andMudholkar, 1983):

g(θ) =

∫e−a(θ)dθ (2.8)

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 12 / 49

Page 13: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

A Condition under which VST is ST

where

a(θ) =2

3

∫ {f1(θ)

f2(θ)

}dθ (3.1)

with f1(.) and f2(.) being defined as

f1(θ) = µ3(θ), (3.2)

f2(θ) = µ4(θ)− µ22(θ). (3.3)

It is natural to ask if and when can a VST be a ST. Such a conditionmay be derived by equating µ3(g) = 0 with the g obtained from VST,using Eq (2.7).It can be easily seen that such a condition appears in the equation:

1

σ(θ){f1(θ)−

3

2f2(θ)

dlnσ(θ)

dθ} = 0

That isdlnσ(θ)

dθ=

2

3

f1(θ)

f2(θ)(3.4)

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 13 / 49

Page 14: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Standard Transformations

We may examine the extent to which some standard VST’s are ST in thelight of the above condition.Fisher’s transformation of correlation coeff:

Using the results from Hotelling (1953), we havef1(ρ) = −6ρ(1− ρ2)3/n2, f2(ρ) = 2(1− ρ2)4/n2 andσ(ρ) = (1− ρ2).It is easily seen that the condition in Eq(3.4) is satisfied as both sidesof the equation equals −2ρ/(1− ρ2).

arcsin Transformation for the Binomial Proportion:

For the binomial proportion θ, we havef1(θ) = θ(1− θ)(1− 2θ)/n2 f2(θ) = 2θ2(1− θ)2/n2, andσ(θ) =

√θ(1− θ). In this case

2

3

f1(θ)

f2(θ)=

1

3

1− 2θ

θ(1− θ)

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 14 / 49

Page 15: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Standard Transformations

However,dlnσ(θ)

dθ=

1

2

1− 2θ

θ(1− θ).

Hence the condition in (3.4) is not satisfied. This implies that abetter normalizing transformation may be available in contrast to theVST, arcsin

√p.

Square root transformation for Poisson RV

In this case f1(θ) = θ, f2(θ) = θ + 2θ2, σ(θ) =√

(θ). And

2

3

f1(θ)

f2(θ)=

2

3(1 + 2θ)

where asdlnσ(θ)

dθ=

1

2θ.

Again in this case the condition does not hold.

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 15 / 49

Page 16: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Standard Transformations

Chi-square Random Variable

Let X be distributed as χ2nθ. Letting Tn = X/n, We have

f1(θ) = 8θ2/n2, f2(θ) = 8θ4/n2 +O(1/n3), and σ(θ) =√

(2θ). TheVST is given by

√(2Tn).

2

3

f1(θ)

f2(θ)=

2

where asdlnσ(θ)

dθ=

1

θ

and the condition is not satisfied again.

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 16 / 49

Page 17: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Symmetrizing transformations in Standard Cases

The above examples demonstrate that there may be a possibility to get abetter normalizing transformation than given by the variance stabilizingtransformation. Now we use the differential equation (2.8) to obtain suchtransformations in the examples discussed above.Correlation Coefficient:

In this case

g(ρ) =

∫exp[

∫2ρ

1− ρ2dρ]dρ

=

∫1

1− ρ2dρ =

1

2ln

1 + ρ

1− ρ(4.1)

which is the well known Fisher’s Z transformation that confirms ourconclusion reached earlier (see Chaubey and Mudholkar (1984)).

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 17 / 49

Page 18: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Symmetrizing transformations in Standard Cases

Binomial Proportion:

In this case the ST is given by

g(θ) =

∫θ−1/3(1− θ)−1/3dθ. (4.2)

This equation does not have an explicit solution, however it can besolved numerically. Later on we include a program for finding the STfor coefficient of variation that can be easily adapted here.

The ST may be contrasted with the VST given by

gv(θ) =

∫θ−1/2(1− θ)−1/2dθ = sin−1

√p. (4.3)

Poisson Variable:

In this case the ST is given by

g(θ) =3

2θ2/3 (4.4)

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 18 / 49

Page 19: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Symmetrizing transformations in Standard Cases

Thus the Poisson variable is better normalized by a powertransformation with power = 2/3 as compared to the VST withpower= 1/2.

Chi-square Random Variable:

In the set-up considered earlier the symmetrizing transformation isgiven by

g(θ) =

∫e−(2/3)lnθdθ = 3θ1/3. (4.5)

Thus the symmetrizing transformation for the Chi-square randomvariable is the well known Wilson-Hilferty cube-root transformation.

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 19 / 49

Page 20: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

VST and ST for Coefficient of Variation

These transformations have been investigated well in the literature.

Next we report on our recent investigations concerning VST and STwith respect to the coefficient of variation, φ = σ/µ, where σ is thepopulation standard deviation and µ is the population mean, where µis assumed to be non-negative.

It is used in many applied areas as an alternative to the standarddeviation.

Engineering applications - Signal to Noise Ratio: Kordonsky andGertsbakh (1997).Agricultural research - Measure of homogeneity of experimental field:Taye and Njuho (2008).- uniformity of a plant variety for seed acceptability: Singh, Niane andChaubey (2010).Biometry - Measure of reproducibility of observations: Butcher andO’Brien (1991) and Quan and Shih (1996)Economics - a measure of income-diversity: Bedeian and Mossholder(2000).

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 20 / 49

Page 21: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

VST and ST for Coefficient of Variation

Normal Samples:

The inference on φ can be dealt with that for θ = 1/φ based on theestimate θ = X/S, where X denotes the mean and S2 the samplevariance based on a random sample X1, ..., Xn from N(µ, σ2).Since

√nTn ∼ t′ν(δ), i.e. a non-central −t. (see Johnson and Kotz

1970) with ν = n− 1 and the non-centrality parameter δ = θ, thecentral moments of θ [ using the moments of non-central t fromHogben et al. (1961)] are listed below:

E(θ) = c11θ, (5.1)

µ2(θ) = E(θ − E(θ))2 = c22θ2 +

c20n, (5.2)

µ3(θ) = E(θ − E(θ))3 = (c33θ2 +

c31n

)θ, (5.3)

µ4(θ) = E(θ − E(θ))4 = c44θ4 +

c42nθ2 +

c40n2, (5.4)

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 21 / 49

Page 22: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

VST and ST for Coefficient of Variation

where

c11 =

√ν

2

Γ(ν−1)2

Γ(ν2 ), ν = n− 1,

c22 =ν

(ν − 2)− c211, c20 =

ν

(ν − 2),

c33 =

(ν(7− 2ν)

(ν − 2)(ν − 3)+ 2c211

)c11,

c31 =3νc11

(ν − 2)(ν − 3),

c44 =ν2

(ν − 2)(ν − 4)− 2ν(5− ν)c211

(ν − 2)(ν − 3)− 3c411,

c42 =6ν

(ν − 2)

(ν − 4)− (ν − 1)c211

(ν − 3)

),

and c40 =3ν2

(ν − 2)(ν − 4).

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 22 / 49

Page 23: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

VST and ST for Coefficient of Variation

The above moments can be substituted in the formulae for the functionsf1(θ) and f2(θ) in equations (3.2) and (3.3) in order to obtain thesymmetrizing transformation. The integral in equation (2.8) is toocomplex to obtain explicitly and therefore, we shall numerically evaluate itfor various values of θ and a given sample size n. We have used theformula S(x) for integration of function s(x) as∫

s(x)dx = S(x) =

∫ x

0s(u)du+ S(0).

For the ease of accessibility and to impress upon the reader how easy it isto obtain this transformation, the source codes written in R, that wereused to compute these values are given in the appendix.

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 23 / 49

Page 24: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

R-Codes for Computing the Symmetrizing Transformation

## Symmetrizing transformation

## Name of the function: fsym

## Arguments: x is the argument at which the function

## is computed

## ss is the sample size

## Output: The value of the symmetrizing function

#

fsym<-function(x,ss){

#

#integral of f1(phi)/f2(phi)

f1f2<-function(x,ss){

hfun<-function(phi,ss=ss) {

nu<-ss-1;d<-sqrt(ss)*phi

c11<-sqrt(nu/2)*gamma((nu-1)/2)/gamma(nu/2)

c22<-(nu/(nu-2))-c11^2;c20<-nu/(nu-2)

c31<-3*c11*c20/(nu-3);c33<-c11*(2*c11^2

+(nu*(7-2*nu)/((nu-2)*(nu-3))))

c40<-3*nu*nu/((nu-2)*(nu-4))Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 24 / 49

Page 25: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

R-Codes for Computing the Symmetrizing Transformation

c42<-6*c20*((nu/(nu-4))-((nu-1)*c11^2/(nu-3)))

c44<-(c20*nu/(nu-4))-(2*c20*c11^2*(5-nu)/(nu-3))-3*c11^4

mu1<-(c11*d)/sqrt(ss);mu2<-(c22*d^2+c20)/ss

mu3<-(c31*d+c33*d^3)/ss^1.5

mu4<-(c40+c42*d^2+c44*d^4)/ss^2

mu3/(mu4-mu2^2)}

fval<- integrate(hfun,0,x,ss=ss)$value

exp(-2*fval/3)}

##

f1f2int<-function(x,ss)sapply(x,f1f2,ss=ss)

##

integrate(f1f2int,0,x,ss=ss)$value}

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 25 / 49

Page 26: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Symmetrizing transformation

0.00 0.10 0.20 0.30

2.0

3.0

4.0

θ

g(1

θ) n=30

0.00 0.10 0.20 0.30

2.0

2.5

3.0

3.5

θ

g(1

θ) n=50

0.00 0.10 0.20 0.30

1.8

2.2

2.6

3.0

θ

g(1

θ)

n=100

0.00 0.10 0.20 0.30

1.8

2.2

2.6

θ

g(1

θ)

n=200

Figure: 1. Symmetrizing transformation values of the coefficient of variation (θ)for varying values of sample size

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 26 / 49

Page 27: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Comparison of ST and VST

Chaubey, Singh and Sen (2013) carried out a large scale simulationcomparing the VST, ST and UT (untransformed statistic) in terms oftheir normalizing quality. The VST was studied in Singh (1993)that isavailable in an explicit form:

g(θ) = sinh−1(Bθ) = ln[Bθ +

√1 +B2θ2

](5.5)

where B = (1 + 34ν )√

n2ν .

Based on 100,000 simulations, it was concluded that the V STreduces the skewness as compared to the untransformed statistic butthe skewness is still significant even for sample sizes as large as 200.On the other hand the ST reduces skewness to a considerable degreefor sample sizes as small as 30.

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 27 / 49

Page 28: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Comparison of ST and VST

For simulating the probability distribution of g(θ) we consider thestandardized statistic

Zg =g(θ)− E(g(θ))√

var(g(θ))

where g(.) is any of the functions associated with symmetrizing,variance stabilizing transformations and no transformation.

The expected value E(g(θ)), using the expansion of g(Xn) = θ in(2.1), is obtained as,

E(g(Tn)) = g(θ) + g′(θ)ξ1(θ) +1

2g′′(θ)(µ2(θ) + ξ21(θ))

= g(θ) + g′(θ)[ξ1(θ) +

R

2(µ2(θ) + ξ21(θ))]. (5.6)

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 28 / 49

Page 29: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Comparison of ST and VST

Note that for computation of the above expectation for ST ,R = g′′(θ)/g′(θ) is substituted from (2.7) and g′ is numericallyobtained from

g′(θ) = exp{−2

3

∫ θ

0

{f1(u)

f2(u)

}du} (5.7)

The table of simulated probabilities are given in the next table. It wasnoted that for sample sizes less than 50, ST does not providesignificant improvement to the VST. Hence, an adjustment for smallsample sizes was provided as described next.

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 29 / 49

Page 30: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Table 1. Probability distribution (P (Z ≤ zα))∗ of standardized transforms of CVα

CV n Transformation† 0.005 0.025 0.05 0.5 0.95 0.975 0.9950.1 30 ST 0.003 0.021 0.046 0.514 0.939 0.965 0.990

V ST 0.002 0.016 0.039 0.517 0.944 0.969 0.991UT 0.000 0.006 0.025 0.547 0.937 0.961 0.985

50 ST 0.004 0.023 0.049 0.504 0.943 0.970 0.993V ST 0.002 0.018 0.042 0.511 0.945 0.970 0.992UT 0.001 0.010 0.031 0.533 0.939 0.964 0.987

100 ST 0.005 0.025 0.051 0.502 0.946 0.972 0.994V ST 0.003 0.020 0.046 0.509 0.946 0.971 0.993UT 0.001 0.015 0.038 0.523 0.941 0.966 0.990

0.2 30 ST 0.003 0.021 0.045 0.511 0.939 0.966 0.990V ST 0.002 0.017 0.039 0.514 0.943 0.969 0.991UT 0.000 0.007 0.025 0.543 0.937 0.961 0.985

50 ST 0.004 0.023 0.048 0.510 0.943 0.970 0.993V ST 0.002 0.018 0.042 0.516 0.945 0.970 0.992UT 0.001 0.010 0.031 0.536 0.939 0.963 0.987

100 ST 0.005 0.024 0.049 0.501 0.947 0.973 0.994V ST 0.003 0.020 0.044 0.508 0.947 0.971 0.993UT 0.002 0.015 0.037 0.522 0.942 0.966 0.989

0.3 30 ST 0.003 0.022 0.047 0.511 0.941 0.967 0.991V ST 0.002 0.017 0.040 0.516 0.945 0.969 0.991UT 0.000 0.007 0.026 0.543 0.938 0.962 0.985

50 ST 0.004 0.025 0.050 0.505 0.943 0.969 0.993V ST 0.002 0.020 0.043 0.512 0.944 0.969 0.992UT 0.001 0.012 0.033 0.532 0.938 0.962 0.987

100 ST 0.005 0.025 0.050 0.503 0.947 0.973 0.994V ST 0.003 0.021 0.045 0.510 0.946 0.971 0.993UT 0.001 0.015 0.038 0.524 0.942 0.966 0.990

† ST : Symmetrizing transformation. V ST : variance stabilizing transformation. UT : Untransformed.*: zα is such that for Z ∼ N(0, 1), P (Z ≤ zα) = α.

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 30 / 49

Page 31: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Small Sample Adjustment

For adjusting the normal approximation provided by the ST, thetechnique suggested in Mudholkar and Chaubey (1975), using amixture approximation was utilized.

This technique models the distribution of the standardized statisticZST = (g(Tn)− E(g(Tn)))/

õ2g, denote the standardized version of

the ST. Then ZST is modeled as

λN(0, 1)⊕

(1− λ)(χ2ν − ν)√

where⊕

denotes the mixture of the corresponding distributions.

The values of ν and λ are obtained by equating the simulatedskewness and kurtosis denoted by β1(ST ) and β2(ST ), respectively, i.e.

ν =8

β1(ST )and λ = 1− 2

3

β2(ST ) − 3

β1(ST )(5.8)

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 31 / 49

Page 32: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Small Sample Adjustment

The lower tail probabilities for ZST can now be approximated as:

P (ZST ≤ x) = λΦ(x) + (1− λ)P (χ2ν ≤ ν + x

√2ν) (5.9)

The confidence intervals are obtained using the following approximaterepresentation of the quantiles of a mixture distribution in terms ofthose of its components.

Let zα and z∗α be the α quantiles of the standardized distributions

N(0, 1) and χ2ν−ν√2ν

respectively. Then the α quantile xα of the mixture

distribution is approximated as:

xα = λzα + (1− λ)z∗α (5.10)

where z∗α is given in terms of the α quantile χ2ν,α as

z∗α =χ2ν,α − ν√

2ν. (5.11)

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 32 / 49

Page 33: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Small Sample Adjustment

We have used simulated values of β1 and β2 for ST, to developpolynomial approximations in powers of φ and 1/n. Here we used thetechnique of multiple linear regression including up to quadratic termsas well as their interactions on a grid of 105 combinations of φ and nvalues that resulted in the following expressions:

β1ST ≈ −0.06694 + 8.51908/n+ 15.42537/n2

+(0.2456− 14.69333/n+ 155.42357/n2)φ

−(0.25299− 9.73724/n+ 162.48528/n2)φ2 (5.12)

β2ST ≈ 3.02586− 4.67269/n

+209.31385/n2 + (0.16502− 5.7324/n+ 4.18595/n2)φ

−(0.12802− 5.69879/n+ 93.2359/n2)φ2 (5.13)

These models were judged to be adequate under squared multiplecorrelation coefficients which were 99.6% and 98%, respectively.

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 33 / 49

Page 34: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Small Sample Adjustment

A comparison of probabilities obtained by the mixture approximationusing the simulated as well as modeled values of skewness andkurtosis along with corresponding probabilities obtained by simulation(based on 100,000 runs) are presented in Table 2 for θ = 0.1, 0.2, 0.3and n = 20, 30, 40, 50.

It may be seen from this table that the mixture approximation basedon modeled skewness (see Eq. (5.12)) and kurtosis (see Eq. (5.13))gives values reasonably close to those based on their simulated values,and in turn, those are close to the exact probabilities obtained bysimulation.

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 34 / 49

Page 35: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Small Sample Adjustment

Table 2. A comparison of the mixture approximation for P (ZST ≤ zα) : (1) By simulation, (2)Mixture approximation with skewness and kurtosis obtained by simulation (3) Mixture approximation withskewness and kurtosis obtained by empirical formulae (Eqs. 5.12 and 5.13). (zα is such that for Z ∼N(0, 1), P (Z ≤ zα) = α.)

Approximation Lower Tail Probability (α)CV n Method 0.005 0.025 0.05 0.5 0.95 0.975 0.9950.1 20 (1) 0.002 0.017 0.041 0.520 0.935 0.961 0.987

(2) 0.002 0.015 0.037 0.522 0.943 0.968 0.990(3) 0.002 0.015 0.037 0.522 0.943 0.967 0.990

30 (1) 0.003 0.021 0.046 0.514 0.939 0.965 0.990(2) 0.004 0.021 0.045 0.509 0.947 0.972 0.993(3) 0.004 0.021 0.045 0.510 0.947 0.972 0.993

40 (1) 0.004 0.023 0.048 0.508 0.942 0.968 0.992(2) 0.004 0.023 0.048 0.505 0.948 0.973 0.994(3) 0.005 0.024 0.048 0.503 0.949 0.974 0.994

50 (1) 0.004 0.023 0.049 0.504 0.943 0.970 0.993(2) 0.005 0.024 0.049 0.503 0.949 0.974 0.994(3) 0.004 0.023 0.048 0.504 0.948 0.973 0.994

0.2 20 (1) 0.002 0.018 0.043 0.521 0.935 0.961 0.987(2) 0.002 0.015 0.038 0.521 0.943 0.968 0.990(3) 0.002 0.015 0.037 0.523 0.943 0.967 0.990

30 (1) 0.003 0.021 0.045 0.511 0.939 0.966 0.990(2) 0.004 0.021 0.045 0.509 0.947 0.972 0.993(3) 0.004 0.021 0.046 0.508 0.947 0.972 0.993

40 (1) 0.004 0.023 0.049 0.507 0.943 0.969 0.992(2) 0.004 0.023 0.047 0.505 0.948 0.973 0.994(3) 0.004 0.023 0.047 0.505 0.948 0.973 0.994

50 (1) 0.004 0.023 0.048 0.510 0.943 0.970 0.993(2) 0.004 0.024 0.048 0.503 0.949 0.974 0.994(3) 0.005 0.024 0.049 0.502 0.949 0.974 0.995

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 35 / 49

Page 36: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Small Sample Adjustment

Table 2. Continued...Approximation Lower Tail Probability (α)

CV n Method 0.005 0.025 0.05 0.5 0.95 0.975 0.9950.3 20 (1) 0.002 0.017 0.041 0.520 0.937 0.963 0.988

(2) 0.002 0.016 0.038 0.521 0.943 0.968 0.990(3) 0.003 0.017 0.040 0.517 0.944 0.969 0.991

30 (1) 0.003 0.022 0.047 0.511 0.941 0.967 0.991(2) 0.004 0.021 0.045 0.509 0.947 0.972 0.993(3) 0.004 0.021 0.046 0.508 0.947 0.972 0.993

40 (1) 0.004 0.022 0.048 0.507 0.941 0.968 0.991(2) 0.004 0.023 0.047 0.505 0.948 0.973 0.994(3) 0.004 0.022 0.046 0.507 0.947 0.972 0.993

50 (1) 0.004 0.025 0.050 0.505 0.943 0.969 0.993(2) 0.004 0.023 0.048 0.504 0.949 0.974 0.994(3) 0.005 0.024 0.049 0.502 0.949 0.974 0.995

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 36 / 49

Page 37: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Inverse Gaussian Distribution

The inverse Gaussian (IG) distribution is regarded as a natural choicefor modeling non-negative data in many situations; see Chhikara andFolks (1974).

The pdf an IG distribution is given by

f(x;µ, λ) =

2πx3

)e−λ(x−µ)

2

2µ2x

where x, λ, µ > 0.

For this distribution

E(X) = µ, V ar(X) = µ3/λ,CV (X) =

õ

λ

and therefore the ratio ϕ = µ/λ being the squared CV presents analternative way to parametrize the distribution.

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 37 / 49

Page 38: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Inverse Gaussian Distribution

Based on a random sample X1, X2, ..., Xn from IG(µ, λ), ϕ may beof interest for inference on θ. Its unbiased estimator is given by

ϕ = XU,

where

U =1

n− 1

n∑i=1

(1

Xi− 1

X).

It is known that X and U are independent and

X ∼ IG(µ, nλ) and (n− 1)U/λ ∼ χ2(n−1)

These properties may be used to set up the VST and ST in thissituation.

The details will be communicated in a forthcoming publication.

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 38 / 49

Page 39: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

An Application

We compare the 95% confidence intervals for the CV s using data onheights (cm) of n = 30 wheat plants of two varieties (Singh et al.2010).

The sample values were:Variety 1 (Entry 4) : x = 91.7 cm, sd = 6.25cm, CV = 0.06814.Variety 2 (Entry 5): x = 115.03cm sd = 2.63cm, CV = 0.0229

For a general transformation, we have standardised random variate

Zg =g(φ)− E(g(φ))√

Var(g(φ))

100(1− α)% confidence limits are solutions (φL, φU ) of the followingequations:

g(φ)− E(g(φ))√Var(g(φ))

= xα/2, x1−α/2

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 39 / 49

Page 40: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

An Application

xα/2, x1−α/2 are obtained using the distribution of Zg as describdeearlier:

P

xα/2 ≤ g(φ)− E(g(φ))√Var(g(φ))

≤ x1−α/2

= 1− α

Note that the above equations involve the parameters φ and hence θin the expected values and variance of all the three transformations,except the variance of variance stabilizing transformation throughnon-linear functions, the solutions need to be obtained numerically.In our application the uniroot function available in R software wasused. For the variance stabilizing transformation and notransformation cases, xα is the α−quantile of the standard normaldistribution. For the symmetrizing transformation, the skewness (β1)and kurtosis (β2) were modeled using the equations given in thepreceding section. The constants required for the approximations aregiven in Table 3.

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 40 / 49

Page 41: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

An Application

Table 3. Constants for the approximation.

Variety n θ β1 β2 λ ν

Entry 4 30 0.068157 0.2288 3.1010 0.7056 34.97Entry 5 30 0.022864 0.2325 3.1022 0.7070 34.41

The values of xα from equation (5.10) are: x0.025 = −1.8907 andx0.975 = 2.0235. The resulting 95% confidence intervals for θ forvarious transformations are given in Table 4.

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 41 / 49

Page 42: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

An Application

Table 4. The 95% confidence intervals of θ.Entry 4 Entry 5

Transformations Lower Upper Width Lower Upper WidthSymmetrizing 0.05425 0.09051 0.03636 0.01821 0.03031 0.01210Variance stabilizing 0.05317 0.09037 0.03720 0.01785 0.03028 0.01242Untransformed 0.04936 0.08704 0.03767 0.01657 0.02916 0.01259Vangel’s Approx. 0.05409 0.09106 0.03697 0.01820 0.03072 0.01252

In this example, we note that symmetrizing transformation providesnarrower confidence intervals as compared to others.

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 42 / 49

Page 43: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

References

Anscombe. F.J. (1948). The transformation of Poisson. Binomial.Negative Binomial data. Biometrika 35, 246-254.

Bartlett, M.S. (1947). The use of transformations. Biometrics 1,39-52.

Bedeian, A.G. and Mossholder, K.W. (2000). On the use of thecoefficient of variation as a measure of diversity. OrganizationalResearch Methods 3, 285-297.

Butcher, J.M. and O’Brien, C. (1991). The reproducibility ofbiometry and keratometry measurements. Eye 5, 708-711.

Chaubey, Y.P. and Mudholkar, G.S. (1983). On the symmetrizingtransformations of random variables. Preprint, Concordia University,Montreal. Available at http://spectrum.library.concordia.ca/973582/

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 43 / 49

Page 44: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 44 / 49

Page 45: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

References

Chaubey, Y.P. and Mudholkar, G.S. (1984). On the almost symmetryof Fisher’s Z. Metron 42(I/II), 165–169.

Chaubey, Y. P., M. Singh and D. Sen (2013). On symmetrizingtransformation of the sample coefficient of variation from a normalpopulation. Communications in Statistics - Simulation andComputation 42, 2118-2134.

Chhikara R. S. and J. L. Folks (1989). The inverse Gaussiandistribution. Marcel Dekker, New York.

Fisher. R.A. (1915). Frequency distribution of the values ofcorrelation coefficient from an indefinitely large population.Biometrika 10, 507-521.

Fisher. R.A. (1922). On the interpretation of χ2 from contingencytables and calculation of ρ. J. Roy. Statist. Soc. Ser. A, 85, 87–94.

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 45 / 49

Page 46: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

References

Hogben, D., Pinkham, R.S. and Wilk, M.B. (1961). The moments ofthe non-central t-distribution. Biometrika 9, 119–127.

Hotelling. H. (1953). New light on the correlation coefficient and itstransforms. J. Roy. Statist. Soc. Ser. B. 15, 193-224.

Jensen, D.R. and Solomon, H. (1972). A Gaussian approximation tothe distribution of a quadratic form in normal variables. J. Amer.Statist. Assoc. 67, 898-902.

Johnson, N.L. and Kotz, S. (1970). Distributions in statistics:continuous univariate distributions -2, (Chapter 27), New York: JohnWiley & Sons.

Kordonsky, K.B. and Gertsbakh, I. (1997). Multiple Time Scales andthe Lifetime Coefficient of Variation: Engineering Applications.Lifetime Data Analysis 2, 139-156.

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 46 / 49

Page 47: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

References

Mudholkar, G.S. and Chaubey, Y.P. (1975). Use of logisticdistribution for approximating probabilities and percentiles ofStudent’s distribution. Journal of Statistical Research 9, 1-9.

Mudholkar, G.S. and Trivedi, M.C. (1980). A normal approximationfor the distribution of the likelihood ratio statistic in multivariateanalysis of variance. Biometrika 67, 485-488.

Mudholkar, G.S. and Trivedi, M.C. (1981a). A Gaussianapproxiamtion to the distribution of the sample variance fornonnormal Populations. Journal of the American StatisticalAssociation 76, 479485.

Mudholkar, G.S. and Trivedi. M.C. (1981b). A normal approximationfor the multivariate likelihood ratio statistics. In StatisticalDistributions in Scientific Work (C. Taillie, C.P. Patil and A.A.Baldessari, Eds.). Dordrecht: Reidel, Vol. 5, 219-230

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 47 / 49

Page 48: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

References

Quan,H. and Shih, J. (1996). Assessing reproducibility by thewithin-subject coefficient of variation with random effects models.Biometrics 52, 1195-1203.

Rao, C.R. (1973). Linear Statistical Inference and Its applications,New York: John Wiley.

Singh, M. (1993). Behavior of sample coefficient of variation drawnfrom several distributions. Sankhya 55, 65-76.

Singh, M., Niane, A.A., and Chaubey, Y.P. (2010). Evaluatinguniformity of plant varieties: sample size for inference on coefficientof variation. Journal of Statistics and Applications 5, 1–13.

Sankaran, M.S. (1959). On the noncentral χ2 distribution.Biometrika 46, 235-237.

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 48 / 49

Page 49: THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

References

Taye, G. and Njuho, P. (2008). Monitoring Field Variability UsingConfidence Interval for Coefficient of Variation. Communications inStatistics - Theory and Methods 37, 831–846

Wilson, E.B. and Hilferty. M.M. (1931). The distribution ofChi-square. Proc. Nat. Acad. Sc. ll, 684-688.

Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 49 / 49