MasterArbeit Statcon

Preview:

Citation preview

An extension for smoothed empirical likelihoodconfidence intervals for extreme quantiles and

small sample sizesMaster Thesis

Author: Oliver Thunich1;2

Counseling: Sebastian Schoneberg2; Bertram Schafer2

Supervisors: Claus Weihs1; David Meintrup3

1TU Dortmund2Statcon GmbH3TH Ingolstadt

1. Motivation 1.0

Motivation

I Application: Calculating process capability without a distributionassumption.I Compute confidence intervals for ”extreme” quantiles (e.g. q = 0.01).I Using a non-parametric method.

I Problem: small sample sizes are desired but lead toI Infinite confidence intervals.I Bad coverage rates.

I Method: Smoothed empirical likelihood.I Capable of returning non symmetrical confidence intervals.I Smoothing reduces coverage error.

TU Dortmund; STATCON 2 / 16

2. Smoothed Empirical Likelihood 2.0

Existing Methods

I Empirical likelihood was first introduced by Owen (1988)

I When quantiles are considered, the log likelihood is dependent on theempirical distribution function Fn (Adimari, 1998):

l(Θ) = 2n[Fn(Θ)log(

Fn(Θ)

q) + (1−Fn(Θ))log(

1−Fn(Θ)

1−q)]

(1)

I Owen (1988) shows that the Wilks theorem is applicable to computeasymptotic confidence intervals: lim

n→∞P(l(Θ)≤ c) = P(χ2

1 ≤ c)

I As this results in a step function, several methods of smoothing havebeen proposed:I Smoothing using a kernel function. (Chen, Hall, 1993)I Linear smoothing of Fn (Adimari, 1998)

TU Dortmund; STATCON 3 / 16

2. Smoothed Empirical Likelihood 2.0

Linear Smoothing

I Let x(1) ≤ ...≤ x(n) be the ordered sample.

I The smoothing proposed by Adimari (1998) is achieved by using alinear smoothing F ∗ of Fn in equation 1.

F ∗n (Θ) =

0 if Θ < x(1)

H(Θ) if Θ ∈ [x(1),x(n))

1 if Θ≥ x(n)

where

H(Θ) =

{2i−1

2n if Θ = x(i); i ∈ {1, ..,n−1}(1−λ ) 2i−1

2n + λ2i+1

2n if Θ ∈ (x(i),x(i+1));λ =Θ−x(i)

x(i+1)−x(i); i ∈ {1, ..,n−1}

Problems:

I Constant likelihood values outside of the observed data can lead toinfinite CI’s.

I likelihood function still has two jumps (at x(1) and x(n)).

TU Dortmund; STATCON 4 / 16

2. Smoothed Empirical Likelihood 2.0

Distribution Function

−2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

Θ

F(Θ

)

FnF*

Figure: Smoothing of Fn for 11 observations from a standard normal distribution

TU Dortmund; STATCON 5 / 16

2. Smoothed Empirical Likelihood 2.0

Likelihood Function

−1 0 1 2

05

1015

Median

Θ

l(Θ)

not smoothed

Adimari

Chen and Hall

χ1,1−α2

true quantile

Figure: Smoothed empirical likelihood functions (n = 11;q = 0.5;α = 0.05)

TU Dortmund; STATCON 6 / 16

2. Smoothed Empirical Likelihood 2.0

Likelihood Function

−2.5 −2.0 −1.5 −1.0 −0.5

02

46

810

1%−Quantile

Θ

l(Θ)

not smoothed

Adimari

Chen and Hall

χ1,1−α2

true quantile

Figure: Smoothed empirical likelihood functions (n = 11;q = 0.01;α = 0.05)

TU Dortmund; STATCON 6 / 16

2. Smoothed Empirical Likelihood 2.0

Coverage Rates

0 100 200 300 400 500

0.6

0.7

0.8

0.9

1.0

n

cove

rage

rat

e

not smoothed

Adimari

Chen and Hall

normal distribution q=0.01

exponential distribution q=0.01

exponential distribution q=0.99

desired coverage rate (0.95)

minimal sample size

Figure: Estimated coverage rates based on 1000 samples

TU Dortmund; STATCON 7 / 16

3. The Extension 3.0

The Extension

Idea:I Find an extension of the likelihood function for values outside of the

observed data so thatI finite confidence intervals are guaranteed.I the desired coverage rate is achieved.

1. extending the smoothed ecdf F ∗ as follows:

Fext(Θ) =

0 if Θ≤ x(1)−d1c1

2n −1

2n∗d1∗c (x(1)−Θ) if x(1)−d1c < Θ < x(1)

H(Θ) if x(1) ≤Θ≤ x(n)2n−1

2n + 12n∗d2∗c (Θ−x(n)) if x(n) < Θ < x(n) +d2c

1 if Θ≥ x(n) +d2c

Where c ≥ 1; d1 = 110 ∑

5i=1(x(i+1)−x(i)) and

d2 = 110 ∑

5i=1(x(n−i+1)−x(n−i))

2. linear extension of the likelihood function.TU Dortmund; STATCON 8 / 16

3. The Extension 3.0

Visualising Fext

−2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

Θ

F(Θ

)

FnF*Fext (c=3)

Figure: Smoothing of Fn using Fext

TU Dortmund; STATCON 9 / 16

3. The Extension 3.0

Visualising Fext

−2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

Θ

F(Θ

)

FnF*Fext (c=3)Fext (c=7)

Figure: Visualising the influence of the parameter c on Fext

TU Dortmund; STATCON 9 / 16

3. The Extension 3.0

Visualising Fext

−2.5 −2.0 −1.5 −1.0 −0.5

0.0

0.1

0.2

0.3

0.4

0.5

Θ

F(Θ

)

FnF*Fext (c=3)Fext (c=7)

Figure: Visualising the influence of the parameter c on Fext

TU Dortmund; STATCON 9 / 16

3. The Extension 3.0

Visualising l(Θ)

−2.5 −2.0 −1.5 −1.0 −0.5

02

46

810

1%−Quantile

Θ

l(Θ)

not smoothed

Adimari

extension 1. step

χ1,1−α2

true quantile

Figure: First step of the extension (c = 3)

TU Dortmund; STATCON 10 / 16

3. The Extension 3.0

Visualising l(Θ)

−6 −5 −4 −3 −2 −1

02

46

810

1%−Quantile

Θ

l(Θ)

not smoothed

Adimari

extension(c=3)

χ1,1−α2

true quantile

Figure: Second step: further linear extension (c = 3)

TU Dortmund; STATCON 10 / 16

3. The Extension 3.0

Visualising l(Θ)

−6 −5 −4 −3 −2 −1

02

46

810

1%−Quantile

Θ

l(Θ)

not smoothedAdimariextension(c=3)extension(c=25)

Figure: Fully extended likelihood function for different values of c

TU Dortmund; STATCON 10 / 16

3. The Extension 3.0

Extension parameter

I The smallest value of c which results in a coverage rate of at least(1−α) is desired.

I The extension is necessary, if the confidence region extends beyondthe observed data.

I The necessity of the extension depends on the significance level α andthe product R:

R :=

{q ∗n if q ≤ 0.5

(1−q)∗n if q > 0.5

I Assumption:I The required value for c depends on q, n, R and α

I The required value for c does not depend on the distribution of thedata.

TU Dortmund; STATCON 11 / 16

3. The Extension 3.0

Modelling c

Simulation study:

I For different values of n;q and α choose the smallest value c ∈ Nthat produces a coverage of at least (1−α).

I Evaluate coverage using m = 5000 samples of size n from a normaldistribution.

I Try to find a model for c given q, n, R and α

TU Dortmund; STATCON 12 / 16

3. The Extension 3.0

Modelling c

0 1 2 3 4

05

1015

20

R

c

α=0.01α=0.025α=0.05α=0.1

Figure: Chosen value of c for different values of R

TU Dortmund; STATCON 12 / 16

3. The Extension 3.0

Modelling c

Simulation study:

I For different values of n,q and α choose the smallest value c ∈ Nthat produces a coverage of at least (1−α).

I Evaluate coverage using m = 5000 samples of size n from a normaldistribution.

I Model chosen:c = 12.344−7.082∗

√R−2.454∗ log(α)−75.125∗q−0.004∗n.

I (1−q) is used for q > 0.5

I adjusted R2 = 0.933

TU Dortmund; STATCON 12 / 16

3. The Extension 3.0

Modelling c

5 10 15 20

510

1520

simulated c

mod

eled

c

α=0.01α=0.025α=0.05α=0.1

Figure: Visualising the Model

TU Dortmund; STATCON 12 / 16

3. The Extension 3.0

Modelling c

−6 −5 −4 −3 −2 −1

02

46

810

1%−Quantile

Θ

l(Θ)

not smoothedAdimari(c=3)(c=25)modeled(c=16.55)

Figure: Example for the modeled value of c (n = 11, q = 0.01,α = 0.05)

TU Dortmund; STATCON 12 / 16

4. Performance 4.0

Performance: coverage

0 100 200 300 400 500 600

0.80

0.85

0.90

0.95

1.00

Observations (n)

Cov

erag

e

q=0.005

q=0.01

q=0.05

desired coverage

Averages:Coverage: 0.952Width: 1.66

Figure: coverage rates based on 1000 samples for normally distributed data

TU Dortmund; STATCON 13 / 16

4. Performance 4.0

Performance: coverage

0 100 200 300 400 500 600 700

0.80

0.85

0.90

0.95

1.00

Observations (n)

Cov

erag

e

q=0.005

q=0.01

q=0.05

desired coverage

Averages:Coverage: 0.97Width: 0.183

Figure: coverage rates based on 1000 samples for exponentially distributed data

TU Dortmund; STATCON 13 / 16

4. Performance 4.0

Performance: coverage

0 100 200 300 400 500 600 700

0.80

0.85

0.90

0.95

1.00

Observations (n)

Cov

erag

e

q=0.995

q=0.99

q=0.95

desired coverage

Averages:Coverage: 0.941Width: 3.89

Figure: coverage rates based on 1000 samples for exponentially distributed data

TU Dortmund; STATCON 13 / 16

4. Performance 4.0

Performance: discussion

Table: Average width if coverage is acceptable.q = 0.01,0.99, α = 0.05, based on 1000 samples

n= 10 n= 20 n= 50 n= 100 n= 200 n= 300 n= 400 n= 500g-NV Inf Inf Inf Inf 0.858 0.918 0.664A-NV Inf Inf Inf Inf 0.877 0.844 0.729H-NV 0.856 0.963 0.662

Af-NV 4.554 3.383 2.522 1.953 1.517 1.028 0.849 0.728g-Exp1 Inf Inf Inf Inf 0.021 0.020 0.016A-Exp1 Inf Inf Inf Inf 0.022 0.020 0.018H-Exp1 0.020 0.018 0.016

Af-Exp1 2.043 0.785 0.255 0.109 0.044 0.024 0.020 0.018g-Exp99 Inf Inf Inf Inf 2.479 2.688 1.886A-Exp99 Inf Inf Inf Inf 2.573 2.390 2.041H-Exp99 2.423 2.695 1.806

Af-Exp99 5.463 4.943 4.065 2.882 2.411 2.050

Legend: g-not smoothed; A- Adimari, H-Chen and Hall, Af- Adimari extended,NV-(standard) normal distribution, Exp1-exponential distribution q = 0.01,Exp99-exponential distribution q = 0.99

TU Dortmund; STATCON 14 / 16

4. Performance 4.0

Performance: discussion

Table: Average width if coverage is acceptable.q = 0.01;0.99; α = 0.05, based on 1000 samples

n= 10 n= 20 n= 50 n= 100 n= 200 n= 300 n= 400 n= 500g-NV Inf Inf Inf Inf 0.858 0.918 0.664A-NV Inf Inf Inf Inf 0.877 0.844 0.729H-NV 0.856 0.963 0.662

Af-NV 4.554 3.383 2.522 1.953 1.517 1.028 0.849 0.728g-Exp1 Inf Inf Inf Inf 0.021 0.020 0.016A-Exp1 Inf Inf Inf Inf 0.022 0.020 0.018H-Exp1 0.020 0.018 0.016

Af-Exp1 2.043 0.785 0.255 0.109 0.044 0.024 0.020 0.018g-Exp99 Inf Inf Inf Inf 2.479 2.688 1.886A-Exp99 Inf Inf Inf Inf 2.573 2.390 2.041H-Exp99 2.423 2.695 1.806

Af-Exp99 5.463 4.943 4.065 2.882 2.411 2.050

Legend: g-not smoothed, A- Adimari, H-Chen and Hall, Af- Adimari extended,NV-(standard) normal distribution, Exp1-exponential distribution q = 0.01,Exp99-exponential distribution q = 0.99

TU Dortmund; STATCON 14 / 16

4. Performance 4.0

Performance: discussion

Table: Average width if coverage is acceptable.q = 0.01,0.99, α = 0.05, based on 1000 samples

n= 10 n= 20 n= 50 n= 100 n= 200 n= 300 n= 400 n= 500g-NV Inf Inf Inf Inf 0.858 0.918 0.664A-NV Inf Inf Inf Inf 0.877 0.844 0.729H-NV 0.856 0.963 0.662

Af-NV 4.554 3.383 2.522 1.953 1.517 1.028 0.849 0.728g-Exp1 Inf Inf Inf Inf 0.021 0.020 0.016A-Exp1 Inf Inf Inf Inf 0.022 0.020 0.018H-Exp1 0.020 0.018 0.016

Af-Exp1 2.043 0.785 0.255 0.109 0.044 0.024 0.020 0.018g-Exp99 Inf Inf Inf Inf 2.479 2.688 1.886A-Exp99 Inf Inf Inf Inf 2.573 2.390 2.041H-Exp99 2.423 2.695 1.806

Af-Exp99 5.463 4.943 4.065 2.882 2.411 2.050

Legend: g-not smoothed, A- Adimari, H-Chen and Hall, Af- Adimari extended,NV-(standard) normal distribution, Exp1-exponential distribution q = 0.01,Exp99-exponential distribution q = 0.99

TU Dortmund; STATCON 14 / 16

4. Performance 4.0

Performance: discussion

Table: Average width if coverage is acceptable.q = 0.01,0.99, α = 0.05, based on 1000 samples

n= 10 n= 20 n= 50 n= 100 n= 200 n= 300 n= 400 n= 500g-NV Inf Inf Inf Inf 0.858 0.918 0.664A-NV Inf Inf Inf Inf 0.877 0.844 0.729H-NV 0.856 0.963 0.662

Af-NV 4.554 3.383 2.522 1.953 1.517 1.028 0.849 0.728g-Exp1 Inf Inf Inf Inf 0.021 0.020 0.016A-Exp1 Inf Inf Inf Inf 0.022 0.020 0.018H-Exp1 0.020 0.018 0.016

Af-Exp1 2.043 0.785 0.255 0.109 0.044 0.024 0.020 0.018g-Exp99 Inf Inf Inf Inf 2.479 2.688 1.886A-Exp99 Inf Inf Inf Inf 2.573 2.390 2.041H-Exp99 2.423 2.695 1.806

Af-Exp99 5.463 4.943 4.065 2.882 2.411 2.050

Legend: g-not smoothed, A- Adimari, H-Chen and Hall, Af- Adimari extended,NV-(standard) normal distribution, Exp1-exponential distribution q = 0.01,Exp99-exponential distribution q = 0.99

TU Dortmund; STATCON 14 / 16

5. Outlook 5.0

Outlook

I Better account for the shape of the data by assigning higher weightsto extreme observations:

I Using a weighted mean for computing d1 and d2.

I Test of a semi parametric variation:I Assume a class of distributions (e.g. exponential) and Modell c using

samples from that distribution.

TU Dortmund; STATCON 15 / 16

6. Outlook 6.0

References

Adimari, G. (1998). An empirical likelihood statistic for quantiles. Journalof Statistical Computation and Simulation, 60(1) pages 85-95.

Chen, S. X., Hall, P. (1993). Smoothed empirical likelihood confidenceintervals for quantiles. The Annals of Statistics, pages 1166-1181.

Owen, A. B. (1988). Empirical Likelihood Ratio Confidence IIntervals for aSingle Functional. Biometrika, Vol. 75, No. 2(Jun. 1988), pages237-249.

Zhu, H. (2007) Smoothed Empirical Likelihood for Quantiles and SomeVariations/Extension of Empirical Likelihood for Buckley-JamesEstimator, Ph.D. dissertation, University of Kentucky.

TU Dortmund; STATCON 16 / 16