30
An extension for smoothed empirical likelihood confidence intervals for extreme quantiles and small sample sizes Master Thesis Author: Oliver Thunich 1;2 Counseling: Sebastian Schoneberg 2 ; Bertram Sch¨ afer 2 Supervisors: Claus Weihs 1 ; David Meintrup 3 1 TU Dortmund 2 Statcon GmbH 3 TH Ingolstadt

MasterArbeit Statcon

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MasterArbeit Statcon

An extension for smoothed empirical likelihoodconfidence intervals for extreme quantiles and

small sample sizesMaster Thesis

Author: Oliver Thunich1;2

Counseling: Sebastian Schoneberg2; Bertram Schafer2

Supervisors: Claus Weihs1; David Meintrup3

1TU Dortmund2Statcon GmbH3TH Ingolstadt

Page 2: MasterArbeit Statcon

1. Motivation 1.0

Motivation

I Application: Calculating process capability without a distributionassumption.I Compute confidence intervals for ”extreme” quantiles (e.g. q = 0.01).I Using a non-parametric method.

I Problem: small sample sizes are desired but lead toI Infinite confidence intervals.I Bad coverage rates.

I Method: Smoothed empirical likelihood.I Capable of returning non symmetrical confidence intervals.I Smoothing reduces coverage error.

TU Dortmund; STATCON 2 / 16

Page 3: MasterArbeit Statcon

2. Smoothed Empirical Likelihood 2.0

Existing Methods

I Empirical likelihood was first introduced by Owen (1988)

I When quantiles are considered, the log likelihood is dependent on theempirical distribution function Fn (Adimari, 1998):

l(Θ) = 2n[Fn(Θ)log(

Fn(Θ)

q) + (1−Fn(Θ))log(

1−Fn(Θ)

1−q)]

(1)

I Owen (1988) shows that the Wilks theorem is applicable to computeasymptotic confidence intervals: lim

n→∞P(l(Θ)≤ c) = P(χ2

1 ≤ c)

I As this results in a step function, several methods of smoothing havebeen proposed:I Smoothing using a kernel function. (Chen, Hall, 1993)I Linear smoothing of Fn (Adimari, 1998)

TU Dortmund; STATCON 3 / 16

Page 4: MasterArbeit Statcon

2. Smoothed Empirical Likelihood 2.0

Linear Smoothing

I Let x(1) ≤ ...≤ x(n) be the ordered sample.

I The smoothing proposed by Adimari (1998) is achieved by using alinear smoothing F ∗ of Fn in equation 1.

F ∗n (Θ) =

0 if Θ < x(1)

H(Θ) if Θ ∈ [x(1),x(n))

1 if Θ≥ x(n)

where

H(Θ) =

{2i−1

2n if Θ = x(i); i ∈ {1, ..,n−1}(1−λ ) 2i−1

2n + λ2i+1

2n if Θ ∈ (x(i),x(i+1));λ =Θ−x(i)

x(i+1)−x(i); i ∈ {1, ..,n−1}

Problems:

I Constant likelihood values outside of the observed data can lead toinfinite CI’s.

I likelihood function still has two jumps (at x(1) and x(n)).

TU Dortmund; STATCON 4 / 16

Page 5: MasterArbeit Statcon

2. Smoothed Empirical Likelihood 2.0

Distribution Function

−2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

Θ

F(Θ

)

FnF*

Figure: Smoothing of Fn for 11 observations from a standard normal distribution

TU Dortmund; STATCON 5 / 16

Page 6: MasterArbeit Statcon

2. Smoothed Empirical Likelihood 2.0

Likelihood Function

−1 0 1 2

05

1015

Median

Θ

l(Θ)

not smoothed

Adimari

Chen and Hall

χ1,1−α2

true quantile

Figure: Smoothed empirical likelihood functions (n = 11;q = 0.5;α = 0.05)

TU Dortmund; STATCON 6 / 16

Page 7: MasterArbeit Statcon

2. Smoothed Empirical Likelihood 2.0

Likelihood Function

−2.5 −2.0 −1.5 −1.0 −0.5

02

46

810

1%−Quantile

Θ

l(Θ)

not smoothed

Adimari

Chen and Hall

χ1,1−α2

true quantile

Figure: Smoothed empirical likelihood functions (n = 11;q = 0.01;α = 0.05)

TU Dortmund; STATCON 6 / 16

Page 8: MasterArbeit Statcon

2. Smoothed Empirical Likelihood 2.0

Coverage Rates

0 100 200 300 400 500

0.6

0.7

0.8

0.9

1.0

n

cove

rage

rat

e

not smoothed

Adimari

Chen and Hall

normal distribution q=0.01

exponential distribution q=0.01

exponential distribution q=0.99

desired coverage rate (0.95)

minimal sample size

Figure: Estimated coverage rates based on 1000 samples

TU Dortmund; STATCON 7 / 16

Page 9: MasterArbeit Statcon

3. The Extension 3.0

The Extension

Idea:I Find an extension of the likelihood function for values outside of the

observed data so thatI finite confidence intervals are guaranteed.I the desired coverage rate is achieved.

1. extending the smoothed ecdf F ∗ as follows:

Fext(Θ) =

0 if Θ≤ x(1)−d1c1

2n −1

2n∗d1∗c (x(1)−Θ) if x(1)−d1c < Θ < x(1)

H(Θ) if x(1) ≤Θ≤ x(n)2n−1

2n + 12n∗d2∗c (Θ−x(n)) if x(n) < Θ < x(n) +d2c

1 if Θ≥ x(n) +d2c

Where c ≥ 1; d1 = 110 ∑

5i=1(x(i+1)−x(i)) and

d2 = 110 ∑

5i=1(x(n−i+1)−x(n−i))

2. linear extension of the likelihood function.TU Dortmund; STATCON 8 / 16

Page 10: MasterArbeit Statcon

3. The Extension 3.0

Visualising Fext

−2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

Θ

F(Θ

)

FnF*Fext (c=3)

Figure: Smoothing of Fn using Fext

TU Dortmund; STATCON 9 / 16

Page 11: MasterArbeit Statcon

3. The Extension 3.0

Visualising Fext

−2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

Θ

F(Θ

)

FnF*Fext (c=3)Fext (c=7)

Figure: Visualising the influence of the parameter c on Fext

TU Dortmund; STATCON 9 / 16

Page 12: MasterArbeit Statcon

3. The Extension 3.0

Visualising Fext

−2.5 −2.0 −1.5 −1.0 −0.5

0.0

0.1

0.2

0.3

0.4

0.5

Θ

F(Θ

)

FnF*Fext (c=3)Fext (c=7)

Figure: Visualising the influence of the parameter c on Fext

TU Dortmund; STATCON 9 / 16

Page 13: MasterArbeit Statcon

3. The Extension 3.0

Visualising l(Θ)

−2.5 −2.0 −1.5 −1.0 −0.5

02

46

810

1%−Quantile

Θ

l(Θ)

not smoothed

Adimari

extension 1. step

χ1,1−α2

true quantile

Figure: First step of the extension (c = 3)

TU Dortmund; STATCON 10 / 16

Page 14: MasterArbeit Statcon

3. The Extension 3.0

Visualising l(Θ)

−6 −5 −4 −3 −2 −1

02

46

810

1%−Quantile

Θ

l(Θ)

not smoothed

Adimari

extension(c=3)

χ1,1−α2

true quantile

Figure: Second step: further linear extension (c = 3)

TU Dortmund; STATCON 10 / 16

Page 15: MasterArbeit Statcon

3. The Extension 3.0

Visualising l(Θ)

−6 −5 −4 −3 −2 −1

02

46

810

1%−Quantile

Θ

l(Θ)

not smoothedAdimariextension(c=3)extension(c=25)

Figure: Fully extended likelihood function for different values of c

TU Dortmund; STATCON 10 / 16

Page 16: MasterArbeit Statcon

3. The Extension 3.0

Extension parameter

I The smallest value of c which results in a coverage rate of at least(1−α) is desired.

I The extension is necessary, if the confidence region extends beyondthe observed data.

I The necessity of the extension depends on the significance level α andthe product R:

R :=

{q ∗n if q ≤ 0.5

(1−q)∗n if q > 0.5

I Assumption:I The required value for c depends on q, n, R and α

I The required value for c does not depend on the distribution of thedata.

TU Dortmund; STATCON 11 / 16

Page 17: MasterArbeit Statcon

3. The Extension 3.0

Modelling c

Simulation study:

I For different values of n;q and α choose the smallest value c ∈ Nthat produces a coverage of at least (1−α).

I Evaluate coverage using m = 5000 samples of size n from a normaldistribution.

I Try to find a model for c given q, n, R and α

TU Dortmund; STATCON 12 / 16

Page 18: MasterArbeit Statcon

3. The Extension 3.0

Modelling c

0 1 2 3 4

05

1015

20

R

c

α=0.01α=0.025α=0.05α=0.1

Figure: Chosen value of c for different values of R

TU Dortmund; STATCON 12 / 16

Page 19: MasterArbeit Statcon

3. The Extension 3.0

Modelling c

Simulation study:

I For different values of n,q and α choose the smallest value c ∈ Nthat produces a coverage of at least (1−α).

I Evaluate coverage using m = 5000 samples of size n from a normaldistribution.

I Model chosen:c = 12.344−7.082∗

√R−2.454∗ log(α)−75.125∗q−0.004∗n.

I (1−q) is used for q > 0.5

I adjusted R2 = 0.933

TU Dortmund; STATCON 12 / 16

Page 20: MasterArbeit Statcon

3. The Extension 3.0

Modelling c

5 10 15 20

510

1520

simulated c

mod

eled

c

α=0.01α=0.025α=0.05α=0.1

Figure: Visualising the Model

TU Dortmund; STATCON 12 / 16

Page 21: MasterArbeit Statcon

3. The Extension 3.0

Modelling c

−6 −5 −4 −3 −2 −1

02

46

810

1%−Quantile

Θ

l(Θ)

not smoothedAdimari(c=3)(c=25)modeled(c=16.55)

Figure: Example for the modeled value of c (n = 11, q = 0.01,α = 0.05)

TU Dortmund; STATCON 12 / 16

Page 22: MasterArbeit Statcon

4. Performance 4.0

Performance: coverage

0 100 200 300 400 500 600

0.80

0.85

0.90

0.95

1.00

Observations (n)

Cov

erag

e

q=0.005

q=0.01

q=0.05

desired coverage

Averages:Coverage: 0.952Width: 1.66

Figure: coverage rates based on 1000 samples for normally distributed data

TU Dortmund; STATCON 13 / 16

Page 23: MasterArbeit Statcon

4. Performance 4.0

Performance: coverage

0 100 200 300 400 500 600 700

0.80

0.85

0.90

0.95

1.00

Observations (n)

Cov

erag

e

q=0.005

q=0.01

q=0.05

desired coverage

Averages:Coverage: 0.97Width: 0.183

Figure: coverage rates based on 1000 samples for exponentially distributed data

TU Dortmund; STATCON 13 / 16

Page 24: MasterArbeit Statcon

4. Performance 4.0

Performance: coverage

0 100 200 300 400 500 600 700

0.80

0.85

0.90

0.95

1.00

Observations (n)

Cov

erag

e

q=0.995

q=0.99

q=0.95

desired coverage

Averages:Coverage: 0.941Width: 3.89

Figure: coverage rates based on 1000 samples for exponentially distributed data

TU Dortmund; STATCON 13 / 16

Page 25: MasterArbeit Statcon

4. Performance 4.0

Performance: discussion

Table: Average width if coverage is acceptable.q = 0.01,0.99, α = 0.05, based on 1000 samples

n= 10 n= 20 n= 50 n= 100 n= 200 n= 300 n= 400 n= 500g-NV Inf Inf Inf Inf 0.858 0.918 0.664A-NV Inf Inf Inf Inf 0.877 0.844 0.729H-NV 0.856 0.963 0.662

Af-NV 4.554 3.383 2.522 1.953 1.517 1.028 0.849 0.728g-Exp1 Inf Inf Inf Inf 0.021 0.020 0.016A-Exp1 Inf Inf Inf Inf 0.022 0.020 0.018H-Exp1 0.020 0.018 0.016

Af-Exp1 2.043 0.785 0.255 0.109 0.044 0.024 0.020 0.018g-Exp99 Inf Inf Inf Inf 2.479 2.688 1.886A-Exp99 Inf Inf Inf Inf 2.573 2.390 2.041H-Exp99 2.423 2.695 1.806

Af-Exp99 5.463 4.943 4.065 2.882 2.411 2.050

Legend: g-not smoothed; A- Adimari, H-Chen and Hall, Af- Adimari extended,NV-(standard) normal distribution, Exp1-exponential distribution q = 0.01,Exp99-exponential distribution q = 0.99

TU Dortmund; STATCON 14 / 16

Page 26: MasterArbeit Statcon

4. Performance 4.0

Performance: discussion

Table: Average width if coverage is acceptable.q = 0.01;0.99; α = 0.05, based on 1000 samples

n= 10 n= 20 n= 50 n= 100 n= 200 n= 300 n= 400 n= 500g-NV Inf Inf Inf Inf 0.858 0.918 0.664A-NV Inf Inf Inf Inf 0.877 0.844 0.729H-NV 0.856 0.963 0.662

Af-NV 4.554 3.383 2.522 1.953 1.517 1.028 0.849 0.728g-Exp1 Inf Inf Inf Inf 0.021 0.020 0.016A-Exp1 Inf Inf Inf Inf 0.022 0.020 0.018H-Exp1 0.020 0.018 0.016

Af-Exp1 2.043 0.785 0.255 0.109 0.044 0.024 0.020 0.018g-Exp99 Inf Inf Inf Inf 2.479 2.688 1.886A-Exp99 Inf Inf Inf Inf 2.573 2.390 2.041H-Exp99 2.423 2.695 1.806

Af-Exp99 5.463 4.943 4.065 2.882 2.411 2.050

Legend: g-not smoothed, A- Adimari, H-Chen and Hall, Af- Adimari extended,NV-(standard) normal distribution, Exp1-exponential distribution q = 0.01,Exp99-exponential distribution q = 0.99

TU Dortmund; STATCON 14 / 16

Page 27: MasterArbeit Statcon

4. Performance 4.0

Performance: discussion

Table: Average width if coverage is acceptable.q = 0.01,0.99, α = 0.05, based on 1000 samples

n= 10 n= 20 n= 50 n= 100 n= 200 n= 300 n= 400 n= 500g-NV Inf Inf Inf Inf 0.858 0.918 0.664A-NV Inf Inf Inf Inf 0.877 0.844 0.729H-NV 0.856 0.963 0.662

Af-NV 4.554 3.383 2.522 1.953 1.517 1.028 0.849 0.728g-Exp1 Inf Inf Inf Inf 0.021 0.020 0.016A-Exp1 Inf Inf Inf Inf 0.022 0.020 0.018H-Exp1 0.020 0.018 0.016

Af-Exp1 2.043 0.785 0.255 0.109 0.044 0.024 0.020 0.018g-Exp99 Inf Inf Inf Inf 2.479 2.688 1.886A-Exp99 Inf Inf Inf Inf 2.573 2.390 2.041H-Exp99 2.423 2.695 1.806

Af-Exp99 5.463 4.943 4.065 2.882 2.411 2.050

Legend: g-not smoothed, A- Adimari, H-Chen and Hall, Af- Adimari extended,NV-(standard) normal distribution, Exp1-exponential distribution q = 0.01,Exp99-exponential distribution q = 0.99

TU Dortmund; STATCON 14 / 16

Page 28: MasterArbeit Statcon

4. Performance 4.0

Performance: discussion

Table: Average width if coverage is acceptable.q = 0.01,0.99, α = 0.05, based on 1000 samples

n= 10 n= 20 n= 50 n= 100 n= 200 n= 300 n= 400 n= 500g-NV Inf Inf Inf Inf 0.858 0.918 0.664A-NV Inf Inf Inf Inf 0.877 0.844 0.729H-NV 0.856 0.963 0.662

Af-NV 4.554 3.383 2.522 1.953 1.517 1.028 0.849 0.728g-Exp1 Inf Inf Inf Inf 0.021 0.020 0.016A-Exp1 Inf Inf Inf Inf 0.022 0.020 0.018H-Exp1 0.020 0.018 0.016

Af-Exp1 2.043 0.785 0.255 0.109 0.044 0.024 0.020 0.018g-Exp99 Inf Inf Inf Inf 2.479 2.688 1.886A-Exp99 Inf Inf Inf Inf 2.573 2.390 2.041H-Exp99 2.423 2.695 1.806

Af-Exp99 5.463 4.943 4.065 2.882 2.411 2.050

Legend: g-not smoothed, A- Adimari, H-Chen and Hall, Af- Adimari extended,NV-(standard) normal distribution, Exp1-exponential distribution q = 0.01,Exp99-exponential distribution q = 0.99

TU Dortmund; STATCON 14 / 16

Page 29: MasterArbeit Statcon

5. Outlook 5.0

Outlook

I Better account for the shape of the data by assigning higher weightsto extreme observations:

I Using a weighted mean for computing d1 and d2.

I Test of a semi parametric variation:I Assume a class of distributions (e.g. exponential) and Modell c using

samples from that distribution.

TU Dortmund; STATCON 15 / 16

Page 30: MasterArbeit Statcon

6. Outlook 6.0

References

Adimari, G. (1998). An empirical likelihood statistic for quantiles. Journalof Statistical Computation and Simulation, 60(1) pages 85-95.

Chen, S. X., Hall, P. (1993). Smoothed empirical likelihood confidenceintervals for quantiles. The Annals of Statistics, pages 1166-1181.

Owen, A. B. (1988). Empirical Likelihood Ratio Confidence IIntervals for aSingle Functional. Biometrika, Vol. 75, No. 2(Jun. 1988), pages237-249.

Zhu, H. (2007) Smoothed Empirical Likelihood for Quantiles and SomeVariations/Extension of Empirical Likelihood for Buckley-JamesEstimator, Ph.D. dissertation, University of Kentucky.

TU Dortmund; STATCON 16 / 16