MasterArbeit Statcon

An extension for smoothed empirical likelihoodconfidence intervals for extreme quantiles and

small sample sizesMaster Thesis

Author: Oliver Thunich1;2

Counseling: Sebastian Schoneberg2; Bertram Schafer2

Supervisors: Claus Weihs1; David Meintrup3

1TU Dortmund2Statcon GmbH3TH Ingolstadt

1. Motivation 1.0

Motivation

I Application: Calculating process capability without a distributionassumption.I Compute confidence intervals for ”extreme” quantiles (e.g. q = 0.01).I Using a non-parametric method.

I Problem: small sample sizes are desired but lead toI Infinite confidence intervals.I Bad coverage rates.

I Method: Smoothed empirical likelihood.I Capable of returning non symmetrical confidence intervals.I Smoothing reduces coverage error.

TU Dortmund; STATCON 2 / 16

2. Smoothed Empirical Likelihood 2.0

Existing Methods

I Empirical likelihood was first introduced by Owen (1988)

I When quantiles are considered, the log likelihood is dependent on theempirical distribution function Fn (Adimari, 1998):

l(Θ) = 2n[Fn(Θ)log(

Fn(Θ)

q) + (1−Fn(Θ))log(

1−Fn(Θ)

1−q)]

I Owen (1988) shows that the Wilks theorem is applicable to computeasymptotic confidence intervals: lim

n→∞P(l(Θ)≤ c) = P(χ2

1 ≤ c)

I As this results in a step function, several methods of smoothing havebeen proposed:I Smoothing using a kernel function. (Chen, Hall, 1993)I Linear smoothing of Fn (Adimari, 1998)

Linear Smoothing

I Let x(1) ≤ ...≤ x(n) be the ordered sample.

I The smoothing proposed by Adimari (1998) is achieved by using alinear smoothing F ∗ of Fn in equation 1.

F ∗n (Θ) =

0 if Θ < x(1)

H(Θ) if Θ ∈ [x(1),x(n))

1 if Θ≥ x(n)

H(Θ) =

{2i−1

2n if Θ = x(i); i ∈ {1, ..,n−1}(1−λ ) 2i−1

2n + λ2i+1

2n if Θ ∈ (x(i),x(i+1));λ =Θ−x(i)

x(i+1)−x(i); i ∈ {1, ..,n−1}

Problems:

I Constant likelihood values outside of the observed data can lead toinfinite CI’s.

I likelihood function still has two jumps (at x(1) and x(n)).

Distribution Function

−2 −1 0 1 2

Figure: Smoothing of Fn for 11 observations from a standard normal distribution

Likelihood Function

−1 0 1 2

Median

not smoothed

Adimari

Chen and Hall

χ1,1−α2

true quantile

Figure: Smoothed empirical likelihood functions (n = 11;q = 0.5;α = 0.05)

Likelihood Function

−2.5 −2.0 −1.5 −1.0 −0.5

1%−Quantile

not smoothed

Adimari

Chen and Hall

χ1,1−α2

true quantile

Figure: Smoothed empirical likelihood functions (n = 11;q = 0.01;α = 0.05)

Coverage Rates

0 100 200 300 400 500

not smoothed

Adimari

Chen and Hall

normal distribution q=0.01

exponential distribution q=0.01

exponential distribution q=0.99

desired coverage rate (0.95)

minimal sample size

Figure: Estimated coverage rates based on 1000 samples

3. The Extension 3.0

The Extension

Idea:I Find an extension of the likelihood function for values outside of the

observed data so thatI finite confidence intervals are guaranteed.I the desired coverage rate is achieved.

1. extending the smoothed ecdf F ∗ as follows:

Fext(Θ) =

0 if Θ≤ x(1)−d1c1

2n −1

2n∗d1∗c (x(1)−Θ) if x(1)−d1c < Θ < x(1)

H(Θ) if x(1) ≤Θ≤ x(n)2n−1

2n + 12n∗d2∗c (Θ−x(n)) if x(n) < Θ < x(n) +d2c

1 if Θ≥ x(n) +d2c

Where c ≥ 1; d1 = 110 ∑

5i=1(x(i+1)−x(i)) and

d2 = 110 ∑

5i=1(x(n−i+1)−x(n−i))

2. linear extension of the likelihood function.TU Dortmund; STATCON 8 / 16

Visualising Fext

−2 −1 0 1 2

FnF*Fext (c=3)

Figure: Smoothing of Fn using Fext

Visualising Fext

−2 −1 0 1 2

FnF*Fext (c=3)Fext (c=7)

Figure: Visualising the influence of the parameter c on Fext

Visualising Fext

−2.5 −2.0 −1.5 −1.0 −0.5

FnF*Fext (c=3)Fext (c=7)

Figure: Visualising the influence of the parameter c on Fext

Visualising l(Θ)

−2.5 −2.0 −1.5 −1.0 −0.5

1%−Quantile

not smoothed

Adimari

extension 1. step

χ1,1−α2

true quantile

Figure: First step of the extension (c = 3)

Visualising l(Θ)

−6 −5 −4 −3 −2 −1

1%−Quantile

not smoothed

Adimari

extension(c=3)

χ1,1−α2

true quantile

Figure: Second step: further linear extension (c = 3)

Visualising l(Θ)

−6 −5 −4 −3 −2 −1

1%−Quantile

not smoothedAdimariextension(c=3)extension(c=25)

Figure: Fully extended likelihood function for different values of c

Extension parameter

I The smallest value of c which results in a coverage rate of at least(1−α) is desired.

I The extension is necessary, if the confidence region extends beyondthe observed data.

I The necessity of the extension depends on the significance level α andthe product R:

{q ∗n if q ≤ 0.5

(1−q)∗n if q > 0.5

I Assumption:I The required value for c depends on q, n, R and α

I The required value for c does not depend on the distribution of thedata.

Modelling c

Simulation study:

I For different values of n;q and α choose the smallest value c ∈ Nthat produces a coverage of at least (1−α).

I Evaluate coverage using m = 5000 samples of size n from a normaldistribution.

I Try to find a model for c given q, n, R and α

Modelling c

0 1 2 3 4

α=0.01α=0.025α=0.05α=0.1

Figure: Chosen value of c for different values of R

Modelling c

Simulation study:

I For different values of n,q and α choose the smallest value c ∈ Nthat produces a coverage of at least (1−α).

I Evaluate coverage using m = 5000 samples of size n from a normaldistribution.

I Model chosen:c = 12.344−7.082∗

√R−2.454∗ log(α)−75.125∗q−0.004∗n.

I (1−q) is used for q > 0.5

I adjusted R2 = 0.933

Modelling c

5 10 15 20

simulated c

α=0.01α=0.025α=0.05α=0.1

Figure: Visualising the Model

Modelling c

−6 −5 −4 −3 −2 −1

1%−Quantile

not smoothedAdimari(c=3)(c=25)modeled(c=16.55)

Figure: Example for the modeled value of c (n = 11, q = 0.01,α = 0.05)

4. Performance 4.0

Performance: coverage

0 100 200 300 400 500 600

Observations (n)

q=0.005

q=0.01

q=0.05

desired coverage

Averages:Coverage: 0.952Width: 1.66

Figure: coverage rates based on 1000 samples for normally distributed data

4. Performance 4.0

0 100 200 300 400 500 600 700

Observations (n)

q=0.005

q=0.01

q=0.05

desired coverage

Figure: coverage rates based on 1000 samples for exponentially distributed data

4. Performance 4.0

0 100 200 300 400 500 600 700

Observations (n)

q=0.995

q=0.99

q=0.95

desired coverage

Figure: coverage rates based on 1000 samples for exponentially distributed data

4. Performance 4.0

Performance: discussion

Table: Average width if coverage is acceptable.q = 0.01,0.99, α = 0.05, based on 1000 samples

n= 10 n= 20 n= 50 n= 100 n= 200 n= 300 n= 400 n= 500g-NV Inf Inf Inf Inf 0.858 0.918 0.664A-NV Inf Inf Inf Inf 0.877 0.844 0.729H-NV 0.856 0.963 0.662

Af-NV 4.554 3.383 2.522 1.953 1.517 1.028 0.849 0.728g-Exp1 Inf Inf Inf Inf 0.021 0.020 0.016A-Exp1 Inf Inf Inf Inf 0.022 0.020 0.018H-Exp1 0.020 0.018 0.016

Af-Exp1 2.043 0.785 0.255 0.109 0.044 0.024 0.020 0.018g-Exp99 Inf Inf Inf Inf 2.479 2.688 1.886A-Exp99 Inf Inf Inf Inf 2.573 2.390 2.041H-Exp99 2.423 2.695 1.806

Af-Exp99 5.463 4.943 4.065 2.882 2.411 2.050

Legend: g-not smoothed; A- Adimari, H-Chen and Hall, Af- Adimari extended,NV-(standard) normal distribution, Exp1-exponential distribution q = 0.01,Exp99-exponential distribution q = 0.99

4. Performance 4.0

Table: Average width if coverage is acceptable.q = 0.01;0.99; α = 0.05, based on 1000 samples

Af-Exp99 5.463 4.943 4.065 2.882 2.411 2.050

Legend: g-not smoothed, A- Adimari, H-Chen and Hall, Af- Adimari extended,NV-(standard) normal distribution, Exp1-exponential distribution q = 0.01,Exp99-exponential distribution q = 0.99

4. Performance 4.0

Af-Exp99 5.463 4.943 4.065 2.882 2.411 2.050

4. Performance 4.0

Af-Exp99 5.463 4.943 4.065 2.882 2.411 2.050

5. Outlook 5.0

Outlook

I Better account for the shape of the data by assigning higher weightsto extreme observations:

I Using a weighted mean for computing d1 and d2.

I Test of a semi parametric variation:I Assume a class of distributions (e.g. exponential) and Modell c using

samples from that distribution.

6. Outlook 6.0

References

Adimari, G. (1998). An empirical likelihood statistic for quantiles. Journalof Statistical Computation and Simulation, 60(1) pages 85-95.

Chen, S. X., Hall, P. (1993). Smoothed empirical likelihood confidenceintervals for quantiles. The Annals of Statistics, pages 1166-1181.

Owen, A. B. (1988). Empirical Likelihood Ratio Confidence IIntervals for aSingle Functional. Biometrika, Vol. 75, No. 2(Jun. 1988), pages237-249.

Zhu, H. (2007) Smoothed Empirical Likelihood for Quantiles and SomeVariations/Extension of Empirical Likelihood for Buckley-JamesEstimator, Ph.D. dissertation, University of Kentucky.

MasterArbeit Statcon

Documents

statcon ORIG

statcon cases.doc

StatCon Syllabus

Chapter VIII Statcon

Statcon Full Text

Statcon Chapter 5

Statcon smu

Statcon Reviewer.docx

STATCON Agpalo

Statcon Full

Remaining Statcon

StatCon Notes

CHAPTER 5 statcon

Statcon Final Syllabus

statcon saver.docx

Finals Statcon

Statcon Tom

© 2016 Statcon Energiaa | CONFIDENTIAL Statcon Energiaa ... · © 2016 Statcon Energiaa | CONFIDENTIAL 7 Statcon Energiaa -WIP v1.0 STATIC FREQUENCY CONVERTER (60KVA/ 400Hz FREQUENCY

Statcon Cases Compilation

Statcon Copy