Click here to load reader
Upload
jeromeweir
View
212
Download
0
Embed Size (px)
Citation preview
THE ROYAL STATISTICAL SOCIETY
2003 EXAMINATIONS −−−− SOLUTIONS
GRADUATE DIPLOMA
PAPER II −−−− STATISTICAL THEORY & METHODS The Society provides these solutions to assist candidates preparing for the examinations in future years and for the information of any other persons using the examinations. The solutions should NOT be seen as "model answers". Rather, they have been written out in considerable detail and are intended as learning aids. Users of the solutions should always be aware that in many cases there are valid alternative methods. Also, in the many cases where discussion is called for, there may be other valid points that could be made. While every care has been taken with the preparation of these solutions, the Society will not be responsible for any errors or omissions. The Society will not enter into any correspondence in respect of these solutions. © RSS 2003
Graduate Diploma, Statistical Theory & Methods, Paper II, 2003. Question 1 (i) The likelihood L of the sample is
( ) ( ) ( )1
1 1
1n n
ni i
i iL f x x θθ − +
= =
= = +∏ ∏
i.e. we have ( ) ( )1
log log 1 log 1n
ii
L n xθ θ=
= − + +∑ .
( ) ( )1
loglog 1
n
ii
d L n xdθ θ =
∴ = − +∑ (A)
and setting this equal to zero gives ( )ˆ
log 1 i
nx
θ =+∑
. Further, ( )2
2 2
logd L ndθ θ
= − ,
confirming that this is a maximum. Hence, by the invariance property of maximum likelihood estimators,
( )1
1 1ˆ log 1ˆn
ii
xn
γθ =
= = +∑ .
(ii) ( ){ } ( )( ) 11
1log 1 11w
wi i e
P X w P X e dxx θθ
∞
+−+ > = > − =
+∫
( )1
1 101 w
ww
e
eex
θθ θ
∞
−
−
−= = + = +
.
Hence the cdf of this is 1 – e–θ
w and the pdf is θ e–θ
w, so the distribution is exponential
with mean 1/θ = γ.
[ ] ( )1ˆ . log 1 iE nE Xn
γ γ∴ = + = . Thus γ̂ is an unbiased estimator of γ.
(iii) ( ) ( )log logd d dL Ld d d
θγ θ γ
=
( ){ } 2
1log 1 in xγγ
= − + −
∑ [using result (A) above] ( )2
1
1 log 1n
ii
n xγ γ =
= − + +∑ .
( ) ( )2
2 2 31
2log log 1n
ii
d nL xdγ γ γ =
∴ = − +∑ , 2
2 2 3 2
2logd n nE L nd
γγ γ γ γ
∴ − = − + =
, and
the C-R lower bound is γ 2/n. From (ii), ( ) 2ˆVar /nγ γ= , so the bound is attained. (iv) No. Because the bound is attainable for γ, it cannot be attainable for a non-linear function of γ, such as θ = 1/γ.
Graduate Diploma, Statistical Theory & Methods, Paper II, 2003. Question 2 (i) Given a random sample of data X from a distribution having parameter θ, a statistic T(X) is sufficient for θ if the conditional distribution of X given T(X) does not involve θ. (ii) Let Y = min(Xi). Defining the indicator function Iθ (xi) to be 0 when xi < θ and
to be 1 when xi ≥ θ, the likelihood function is ( ) ( )1
i
nx
ii
L e I xθθθ −
=
= ∏ . Also, we have
( ) ( )1
n
ii
I x I yθ θ=
=∏ and so ( ) ( ) ixnL e I y eθθθ −Σ= . Therefore, by the factorisation
theorem, Y is sufficient for θ.
(iii) P(Y > y) implies P(X1 > y, X2 > y, …, Xn > y), i.e. ( ) ( )1
n
ii
P Y y P X y=
> = >∏ .
Now, ( ) x x yyy
P X y e dx e eθ θ θ∞ ∞− − − > = = − = ∫ , so ( ) ( )n yP Y y e θ −> = , for y > θ.
Hence the cdf is ( ) ( )1 n yF y e θ −= − and the pdf is f(y) = dF(y)/dy = ( )n yne θ − , for y > θ.
(iv) We have that Y has a shifted exponential distribution. Hence ( ) 1E Yn
θ= +
and ( ) 2
1Var Yn
= , so that ( ) 1E Y c cn
θ− = − + and ( ) 2
1Var Y cn
− = . From these,
( ) 1Bias Y c cn
− = − and 2
22
1 1Bias VarMSE cn n
= + = − +
, which is clearly
minimised when c = 1/n. Thus Y – (1/n) has smallest variance of all estimators of the form Y – c.
Graduate Diploma, Statistical Theory & Methods, Paper II, 2003. Question 3 (i) The likelihood for a sample (x1, x2,, …, xn) is ( ) ( )Const. 1 ii
n xxL θ θ θ −ΣΣ= × − ,
and so the likelihood ratio is ( )( )
( ) ( )( ) ( )
2 2 13 3 33 3 14 4 4
8 49 3
i i i i
i i
x n x x n x
x n x
LL
λΣ −Σ Σ −Σ
Σ −Σ = = =
. Using the
Neyman-Pearson lemma, we reject H0 when λ > c, where c is chosen to give the required level of test, α. Now, λ is an increasing function of Σxi, hence of θ̂ , and an equivalent rule is therefore to reject H0 when ˆ kθ < , where k is chosen to give test level α. (ii) ˆnθ is binomial with parameters (n, θ ). Hence the large-sample distribution of θ̂ is ( )( )N , 1 /nθ θ θ− . When θ = 3/4 this is ( )3 3
4 16N , n , and when θ = 2/3 it is
( )2 23 9N , n .
(iii) For α = 0.05, choose k such that ( )3
4ˆ 0.05P kθ θ< = = . That is, we want
34 0.05
3/16k
n− Φ =
, or
34 1.6449
3/16k
n− = − , giving 3 1.6449 3
4 4k
n= − .
(iv) For power 0.95, ( )23
ˆ 0.95P kθ θ< = = , i.e. 23 0.95
2 / 9k
n− Φ =
or
23 1.6449
2 / 9k
n−
= , giving 2 1.6449 23 3
kn
= + .
Using this expression for k together with the expression in (iii) means that we require
3 1.6449 3 2 1.6449 24 4 3 3n n
− = + or 1 1.6449 1 13 212 4 3n
= +
.
Thus we get √n = 12 × 1.6449 × 0.9044 = 17.8521 and n = 318.7, so we take n = 319.
Graduate Diploma, Statistical Theory & Methods, Paper II, 2003. Question 4 (i) P(0) = θ P(1) = θ (1 – θ ) P(≥ 2) = 1 – θ – θ (1 – θ ) = (1 – θ )2. Thus the likelihood of n0 zeros, n1 ones and n2 with two or more flaws is
( ){ } { } ( ) ( )1 0 1 0 10 0 12 2 21 1 1n n n n n n nn n nL θ θ θ θ θ θ− − − −+= − − = − .
(ii) ( ) ( ) ( ) ( )0 1 0 1log log 2 2 log 1L n n n n nθ θ θ= + + − − − .
( ) 0 1 0 12 2log1
n n n n nd Ldθ θ θ
+ − −∴ = −−
.
Setting this equal to zero gives that θ̂ satisfies ( )( ) ( )0 0 1ˆ ˆ1 2 2n n n n nθ θ+ − = − − , so
that 0 1
0
ˆ2n nn n
θ +=−
.
Further, ( )( )
20 1 0 1
22 2
2 2log1
n n n n nd Ldθ θ θ
+ − −= − −−
, which confirms that θ̂ is a maximum,
and the sample information when ˆθ θ= (given by ( )2
2logd L
dE θ− evaluated at ˆθ θ= ) is
( ) ( )2 20 0
0 1 0 1
2 22 2
n n n nn n n n n
− −+
+ − − (using 0 1
0
2 2ˆ12
n n nn n
θ − −− =−
).
(iii) An approximate 90% confidence interval for θ is ( )
1.6449ˆsample information
θ ± .
In the case when n = 100, n0 = 90 and n1 = 7, we have 2n – n0 = 110, n0 + n1 = 97 and 2n – 2n0 – n1 = 13.
Thus 97ˆ 0.882110
θ = = and the sample information is 2 2110 110 1055.5115
97 13+ = .
Thus the confidence interval is
1.64490.88232.489
± , i.e. 0.882 0.051± or (0.831, 0.933).
Graduate Diploma, Statistical Theory & Methods, Paper II, 2003. Question 5 (i) α = 0.025, β = 0.075.
For observations x1, x2, …, xn the likelihood is ( )2
122
2exp
nn
ii i
n n
x xL θθθ
= Σ= −
∏, and for
the given H0 and H1 the likelihood ratio is ( )( )
22
2 1 3exp1 2 4
nn in
n
Lx
Lλ = =
∑ .
The sequential probability ratio test rule is to continue sampling while A < λn < B, accept H0 if λn ≥ B and reject H0 (i.e. accept H1) if λn ≤ A. A and B are given by
0.025 1 0.0271 0.925 37
A αβ
= = = =−
, 1 0.975 130.075
B αβ−= = = .
(ii) ( ) 2 23
2 /20
2 xxE X e dxθ
θ∞ −= ∫ put y = x2/θ 2, so that dy/dx = 2x/θ 2
( )2 2 2
02yye dyθ θ θ
∞ −= = Γ =∫ .
The ith item in the sequence making up {log λn} is 23
42 log 2i iZ X= − + .
( ) 342 2log 2 .4 1.6137iE Z θ = = − + = .
( ) 341 2log 2 .1 0.6363iE Z θ = = − + = − .
( ) ( )( )
log 1 log2 1.494
2i
A BE N
E Zα α
θθ
+ −= ≈ =
=.
( ) ( )( )
1 log log1 4.948
1i
A BE N
E Zβ β
θθ
− += ≈ =
=.
(iii) x1 = 2.2. ( )31
1 4 4exp 4.84 9.428λ = × = , continue sampling. x2 = 2.5. ( )2 231
2 16 4exp (2.2 2.5 ) 255.93λ = × + = , accept H0. No need to consider x3.
Graduate Diploma, Statistical Theory & Methods, Paper II, 2003. Question 6 (i) A prior distribution is conjugate for a particular model (e.g. Normal, beta) if the prior and posterior distributions are from the same family.
(ii) Likelihood ( ) ( )1/ 2
1
1constant exp 22
nn
i ii
L x xθ θ θ −
=
= × − − +
∑x .
The posterior distribution is proportional to ( ) ( )g Lθ θx , i.e. it is
( ) ( )11 / 2
1
1constant exp 22
nn
i ii
x xαθ θ β −− +
=
× − + − + ∑ ,
which is gamma with parameters ( / 2)nα + and ( )11 22 i ix xβ −+ − +∑ . Hence the
gamma prior is conjugate. (iii) The mean, 20, is α /β. The variance, also 20, is α /β 2. So β must be 1, and α must be 20, and these must be the values used in the prior distribution.
(iv) θ x is gamma 80 5.020 , 12 2
+ + , i.e. gamma(60, 3.5).
The mean of this is 60/3.5 and the variance is 60/(3.5)2. These are used in a Normal approximation, which is satisfactory for n = 80. Hence an approximate 90% highest posterior density interval for θ is given by
60 601.64493.5 3.5
± ,
i.e. 17.143 ± 3.640 or (13.50, 20.78).
Graduate Diploma, Statistical Theory & Methods, Paper II, 2003. Question 7 (i) The likelihood ( )L θx is ( ). 1 ii
n xxk θ θ −ΣΣ − , and the posterior density is
( ) ( ) ( )g g Lθ θ θ∝x x
i.e. we have
( ) ( ) ( )11 1 1 iin xxg βαθ θ θ θ θ− −ΣΣ−∝ − −x
( ) 11 1 iin xx βαθ θ + − −Σ+Σ −= − .
So θ x is beta(α + Σxi, β + n – Σxi), and with a squared error loss the Bayes estimator
of θ is the mean of this distribution, i.e. ixn
αα β
+ Σ+ +
.
(ii) When 12 nα β= = , we have
12
B̂in x
n nθ + Σ=
+. The expectation of this is
12 n nn n
θ++
, so its bias is given by
( )11 122 2
1nn n
n n n n nθθ θθ
−+ −− = =+ + +
.
Also,
( ) ( )( ) ( )
( )B 2 2
11ˆVar Var 11
ix nn n n n n
θ θθ θ θ
−Σ = = − = + + +.
The risk is
( ) ( )( )
( )( ) ( )
2122
B 2 2 2
1 1ˆ Bias Variance1 1 4 1
MSEn n n
θ θ θθ
− −= + = + =
+ + +.
(iii) A Bayes estimator with constant risk for all θ is minimax.
Graduate Diploma, Statistical Theory & Methods, Paper II, 2003. Question 8 Topics to be included in a comprehensive answer include the following, and suitable examples should be given. Parametric tests are based on assumptions about the values of the parameters in mass or density functions for a family of distributions, for example N(µ, σ 2) or B(n, p), and confidence interval methods use the same theory. Parametric methods often use a likelihood function based on an assumed model, for example in a likelihood ratio test to compare hypotheses about a parameter in (say) a gamma family. Moments of a distribution, especially mean and variance, are often used in parametric methods, whereas order statistics (median etc) are more useful for non-parametric inference. It is less easy to construct confidence-limit arguments in non-parametric inference. Non-parametric methods need fewer assumptions, for example not requiring a specific distribution as a model. Prior information for parametric methods includes a model and some values for its parameters, whereas merely the value of an order statistic is often sufficient in a non-parametric test. Exact probability theory based on samples from Normal distributions can be used for parametric methods, whereas approximate methods are more common for non-parametric methods. Computing of critical value tables for non-parametric tests is often very complex compared with that required for parametric tests, although some good Normal approximations exist for moderate-sized samples in some standard non-parametric tests. If both types of test are possible for a set of data (for example a two-sample test), the parametric one is more powerful (provided the underlying modelling assumptions are satisfied) but the non-parametric one may be more robust (in case the assumptions are not). Ranked (non-numerical) data need the non-parametric approach.