8

Click here to load reader

On the sampling distribution of improved estimators for coefficients in linear regression

Embed Size (px)

Citation preview

Page 1: On the sampling distribution of improved estimators for coefficients in linear regression

Journal of Econometrics 2 (1974) 143-150. 0 North-Holland Publishing Company

ON THE SAMPLING DISTRIBUTION OF IMPROVED ESTIMATORS FOR COEFFICJENTS IN LINEAR REGRESSION

Aman ULLAH*

Southern Methodist University, Dallas, Texas 75222, U.S.A.

Received June 1973, revised version received December 1973

1. Introduction

It is well known that the least squares method provides best linear unbiased estimates of the parameters of regression model under the assumptions of Gauss-Markov theorem. However, if we get out of the class of linear functions and relax the condition of unbiasedness, it is possible to obtain an estimator (which is a nonlinear function of observations on the dependent variable and is in fact biased) which has a smaller mean squared error. In fact, such an estimator has been given by Stein (1956), and more explicitly by James and Stein (1961) while considering the problem of estimating the mean vector of a multivariate normal distribution. It has been shown by James and Stein that their estimator is better than the usual one in the sense that sum of its component-wise mean squared errors is smaller than that of the other, for all parameter values provided at least three parameters are to be estimated. However, it should be noted that the sampling distribution of the improved James and Stein estimator is not known.

In this paper, we analyze the form of sampling distribution of improved estimator by considering its exact moments and their approximations. It has been found that, up to certain order of approximation, the sampling distribution of the improved estimator fil for pi, i = 1, . . ., K, tends to be asymmetric with Kurtosis coefficient equal to three. The extent of departure from symmetry depends on the number of parameters K, the true magnitude of PI, and o (standard deviation of error in equation). If Pr is positive, the distribution will tend to be negatively skewed and positively skewed if /Ii is negative. Further, for fir close to zero, the distribution, up to order l/T, where T is number of observation, will be normal.

*The author is thankful to Professor A.L. Nagar for his help in preparing this paper. He is also grateful to the referees for their valuable comments and suggestions.

Page 2: On the sampling distribution of improved estimators for coefficients in linear regression

144 A. Ullah, Sampling distribution of improved estimators

2. Preliminaries

Let us consider the general linear regression model as

Y = xp+u, (1)

where y is T x 1 vector of observations on the variable to be explained; X is T x Knonstochastic matrix such that’

X’X = I. (2) Further, j? is Kx 1 coefficient vector and we assume the elements of TX 1 vector u to be independently and normally distributed with mean zero and constant variance oz.

The least squares (maximum likelihood) estimator for j3 in eq. (1) is

b = X’y. (3) And the improved estimator for p given by Stein (1956) and James and Stein (1961) is [the following form of the estimator is given in Sclove (1968)] :

, K>3,

where a(K- 2)

‘= T-K+2’ Oca62,

and v is the residual sum of squares given by

v = a’a = U’(I-XxxI)U, a = y-xb. (6)

We note that K components bI, . . ., b, of the vector b in eq. (3) are inde- pendently and normally distributed with the following four moments :

W,-A) = 0, &b&J2 = c?,

Q&-/U3 = 0, E(br-/3J4 = 3rr4, i= l,...,K. (7)

Since lim T&b,-PJ2 = lim To2 (8) T-rm T+O3

is assumed to exist, the order of E(bi-BJ’ = o2 will be l/T and likewise E(b,-j?J4 will be of order l/T’.’ Next, the distribution of

w b’b jib2 s-z-

u2 cr2 (9)

IAn example of such model having interest is regression on principal components, cf. Kendall (1957) and Massy (1965).

2To get the order of E(b( - /Q2 more explicitly, one might consider writing X’X = TZ. Then E(b,-/3J2 = a2/T, which is of order l/T. Throughout our analysis, we shall consider X’X = Z without loss of generality.

Page 3: On the sampling distribution of improved estimators for coefficients in linear regression

A. Ullah, Sampling distribution of improved estimators 14s

is ‘noncentral &i-square with K degrees of freedom and the parameter of non- centrality /YP/a’ which is of order Tin magnitude.

Finally, we observe from eq. (6) that v/o2 is ‘central chi-square’ with T-K degrees of freedom. Thus,

Ev = o’(T-K),

Ev2 = cr4(T-K)(T-K+2),

Ev3 = a6(T-K)(T-K+2)(T-K+4),

Ev4 = d’(T-K)(T- K+2)(T- K+4)(T- K+ 6).

The statistic v is distributed independently of statistics b and b’b.

(10)

3. Exact moments of Bi

The sampling error of the estimator fii (ith component of b) in eq. (4) is

@i-Pi) = (b,-BJ-C 3 $3 i = 1,. . ., K, (11)

where W = b’b/a2 has been defined in eq. (9). Then, using eqs. (7) and (10) and the fact that v and b are independently distributed, the first four moments of the estimator fl, are given by

J%%Br> = -c(T-K)Eb,W-‘, (12)

E(it-8J2 = a2 -2c(T_K)E(bf_pibJ W-l (13)

+c2(T-K)(T- K+2)Eb,2W-2,

wJ-BJ3 = - 3c(T- K)E(b: - 2/3*b,z + fi:bi) W-l (14)

+3c2(T-K)(T-K+2)E(b:-j?,b:)W-2

-c3(T-K)(T-K+2)(T-K+4)Eb:W-3,

E(/k-BJ4 = 3a4-4c(T-K)E(b~-3~,b:-t-3~:b:-jI~bJW-’ (15)

+~c~(T-K)(T-K+~)E(~‘:-~P,~:+/~:~:)W-~

-4c3(T-K)(T-K+2)(T-K+4)E(bf-jI,b;)W-3

+c4(T-K)(T-K+2)(T-K+4)(T-K+6)EbfW-4.

We shall illustrate evaluation of the expectations in eqs. (12) to (15) by considering

Ebi W-‘, r= 1,2,... . (16)

As noted in sect. 2, the elements b,, . . ., bR are independently and normally

Page 4: On the sampling distribution of improved estimators for coefficients in linear regression

146 A. Vllah, Sampling distribution of improved estimators

distributed with means PI, . . ., & and constant variances oz. Therefore, we can write3

Eb,W-’ = E(bi-B,)W-‘+PiEW-’ (17)

= c2 -f- EW-‘+/l,EW-‘. aPi

Further,4

Eb:W-’ = E(bi-pi)’ W-‘+2~iE(bi-pi)W-r+p~EW-’ (18)

= (~4~+~2)EW-r+i~2pi ~ EW-r+BizEW-r,

and

a +3/?&r’ ag, EW-‘-@EW-‘, (1%

EbtW-’ =

+4P;a2 -$ EW-‘+PfEW-‘. (20) i

Thus, it follows that to evaluate the expectations involved in (12) to (15), we require E W-’ (r = 1, 2, . . .) and its partial derivatives with respect to pi. We shall work out E W-’ below:

EW-’ = [W-y(W)dW, (21)

where

(22)

is the density function of noncentral chi-square statistic Wdefined in eq. (9), and

K

(23)

“Cf. Baranchik (n.d.) and also (1973, p. 314). .For details, see Ullah (1970).

Page 5: On the sampling distribution of improved estimators for coefficients in linear regression

A. WItah, Sampling distribution of improved estimators 147

which is of order Tin magnitude.5 Substituting eq. (22) in eq. (21) and noting that

$ ~$7+(K+Z”)-r-l e-fwdw = 2~(K+2n)-rr[(K+2n)/2_r],

provided K/2 > r, we obtain

(24)

Ew-, = 2-,W/2-r)e_o f (K/2-r), 0” W/2) n-0 W/2), n!

(25)

= 2-r UK/2-r) _ W/2)

e e1Fl(K/2-r; K/2; 6),

where ,F,( ) is a confluent hypergeometric function. ‘*’

Now using a result given in Slater (1960, p. 15, eq. 2.1.8), we have

&W-’ T(r+s) T(K/2-r) _ = 2-y-1)“_

T(r) T(K/2 + S) e e1Fl(K/2-r; K/2+s; O),

(26) for r = 1,2,3, . . ., and s = 1,2, 3, . . . .

The derivatives of E W-’ with respect to pi can then be obtained in a straight- forward manner, for example,

and so on.

(28)

Defining

f p.v = r(K’2+P) e-’ F (K/2+p* K/2+ v’ 0) T(K/2+v) 1 1 ’ ’ ’

v-p>o, (29

SFor the density of noncentral chi-square, see, for example, Rao (1965, p. 146). 6The confluent hypergeometric function is given by

m (* rFr(a; c;x) = _&).n! , [cf. Slater (1960, p. 2)]

where (a). = a(a+ 1). . .(a+n- l), (a)O = 1. Also, (a), = r(a+n)r/(a) for positive a. ‘James and Stein (1961), using conditional distribution approach, proved that EW-’ =

E(l/K-2+2x) where x is the Poisson distribution with mean 0 [as in eq. (23)]. But if we write

1 EK-2+2x

m e-08” -E r,--

1

x-0 x! (K-2+2$ ’

then it can be seen to be identical with our result in eq. (25) for r = 1. We have, however, expressed the result in eq. (25) in terms of confluent hypergeometric function in order to obtain required expectations in a straightforward way and also to obtain approximate results in sect. 4.

Page 6: On the sampling distribution of improved estimators for coefficients in linear regression

148 A. Ullah, Sampling distribution of improved estimators

we may now state the following results,s

@*-Pi) = -3cPi(T-K)f,91 9 (30)

E(Bi-Bi)2 = a2 +c(T-K)M3~2P12-022f0,11 (30

+~c2(T-K)(T-K+2)[f,,2B:+a2f-,,,l 9

E(i%-bi)3 = -Sc(T-K)~~(S,,3+f,,,-2f,,2) (32)

+Pia2(3fi,2-2h,I)l+~cC2(T-K)(T-K+2)

x ~P3(~,3~fo,2)+Di02(3fo,2~-f-i,l)l~~c3(T~K)

x (T-K+2)(T-K+4)(8i~~,3+3Pio2f-1,2) 9

E(~i-Bi)4 = 3a4 -2C(T-K)[P4(S3,4-352,3+3fi,~-fo,l) (33)

+ 3/+‘2(2f,,3 - 3fi ,2 +_&,I) + 3a4f, ,21

+5c2(T-K)(T-K+2)[Pf(f,,4+jb,2 -2fi ,3)

+P’cr2(6fi,3+f-l,1-6fo,2)+3a4fo,21

+3(7w(~-~+2)(~-~+4)~3f(fi,~-fo,~)

+3B?a2 x(?&3-f-I,2)+3~4f-1,21

f &&=- K)(T- K+ 2)(T- K-i- 4)

X(T-K+6)[P4fo,4+6~iZa2f-,,3+3~4f_2,21.

It can be noted from eq. (30) that the sign of bias is opposite in sign of /Ii. For given values of T, 0 and K, the bias can be computed exactly by contiguous functions for the confluent hypergeometric functions; for see Slater (1960).

4. Approximations to the exact moments of fit

It should be noted from eq. (5) that

K-2 c=a-

T 1SK-2+(K-2)2+

T -F- 1 ..* ’ O<a<2.

using the example,

(34)

aIt should be noted that for the result in eq. (30) we require EbrW-’ [see eq. (12)], which is equal to (/3,/2)&. Baranchik (1973), however, obtained it as EbrW-’ = (B,/~L~)E(~x/K-~+~x), where x has a Poisson distribution with mean 0 = jI’j?/2 (a’ = 1 in his case). But if we express his result (as in footnote 8) in a series and use the definition of confluent hypergeometric function (given in footnote 7). it can be seen identical with UU2U-o. 1.

Page 7: On the sampling distribution of improved estimators for coefficients in linear regression

A. Ullah, Sampling distribution of improved estimators 149

And for large values of 101 we have from eq. (29)g

where (a), = a(u+ 1). . .(a+n- I), (a)0 = 1. It should be noted from footnote 10 that series in eq. (35) terminates to a finite number of terms for even K, and series converges for T -_, co and hence 0 --f ccl0

Using eqs. (34) and (35), we then obtain, to order 1/T2,

(36) JwrBJ = _3E$+_E$_;],

E(j?,+3,)2 = 02+7 {4+o(K-2))$-40’1, (37)

and

where 6 and cr2 are of order T and l/T, respectively, as mentioned in sect. 2. Therefore, the variance of bi, to order l/T2 , is

V<BJ = E(~i-Pi)2-[E(j3,-Bi)12 (38)

= ,2+4w 8: 2 ---g .

[ 1 8 8

Further, we have the following results on the third and fourth moments of fil :

to order l/T’, and the central moments

gFor large values of 1x1 we have [Copson (1948, p. 265)]

T(c) e’ J$(a; c; x) N - ---;zFO(c-a, 1 --a; l/x) l-(u) .I+-

where O” (c-f&(1 -a).

2&(c-a, l-a; l/x) = z &# , n-o

(39)

(41)

which is only defined when the series terminates, that is when at least one of (c-u), (1 -a) is a non-positive integer [see, for example, Slater (1966, p. 431. Further, the series converges only at l/x = 0, i.e., x = co.

loThe author is thankful to one of the referees for bringing this point to his notice.

Page 8: On the sampling distribution of improved estimators for coefficients in linear regression

150 A. UNah, Sampling distribution of improved estimators

to order l/T2 and 1/T3, respectively. The Pearsonian coefficients of the skewness and kurtosis of the distribution

of pi are as follows:

3a(K-2)” pi ’ d/skewness (fli) = - 402

0 ; , (42)

and kurtosis (pi) = 3 (43)

up to order l/T. It should be noted that the contribution of the terms of order l/T, both in skewness and kurtosis of fli is in fact zero.

As should be expected, d/skewness@,) -+ 0 and kurtosis (Bi) -+ 3 as T + a.

The departure from symmetry depends on K and actual magnitude of ai. If pi > 0, the distribution of fii will tend to be negatively skewed in small samples and positively skewed if pi < 0. If the actual magnitude of Bi is close to zero, the distribution of /ii will be close to symmetry even in small samples.

The kurtosis coefficient of the distribution of fli is the same (up to the required order of approximation) as that of normal distribution. Thus, for pi close to zero, the distribution of Ji is normal.

References

Baranchik, A.J., n.d., Multiple regression and estimation of the mean of a multivariate normal distribution, Technical Report no. 51 (Department of Statistics, Stanford University, Stanford, Calif.).

Baranchik, A.J., 1973, Inadmissibility of maximum likelihood estimators in some multiple regression problems with three or more independent variables, The Annals of Statistics 1, 312-321.

Copson, E.T., 1948, An introduction to the theory of functions of a complex variable (Oxford University Press, London).

James, W. and C. Stein, 1961, Estimation with quadratic loss, Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1 (University of California Press, Berkeley) 361-379.

Kendall, M.C., 1957, A course in multivariate analysis (Griffin, London). Massey, W.F., 1965, Principal components regression in exploratory statistical research, Journal

of the American Statistical Association 60,234-256. Rao, C.R., 1965, Linear statistical inference and its applications (John Wiley, New York). Sclove, S.L., 1968, Improved estimators for coefficients in linear regression, Journal of the

American Statistical Association 63, 599-606. Slater, L.J., 1960, Confluent hypergeometric functions (Cambridge University Press,

Cambridge). Slater, L.J., 1966, Generalized hypergeometric functions (Cambridge University Press,

Cambridge). Stein, C.. 1956, Inadmissibility of the usual estimator for the mean of a multivariate normal

distribution, Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability. vol. 1 (University of California Press, Berkeley) 197-206.

Ullah, A., 19%. Statistical estimation of economic relations % the presence of errors in equa- tions and in variables, Ph.D. Dissertation (Economics Department, Delhi University, Delhi).