3
JOURNAL OF RESEARCH of the National Bureau of Standards - B. Mathematical Sciences Vol. 72B, No.1, January- March 1968 The Distribution of the Sample Correlation Coefficient With One Variable Fixed David Hogben Institute for Basic Standards, National Bureau of Standards, Washington, D.C. 20234 (November 30, 1967) For the usual straight·line model, in which the ind epen dent variable takes on a fixed, kno wn set of va lues, it is sho wn that the samp le correlation coe ffi cie nt is di st ribut ed as Q with (n-2) degr ees of fr ee dom and noncentrality O =({3 / CT) - X)2. Th e Q variate has b ee n de fin ed and studied else· wh ere by Hogben et a l. It is noted that the squar e of the cor rela ti on coe ffi cient is dis tribut ed as a nonce ntral beta variable. Key Words: Analysis of variance, calibr a li on, co rr ela ti on coe ffi cie nt , degr ees of freedom, di s- tribution, fix ed va ri a bl e , non ce ntral beta va ri a bl e, nonce ntr alit y, Q variate. 1. Introduction Consider the straight-line model Y; = Cl' + f3 Xi + Ei , i=l, 2, . . . , n where (1) (i) th e Ei are assumed to behave as norm all y and independently di st ribut ed random variabl es with mea n zero and common variance (J 2, ( ii ) Cl' a nd f3 are unknown par ameter s, and (iii) low ercase itali c letters de not e fix ed, known cons tant s and upp er case italic le tt ers denote random variables, i. e., x is fix ed and Y is random. Thi s and oth er s trai ght-line models are dis- cu sse d in detail by Acton [1959]. The s ample co rrelation coe ffi cie nt " xY is de fin ed by n L (Xi-X) (Yi-Y) i= 1 (2) 1/ _ II where x= L xd n and Y= L Yi/n. i- I i- I It is of some inter es t in the calibration problem wh ere a fitted straight·line is used in rever se for es timatin g a n unkn own Xo co rr es pondin g to an observed Yo. The distribution of "X y where X and Y follow the bivariat e normal is well known; see for example Kendall and Stuart [1961 , pp . 383- 390]. In the prese nt paper the di s tribution of "x Y as defined by (2) is derived for x fix ed. Thi s di st ribution 33

The distribution of the sample correlation coefficient ... · The Distribution of the Sample Correlation Coefficient ... noncentral beta ... central moments and an approximation to

Embed Size (px)

Citation preview

Page 1: The distribution of the sample correlation coefficient ... · The Distribution of the Sample Correlation Coefficient ... noncentral beta ... central moments and an approximation to

JOURNAL OF RESEARCH of the National Bureau of Standards - B. Mathematical Sciences

Vol. 72B, No.1, January- March 1968

The Distribution of the Sample Correlation Coefficient With One Variable Fixed

David Hogben

Institute for Basic Standards, National Bureau of Standards, Washington, D.C. 20234

(November 30, 1967)

For the usual st raight·line model, in whic h the independent variable ta kes on a fixed, kno wn se t of va lues, it is shown that the sample corre lation coeffi cie nt is distributed as Q with (n-2) degrees

of freedom and noncentralit y O=({3 /CT) V~(Xi - X)2. The Q variate has been defin ed and studied e lse· where by Hogben et al. It is noted that the square of the correlation coeffi cient is di stributed as a noncentral beta variab le.

Key Words: Analysis of variance, calibralion, correlation coeffi cie nt, degrees of freedom, di s­tribution , fixed variable , noncentral beta variable, noncentrality, Q variate.

1. Introduction

Consider the s traight-line model

Y; = Cl' + f3Xi + Ei , i = l , 2, . . . , n where

(1)

(i) the Ei are assumed to be have as norm all y and indepe nde ntly di stributed random variables with mean zero and common variance (J2,

(ii) Cl' a nd f3 are unknown para meters, and (iii) lowercase italic le tters denote fixed , known constants and uppercase italic le tters de note

random variables, i. e., x is fixed and Y is random. This and other straight-line models are dis­cussed in de tail by Acton [1959].

The sample correlation coeffi cient " xY is defined by

n

L (Xi-X) (Yi-Y) i= 1

(2)

1/ _ II

where x= L xdn and Y= L Yi/n. i - I i - I

It is of so me interest in the calibration problem where a fitted straight·line is used in reverse for es timating a n unknown Xo corres ponding to an observed Yo. The distribution of "X y where X and Y follow the bivariate norm al is well known; see for example Kendall and Stuart [1961 , pp. 383- 390]. In the present paper the di stribution of "x Y as defined by (2) is derived for x fixed. This distribution

33

Page 2: The distribution of the sample correlation coefficient ... · The Distribution of the Sample Correlation Coefficient ... noncentral beta ... central moments and an approximation to

is well known for the special case with all Y; identically distributed (i.e., (3 = 0), in which case it is the same as the distribution of TXY for X, Y independent and normal. See, e.g., Hotelling [1953, p. 196] . J. N. K. Rao and an unidentified person have pointed out that the distribution of G-Y can be obtained as a special case of the conditional distribution of the multiple correlation coefficient for the multi·variate normal; see, e.g., C. R. Rao [1965, p. 509].

2. Derivation

In an analysis of variance for the model (1) the (corrected) total sum of squares with (n -1) degrees of freedom may be partitioned into two independent components; the first being the sum of squares due to the slope with 1 degree of freedom and the second being the residual sum of squares with (n - 2) degrees of freedom. This partition can be expressed by

(3)

where

and " n _ / /I {3=L (x;-x)(Y;-Y) L (X; _ x)2

i - I i - I

is the usual least squares estimator for {3. Let the random variables Wand X2 be defined by

L (x;-x)(Y;-Y) W=--;====-

O'~L (X;_X)2 ' (4)

and X2 = L (Y; - f;)2/O'2. (5)

Using (4) and (5) and dividing both sides of eq (3) by 0'2 we have

n

L (Y; - YF/O'2= W2+X2. (6) i= t

If both the numerator and denominator of Tx l' are divided by 0'2 and the first factor of the denominator is combined with the numerator, the correlation coefficient may be written as

W r xY = -VI-W=2:=+====X=2' (7)

Since the Yi are normally distributed, it is easily shown that W is normally distributed with mean

()= (f3/O')V"L(x;-x)2 and variance 1. Further, it is well known from the theory of the general linear hypothesis that under model (1) Wand X2 are independently distributed and X2 is distributed as chi·squared with (n - 2) degrees of freedom. Therefore, TxY is equal to the random variable Q defined and studied in Hogben et aI. , [1964a] and [1964b]. Hence, the following theorem is proved.

THEOREM: The correlation coefficient r xY, defined by (2) under model (1), is distributed as Q with

(n-2) degrees of freedom and noncentrality ()= ({3/O')V"L(xi-xF.

34

Page 3: The distribution of the sample correlation coefficient ... · The Distribution of the Sample Correlation Coefficient ... noncentral beta ... central moments and an approximation to

Various properties of Q are given in the previous two refere nces, including analytic expres­sions and recurre nce relations for the mome nts about zero, numeri cal values for the first four central mome nts and an approximation to the distribution of Q by that of a linearly transformed be ta variable_ It follows from (7) that rh is distributed as noncentral beta; see for example Seber

[1963], where in his notation nl=l , n2= n-2 and A=82/2_ Furthermore, t=Y(n-2)r2/ (l-r) is di stributed as noncentral t with noncentrality 8 and (n - 2) degrees of freedom _ The distribution of r x· y also follows from the interesting and easily derived relation

" - - c;;;==:::::f3== rxY- "

Yf32+ (n-2)s~ f3

Thanks go to Joan Rosenblatt for pointing out the looseness of an earlier proof of the theorem and to her and Edwin L. Crow for cons tructive suggestions.

3. References

Acton, F. (1959), Analysis of Straight-Line Data (John Wiley & Sons, Inc., New York, N.Y.). Hogben; D. , Pinkham, R. S. and Wilk , M. B. (1964a), The moments of a variate related to the non-central t. Ann. Math.

Statist., 35, 298- 314. Hogben, D., Pinkham, R. S. and Wilk, M. B. (1964b), An approximation to the di stribution of Q (a variate related to the non-

central t). Ann. Math. Statist. 35, 315- 318. Hotelling, H. (1953), New light on the correlation coefficient and its transforms. J. Roy. Statist. Soc. Ser. B, 15, 193- 225. Kendall , M. G. and Stuart, A. (1961), The Advanced Theory of Stati stics, L (Hafne r Publishing Co., New York). Rao, C. R. (1965), Linear Stati stical Inference and Its Application (John Wiley & Sons, Inc., New York , N.Y.) . Seber, G. A. F . (1963), The non-centra l chi·squared and beta distributions. Biometrika 50, 542- 544.

(Paper 72Bl- 257)

35