Upload
others
View
26
Download
0
Embed Size (px)
Citation preview
NATIONAL CENTER Series 2
For HEALTH STATISTICS Number 20
PROPERWOF THIS PUBLICATIONSf31!AN(JI EDITORIAL LMWRY
VITAL and HEALTH STATISTICS
DATA EVALUATION AND METHODS RESEARCH
Variance and Covariance of
life Table Functions Estimated
From a Sample of Deaths
Formulas for the variance and covariance of functions of abridged and complete life tables based on a sample
of deaths.
Washington, D.C. March 1967
U.S. DEPARTMENT OF
HEALTH, EDUCATION, AND WELFARE Public Health Service
John W. Gardner William H. Stewart
Secretary Surgeon General
Public Health Service Publication No. 1000-Series 2-No. 20
For sale by tbe Superintendent of Documents, U.S. Government Printing Office Washington, D. C., 20402- Price 15 cents
--NATIONAL CENTER FOR HEALTH ST - ‘---”A1lSIICS
FORREST E. LINDER, PH.D., Director
THEODORE D. WOOLSEY, Deputy Director
OSWALD K. SAGEN, PH.~, Assi.skzfit Director
WALT R.SIMMONS,M.A.,Statistical Advisor
ALICE M.WATERHOUSE, M. D.,Medical Consultant
JAMES E. KELLY, D,D.S.,Dental Advisor
LOUIS R.STOLCIS,M.A.,Executive O//icer
DONALD GREEN, in/ormation O//icer
OFFICE OF HEALTH STATISTICSANALYSIS
IWAO M. MORIYAMA, PH.D., Cbie/
DIVISION OF VITAL STATISTICS
ROBERT D. GROVE, PH.D., Chic/
DIVISION OF HEALTH INTERVIEW STATISTICS
PHILIP S.LAWRENCE, Sc.D.,Chic/
DIVISION OF HEALTH RECORDS STATISTICS
MONROE G. SIRKEN, PH.D., Chic/
DIVISION OF HEALTH EXAMINATION STATISTICS
ARTHUR J.MCDOWELL, Cbie/
DIVISION OF DATA PROCESSING
LEONARD D. MCGANN, Cbie/
Public Health Service Publication No. 1000-Series 2-No. 20
Library of Congress Catalog Card Number 66-62207
FOREWORDAnnually the National Center for Health Statistics prepares two
sets of abridged life tables for the United States. In the regular abridged life tables the age-specific mortality rates are based on tabulations of all deaths occurring during the calendar year distributed by age, color, and sex. In the sample life tables the age-specific mortality rates are based on a 10-percent sample of deaths; these rates are available several months prior to the rates based on a complete count of deaths.
Formulas for the variance of the functions of life tables based on a complete count of deaths were derived several years ago. These formulas have appeared in several publications prepared by Dr. C. L. Chiang. However, formulas for the variance of the life table functions were not available for assessing the reliability of life tables based on a sample of deaths. Accordingly the National Center for Health Statistics invited Dr. Chiang to generalize his earlier work and make the results applicable to life tables based on a sample of deaths. This report presents his results and includes the formulas for variance and covariance of the life tables functions based on a sample of deaths.
As the Center’s project director for this study, I identified the problem and worked with Dr. Chiang in the early stages of developing the stochastic model and in reviewing this manuscript for publication.
Monroe G. Sirken, Ph. D., Chief Division of Health Records Statistics
-------------------------------------------------------------
------------------------------------------------------
------------------
---------------------------
-----------------------------------------------------------
CONTENTS
Page
Foreword i
I. Introduction 1
II. A Life Table Based on the Complete Count of Deaths 1
III. A Life Table Based on aSample of Deaths 3
References 8
IN THIS REPOR T fovmulas for the uayiance and covariance of functions of abridged and complete life tables based on a sample of deaths are derived. It is well known that life table functions, as any statistical quantities, are random va?’iables. When a life table is constricted on the basis of a sample of deaths vathey than on the total count, theye is a component of sampling vayiation associated with the obseyved values. This component of vaviation must be assessed in making statistical infevence YegaYding .swvival expedience of a population as determined in such a table. In the foymulas foy the vayiance and covayiance ofestimates of the probability of dy~kg (q, 1, the survival ?’ate (POi ), and the expectation of life ( e Jpresented in this paper, both the random vaYiation and the sampling vayiation aye taken into account.
VARIANCE AND COVARIANCEOF LIFE TABLE FUNCTIONS ESTIMATED
FROM A SAMPLE OF DEATHS
C. I.I. Chiang, Ph. D., School of Public Health, University of California, BeYkeley
1. INTRODUCTION
The Federal Government has published annual abridged life tables for the United States since 1945.1 These tables are based on age-specific mortality rates computed from the total number of deaths occurring during the calendar year and fromO the estimate of the midyear population. In 1958” the Federal vital statistics agency established another series of annual abridged life tables based on a 10-percent sample of deaths with the objective of publishing annual life tables on a more current schedule.
Using a 10-percent sample rather than the total number of deaths makes very little difference in the numerical values of a life table, but it does increase the amount of variation associated with these values. This is because the life table functions as determined by a sample are subject to sampling variation and to the random variation present when the total count is used. The main functions of general interest are the probability of dying (qi), the survival rate (pO) , and the expectation of life (e]). The purpose of this paper is to derive formulas for the variances of the estimates of these functions, taking into account both random variation and sampling variation. h section II we shall reproduce the corresponding formulas for these functions for the case where only random variation is considered.
Il. A LIFE TABLE BASED ON THE
COMPLETE COUNT OF DEATHS
In earlier studies of life table and mortality rates 3-5 probability distributions of life table functions and formulas variances were derived. are reproduced below original publications can
In an abridged life vided into w+l intervals,
for the corresponding Some of these formulas
for easy reference; the be consulted for details. table the life span is dieach of which is defined
by two exact ages (Xi, x i+l) , for i= O,l,...,w, except for the last interval which is usually a half-open interval such as ‘’95 and over. ” The age XOmay be taken as O, the age at birth, and XW as the age at the beginning of the last interval. For the interval (Xi, Xi+l) let ni =Xi+l –Xi be the length of the interval; when ~i=) for all i, we have the complete life table. Let Di be the number dying iu the age interval (xi, x I+~) during the calendar year and Pi be the corresponding midyear population, so that the sum
DO+ D1+. ..+ DW=D (1)
is the total number of deaths during the year and
<+ PI+ . ..+ PW=P (2)
1
is the total midyear population. The ratio random variable in Ni “trials” with the binomial probability ~~. Accordingly, Di has the expectation
(3)
E(DJ = Ni q, (8)
is the age-specific mortality rate. Let ivi be the number of individuals alive at and the variance
exact age xi among whom the Di deaths occur so ‘D 2 ~ =Niqi(l–qj ). (9)that
Therefore, the sample variance of ~i is (4)
(10)
is an estimate of theprobability thatanindividualalive at exact age xi will die in the interval Substituting (7) in (10) yields the required for(xi, xi+ ~) . The number P/i is not directly observed mulabut is estimated from Di and Pi . A meaningful niMi(l–aini Mi)
concept of the relation between Ni and Pi from S( = ~[l+(l–ai)ni MiJ3 “ (11)
the standpoint of the life table is that ~ is anestimate of the total number of years lived in
The covariance between ~i and ~j for two agethe interval (xi, xi +1) by Ni individuals. Let ai be
intervals is equal to zero as shown in referencethe average fraction of the interval (Xi, x 1+ J livedby each of the Di individuals. 6 It can be seen that
3. Formula (11) is very important in the statistical analysis of a life table since the formulas for the variances of other life table functions can all be expressed in terms of the variance of
~=ni(Ni– Di) + aini Di . (5) ~i. Two important examples follow.
1. In a current life table, the proportion of
number of years lived by the iVi-Di survivors, and the second term is equal tr the sum of the
A la
fractions of the interval lived by the Di individ- ‘Oa = ~ (12)
uals who die during the interval. Equation (5) can be rewritten as is actually computed from
Boa=&& . . .&_l Ni =~[~+(l-ai)niDi]. (6)
The first term on the right side of (5) is the survivors from age O to age X&
= (l-Q(I-$J... ($J$J , (13)
By substituting (6) in (4) we establish a basicrelationship between the age-specific death rate where~i = I-$i is an estimate of the probability
(Mi ) and the corresponding estimated probability of surviving the interval (xi, x i + ~). The formula
of dying (81) for the sample variance of & is
niMi (7) s; =x ::’jji-z~z a. (14)61 =
l+(l-ai)rziMi “ off a 1=0 qi
This formula has been used for the construction 2. The observed expectation of life at age Xa
of life tables7 and has appeared in references may be expressed as
4 and 5. ,$=an+c A To derive the formula for the variance of CY IXCY cr+l~a, cz’+1
A A~~, we only need to observe that D, is a binomial +C CY+7. pa, a+2 +-.”+ cW*~, w* (15)
2
where
c,= (1-a l_~)ni-~ + alni. (16)
The sample variance of & is given by
The derivation of formulas (14) and (17) is given in references 3 and 4 and will not be repeated here.
Ill. A LIFE TABLE BASED ON A SAMPLE OF DEATHS
In the construction of the new series of life tables, the actual number (Di) of deaths is not observed directly but is estimated by a sampling procedure. The estimated value of Di is then used as the basis for the computation of the mortality rate and the life table functions. We have then a sample of size d taken without replacement from the total of D death certificates such that
d —. (18)Df
is a preassigned sampling fraction. In the new series described here, f=. 10. Depending upon the distribution by age at death, a number di of deaths in the sample will fall into the age interval (xi, x,+ ~) with
dO+dl+. ..+dw=d. (19)
The number D, is estimated from
(20)
and the age-specific mortality rate from A
diMl=~=—
i fPi “ (21)
It is clear that di (or fii ) and hence Mi are subject to both random variation and sampling variation. As a result, the formula for the variance of $i as presented in section II must be revised and the covariance between $i and $j will need to be evaluated. Our first step is to derive the formulas for the variance of di and the covariance between di and dj.
The vaviance Of di .—The probability distribution of di depends upon the total number of deaths Di in the age interval (xi ,xi+l) and the total number of deaths D at all ages. Relative to Di and D, the variance of di may be written as
2 2 + EG2
‘di = = E(dil Di, D) di/Di, D “ (22)
The first term on the right side of (22) is the variance of the conditional expectation of di given Di and D, and the second term is the expectation of the variance of the conditional distribution of di given Di and D. We shall discuss them separately.
Since the sample is taken without replacement, the conditional distribution of di given Di and D is hypergeometric with the expectation
E(dijDi, D)=d ~= fDi . (23)
Using formulas (23) and (9), we compute the variance of E(di lDi ,D),
2 = f2U~ = f2Ni~i (l–~i)- (24)‘E(di}Di, D) 1
For the last term in equation (22), we make use of the well-known theorem in the hypergeometric distribution to write
U2 = d~(l -;) (~) dil Di, D
. d#(l– #) (~)
since the number D is usually large. write (25)
2 u
D=;(1 -;)(Di - ~)
diiDi,
D? = f(l–f)(Di - &)
and its expectation
+,9 o = f(l-f)[E(Di) - E(A)]. i i’
(25)
We now re
(26)
(27)
The first expectation inside the brackets was given in (8), and the second expectation may be rewritten as
E(+rl; )=E[$E(D; ]D)]. (28)
Our problem is to find the conditional expectation E(D;/D).
3
To avoid confusion in notation, let us consider the particular case in which i= O and write the conditional expectation
(29)
where 60 is the value that the random variable DO takes on and i%{~o = bob} is the corresponding conditional probability. Using Bayes’ law,
Pr{B/A} = Pr{B}-Pr{A[B}
(30)Pr&l}
we may rewrite the conditional probability as
Pr{Do=$]. Pr{D1+. . .+ Dw=D-f30\ Pr{Do= $[D} =
Pr{Do+D1+. $ ‘+ DW=D}
(31)
where the sum in the numerator is taken over all possible values of 3 so that
61+. ..+6W=’D-60 (32)
and in the denominator
The probability distribution of each Di is binomial with the probability distribution
Ni ! 6. Ni-6i pr{Di= ~i}= 6,1(Ni_6i)! ‘i ‘(l-qi) - (34)
t
When (34) is substituted, formula (31) becomes very unwieldy. Ordinarily, however, the probability qi is small and Ni is large, so that the Poisson distribution should be a good approximation; this wiIl give us
–hi ~:i Pr{Di=6i}= e ~,, (35)
1“
where, for simplicity,
Xi =Ni qi = E(Di) (36)
is the expected number of deaths in the age interval (Xi,xi+ ~) as given in (8). Now formula (35) is substituted in the last expression of (31) to give the sum in the numerator
–(xl+. . .+AW) 1 . e
(D- 130)! ‘Al+” “ “ ‘Aw)D-60 (37)
the denominator
(38)
–(AO+A1+. ..+AW) . e J (AO+A1+.D~ ..+ AW)D,
and, finally, after simplification, the conditional probability
Formula (39) shows that the conditional distribution of Do given D is binomial with the proportion of the expected number of deaths in the age interval (XO,Xl)
(40)
as the binomial probability. It follows that
2 0 (41)
2“
; Nk qk ‘D2 (::90) )
In general we have
(l.E(D,%))=@wNi~f—‘i 9i )
~~oNkqk ~~0 Nkqk
II (42)2“
~ Nkqk‘D’‘“)2)(k:Using (42) and the relation
E(D) = E & Dk = & Nkqk (43) ()
we compute the expectation in (28)
+ Em, :;;’;k,, (44)
k
Ni qi Ni qi ~+ (Niqi)2 =— (1-—
~Nk Qk ~Nk ‘?k ~’
where the summation in each denominator is taken from k= Oto k= W. The expectation of the variance in (27) thus becomes
Ni qi N, qi
‘“~llDi,D = f(l-f) Njqi – — (1 - —)
/ ~Nk qk ~Nk qk
_ (Nic7i)* (45)
‘Nk~k 1 I ivi qi
= f(l-f).iygi (1 - —) (1 -—). ‘Nk qk ~Nk qk
Substituting (24 ) and (45) in (22) gives the required formula for the variance of di,
‘= fzNiqi(l-qi) (46)‘di
1 Ni qi + f(l-f)Niqi (l- —) (1 - —).
‘Nk qk ‘“k qk
It may be noted that since
DO+ D1+. ..+ DW=D (47)
and
Noqo Nl% + . . . + ivwqw , , (48)—=
ZNk qk + ZNk qk ~Nk qk
the conditional joint distribution of DO,. .-. DW given D is mtdtinomial with the expectation of the product Di Dj
E(Di DjID)=D(D–l) (Z) (2%-)“’) The covayiance between d, am! dj .—-.Again
we write the covariance between di and dj relative . to D i, Dj, and D as follows:
‘didj - ‘E(di\Di, D), E(dltDj, N + %di,djlDi, Dj, D. (50)
We have the conditional expectation as given in
(23)9
E(di iDi , D) = fDi , (23)
and hence
‘E(dii Di, D), E(djl Dj, D)= ‘f Di,fDj = ‘2uOi, Dj . (51)
Since the numbers of deaths at different ages are independently distributed, o Di Dj = 0, the covariance of the conditional expectations vanishes,
. 0 (52) ‘E(dii Di, Dj,D), E(djiDi, Dj,LI)
and the covariance between di and cfj is reduced to
= ‘di dj
Eu d,, dj!Dl, Dj,D -
(53)
The joint distribution of dO,dl,..., dw givenDO,l&@W and D is the generalized hypergeometric, and the covariance between di and dj is
‘di, dj/Di, Dj, D= – d 33
~ (l– +)
D (54)
= –f(l–f) *.
Therefore (53) becomes
‘di, dj =- f(l-f)E *
(55)()
= -f(l-f)E[~ E(Di DjlD)].
5
Introducing (49) in (55) and using (43) we have the formula for the covariance between di and dj ,
‘d,, dj= “(1-’)M=krix:q”” ‘“ The variance and covariance of ~i .—III this
case the estimated probability oi is computed from
fii di $,”~=— (57)
i ‘ivi
where Ni as before is defined as the number of individuals alive at x, among whom Di deaths occur in the age interval f xl, xl +, ). Since both f and Ni are constant, the variance of ~i can be obtained from
Substituting (46) in” (58) we have the variance
a~i=+qi(l-qi) +(+ –l)+qj x i
(59) (1- —)(1-—
‘i 9i )1 . ~N@k ~Nk qk
Formula (59) shows that the variance of ~i consists of two components. The first term on the right side of (59) is the component of random variation, and the second term is the component of sampling variation. The second component de-creases as the sampling fraction f increases. When the total count of deaths is used so that f= 1, the second term vanishes and (59) is reduced to a form of which formula (10) is an estimate.
The covariance of ~i and ~j can also be de-rived by using the covariance of di and d] . From (57) we can write
1 .— (60)‘ti , tj f 2NiNj ‘did] “
In view of (56),
%*,$ = -(Ll)(l--+) —.9i 9j
(61)f k =f qk
Thus the covariance between ~i and $i is entirely due to sampling variation and vanishes when f=l.
The variance of ~aand &)a .—The formulas for the variance of the proportion of survivors and the observed expectation of life are derived by using an approximate method (see p. 232 of reference 8). For the proportion of survivors
(13)
we take the derivatives
afj P 652)
af)i ““195P”= ?
and write
~since U!= at i and U; i,j=u$, tj” Substituting (62)
in (63) gives
(64)
where the variance and covariance of #l and $1 are given in (59) and (61), respectively.
To derive the formula for the variance of I$a, the observed expectation of life at age x ~,
we recall expression (15)
$a -aana + Ca+l 2a, a+l
+ ‘a+ 2 $ CY,U+2 + . ..+cw@a. w (15)
and write where the variance and covariance of $ and $j are given by (59) and (61), respectively.
%-Z{* ‘“)2 “i, Estimates of the variances of the life table func-tions derived in the preceding sections can be
w-1 w-l +~~
l-a pa { & ~a }{
+%Pj
u~q”} ‘i
made by substituting observed values for the corresponding unknown quantities in the respec-
i+j tive formulas, Thus we use
(65) Sample variances of the life table~nctions. —
It is easy to see that for each k,
(20)
A(?4 #ak - ~CKiPi+l, k form<i <k as an estimate of the expected number of deaths
(66) N, q, and compute the age-specific death rate from
* &~ =-0, for i>k B M,=+ I (21)
I and hence from (15) we have
N, from (S7) a $ w
ckPai Pi+lkafjl c7- k:l+l
w and the estimated probability of dying from =1P*,
[ Ci+l + k~=i+2Ckpi+l, k1 . (67) $,= n, Ml
I+(l–a, ) niMi (7)
Since ci+l-(l-ai )ni+ai+lni+l, When these substitutions are made in (59) and (61), we have the sample variance of ~i
as defined in (16), (67) may be rewritten as A
s;, = + $2, (l–o, )+(+–l)$a; (1 -:)(1- +)-pai[(l -~i)ni+ai+lq+l D, I
(70) +~
k=i+2ckpi+1, k1 and the covariance between $, and ~,
=Pal[(l- a,)ni+ci+~l. (68)
substitute (68) in (65) to obtain the required formula for the variance of ~a , It is interesting to note that although it was
necessary to introduce the quantity N, in the derivation of the formulas for the variance and co=;
m‘~~~P~l[(l-al)ni+ei+1]2u~
qi variance, the final expressions in (70) and (71) hold whatever may be the relation between d~ and M,. The variances of #O&and $a can now be esti
+ ~~~}–~pai ‘aj[(l-ai) ‘i+ei+ll mated from =
i+j a-l a—l a-l
A’ z[(l-aj) ‘j ‘ej+ ~l”$ $ s; =Poa ,:0 + s;,+ x A .5$$
1’ J m [ 1 1=0 i=o qy I’j 1
0’ =0,1 ,. ... (69) I+j
(72)
7
and ( 73)
A2s:= ‘~’ pai [(l–ai)ni + $i+l ]2s~ ‘a i=~ i
w-1 w—l
+ x ,Z &i&j[(l-ai) rri+2i+1 ]i=ici+j=a
Since every quantity in equations (70) through (73) can be determined from observed data, the sample variances of the life table functions can all be computed.
REFERENCES
1National Center for Health Statistics: Guide to United
States Life Tables, 1900-1959, by hi. G. Sirkin, L. L. Bach
rach, and G. A. Carlson. PHS Pub. No. 1086, Public Health Bibliographic Series No. 42. Public Health Service. Washing-
ton. U.S. Government Printing Office, 1963.
2 National Center for Health Stati sties: Comparison of two
methods of constructing abridged life tables by referenco to
a “standard” table. Vita? and Health Statistics. PHS Pub. No. 1000-Series 2-No. 4. Public Health Service. Washington.
U.S. Government Printing Office, Mar. 1966. 3
Chiang, C. L.: A stochastic study of the life table and
its applications, I, probability distributions of the b]ometric
functions. Biometrics 16(4):618-635, Dec. 1960.
4 Chiang, C. L.: A stochastic study of the life table and
its applications, II, sample variance of the observed expec
tation of life and other biometric functions. Human Biology
32(3):221-238, Sept. 1960.
5National Vital Statistics Division: Standard error of the
age-adjusted death rate, by C. L. Chiang. Vital Statistics-
Special Reports, Vol. 47, No. 9. Public Health Service. Washington. U.S. Government Printing Office, Aug. 1961.
6 Chiang, C. L., Norris, F. D., Olson, F. E., Shipley, P. W.,
Supplee, H. E.: Determination of the Fraction of Last Yenr
of Life and Its Variation by Age, Race, and Sex (unpublished manuscript).
7Norris”, F. D., and Riese, D. C.: Abridged Life Tables,
California, 1959-61. California State Department of Public
Health, Bureau o“fVital Statistics and Data Processing, Sept. 1963.
8Kendall, hi. G., and Stuart, A.: The Advanced Tkeory of
Statistics, Vol. I. London. Charles Griffin and Co., Ltd.,
1963.
000
I wish to express my appreciation to Miss Helen E. Supplee for a critical reading of the paper and valuable suggestions.
U. S. GOVERNMENT PRINTING OFFICE : 1967 0 - z4fi-Of{h
OUTLINE OF REPORT SERIES FOR VITAL AND HEALTH STATISTICS
Public Health Service Publication No. 1000
! Series 1. Programs and collection procedures.— Reports which describe the general programs of the National Center for Health Statistics and its offices and divisions, data collection methods used, definitions, and other material necessary for understanding the data.
I Sevies 2. Data evaluation and methods resea?’ch. —Studies of new statistical methodology including: experimental tests of new survey methods, studies of vital statistics collection methods, new analytical techniques, objective evaluations of reliability of collected data, contributions to statistical theory.
Series 3. Analytical studies. — Reports presenting analytical or interpretive studies based on vital and health statistics, carrying the analysis further than the expository types of reparts in the other series.
I
Series 4, Documents and committee reports. — Final reports of major committees concerned with vital and health statistics, and documents such as recommended model vital registration laws and revised birth and death certificates.
Series 10. Data jkom the Health Interview SWvey. —Statistics on illness, accidental injuries, disability, use of hospital, medical, dental, and other services, and other health-related topics, based on data collected in a continuing national household interview survey.
I Series 11. Data from the Health Examination Survey. —Data from direct examination, testing, and measurement of national samples of the population provide the basis for two types of reports: (1) estimates of the medically defined prevalence of specific diseases in the United States and the distributions of the population with respect to physical, physiological, and psychological characteristics; and (2) analysis of relationships among the various measurements without reference to an explicit finite universe of persons.
‘1 Series 12. Data from the Institutional Population Surveys. —Statistics relating to the health characteristics of persons in institutions, and on medical, nursing, and personal care received, based on national samples of establishments providing these services and samples of the residents or patients.
Series 13. Data from the Hospital Discharge Survey. —Statistics relating to discharged patients in short-stay hospitals, based on a sample of patient records in a national sample of hospitals.
Series 20, Data on mo?’tality,-Various statistics on mortality other than as included in annual or monthly reports —spec ial analyses by cause of death, age, and other demographic variables, also geographic and time series analyses.
Series 21. Data on Mtality, marriage, and divorce. — Various statistics on natality, marriage, and divorce other than as included in annual or monthly reports— special analyses by demographic variables, also geographic and time series analyses, studies of fertility.
Series 22. Data from the National Natality and Mortality .%.wveys. —Statistics on characteristics of births and deaths not available from the vital records, based on sample surveys stemming from these records, including such topics as mortality by socioeconomic class, medical experience in the last year of life, characteristics of pregnancy, etc.
For a list of titles of reports published in these series, write to: Office of Information and Publications National Center for Health Statistics U.S. Public Health Service Washington, D.C. 20201