Upload
francine-eleanor-cain
View
216
Download
1
Embed Size (px)
Citation preview
NCHS July 11, 2006
A Semiparametric Approach to Forecasting US Mortality Age Patterns
Presenter: Rong Wei1
Coauthors: Guanhua Lu2, Benjamin Kedem2 and Paul D. Williams1
1National Center for Health Statistics (NCHS)2Math Dept. University of Maryland, College Park
NCHS July 11, 2006
Outline
Background Project tasks Model Introduction New Approach: Semiparametric model Mortality forecasting: US, small states Comparison with Lee-Carter Model Conclusion
NCHS July 11, 2006
Background NCHS publishes race-gender specific life
tables for each of 50 states plus DC decennially;
Out of 300+ tables, about 1/5 of tables could not be published due to small numbers of deaths in a short time period;
Mortality data have been well documented in NCHS for every year, state, race-gender population since 1968.
NCHS July 11, 2006
Mortality age patterns: data from US and large states
Mortality of white male in California - data from 1998
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
0 10 20 30 40 50 60 70 80 90
age
ln (q
)
Mortality of US male - 1998
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
0 20 40 60 80 100
age
ln (q
)
NCHS July 11, 2006
Mortality in small states: one year data vs. 30 years historical data
Mortality of black female in Iowa state - data from 1968 to 1998
-7
-6
-5
-4
-3
-2
-1
0
-10 10 30 50 70 90
Age in year
Ln
(q
)
Mortality of black female in Iowa state - data from 1998
-7
-6
-5
-4
-3
-2
-1
0
-10 10 30 50 70 90
Age in year
Ln
(q
)
NCHS July 11, 2006
The tasks To solve the insufficient data problem, data
from 30+ years are used to model the age-specific death pattern for small areas;
Select a time series model which gives better control for time effect and random error in multiple time series with short prediction;
Project mortality curves (one year ahead vs. many years prediction) in small areas with historical data and robust statistical methodology.
NCHS July 11, 2006
Introduction to mortality forecasting models:
US mortality forecasting model by Lee and Carter (1992):
Ln( mx,t ) = ax + bx kt + ex,t
kt = kt-1 + c + et
The LC model is based on principle components. It searches for the 1st PC in n dimensional time series data and solves for the age and time parameters by singular value decompositions.
The LC model explains 60 – 93% of total dimensional variance (Girosi and King). For some populations, the 1st PC may be insufficient to explain the variance in high-dimensional data.
NCHS July 11, 2006
New Approach: Semiparametric model
Semiparametric approach Short mortality time series used from
1968 to 1998 for consistency of data collection
Combining more information from age neighborhood
Centered death rates Emphasis on predictions of incoming
years
NCHS July 11, 2006
Application on US mortality forecasting
Data: Mortality data from death certificates
filed in state vital statistics offices and reported to NCHS from 1968 – 2002;
Population data from decennial census and interpolated between two adjacent decennial census
Age-specific mortality rates were calculated for each race-gender demographic population.
NCHS July 11, 2006
Cont’d 85 age-specific time series for ages 1,…, 85, where
the age category 85+ includes age 85 and above; For each age, time series is from 1970 to 2001,
2002 data are available for comparison with the prediction result;
All the 85 time series are categorized into 5 year age groups 1-5, 6-10, ..., 81-85+, a total of 17 groups;
Death rates at each age are rescaled by centralized from the averages over years;
Residuals from the time series “in the middle” of each group are taken as the reference.
NCHS July 11, 2006
Mortality age-patterns across four decades: 1970 – 2000: US National Vital Statistics
NCHS July 11, 2006
Comparison of age groups 32-34 & 31-35
Combining more information increases the fit of density
curves
NCHS July 11, 2006
Mean Square Error of prediction from Semiparametric model (SP) & Lee-Carter (LC)
MSE for total population
MSE for Female
Age Group
1-85 1-30 31-50 51-70 71-85
SP model .104 .050 .015 .030 .009
LC model .297 .078 .180 .029 .013
Age Group
1-85 1-30 31-50 51-70 71-85
SP model .187 .121 .026 .032 .008
LC model .619 .226 .341 .027 .025
NCHS July 11, 2006
Black females in IA, 1999
-8
-7
-6
-5
-4
-3
-2
-1
0
0 20 40 60 80 100
age
log
(d
eath
rat
e)
Black males in IA, 1999
-8
-7
-6
-5
-4
-3
-2
-1
0
0 20 40 60 80 100
age
Log(
deat
h ra
te)
TRUE
U75
L25
Estimate
Semiparametric Time Series Estimate: Mortalities in Small Populations
NCHS July 11, 2006
Semiparametric Time Series Estimate: Mortalities in Small Populations
White females in DC, 1999
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
0 20 40 60 80 100age
Lo
g(d
ea
th r
ate
)
TRUE
U75
L25
Estimate
White males in DC, 1999
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
0 20 40 60 80 100
age
Lo
g(d
ea
th r
ate
)
NCHS July 11, 2006
Conclusion Historical data fitted by the time series -
semiparametric model can help when estimating mortality rates in small areas with insufficient observations;
Compared to LC model, the semiparametric method reduces the overall MSE appreciably due to better modeling the predictive probabilities with conditional distributions;
This is a non-Bayesian method. The Bayesian method will result in relatively large prediction interval, so further than one year ahead prediction could apply.
NCHS July 11, 2006
Alternative ways to solve the problem of estimating mortalities for small areas
In addition to the way of borrowing strength from historical data, other alternatives include:
Borrow strength from national mortality data;
Borrow strength from geographic neighborhood data;
Borrow strength from other area data with similarities in cause of death.