1
Geographic variation of mortality with different socioeconomic indicators
using Multivariate multiple regression model
Jurairat Ardkaew
BOD - International Health Policy Program - IHPP
2
Objective
To examine mortality pattern by age sex and socio-economic indicators across administrative superdistricts in Thailand during the latest census period (1999-2001).
3
Data source
• The data for mortality cases are available from vital registration, Ministry of Public Health.
• The number of population by region was obtained from population and household census 2000.
• The socioeconomic indicators were obtained from 100% population and household census 2000 and 20% population and household census 2000.
4
Multivariate Regression• When there are several (i>1)criterion variables, we could just fit i
separate models 11 Xy 22 Xy
ii Xy …
•But this:
• Does not give simultaneous tests for all regressions.
• Dose not take correlation among the y’s into account.
• Often, multivariate test are more powerful, when the responses are correlated.
• Multivariate test provide a way to understand the structure of relation across separate response measures.
• Avoid multiplying error rates, as in ANOVA
• Overall test for multiple responses – similar to overall test for many group.
Why domultivariate
test?
5
Multivariate Multiple Regression Model
• The multivariate multiple regression model is
y1 … yi = x1 x2 … xj β1 … βi + Enxi
may be expressed simply in matrix form as
Ynxi = Xnxj Bjxi + εnxi
• The LS solution, B=(XTX)-1XTY gives same coefficients as fitting i
models separately.
6
Application for the this study
It would be surprising if there were no correlations between successive age groups. To incorporate these correlations in a quite general way, we can use a matrix formulation of the model.
outcome variable (Yrx) : mortality rate
explanatory variables (Xrj): observed socio-economic indicators
Suppose that Y is the matrix of outcome variables f(mrx) = log(mrx), where
the columns correspond to nA age groups (0,1-4,…, 80-84) and the rows
correspond to nR regions (235 superdistricts), and X is the matrix with rows
also corresponding to regions and p+2 columns ( ), where the first column contains 1s, the next p columns contain the observed socio-economic predictors, and the last column contains the unobserved
explanatory variable (obtained from the least-squares fit), and r denotes the region (such as a ‘super-district’, a district or group of contiguous districts within the same province having population approximately 200,000
persons).
rh
1
1
p
jrjg
7
Then the model
where gr,p+1 = (an explanatory variable encapsulating the unobserved
information on how mortality varies with region).
may be expressed simply in matrix form as
Y = X B
where B is the p+2 x nA matrix of parameters (ax, bjx).
This model is easily fitted using multivariate multiple regression analysis.
1
1
)(p
jrjjxxrx gbamf
rh
8
Multivariate Multiple Regression Analysis: Example
This model allows correlations between errors corresponding to different outcomes but assumes independent errors within each outcome variable.
This model is fitted separately to all-cause male and female mortality rates in the 235 superdistricts (r = 235) of Thailand, for the period 1999-2001.
The 6 selected Socioeconomic indicators (p=6)• pop.density (in1000s of persons per square km)
• prop.Agriculture population
• prop. population who live out municipal
• prop.Aged15+&Grad >= Secondary1 School
• prop.Households that No Toilet
• prop.Households that have Pipe Water Supply inside the house
9
Distribution of SE indicators in each region
Max = 32.83 Min = 0.02 Mean = 1.49
Max = 0.96 Min = 0.00 Mean = 0.69
10
Distribution of SE indicators in each region
Max = 0.73 Min = 0.16 Mean = 0.34
Max = 0.91 Min = 0.0007 Mean = 0.52
11
Distribution plot of SE indicators in each region
Max = 0.96 Min = 0.05 Mean = 0.41
Max = 0.16 Min = 0.0002 Mean = 0.02
12
The result of MMR Model
13
These values are high when the mortality is high (in age group 5-9 and the age groups 15-19,20-24, 25-29, .., 65-69).
The model gives an r-squared for each age group.
male: coef (std.error)
Significant code : a = 0.001 , b = 0.01, c = 0.05, d = 0.1
14
These values are high when the mortality is high (in age group 5-9 and the age groups 15-19,20-24, 25-29, .., 65-69).
The model gives an r-squared for each age group.
female : coef (std.error)
Significant code : a = 0.001 , b = 0.01, c = 0.05, d = 0.1
15
Unobserved mortality in each region
For male, unobserved mortality is general low in super district of
southern region and high in most of super districts of in Northern region
practically, ChaingRai, Chiangmai, Phayao and Phare and some super districts in Burirum.
For female, low and high unobserved mortality occur in the similar areas.
16
17
The first 30 Ranking highest unobserved mortalitymale
18
The first 30 Ranking highest unobserved mortalityfemale
19
Correlations between Residuals in Age Groups
male
female
20
Fit model with 1SE : pop.density
21
Fit model with 1SE : prop.outMunicipal
22
Fit model with 1SE : prop.AgricalturePop
23
Fit model with 1SE : prop.Aged15+&Grad>=Secondary1 School
24
Fit model with 1SE : prop.No Toilet
25
Fit model with 1SE : prop.PipeWaterSupplieinsideHouse
26
R Mapping
Drawing map using R program• Thematic Map
– Thematic maps are data maps of a specific subject or for a specific purpose.
– Display data according to reference base. (such as : comparing mean with tail of 95%CIs of subject)
• Range Map– Display data according to range set by users.– The ranges are shaded using color.
27
Example : Childhood diarrhea incidence in 5 border provinces of Northeast Thailand : 1999-2004
Data structure
… … … … … …
28
Example : Thematic map
29
Example : All cause of death age 0-84 in Thailand (1999-2001)
Data structure
MortM : mortality/1000 of male QM : quintile of mortality/1000 of male MortF : mortality/1000 of female QF : quintile of mortality/1000 of female
30
Example : Range map