30
Inference from ecological models: air pollution and stroke using data from Sheffield, England. Ravi Maheswaran, Guangquan Li, Jane Law, Robert Haining, Marta Blangiardo, Sylvia Richardson, Nicky Best

Inference from ecological models: air pollution and stroke using data from Sheffield, England

Embed Size (px)

DESCRIPTION

Inference from ecological models: air pollution and stroke using data from Sheffield, England. Ravi Maheswaran, Guangquan Li, Jane Law, Robert Haining, Marta Blangiardo, Sylvia Richardson, Nicky Best. Outline: Background to the Sheffield study and results presented at Geomed 2005. - PowerPoint PPT Presentation

Citation preview

Inference from ecological models: air pollution and stroke using data from

Sheffield, England.

Ravi Maheswaran, Guangquan Li, Jane Law, Robert Haining, Marta Blangiardo, Sylvia

Richardson, Nicky Best

Outline:

1.Background to the Sheffield study and results presented at Geomed 2005.

2.From the Poisson to the Binomial model

3.Results

4.Conclusions

1. Nitrogen oxides (NOx) and stroke mortality in Sheffield, England (Geomed 2005).

• Strokes account for 8%-12% of UK deaths

• Some evidence of a link between air pollution and stroke:

• studies of severe air pollution episodes (e.g 1952 London smog);

• analysis of daily time series (e.g. Kan et al (2003): Shanghai);

• cohort studies (e.g. Nafstad et al (2004): Norwegian males).

Since absolute number of deaths is small, power of tests even in large cohort studies is not large particularly for a factor that may not have a large effect.

Small area ecological studies may help: - by providing another way of looking at the relationship;- by allowing the analysis of very large populations and at a much lower cost than a cohort study;- small areas are likely to be more homogeneous (than large areas) in terms of population characteristics thus reducing the risk of ecological bias.

Data

Stroke mortality data:

• ICD9 codes 430-438;• 1994-8. c3k stroke deaths in population of c200k

over 45;• Aggregated by Enumeration District (c 150

households); age (5 year cohorts from 45 to 85+) and sex.

• 2.89 deaths per ED (min expected: 0.1; max:10.9)

Population data:

(i) 1991 Census data on demography and deprivation (Townsend index);

• Recorded at the Enumeration District level (n=1030)

(ii) Sheffield Health and Illness Prevalence survey (2000):• Random sample stratified by ward;• >10k respondents of whom >9.5k gave complete age, sex

and smoking information.• Average of 2.43 smokers per ED (Min expected: 0.19; max

expected: 19.24)

Environmental data: Quantifying NOx exposure. The Indic-Airviro model:

Average annual mean pollution levels 1994-9 (exc 1998): NOx (ug/m3)

50.00 100.00 150.00 200.00 250.00 300.00 350.00

Monitored

50.00

100.00

150.00

200.00

250.00

300.00

350.00

Mo

del

led

Modelled = -80.43 + 3.66 * monitorR-Square = 0.73

Areal Interpolation (from grid to ED): point in polygon – weighted PostPoint

ID Domestic Pollution Dom*Pollproperties for grid

1 13 16.95 220.382 3 18.29 54.883 33 16.72 551.634 31 16.97 526.195 19 16.97 322.516 3 17.40 52.207 33 17.02 561.808 20 17.72 354.449 7 18.72 131.04

Sum 162 2775.06Average 17.42

Weighted average = 17.13

NOx data transfered to the enumeration district framework after application of the weighted PostPoint

method of areal interpolation

Poisson Model

yi = number of stroke deaths in area i.

yi ~ Poisson(i)

i = riEi

ri = underlying true area i specific relative risk.Ei = expected number of deaths in area i standardized for age, sex and socio-economic deprivation:

m = age-sex-deprivation specific mortality rate for population subgroup m.ni,m = size of population subgroup m in area i.

k

1mmi,mi nθE

aveiii z x]Log[r β

Generalized linear model:

xi = NOx level in area i.zi

ave = Smoking prevalence ratio in area i (spatial moving average using the observed and expected counts).

Poisson regression controlling for age, sex, deprivation and smoking prevalence.

Parameter Rel. Risk (95% CI) WinBUGS

Rel. Risk (95% CI)

SAS

NOx category

5 1.48 (1.31-1.67) 1.48 (1.23-1.77)

4 1.26 (1.12-1.42) 1.26 (1.06-1.51)

3 1.10 (0.98–1.24) 1.10 (0.92-1.32)

2 1.13 (1.00-1.26) 1.12 (0.94-1.34)

1 1 1

Smoking: zave 0.93 (0.84-1.02) 0.93 (0.80-1.08)

DIC: 4871.57 Deviance/df=2.3

Bayesian hierarchical spatial model:

Fitted to allow for overdispersion due to :- small area population heterogeneity;- missing covariates (that may be spatially autocorrelated).

To allow for the uncertainty associated with the smoking data (small counts; missing values), an errors-in-variable model used for zi.

ieβ estiii z x]Log[r

ei = unexplained area-specific log relative risk in area i after adjusting for x and zest. = vi + si

vi = unstructured random effects (zero-mean normal prior)si = spatially structured random effects (zero-mean intrinsic conditional autoregressive prior).

ziest = log[smoke.ri] = smoke. + smoke.vi +

smoke.si

Priors:- flat priors used for , and .- gamma(0.5, 0.0005) used for the precision parameters of the random effect terms.

Spatial fraction (SF):- Var(si)/[Var(si) + Var(vi)]. Ratio of the estimate of the marginal variance of the spatial random effect to the sum of the estimated marginal variances of the

spatial and the unstructured random effects.

SF => 1 implies spatial heterogeneity dominates;SF => 0 implies unstructured heterogeneity dominates.

Poisson regression with spatial random effects, controlling for age, sex, deprivation and smoking prevalence

Parameter Rel. Risk (95% CI) WinBUGS

NOx category

5 1.27 (1.03-1.54)

4 1.16 (0.95-1.40)

3 1.04 (0.85-1.25)

2 1.07 (0.89-1.29)

1 1

Smoking: zest 1.05 (0.79-1.40)

Spatial fraction (model; for smoking

(0.006; 0.99)

DIC= 3927.77

Conclusions:

Evidence of an association between NOx and stroke mortality:1. threshold level for an effect;2. effect size diminishes after including random

effects to allow for overdispersion and missing variables;

3. spatially smoothing NOx to allow for local journeys did not make a difference to the size of the effect;

4. Unable to allow for long and short term population movements.

5. No association with smoking prevalence (effect of definition?; small sample sizes in some EDs?)

2. Fitting a Binomial Model

-stroke is not contagious so outcomes for individuals are independent Bernoulli rvs and therefore at the area level they aggregate to Binomial rvs. - because stroke is relatively rare, the Poisson assumption should give similar results, but it is only an approximation.- we also have data on the proportion exposed to different levels of NOx at the ED level which was not previously used.

Ecological analysis

Not-exposed Exposed Margins

Death

Not Death

Totals

Unknown (but of interest)Unknown (but of interest)

Observed (not previously used)Observed (not previously used) Observed (and used in the previous analysis)Observed (and used in the previous analysis)

Within-ED population distribution by PostPoint.

Dichotomised individual level model

xi,j is 0 (if individual j in area i is not exposed) or 1 (if individual j in area i is exposed).

:stroke risk in not-exposed group in i

:stroke risk in exposed group in i

zi denotes other area level covariates (e.g. deprivation)

vi ~ N(0,2). An unstructured random effect to account for unmeasured covariates.

i

xi

vlogit

yji

iγzji,xi,

xi,,

x)q(

)q(Bernoulli~

ji,

ji,,

}1{y probabilitq

1}y{probabilitq

1,i,1

0,i,0

�y

y

i

i

The person is in the exposed groupThe person is in the exposed group

The person is in the not-exposed groupThe person is in the not-exposed group

Depending on the exposure status of the individual:

This can be extend to a categorical exposure variable with more than 2 levels. Various extensions of the model such as incorporating continuous exposure can be found in Jackson et al. (2006)

Jackson, C. H., Best, N. G. and Richardson, S. Improving ecological inference using individual-level data. Statistics in Medicine (2006) 25(12): 2136--2159

ii

ji

ii

ji

vqlogit

thenxIf

vqlogit

thenxIf

i

i

)(

1

)(

0

1,

,

0,

,

An area-level model incorporating the distribution of within-area exposure

where

i = proportion of the population in area i in the

exposed category.pi = probability of stroke death in area i, regardless of exposure.

)(

)(

1,

0,

ii

ii

vexpitq

vexpitq

i

i

RemarkNote that applying a Binomial model with the proportion of exposed individuals as a covariate:

But in general

Ecological biasEcological bias

Derived from an individual level model

)(

)(

1,

0,

ii

ii

vexpitq

vexpitq

i

i

iii

iii

vplogit

npBiny

izγ)(

,~

Parameter Rel. Risk (95% CI) Rel. Risk (95% CI)

NOx category Without unstr. R.E. With unstr. R.E.

5 1.34 (1.14 – 1.52) 1.07 (0.88 – 1.29)

4 1.16 (1.03 – 1.30) 1.05 (0.86 – 1.25)

3 1.10 (0.99 – 1.22) 0.92 (0.75 – 1.10)

2 1.00 (0.87 – 1.13) 0.87 (0.73 – 1.04)

1 1 1

DIC: 4953.02

pD: 8

DIC: 3936.66

pD: 480

3. ResultsBinomial regression controlling for age, sex (18 strata), deprivation and incorporating the within area distribution of exposure.

A dichotomised-exposure Binomial regression model controlling for age, sex (4 strata; 18 strata) and deprivation

and incorporating data on the within area distribution of exposure.

Parameter Rel. Risk (95% CI): (4 strata)

Rel. Risk (95% CI): (18 strata)

NOx category

Exposed 1.20 (1.05 – 1.34) 1.14 (1.00-1.30)

Non-exposed 1 1

• The exposed category comprises NOx categories 4 and 5 in the previous slide;

• The non-exposed category comprises categories 1, 2 and 3.

4. Conclusions

1. Incorporation of information on within area exposure resulted in a reduction of the estimated relative risk compared to the earlier set of results.

2. Lower risks in categories 2 and 3 in the binomial model with 5 exposure categories may indicate some confounding effects have not been accounted for in the current model; in the absence of additional information, these effects could be “averaged out” by combining some exposure categories.

3. Fitting a reduced model with two exposure categories does indicate a significant effect in the exposed group after adjusting for age, sex and deprivation;

4. Increasing the number of age-sex cohorts from 4 to 18 in the dichotomous-exposure model reduced the estimated relative risk to 1.14 (95% CI: 1.00, 1.30), but there is still evidence of a significant effect.

Differences between the current approach and the earlier modelling. – The Poisson model is prone to ecological bias since for

exposure, only aggregated information was used.– Here we attempt to reduce  the bias by utilizing data on

the within-area distribution of exposure, i.e., the  proportion of people in the exposed and non-exposed groups.

– Deprivation was absorbed into the expected number of cases in the earlier work, here it has been included as a covariate. We could adjust for deprivation in the baseline risks.

– There was no adjustment for smoking prevalence since it was not significant in the earlier modeling. The possibility exists of using lung cancer mortality as a proxy for smoking instead.