Spatio-temporal modelling for air pollution - SAMSI A rigorous statistical framework for estimating the long-term health effects of air pollution ... Spatio-temporal modelling for

  • View
    212

  • Download
    0

Embed Size (px)

Text of Spatio-temporal modelling for air pollution - SAMSI A rigorous statistical framework for estimating...

  • Spatio-temporal modelling for airpollution

    Sujit SahuSouthampton Statistical Sciences Research Institute,

    University of Southampton

    SAMSI, March 2013

    Collaborators: Lee, Mitchell, Rushworth (Glasgow)Mukhopadhay, Bakar, Bass, Yip (PhD Students) and theUK Met Office.

    1 A rigorous statistical framework for estimating thelong-term health effects of air pollution

    2 Forecasting next day ozone levels in the eastern US

  • Spatio-temporal modelling for airpollution

    Sujit SahuSouthampton Statistical Sciences Research Institute,

    University of Southampton

    SAMSI, March 2013

    Collaborators: Lee, Mitchell, Rushworth (Glasgow)Mukhopadhay, Bakar, Bass, Yip (PhD Students) and theUK Met Office.

    1 A rigorous statistical framework for estimating thelong-term health effects of air pollution

    2 Forecasting next day ozone levels in the eastern US

  • Motivation

    Air pollution has many detrimental effects to humanhealth: primarily respiratory, lung function, coughing,throat irritation, congestion, bronchitis, asthma.

    According to the website of the Department forEnvironment Food and Rural Affairs (DEFRA):

    In 2008 air pollution in the form of anthropogenicparticulate matter (PM) alone was estimated toreduce average life expectancy in the UK by aroundsix months.

    Thereby imposing an estimated equivalent healthcost of 19 billion GBP in 2008.

    Traffic pollution kills 5,000 a year in UK, says study: BBCNews, April 17, 2012.

    Sujit Sahu 2

  • Three main aims of our project

    A Development of a model that provides an accuraterepresentation of the spatio-temporal structure insmall-area health data.

    B Development of a model that produces estimates andmeasures of uncertainty in the levels of overall airpollution at relevant spatial and temporal resolutions(as required to align with the health data).

    C Development of a single integrated framework forcombining the health and pollution models describedin A and B, thus allowing the chronic effects of airpollution to be estimated.

    Sujit Sahu 3

  • Specific Research Objectives (SRO)

    (i) To develop a Bayesian spatio-temporal Markovrandom field (MRF) model that can representlocalised spatial structure and identify boundaries inhealth data.

    (ii) To apply the model to real and simulated data sets, toquantify the impact that mis-specifying the spatialstructure of the unmeasured confounders has on theestimated pollution-health relationship.

    Sujit Sahu 4

  • Specific Research Objectives...

    (iii) To develop a Bayesian multiple pollutant space-timegeostatistical model that can predict levels of overallpollution at unmonitored locations with theirassociated uncertainties.

    (iv) To validate the model in SRO (iii) using air pollutiondata in study regions.

    (v) To combine the MRF model in (i) and geostatisticalmodel (iii) to estimate the effects of air pollution onhuman health in three case studies: London,Southampton and Glasgow.

    Sujit Sahu 5

  • Specific Research Objectives...

    (vi) To study the effect of future climate on health and airpollution, by using UK specific regional climate modelprojections to 2050 that will be used in the integratedhealth-pollution model.

    (vii) To develop a user-friendly software package enablingothers to implement the methods that we develop.

    Sujit Sahu 6

  • Health outcome model

    Let Yt (Ai) and Et(Ai) denote the observed andexpected numbers of health events that occur in arealunit Ai (i = 1, . . . , n) and time period t (t = 1, . . . ,T ),such as respiratory admissions to hospital.

    The overall risk Rt(Ai) is modelled by covariatesxt(Ai) and a random effect t(Ai).

    Yt(Ai) Poisson(Et(Ai)Rt(Ai)),log(Rt(Ai)) = xt(Ai)+ t(Ai).

    Sujit Sahu 7

  • Modelling the random effects

    Denote the random effects by = (1, . . . ,T ), wheret = (t(A1), . . . , t(An)),we propose a class of MRF priors which decomposef (1, . . . ,T ) :

    p

    t=1

    N(t |0, 2t Q1t )

    andT

    t=p+1

    N(t |F1tt1 + + Fpttp, 2t Q1t ).

    Fjt are the temporal transition matrices,p denotes the lag of the temporal correlation and istypically chosen to be one or two.

    Sujit Sahu 8

  • Modelling the random effects...

    Temporal correlation induced via the mean structure.

    Spatial correlation is induced via the variancestructure.

    The latter is parameterised by the precision matrixQt , whose jk th element controls the spatialcorrelation structure between t(Aj) and t(Ak ).

    Qt constant is a possibility.

    Sujit Sahu 9

  • Air pollution model

    Let Z (k)l (sj), which denotes the concentration of Kpollutants observed at J monitoring stations,s1, . . . , sJ , at L different time points.

    The J stations will be unevenly distributed relative tothe n areal units for which disease data have beencollected.

    The L time points will also be at a higher temporalresolution than the disease data, e.g. daily comparedwith annually.

    Z (k)l (sj) = (k)l (sj) +

    (k)l (sj), (1)

    (k)l (sj) = x

    (k)l

    (sj)(k)(sj) + (k)l (sj).

    Sujit Sahu 10

  • Air pollution model...

    The error term (k)l (sj) is assumed to be a pollutantspecific white noise process.

    The true concentration of pollution will be modelledby a combination of covariates x (k)l (sj),

    and a spatio-temporal processl(sj) = (

    (1)l (sj), . . . ,

    (K )l (sj)).

    We propose representing the space-time processl(sj) with the linear model of co-regionalisation,l(sj) = Dll(sj), which uses the correlations betweenthe pollutants to improve the fit of the model.

    Sujit Sahu 11

  • Estimating overall air quality index (AQI)

    l(sj) =1K

    K

    k=1

    (k)l (sj)

    (k)

    sd((k)),

    where (k) and sd((k)) are the sample mean andstandard deviation of (k)l (sj).

    A Bayesian approach will enable us to produceposterior distributions for l(sj), which in turn allowsus to quantify the uncertainty in the AQI.

    Sujit Sahu 12

  • The link model...

    t(A) = |A|1

    At(s)ds, (2)

    where |A| denotes the area of block A

    log(Rt(Ai)) = 0t1(Ai) + x t (Ai)+ t(Ai). (3)

    Here 0 is the effects of air pollution on health.The AQI is lagged by one year compared with thehealth data to ensure that the exposure occursbefore the response.A Measurement error model links t(Ai) and l(sj) by

    t(Ai) N(t(Ai), 2),

    where t(Ai) is the average of all l(sj) where sj is withinareal unit i and l is within the aggregate time t .

    Sujit Sahu 13

  • Discussion

    The models are under currently construction: twopost-docs: Mukhopadhyay (Southampton) andRushworth (Glasgow).

    Software packages implementing the models will bedeveloped.

    Three year EPSRC, like NSF, project worth 635K.Launch meeting in Southampton on April 15, 2013.

    We would like to learn more from data from India.

    We need pollution and health data.

    A post-doc/PhD student, preferably from India, whocan harass the government for releasing the data!

    A collaborator, like many Indian colleagues, as well.

    Sujit Sahu 14

  • 2: Forecasting next day ozone levels in theeastern United States

    1 Three Gaussian process models.2 Forecast calibration with a small example.3 Illustration with a large data set.4 Discussion.

    Sujit Sahu 15

  • Preliminaries in modelling

    Apply transformation to stabilize variance and toencourage symmetry etc.

    We use the square root, but it is possible to use thelog.

    Observed data = Zl(s, t), s = (long, lat), at n sites.Denote time by two indices: t for hours (days) within lfor days (years).

    Data are observed at n sites s1, . . . , sn.As a covariate in a downscaler model (Sahu et al2009, Berrocal et al 2010) we use the grid CMAQ(computer model) output, xl(s, t).We can use other covariates, e.g. temperature,windspeed and relative humidity, but those do notremain significant after including CMAQ output.

    Sujit Sahu 16

  • Model 1: Gaussian Process (GP)

    Measurement error model:

    Zl(s, t) = Ol(s, t) + l(s, t), l(s, t) N(0, 2 ).

    Ol(s, t) = true value, underlying space-time process.l(s, t) are independent.2 is called the nugget effect.

    Model for true ozone

    Ol(s, t) = xl(s, t)+ l(s, t)

    xl(s, t): adjustment for local meteorology and/orother covariates, e.g. CMAQ output.

    l(s, t): space-time intercept, assumed to beindependent in time.

    Assume lt N(0, ), = 2 Matern correlation.

    Sujit Sahu 17

  • Model 2: AR models

    Measurement error model:

    Zl(s, t) = Ol(s, t) + l(s, t), l(s, t) N(0, 2 ).

    Details as before.

    Model for true ozone

    Ol(s, t) = Ol(s, t 1) + xl(s, t)+ l(s, t)

    Ol(s, t 1): auto-regressive.xl(s, t): adjustment for local meteorology and/orother covariates, e.g. CMAQ output.

    l(s, t): space-time intercept, independent in time.Assume lt N(0, ), =

    2 Matern correlation.

    Need an initial condition when t = 1. Details omitted.Sujit Sahu 18

  • Model 3: GPP approximations: Banerjee et al2008

    Ideally, in addition to the nugget effect, would like tofit:

    Ol(si , t) = xl(si , t)+ l(si , t). (4)

    The problem is that then we will have nrT space-timerandom variables l(si , t), the same number as data.GPP approximations, reduce this number byconsidering a smaller number, m

  • Details of the GPP approximations

    Let C be the n m covariance matrix with the ij thelement Cov(l(si , t), (sj , t)), fori = 1, . . . , n, j = 1, . . . ,m.

    Let be the m m covariance matrix of the spatialprocess at the knots, lt = (l(s

    1, t), . . . , l(s

    m