16
Chapter 1 Why spatio–temporal epidemiology? 1.1 Overview Spatial epidemiology is the description and analysis of geographical data, specif- ically health data in the form of counts of mortality or morbidity and factors that may explain variations in those counts over space. These may include demographic and environmental factors together with genetic, and infectious risk factors (Elliott & Wartenberg, 2004). It has a long history dating back to the mid-1800s when John Snow’s map of cholera cases in London in 1854 provided an early example of ge- ographical health analyses that aimed to identify possible causes of outbreaks of infectious diseases (Hempel, 2014). Since then, advances in statistical methodology together with the increasing availability of data recorded at very high spatial and temporal resolution has lead to great advances in spatial and, more recently, spatio– temporal epidemiology. These advances have been driven in part by increased awareness of the poten- tial effects of environmental hazards and potential increases in the hazards them- selves. Over the past two decades, population predictions based on conventional de- mographic methods have forecast that the world’s population will rise to about 9 bil- lion in 2050, and then level off or decline. However, recent analyses using Bayesian methods have provided compelling evidence that such projections may vastly under- estimate the world’s future population and instead of the expected decline, population will continue to rise (Gerland et al., 2014). Such an increase will greatly add to the anthropogenic contributions of environmental contamination and will require polit- ical, societal and economic solutions in order to adapt to increased risks to human health and welfare. In order to assess and manage these risks there is a requirement for monitoring and modelling the associated environmental processes that will lead to an increase in a wide variety of adverse health outcomes. Addressing these issues will involve a multi-disciplinary approach and it is imperative that the uncertainties that will be associated with each of the components can be characterised and incor- porated into statistical models used for assessing health risks (Zannetti, 1990). In this chapter we describe the underlying concepts behind investigations into the effects of environmental hazards and particularly the uncertainties that are present at each stage of the process. This leads to a discussion of the reasons why considering 1

- Why spatio–temporal epidemiology? · 2020-03-23 · Example 1.1. Daily measurements of particulate matter An example of temporal correlation in exposures can be seen in Figure

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: - Why spatio–temporal epidemiology? · 2020-03-23 · Example 1.1. Daily measurements of particulate matter An example of temporal correlation in exposures can be seen in Figure

Chapter 1

Why spatio–temporal epidemiology?

1.1 Overview

Spatial epidemiology is the description and analysis of geographical data, specif-ically health data in the form of counts of mortality or morbidity and factors thatmay explain variations in those counts over space. These may include demographicand environmental factors together with genetic, and infectious risk factors (Elliott& Wartenberg, 2004). It has a long history dating back to the mid-1800s when JohnSnow’s map of cholera cases in London in 1854 provided an early example of ge-ographical health analyses that aimed to identify possible causes of outbreaks ofinfectious diseases (Hempel, 2014). Since then, advances in statistical methodologytogether with the increasing availability of data recorded at very high spatial andtemporal resolution has lead to great advances in spatial and, more recently, spatio–temporal epidemiology.

These advances have been driven in part by increased awareness of the poten-tial effects of environmental hazards and potential increases in the hazards them-selves. Over the past two decades, population predictions based on conventional de-mographic methods have forecast that the world’s population will rise to about 9 bil-lion in 2050, and then level off or decline. However, recent analyses using Bayesianmethods have provided compelling evidence that such projections may vastly under-estimate the world’s future population and instead of the expected decline, populationwill continue to rise (Gerland et al., 2014). Such an increase will greatly add to theanthropogenic contributions of environmental contamination and will require polit-ical, societal and economic solutions in order to adapt to increased risks to humanhealth and welfare. In order to assess and manage these risks there is a requirementfor monitoring and modelling the associated environmental processes that will leadto an increase in a wide variety of adverse health outcomes. Addressing these issueswill involve a multi-disciplinary approach and it is imperative that the uncertaintiesthat will be associated with each of the components can be characterised and incor-porated into statistical models used for assessing health risks (Zannetti, 1990).

In this chapter we describe the underlying concepts behind investigations into theeffects of environmental hazards and particularly the uncertainties that are present ateach stage of the process. This leads to a discussion of the reasons why considering

1

Page 2: - Why spatio–temporal epidemiology? · 2020-03-23 · Example 1.1. Daily measurements of particulate matter An example of temporal correlation in exposures can be seen in Figure

2 WHY SPATIO–TEMPORAL EPIDEMIOLOGY?

diseases and exposures over both space and time are becoming increasingly impor-tant in epidemiological analyses. In this book we advocate a Bayesian approach tomodelling and later in this chapter we consider a general framework for modellingspatio–temporal data and introduce the notation that will be used throughout thebook. Different types of spatial data are introduced together with a brief summary ofthe effect that the underlying generating mechanism will have on subsequent mod-elling, a subject that is further developed in later chapters. Throughout this chapter,concepts and theory are presented together with examples.

1.2 Health-exposure models

An analysis of the health risks associated with an environmental hazard will require amodel which links exposures to the chosen health outcome. There are several poten-tial sources of uncertainty in linking environmental exposures to health, especiallywhen the data might be at different levels of aggregation. For example, in studies ofthe effects of air pollution, data often consists of health counts for entire cities withcomparisons being made over space (with other cities experiencing different levelsof pollution) or time (within the same city) whereas exposure information is oftenobtained from a fixed number of monitoring sites within the region of study.

Actual exposures to an environmental hazard will depend on the temporal tra-jectories of the population’s members that will take individual members of that pop-ulation through a sequence of micro-environments, such as a car, house or street(Berhane, Gauderman, Stram, & Thomas, 2004). Information about the current stateof the environment may be obtained from routine monitoring or through measure-ments taken for a specialised purpose. An individual’s actual exposure is a complexinteraction of behaviour and the environment. Exposure to the environmental hazardaffects the individual’s risk of certain health outcomes, which may also be affectedby other factors such as age and smoking behaviour.

1.2.1 Estimating risks

If a study is carefully designed, then it should be possible to obtain an assessmentof the magnitude of a risk associated with changes in the level of the environmentalhazard. Often this is represented by a relative risk or odds ratio, which is the natu-ral result of performing log–linear and logistic regression models respectively. Theyare often accompanied by measures of uncertainty, such as 95% confidence (or inthe case of Bayesian analyses, credible) intervals. However, there are still severalsources of uncertainty which cannot be easily expressed in summary terms. Theseinclude the uncertainty associated with assumptions that were implicitly made in anystatistical regression models, such as the shape of the dose–response relationship (of-ten assumed to be linear). The inclusion, or otherwise, of potential confounders andunknown latencies over which health effects manifest themselves will also introduceuncertainty. In the case of short-term effects of air pollution for example, a lag (thedifference in time between exposure and health outcome) of one or two days is often

Page 3: - Why spatio–temporal epidemiology? · 2020-03-23 · Example 1.1. Daily measurements of particulate matter An example of temporal correlation in exposures can be seen in Figure

DEPENDENCIES OVER SPACE AND TIME 3

chosen (Dominici & Zeger, 2000) but the choice of a single lag doesn’t acknowl-edge the uncertainty associated with making this choice. Using multiple lags in thestatistical model may be used but this may be unsatisfactory due to the high cor-relation amongst lagged exposures, although this problem can be reduced by usingdistributed lag models (Zannetti, 1990).

1.2.2 A new world of uncertainty

The importance of uncertainty has increased dramatically as the twentieth centuryushered in the era of post-normal science as articulated by Funtowicz and Ravetz(Funtowicz & Ravetz, 1993). Gone were the days of the solitary scientist runnningcarefully controlled bench-level experiments with assured reproducibility, the hall-mark of good classical science. In came a science characterized by great risks andhigh levels of uncertainty, an example being climate science with its associated envi-ronmental health risks. Funtowicz–Ravetz post-normality has two major dimensions(Aven, 2013): (i) decision stakes or the value dimension (cost–benefit) and (ii) sys-tem uncertainties. Dimension (i) tends to increase with (ii); just where certainty isneeded the most, uncertainty is reaching its peak.

Post-normal science called for a search for new approaches to dealing with un-certainty, ones that recognised the diversity of stakeholders and evaluators neededto deal with these challenges. That search led to the recognition that characterisinguncertainty required a dialogue amongst this extended set of peer reviewers throughworkshops and panels of experts. Such panels are convened by the US EnvironmentalProtection Agency (EPA) who may be required to debate the issues in a public fo-rum with participation of outside experts (consultants) employed by interest groupssuch as in the case of air pollution the American Lung Association and the AmericanPetroleum Producers Association.

1.3 Dependencies over space and time

Environmental epidemiologists commonly seek associations between an environ-mental hazard Z and a health outcome Y . A spatial association is suggested if mea-sured values of Z are found to be large (or small) at locations where counts of Yare also large (or small). Similarly, temporal associations arise when large (or small)values of Y are seen at times when Z are large (or small). A classical regression anal-ysis might then be used to assess the magnitude of any associations and to assesswhether they are significant. However such an analysis would be flawed if the pairsof measurements (of exposures), Z and the health outcomes, Y , are spatially corre-lated, which will result in outcomes at locations close together being more similarthan those further apart. In this case, or in the case of temporal correlation, the stan-dard assumptions of stochastic independence between experimental units would notbe valid.

Page 4: - Why spatio–temporal epidemiology? · 2020-03-23 · Example 1.1. Daily measurements of particulate matter An example of temporal correlation in exposures can be seen in Figure

4 WHY SPATIO–TEMPORAL EPIDEMIOLOGY?

Easting

Nor

thin

g

1e+05

2e+05

3e+05

4e+05

5e+05

6e+05

2e+05 3e+05 4e+05 5e+05 6e+05

0.5

1.0

1.5

2.0

Figure 1.1: Map of the spatial distribution of risks of hospital admission for a res-piratory condition, chronic obstructive pulmonary disease (COPD), in the UK for2001. The shades of blue correspond to standardised admission rates, which are ameasure of risk. Darker shades indicate higher rates of hospitalisation allowing forthe underlying age–sex profile of the population within the area.

An example of spatial correlation can be seen in Figure 1.1 which shows the spa-tial distribution of the risk of hospital admission for chronic obstructive pulmonarydisease (COPD) in the UK. There seem to be patterns in the data with areas of highand low risks being grouped together suggesting that there may be spatial depen-dence that would need to be incorporated in any model used to examine associationswith potential risk factors.

1.3.1 Contrasts

Any regression-based analysis of risk requires contrasts between low and high lev-els of exposures in order to assess the differences in health outcomes between thoselevels. A major breakthrough in environmental epidemiology came from recognisingthat time series studies could, in some cases, supply the required contrasts betweenlevels of exposures while eliminating the effects of confounders to a large extent.

This is now the standard approach in short-term air pollution studies(Katsouyanni et al., 1995; Peng & Dominici, 2008) where the levels of a pollutant,

Page 5: - Why spatio–temporal epidemiology? · 2020-03-23 · Example 1.1. Daily measurements of particulate matter An example of temporal correlation in exposures can be seen in Figure

DEPENDENCIES OVER SPACE AND TIME 5

Z, varies from day-to-day within a city while the values of confounding variables,X , for example the age–sex structure of the population or smoking habits, do notchange over such a short time period. Thus if Z is found to vary in association withshort-term changes in the levels of pollution then relationships can be established.However, the health counts are likely to be correlated over time due to underlyingrisk factors that vary over time. It is noted that this is of a different form than that forcommunicable diseases where it may be the disease itself that drives any correlationin health outcomes over time. This leads to the need for temporal process models tobe incorporated within analyses of risk. In addition, there will often be temporal pat-terns in the exposures. Levels of air pollution for example are correlated over shortperiods of time due to changes in the source of the pollution and weather patternssuch as wind and temperature.

Periods of missing exposure data can greatly affect the outcomes of a health anal-ysis, both in terms of reducing sample size but also in inducing bias in the estimatesof risk. There is a real need for models that can impute missing data accurately andin a form that can be used in health studies. It is important that, in addition to predict-ing levels of exposure when they are not available, such models should also producemeasures of associated uncertainty that can be fed into subsequent analyses of theeffect of those exposures on health.

Example 1.1. Daily measurements of particulate matter

An example of temporal correlation in exposures can be seen in Figure1.2, which shows daily measurements of particulate matter over 250 daysin London in 1997. Clear auto-correlation can be seen in this series of datawith periods of high and low pollution. There are also periods of missing data(shown by triangles along the x-axis) where measurements were not available.

Figure 1.2: Time series of daily measurements of particulate matter (PM10) for 250days in 1997 in London. Measurements are made at the Bloomsbury monitoring sitein central London. Missing values are shown by triangles. The solid black line is asmoothed estimate produced using a Bayesian temporal model and the dotted linesshow the 95% credible intervals associated with the estimates.

Page 6: - Why spatio–temporal epidemiology? · 2020-03-23 · Example 1.1. Daily measurements of particulate matter An example of temporal correlation in exposures can be seen in Figure

6 WHY SPATIO–TEMPORAL EPIDEMIOLOGY?

It is noted that classical time series composition and analysis is primarily in-terested in modelling the behaviour of a response variable over time rather than itsrelationship to a set of explanatory variables which is at the heart of environmentalepidemiology. However the classical theory can play a key role in learning the natureof any serial dependence in outcomes, both in health and exposures, and for con-structing suitable models that incorporate such dependence.

Until recently epidemiological studies have considered measurements over spaceor time but rarely both. Increased power can be gained by combining space and timewhen processes evolve over both of these domains. We then have contrasting levelsin both the spatial and temporal domains. When the spatial fields are temporallyindependent, replicates of the spatial field become available. Spatial dependence isthen easier to model. However the likely presence of temporal dependence leads toa need to build complex dependence structures. At the cost of increased complexity,such models may utilise the full benefit of the information contained in the spatio–temporal field. This means that dependencies across both space and time can beexploited in terms of ‘borrowing strength’. For example, values could be predictedat unmonitored spatial locations or at future times to help protect against predictedoverexposures.

Example 1.2. Spatial prediction of NO2 concentrations in Europe

In this example we see the result of using a spatial model to predict levelsof nitrogen dioxide (NO2) across Europe (Shaddick, Yan, et al., 2013). Mea-surements were available from monitoring sites at approximately 400 sitessituated throughout Europe and these data were used to predict concentra-tions for every 1km × 1km geographical grid cell within the region. In thiscase, a Bayesian model was fit within WinBUGS and posterior predictionswere imported (via R) to ESRI ArcGIS for mapping. The results can be seenin Figure 1.3.

In addition to the issues associated with correlation over space and time, envi-ronmental epidemiological studies will also face a major hurdle in the form of con-founders. If there is a confounder, X , that is the real cause of adverse health effectsthere will be problems if it is associated with both Z and Y . In this case, apparent re-lationships observed between Z and Y may turn out to be spurious. It may thereforebe important to model spatio–temporal variation in confounding variables in additionto the variables of primary interest.

1.4 Examples of spatio–temporal epidemiological analyses

Environmental exposures will vary over both space and time and there will poten-tially be many sources of variation and uncertainty. Statistical methods must be able

Page 7: - Why spatio–temporal epidemiology? · 2020-03-23 · Example 1.1. Daily measurements of particulate matter An example of temporal correlation in exposures can be seen in Figure

EXAMPLES OF SPATIO–TEMPORAL EPIDEMIOLOGICAL ANALYSES 7

Figure 1.3: Predictions of nitrogen dioxide (NO2) concentrations throughout Europe.The predictions are from a Bayesian spatial model and are the medians of the pos-terior distributions of predictions based on measurements from approximately 400monitoring sites.

to acknowledge this variability and uncertainty and be able to estimate exposures atvarying geographical and temporal scales in order to maximise the information avail-able that can be linked to health outcomes in order to estimate the associated risks.In addition to estimates of risks, such methods must be able to produce measures ofuncertainty associated with those risks. These measures of uncertainty should reflectthe inherent uncertainties that will be present at each of the stages in the modellingprocess.

This has led to the application of spatial and temporal modelling in environ-mental epidemiology, in order to incorporate dependencies over space and time inanalyses of association. The value of spatio–temporal modelling can be seen in twomajor studies that were underway at the time this book was being written: (i) theChildren’s Health Study in Los Angeles and (ii) the MESA Air (Multi-Ethnic Studyof Atherosclerosis Air Pollution) study.

Example 1.3. Children’s health study – Los Angeles

Children may suffer increased adverse effects to air pollution comparedto adults as their lungs are still developing. They are also likely to experi-ence higher exposures as they breathe faster and spend more time outdoorsengaged in strenuous activity. The effects of air pollution on children’s healthis therefore a very important health issue.

Page 8: - Why spatio–temporal epidemiology? · 2020-03-23 · Example 1.1. Daily measurements of particulate matter An example of temporal correlation in exposures can be seen in Figure

8 WHY SPATIO–TEMPORAL EPIDEMIOLOGY?

The Children’s Health Study began in 1993 and is a large, long-term studyof the effects of chronic air pollution exposures on the health of children livingin Southern California. Approximately 4000 children in twelve communitieswere enrolled in the study although substantially more have been added sincethe initiation of the study. Data on the children’s health, their exposures to airpollution and many other factors were recorded annually until they graduatedfrom high school.

This study is remarkable as the complexity of such longitudinal studieshas generally made them prohibitively expensive. While the study was ob-servational in nature, i.e. subjects could not be randomised to high or lowexposure groups, children were selected to provide good contrast between ar-eas of low and high exposure. Spatio–temporal modelling issues had to beaddressed in the analysis since data were collected over time and from a num-ber of communities which were distributed over space (Berhane et al., 2004).

A major finding from this study was that:

Current levels of air pollution have chronic, adverse effects on lunggrowth leading to clinically significant deficit in 18-year-old children.Air pollution affects both new onset asthma and exacerbation. Liv-ing in close proximity to busy roads is associated with risk for preva-lent asthma. Residential traffic exposure is linked to deficit in lungfunction growth and increased school absences. Differences in geneticmakeup affect these outcomes. (http://hydra.usc.edu/scehsc/about-studies-childrens.html)

Example 1.4. Air pollution and cardiac disease

The MESA Air (Multi-Ethnic Study of Atherosclerosis and Air Pollution)study involves more than 6000 men and women from six communities in theUnited States. The study started in 1999 and continues to follow participants’health as this book is being written.

The central hypothesis for this study is that long-term exposure to fineparticles is associated with a more rapid progression of coronary atheroscle-rosis. Atherosclerosis is sometimes called hardening of the arteries and whenit affects the arteries of the heart, it is called coronary artery disease. Theproblems caused by the smallest particles is their capacity to move throughthe gas exchange membrane into the blood system. Particles may also gener-ate anti-inflammatory mediators in the blood that attack the heart.

Data are recorded both over time and space and so the analysis has beendesigned to acknowledge this. The study was designed to ensure the necessarycontrasts needed for good statistical inference by taking random spatial sam-

Page 9: - Why spatio–temporal epidemiology? · 2020-03-23 · Example 1.1. Daily measurements of particulate matter An example of temporal correlation in exposures can be seen in Figure

BAYESIAN HIERARCHICAL MODELS 9

ples of subjects from six very different regions. The study has yielded a greatdeal of new knowledge about the effects of air pollution on human health. Inparticular, exposures to chemicals and other environmental hazards appear tohave a very serious impact on cardiovascular health.

Results from MESA Air show that people living in areas with higherlevels of air pollution have thicker carotid artery walls than peopleliving in areas with cleaner air. The arteries of people in more pol-luted areas also thickened faster over time, as compared to peopleliving in places with cleaner air. These findings might help to ex-plain how air pollution leads to problems like stroke and heart attacks.(http://depts.washington.edu/mesaair/)

1.5 Bayesian hierarchical models

Bayesian hierarchical models are an extremely useful and flexible framework inwhich to model complex relationships and dependencies in data and they are usedextensively throughout the book. In the hierarchy we consider, there are three levels;

(i) The observation, or measurement, level; Y |Z,X1,θ1.Data, Y , are assumed to arise from an underlying process, Z, which is unobserv-able but from which measurements can be taken, possibly with error, at locationsin space and time. Measurements may also be available for covariates, X1. Here θ1is the set of parameters for this model and may include, for example, regressioncoefficients and error variances.

(ii) The underlying process level; Z|X2,θ2.The process Z drives the measurements seen at the observation level and repre-sents the true underlying level of the outcome. It may be, for example, a spatio–temporal process representing an environmental hazard. Measurements may alsobe available for covariates at this level, X2. Here θ2 is the set of parameters forthis level of the model.

In this book we advocate a Bayesian approach and so there will be an additional levelat which distributions are assigned to all unknown quantities.

(iii) The parameter level; θ = (θ1,θ2).This contains models for all of the parameters in the observation and process leveland may control things such as the variability and strength of any spatio–temporalrelationships.

Here the notation Y |X means that the distribution of Y is conditional on X .

This book involves models for both health counts and exposures and each of thesecan be framed in the context of a hierarchical model. To avoid ambiguity betweenthe two, we use Y (1), X (1), Z(1), θ (1) for health models and Y (2), X (2), Z(2), θ (2) forexposure models.

Page 10: - Why spatio–temporal epidemiology? · 2020-03-23 · Example 1.1. Daily measurements of particulate matter An example of temporal correlation in exposures can be seen in Figure

10 WHY SPATIO–TEMPORAL EPIDEMIOLOGY?

When health or exposure models are considered separately the Y () notation isdropped for clarity of exposition. Also, it is noted that we do not generally considercases where health counts from routinely available data sources may not be an ac-curate reflection of the underlying health of the population at risk, i.e. it is assumedthat Y (1) = Z(1). In practice this might not be an entirely accurate assumption due tomisclassification, migration or data anomalies.

1.5.1 A hierarchical approach to modelling spatio–temporal data

We now describe an implementation of this approach for modelling spatial–temporaldata. A spatial–temporal random field, Zst ,s ∈S , t ∈T , is a stochastic process overa region and time period. This underlying process is not directly measurable, butrealisations of it can be obtained by taking measurements, possibly with error. Moni-toring will only report results at NT discrete points in time, T ∈T where these pointsare labelled T = {t0, t1, . . . , tNT }. The same will be true over space, since where airquality monitors can actually be placed may be restricted to a relatively small num-ber of locations, for example on public land, leading to a discrete set of NS locationsS ∈S with corresponding labelling, S = {s0,s1, . . . ,sNT }.

As described above, there are three levels to the hierarchy that we consider. Theobserved data, Zst ,s= 1, ...,NS, t = 1, ...,NT , at the first level of the model are consid-ered conditionally independent given a realisation of the underlying process, Zst . Thesecond level describes the true underlying process as a combination of two terms: (i)an overall trend, µst and (ii) a random process, ωst . The trend, or mean term, µstrepresents broad scale changes over space and time which may be due to changes incovariates that will vary over space and time. The random process, ωst has spatial–temporal structure in its covariance. In a Bayesian analysis, the third level of themodel assigns prior distributions to the hyperparameters from the previous levels.

Yst = Zst + vst

Zst = µst +ωst

(1.1)

where the vst is an independent random, or measurement, error term, µst is a space–time mean field (trend) and ωst is a spatial–temporal process.

Throughout the book, where time and space are considered separately the nota-tion is simplified to reflect the single domain, e.g. Ys and Yt are used, reserving Yst foroccasions where both space and time are under consideration.

1.5.2 Dealing with high-dimensional data

Due to both the size of the spatio–temporal components of the models that maynow be considered and the number predictions that may be be required, it may be

Page 11: - Why spatio–temporal epidemiology? · 2020-03-23 · Example 1.1. Daily measurements of particulate matter An example of temporal correlation in exposures can be seen in Figure

BAYESIAN HIERARCHICAL MODELS 11

Figure 1.4: Triangulation for the locations of black smoke monitoring sites withinthe UK for use with the SPDE approach to modelling point-referenced spatial datawith INLA. The mesh comprises 3799 edges and was constructed using triangles thathave minimum angles of 26 and a maximum edge length of 100 km. The monitoringlocations are highlighted in red.

computationally impractical to perform Bayesian analysis using packages such asWinBUGS (Lunn, Thomas, Best, & Spiegelhalter, 2000) or bespoke MCMC in anystraightforward fashion. This can be due to both the requirement to manipulate largematrices within each simulation of the MCMC and issues of convergence of param-eters in complex models (Finley, Banerjee, & Carlin, 2007).

Throughout the book, we show examples of recently developed techniques thatperform ‘approximate’ Bayesian inference based on integrated nested Laplace ap-proximations (INLA) and thus do not require full MCMC sampling to be performed(Rue, Martino, & Chopin, 2009). INLA has been developed as a computationally at-tractive alternative to MCMC. In a spatial setting such methods are naturally alignedfor use with areal level data rather than the point level. However recent developmentsallow a Gaussian field (GF) with a Matern covariance function to be represented bya Gaussian Markov Random Field (GMRF) (Lindgren, Rue, & Lindström, 2011).This is available within the R-INLA package and an example of its use can be see inFigure 1.4 which shows a triangulation of the locations of black smoke (a measureof particulate air pollution) monitoring sites in the UK. The triangulation is part ofthe computational process which allows Bayesian inference to be performed on largesets of point-referenced spatial data.

Page 12: - Why spatio–temporal epidemiology? · 2020-03-23 · Example 1.1. Daily measurements of particulate matter An example of temporal correlation in exposures can be seen in Figure

12 WHY SPATIO–TEMPORAL EPIDEMIOLOGY?

1.6 Spatial data

Three main types of spatial data are commonly encountered in environmental epi-demiology. They are (i) lattice, (ii) point-referenced and (iii) point-process data.

(i) Lattices refer to situations in which the spatial domain consists of a discrete setof ‘lattice points’. These points may index the corners of cells in a regular orirregular grid. Alternatively, they may index geographical regions such as admin-istrative units or health districts (see for example Figure 1.1), This is an importanttopic in spatio–temporal epidemiology and detailed discussions can be found inGotway and Young (2002); Cressie and Wikle (2011) and Banerjee et al. (2015).We denote the set of all lattice points by L with data available at a set of NLpoints, l ∈ L where L = l1, ..., lNL . In many applications, such as disease mapping,L is commonly equal to L . A key feature of this class is its neighbourhood struc-ture; a process that generates the data at a location has a distribution that can becharacterised in terms of its neighbours.

(ii) Point-referenced data are measured at a fixed, and often sparse, set of ‘spatialpoints’ in a spatial domain or region. That domain may be continuous, S but inthe applications considered in this book the domain will be treated as discrete bothto reduce technical complexity and to reflect the practicalities of siting monitors ofenvironmental processes. For example, when monitoring air pollution, the numberof monitors may be limited by financial considerations and they may have to besited on public land. Measurements are available at a selection of NS sites, s ∈ Swhere S = s1, ...,sNS . Sites would usually be defined in terms of their geographicalcoordinates such as longitude and latitude, i.e. sl = (al ,bl).

iii) Point-process data consists of a set of points, S, that are randomly chosen by aspatial point process (Diggle, 2013) . These points could mark, for example, theincidence of a disease such as childhood leukaemia (Gatrell, Bailey, Diggle, &Rowlingson, 1996). Despite the importance of spatial point process modelling wedo not cover this topic and its range of applications in this book. The reader isdirected to Diggle (1993) and Diggle (2013) for further reading on this subject.

Example 1.5. Visualising spatial data

In this example, we consider ways in which spatial data can be visualised.This is an important topic which encompasses aspects of model building,including the assessment of the validity of modelling assumptions, and thepresentation of results. In this section we provide a brief introduction to thesubject and show some examples of how spatial data can be visualised usinga variety of R packages and particularly how R can be used to interact withGoogle maps in order to display spatial data.

We illustrate this by mapping measurement of lead concentrations in theMeuse River flood plain. The Meuse River is one of the largest in Europe andthe subject of much study (Ashagrie, De Laat, De Wit, Tu, & Uhlenbrook,

Page 13: - Why spatio–temporal epidemiology? · 2020-03-23 · Example 1.1. Daily measurements of particulate matter An example of temporal correlation in exposures can be seen in Figure

SPATIAL DATA 13

2006). A comprehensive dataset was collected in its flood plain in 1990 andprovides valuable information on the concentrations of a variety of elementsin the river. The information is measured at 155 sampling sites within theflood plain.

The following questions will be of interest to health researchers andothers:• How much spatial variation is there in lead concentrations?• Are there distinct spatial trends in the concentration levels?• Are there enough sampling sites to appropriately characterise this random

field?• Can levels be mapped between the sampling sites for example at a center

of human activity located between sites?The following R code shows how the data on lead concentrations in the

Meuse River flood plain can be displayed on a Google map of the area.> library(sp)> library(rgdal)

> data(meuse)

### We first assign a reference system used in theNetherlands

> proj4string(meuse_sp) <- CRS(’+init=epsg :28992 ’)### Then we convert it to another for Google mapping that### requires the latitude - longitude scale> meuse_ll <- spTransform(meuse_sp,

CRS("+proj=longlat +datum=WGS84"))

### Finally we write the result that can be read by Googlemaps

> writeOGR(meuse_ll, "meuse.kml", "meuse", driver="KML")Figure 1.5 shows the result of opening the meuse.kml file in Google

maps. It shows the sampling sites marked with map tacks. Google’s StreetView then lets an observer see the map tacks. Clicking on one of the visiblemap tacks reveals the sample data record for that site within Street View. Theability to interact with Google maps is not restricted to just visualising databut can be used to overlay the results of statistical analyses, such as spatialpredictions, onto maps. This subject is covered in detail in Bivand, Pebesma,and Gómez-Rubio (2008).

High concentrations of lead can be observed at that site of 91 milligramsper kilogram (mg/kg) or equivalently parts per million (ppm), with the con-version being one-to-one. Turning to look in the opposite direction, anothertack could be seen that is associated with a measurement of about half of thefirst’s lead concentration level. There is great interest in the observed levels aslead in soil can get into the food chain and cause adverse health reactions, in-cluding retarded neurological development in children (Reagan & Silbergeld,1990).

Page 14: - Why spatio–temporal epidemiology? · 2020-03-23 · Example 1.1. Daily measurements of particulate matter An example of temporal correlation in exposures can be seen in Figure

14 WHY SPATIO–TEMPORAL EPIDEMIOLOGY?

(a) Sampling sites near Meuse River (b) Map tack opens to show sample

Figure 1.5: Google Earth and Google Street Map provide useful ways of visualisingspatial data. Here we see (a) the location at which samples were taken in the MeuseRiver flood plain and (b) the information that was collected.

1.7 Good spatio–temporal modelling approaches

Often spatio–temporal models are purpose-built for a particular application and thenpresented as a theoretical model. It is then reasonable to ask what can be done withthat model in settings other than those in which it was developed. More generally,can it be extended for use in other applications? There are a number of key ele-ments which are common to good approaches to spatio–temporal modelling. Theapproaches should do the following:

• Incorporate all sources of uncertainty. This has led to the widespread use ofBayesian hierarchical modelling in theory and practice.

• Have an associated practical theory of data-based inference.• Allow extensions to handling multivariate data. This is vital as it may be a mix of

hazards that cause negative health impacts. Even in the case where a single hazardis of interest, the multivariate approach allows strength to be borrowed from theother hazards which are correlated with the one of concern (Sun, 1998).

• Be computationally feasible to implement. This is of increasing concern as we seeincreasingly large domains of interest. One might now reasonably expect to see aspatial domain with thousands of sites and thousands of time points.

• Come equipped with a design theory that enables measurements to be made opti-mally for estimating the process parameters or for predicting unmeasured processvalues. Good data are fundamental to good spatio–temporal modelling, yet thisaspect is commonly ignored and can lead to biased estimates of exposures andthus risk (Shaddick & Zidek, 2014; Zidek, Shaddick, & Taylor, 2014).

• Produce well calibrated error bands. For example, a 95% band should containpredicted values 95% of the time, i.e. they have correct coverage probabilities.This is important not only in substantive terms, but also in model checking. Theremay be questions about the formulation of a model, for example of the precise na-ture of the spatio–temporal process that is assumed, but that may be of secondaryimportance if good empirical performance of the model can be demonstrated. This

Page 15: - Why spatio–temporal epidemiology? · 2020-03-23 · Example 1.1. Daily measurements of particulate matter An example of temporal correlation in exposures can be seen in Figure

SUMMARY 15

criteria can really challenge the perceived efficacy of a model; “All models arewrong but they can useful.” as George Box famously remarked (Box & Draper,1987).

1.8 Summary

This book provides a comprehensive treatment of methods for spatio–temporal mod-elling and their use in epidemiological studies. From this book the reader will havegained an understanding of the following topics:• The basic concepts of epidemiology and the estimation of risks associated with

environmental hazards.• Hierarchical modelling with a Bayesian framework.• The theory of spatial, temporal and spatio–temporal processes needed for envi-

ronmental health risk analysis.• Fundamental questions related to the nature and role of uncertainty in environ-

mental epidemiology and methods that may help answer those questions.• Important areas of application within environmental epidemiology together with

strategies for building the models that are needed and coping with challenges thatarise.• Methods and software for the analysis and visualisation of environmental and

health. Examples of R and WinBUGS code are given throughout the book and,together with data for the examples, the code is included in the online resources.• A variety of exercises, both theoretical and practical, to assist in the development

of the skills needed to perform spatio–temporal analyses.• New frontiers and areas of current and future research.

Page 16: - Why spatio–temporal epidemiology? · 2020-03-23 · Example 1.1. Daily measurements of particulate matter An example of temporal correlation in exposures can be seen in Figure