17
This article was downloaded by: [Queensland University of Technology] On: 31 October 2014, At: 17:58 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Journal of Applied Statistics Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/cjas20 Developments in General and Syndromic Surveillance for Small Area Health Data Andrew Lawson a , Allan Clark a & Carmen Vidal Rodeiro a a Norman J. Arnold School of Public Health , University of South Carolina , Columbia, USA Published online: 02 Aug 2010. To cite this article: Andrew Lawson , Allan Clark & Carmen Vidal Rodeiro (2004) Developments in General and Syndromic Surveillance for Small Area Health Data, Journal of Applied Statistics, 31:8, 951-966, DOI: 10.1080/0266476042000270568 To link to this article: http://dx.doi.org/10.1080/0266476042000270568 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Developments in General and Syndromic Surveillance for Small Area Health Data

Embed Size (px)

Citation preview

Page 1: Developments in General and Syndromic Surveillance for Small Area Health Data

This article was downloaded by: [Queensland University of Technology]On: 31 October 2014, At: 17:58Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: MortimerHouse, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of Applied StatisticsPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/cjas20

Developments in General and Syndromic Surveillancefor Small Area Health DataAndrew Lawson a , Allan Clark a & Carmen Vidal Rodeiro aa Norman J. Arnold School of Public Health , University of South Carolina , Columbia,USAPublished online: 02 Aug 2010.

To cite this article: Andrew Lawson , Allan Clark & Carmen Vidal Rodeiro (2004) Developments in General and SyndromicSurveillance for Small Area Health Data, Journal of Applied Statistics, 31:8, 951-966, DOI: 10.1080/0266476042000270568

To link to this article: http://dx.doi.org/10.1080/0266476042000270568

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose ofthe Content. Any opinions and views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be reliedupon and should be independently verified with primary sources of information. Taylor and Francis shallnot be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and otherliabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to orarising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Page 2: Developments in General and Syndromic Surveillance for Small Area Health Data

Journal of Applied Statistics,Vol. 31, No. 8, 951–966, October 2004

Developments in General andSyndromic Surveillance for Small AreaHealth Data

ANDREW LAWSON, ALLAN CLARK ANDCARMEN L. VIDAL RODEIRONorman J. Arnold School of Public Health, University of South Carolina, Columbia, USA

A In this paper we examine a range of issues related to the analysis of healthsurveillance data when it is spatially-referenced. The importance of considering alarm functionsderived from likelihood or Bayesian models is stressed. In addition, we focus on some newdevelopments in predictive distribution residuals in the analysis.

K W: Syndromic, surveillance, statistics, small area, health

Introduction

The development of methodology to detect unusual aggregations of disease hasreceived considerable impetus following the terrorist attacks of September 11th,2001. The associated anthrax scare alerted decision makers in the US tothe potential for health-related bioterrorism. The threatened spread of highlyinfectious and potentially deadly agents within a population has become apossibility. Following this, much research and government funding has becomefocused on statistical issues related to detection of disease outbreaks and spreadof disease. Related to this development, there is an increased interest in syndromicsurveillance, where a range of indicators (symptoms) are recorded and the earliestpossible detection of adverse health effects is the overall aim. An example of thisapproach is the use of symptomatic indicators—such as pharmaceutical sales,job and/or school absenteeism—to make early decisions about an outbreak of,say, respiratory disease (an inhalation alarm). At its simplest, this involves thecoupled surveillance of at least 3–4 time series of data. However, the surveillancetask is much broader than this simple example suggests: the target disease maybe unknown and so a range of diseases might need to be monitored simulta-neously. In addition, a decomposition of the population into age î sex groupsmay also need to be monitored for each disease. Monitoring of nine age and twogender groups would require 18 time series to be examined for each disease. Inaddition, spatial disease distribution could be an important indicator of out-

Correspondence Address: Andrew Lawson, Normal J. Arnold School of Public Health, University ofSouth Carolina, Columbia, South Carolina, USA. Email: [email protected]

0266-4763 Print/ 1360-0532Online/04/080951-16 © 2004 Taylor & Francis LtdDOI: 10.1080/0266476042000310568

Dow

nloa

ded

by [

Que

ensl

and

Uni

vers

ity o

f T

echn

olog

y] a

t 17:

58 3

1 O

ctob

er 2

014

Page 3: Developments in General and Syndromic Surveillance for Small Area Health Data

952 A. Lawson et al.

breaks or subsequent spread. Hence, for a single disease, we may have 18 timeseries, but if we have also monitored say T time periods, then there could beTî18 separate maps of disease simultaneously to monitor with the time series.Hence, we are essentially within a very large database looking for ‘aberrations’in health data. A further complication is that the aberrations we seek to find areoften not simple features of the disease incidence but could depend on what hasalready been seen and on the purpose of the surveillance task.Implicit within the above discussion is the notion that surveillance is an

evolving task, and is carried out in what is regarded as real or near-real time.The sequential nature of this task means that the methods designed for fullyhistorical data sets (e.g. retrospective studies), where all the time periods areobserved, cannot be used directly or may be inappropriate (see for exampleHand, 1999). In essence, syndromic surveillance is a special case of data miningand many of the issues arising in that literature are relevant to this case. A usefulinterlocution to work on data mining for health surveillance is Wong et al.(2002). Previous short reviews of spatial statistical issues in this area are givenin Lawson (2001a) and 2001b, ch 9).In this paper we focus mainly on work on Bayesian methods for health

surveillance, on the spatial dimension of such work and on the issues that mustbe resolved. The paper is split into three sections. First the use and developmentof likelihood and Bayesian alarm functions is discussed. This is followed by adiscussion of the use of residuals in the detection of anomalies in health maps.Third, we emphasize that, for any complex Bayesian model, computationalspeed-ups must be sought t make implementation realistic in a surveillancecontext. The use of particle filtration, moving windows, and likelihood orposterior approximations are all relevant in this context.

Alarm Function Approach

An alarm function is a function, say p(Ys), of the history of a process (Y

s) such

that it triggers an alarm at time tAwhere:

tAómin{s : p(Y

s)[g(s)}

where s is a time point and g(s) is a control limit. A large number of differentalarm functions are possible, for example the Shewart CUSUM method can bedefined by having alarm function p(Y

s)óL(s, s), where L(s, t) is the conditional

likelihood ratio at time s for a change at time t. Sonesson & Bock (2003) showthat some of the traditional change-point detection methods can be representedas alarm functions. Different alarm functions are ‘optimal’ depending on thedefinition of optimal. One useful definition is provided by De-Mare (1980),whose approach is optimal in the sense that no other function will give a higherdetection level conditional on the probability of incorrectly setting-off the alarm.The approach of De-Mare (1980), extended by Frisen & De-Mare (1991),

consists of defining the alarm function at time s, for a change at time t, as thelikelihood ratio:

Dow

nloa

ded

by [

Que

ensl

and

Uni

vers

ity o

f T

echn

olog

y] a

t 17:

58 3

1 O

ctob

er 2

014

Page 4: Developments in General and Syndromic Surveillance for Small Area Health Data

Syndromic Surveillance for Small Area Health Data 953

p1(Y

s)ó<

s

i�t

f1( y

i)

f0( y

i)

and for a change at any time before time s as the summation:

p2(Y

s)ó;

s

j�1

tj

;s1t

k

<s

i�j

f1( y

i)

f0( y

i)

where f0( y) is the distribution of the data under the null hypothesis; f

1( y) is the

distribution of the data under the alternative hypothesis, and �jis the (prior)

probability that the change-point is at time j. Notice that p2(Y

s) is simply a

weighted sum of p1(Y

1), . . . , p

1(Y

s).

The control limits for these processes are g(s)óK for p1(Y

s) and:

g(s)ó;�

s�1t

j

;s1t

j

K1ñK

for p2(Y

s), for a suitable choice of K. The value of K can be chosen in a number

of different ways and is discussed further below.

Bayesian Alarm Functions

In a Bayesian framework we need to adjust the alarm functions to take intoaccount the prior distributions that are an integral part of the model. Forsimplicity we assume the same distribution of the data under both the null andalternative hypothesis, but for the change of a parameter, say �

0and �

1. The

prior distributions under the two alternatives are denoted by �0(�) and �

1(�).

The approach for Bayesian methods is based on the Bayes factor (Kass &Raftery, 1995) which, if we have two simple hypotheses is given by:

BFóf ( y

jD h

1)

f ( yjD h

0)

n1(h

1)

n0(h

0)

where we are taking �1(�

1) to be the prior probability that the alternative

hypothesis is correct and �0(�

0) to be the prior probability of the null hypothesis

being correct. Hence the alarm function takes the form:

p1(Y

s)ó<

s

j�t

f ( yjD h

1)n

1(h

1)

f ( yjD h

0)n

0(h

0)

which is just the product of posterior ratios. If we have composite hypotheses wereplace the posterior distributions by the predictive distributions:

n0( y

j)ó�

�0

f ( yjD h)n

0(h)dh

Dow

nloa

ded

by [

Que

ensl

and

Uni

vers

ity o

f T

echn

olog

y] a

t 17:

58 3

1 O

ctob

er 2

014

Page 5: Developments in General and Syndromic Surveillance for Small Area Health Data

954 A. Lawson et al.

n1( y

j)ó�

�1

f ( yjD h)n

1(h)dh

It is possible to replace the prior distributions in the predictive distribution bythe posterior distributions of the parameters given the data up to time jñ1, i.e.:

n0( y

j)ó�

�0

f ( yjD h)n

0(h D y

1, . . . , y

j�1)dh

n1( y

j)ó�

�1

f ( yjD h)n

1(h D y

1, . . . , y

j�1)dh

An objection to the use of the posterior predictive distribution is that we wouldbe using the data from the previous points twice, once in the summation of thealarm and once in the predictive distribution. However, despite this objection,the use of the posterior predictive distribution does have its advantages, inparticular it reflects how the model is performing over the whole time frame andnot just the current time frame.

Computation of Bayesian Alarm Functions

In this section we assume we wish to compare composite hypotheses, or at leastone of the hypotheses is composite. For most models it is impossible directly tocompute predictive distributions. We are then left to use either asymptoticapproximations (Tierney & Kadane, 1986), simulation, or a combination of both(Diciccio et al., 1997).If we are using the prior predictive distributions then we can use sampling to

estimate the integral:

n1( y)ó�

�1

f ( y D h)n1(h)dh

in the following manner

(1) Draw ��1�, . . . , ��m� from �1(�)

(2) Approximate �1( y) by:

n1( y)�

1m;m

i�1

f ( y D h�i�)

Notice that we only need to do the sampling once at the start of the surveillancesince the predictive distribution does not depend on time.If we are using the posterior predictive distribution the computation is

increased greatly since, at each time j, we need to do the following:

Dow

nloa

ded

by [

Que

ensl

and

Uni

vers

ity o

f T

echn

olog

y] a

t 17:

58 3

1 O

ctob

er 2

014

Page 6: Developments in General and Syndromic Surveillance for Small Area Health Data

Syndromic Surveillance for Small Area Health Data 955

(1) Draw ��1�, . . . , ��m� from �1(� y

1, , y

j�1)

(2) Approximate �1( y

j) by:

n1( y

j)�1m;m

i�1

f ( yjD h�i�)

Of course, this implies that we can sample from the posterior distribution,which may require Markov Chain Monte Carlo (MCMC) at each iteration. Thiscan be improved via approximating the posterior distribution by a (multivariate)normal distribution with the mean and variance computed via a Laplaceapproximation, or via a sequential importance sampling. (See Liu & Chen,1998.)

Choice of Control Limits

The choice of the control limits is perhaps the hardest problem in surveillanceand no choice of control limits will give optimal properties for all possiblecriteria. Different criteria are reviewed by Sonesson & Bock (2003), althoughalmost all of them require to be evaluated via simulation.An alternative approach to the setting of the control limits is to view the

alarm as the posterior probability that the change-point occurred before times, i.e.:

tAómin{s; Pr{qOs DY

sóy

s}[K}

for the alternative of a change any time before t. This leads to the same alarmfunction and hence, by choosing this probability, we define the control limits. Alarge value of K will result in a high probability of setting the alarm off correctly;however, it would lead to a large delay in setting off an alarm. Thus we need tohave a trade-off between the delay and the chance of incorrectly setting of analarm. Recall that no other method, with the same probability of setting thealarm-off correctly, has a shorter expected delay time.

Example: Disease Time Series

We take a simple example of observing counts of disease sequentially over time,say y

1, y

2. . . , y

t, . . . when the disease count is assumed to be Poisson distributed

with expected rate Ejand relative risk �

j:

YjD h

j~Poisson(E

jhj)

The null hypothesis states that:

H0: h

1ó. . .óh

jó1

i.e. a point prior on one. The alternative is that the risks are a realization froma gamma distribution, i.e.:

H1: h

k~G(a, b)

Dow

nloa

ded

by [

Que

ensl

and

Uni

vers

ity o

f T

echn

olog

y] a

t 17:

58 3

1 O

ctob

er 2

014

Page 7: Developments in General and Syndromic Surveillance for Small Area Health Data

956 A. Lawson et al.

We also need to specify the prior distribution for the change-time: we assume ageometric distribution with probability parameter p. With the simple null hypo-thesis it is possible to represent the (prior) predictive distribution by:

n0( y

j)óexp(E

j)Eyj

j

yj!

The alternative hypothesis has the following (prior) predictive distribution:

n1( y)ó !( yòa)

!( yò1)!(a) �E

iE

iò1/b�

y

� 1/bE

iò1/b�

which is of negative binomial form. Hence the (prior) predictive rule for a changeany time before now is given by the stopping rule:

tAómin�s :

;sj�1

p(1ñp)jL( j, s)

1ñ(1ñp)s[0.5�

where:

L( j, s)ó<s

k�j

n1( y

j)

n0( y

j)ó<

s

k�j

!( ykòa)

exp(ñEk)!(a) � 1

Ekò1/b�

yk

� 1/bE

kò1/b�

This sets off an alarm as soon as we have enough evidence to say that (aposteriori) the chance of a change before now is 50%. If we wish to use theposterior predictive rule then the posterior distribution of � is:

h D y1, . . . , y

j�~G�aò;

j

k�1

yk,

1

;jk�1

Ekò1/b�

thus the posterior predictive rule is the same as the prior predictive rule but fora change of � and �.

Syndromic Surveillance

The above results can be extended to the case of syndromic (multivariable)surveillance. We assume that syndromic variables are also available: x

itis one

such variable and xtis the vector of syndromic variables.

Define the complete data and ancillary (syndromic) vector as:

Dtó�

yt

x1t

x2t

x3t

·

ó�yt

xt

Dow

nloa

ded

by [

Que

ensl

and

Uni

vers

ity o

f T

echn

olog

y] a

t 17:

58 3

1 O

ctob

er 2

014

Page 8: Developments in General and Syndromic Surveillance for Small Area Health Data

Syndromic Surveillance for Small Area Health Data 957

We also define xTó(x

1, . . . , x

t),.y

Tó( y

1, . . . , y

t) and D

Tó(D

1, . . . ,D

t). Interest

is still in detecting a change in the distribution of the disease count yt; however,

it is hoped that by including other (correlated) variables, e.g. another disease,drug sales, hospital admissions, etc that we can decrease the length of time untilan alarm is set off and increase the power to detect a given change.

Posterior Definition: Conditioning on xT

If the behaviour of ytis considered conditional on the syndromic variables then

a sequential posterior distribution can be identified for this case as:

n(h D yT, x

T)ëf ( y

tD h, x

T)n(h D y

T�1, x

T�1)

where �(� D yT�1, x

T�1) is a posterior distribution up and including time Tñ1.

The equivalent (posterior) predictive distribution is given by:

n( ytD y

T�1)ó� f ( y

tD h, x

T)n(h D y

T�1x

T�1)dh

This form of posterior may be appropriate when we are simply interested inancillary variables that are precursors of the disease onset. However, whensyndromes are defined by ensembles of health symptoms then an unconditionalposterior distribution would be appropriate.Within an MCMC sampler this can be approximated via:

n( ytD y

T�1)B1

G;G

g�1

f ( ytD h�g�

T�1, x

T)

where h�g�T�1

is the sampled parameter vector for the gth iteration from theposterior at time Tñ1.

Posterior Definition: Unconditional Version

When syndromic variables are themselves diseases or symptoms of diseases thenit may be appropriate to consider a full unconditional model for D

t. This would

allow for the modelling of between-variable dependencies. In that case, Dtis the

vector of count data and syndromic variables at time t. The posterior distributiongiven the evolution up to and including time t is:

n(h DDT)ëf (D

tD h)n(h DD

T�1)

where f(DtD �) is the new data likelihood which could include correlations

between elements (which could be maps or time series). The associated predictivedistribution is given by:

n(DtDD

T�1)ó� f (D

tD h)n(h DD

T�1)dh

where �(� DDT�1)óf(D

t�1D �)�(� DD

T�2).

Dow

nloa

ded

by [

Que

ensl

and

Uni

vers

ity o

f T

echn

olog

y] a

t 17:

58 3

1 O

ctob

er 2

014

Page 9: Developments in General and Syndromic Surveillance for Small Area Health Data

958 A. Lawson et al.

Syndromic Vector Monitoring

Adopting the notation of the previous section, to monitor for changes in thedensity of D

Twe can use the alarm function given by:

p2(D

s)ó;

s

k�1

tk

;s1t

j

<s

u�k

f (DuD h

1)n(h

1DD

u)

f (DuD h

0)n(h

0DD

u)

This alarm function could be extended to include dependence on previousobserved data. Note that the definition of f(D

uD �) may include large sparse

models for events and there are special issues relating to prior distributions forregression parameters in these models that should be considered (see Genkinet al., 2004).

Residuals in the Surveillance of Disease Maps

Besides the use of alarm functions based on posterior distributions, it is possibleto use sequential Bayesian model fitting with associated variance monitoring toassess changes in the state of the system (Lawson, 2004). We do not pursuethis approach here. Instead, we consider the use of model-based residuals asdeterminants of the state of a system. Recursive residuals from an estimatedregression model have simple properties. They can easily be used as a generalsurveillance technique to detect possible changes in the data or to determinewhether the observed data for a new time period are representative of the datawe might expect under a model when it is fitted for previous periods.Our objective is to use the residuals of the fitted model in order to monitor

changes in the risk patterns of a disease in sequential time periods (days, months,years, decades, etc). In this section, we describe a new set of residuals, calledsurveillance residuals, that will be used in addition to the usual Bayesian residuals(Carlin & Louis, 2000). We also present a small simulation study that will showthe behaviour of these measures in a surveillance context.In the following we assume that we will monitor a map of m small areas over

T time periods. The count of disease in area i, ió1, . . . ,m and temporal periodj, jó1, . . . ,T is denoted as y

ij; E

ijand �

ijdenote, respectively, the expected

number of cases and the relative risk in the ith region in time period j.Surveillance residuals are computed as the difference between the observed

data for a new time period and the data we expect under a model when it isfitted using previous time periods. Simulated values of the posterior predictivedistribution are used to estimate the expected value under the assumed model.Surveillance residuals can be obtained as part of a MCMC output as:

rSijóy

ijñ1

G;G

g�1

E( yijD h�g�

ij�1)

where E( yijD h�g�

ij�1) is the expected value of the data given the relative risks, and

{h�g�ij�1} is a set of relative risks sampled from the posterior distribution (only

post-convergence samples that are separated by a distance large enough to

Dow

nloa

ded

by [

Que

ensl

and

Uni

vers

ity o

f T

echn

olog

y] a

t 17:

58 3

1 O

ctob

er 2

014

Page 10: Developments in General and Syndromic Surveillance for Small Area Health Data

Syndromic Surveillance for Small Area Health Data 959

assume independence are considered). In the case of count data surveillance,residuals can be computed using:

rSijóy

ijñ1

G;G

g�1

Eijh�g�ij�1

ió1, . . . ,m jó2, . . . ,T

To assess the distribution of these residuals, the equivalent of a parametricbootstrap (Efron & Tibshirani, 1993) can be applied in the Bayesian setting. Wegenerate a set of simulated counts { y*

ij} from a Poisson distribution:

y*ij~Poisson(E

ijhij)

where hijis the posterior mean of the relative risk. In this way, a ranking and

hence other posterior quantities such as Bayesian p-values can be computed byassessing the rank of the surveillance residual within the pooled set (Lawson,2001c). Extremely small p-values indicate that the new data are not representativeof the data we might expect under the model, so a change in the relative riskpattern may have taken place. Moderate values of the p-values may lead toquestioning if a change is taking place.

Simulation Study

Data. The US state of South Carolina, which consists of mó46 counties, wasselected to simulate the relative risks at county level. A set of fixed expectedcounts for the mapped area is employed: we use the expected number of deathsfrom malignant neoplasms for a period of Tó20 years obtained from the SouthCarolina Department of Health and Environmental Control.The model considered for the true relative risks is:

hijóexp{v

i} ió1, . . . ,m; jó1, . . . ,T

where (v1, . . . , v

m) is assumed to be a realization of a multivariate normal

distribution with mean zero and precision matrix �I. We assign to the parameter� a proper prior, say �~gamma(1, 0.01) to avoid problems of impropriety (Knorr-Held, 2000).Once the values for the relative risks {�

ij} are specified, we simulate sets of

observed counts in the mapped area using the Poisson model:

yij~Poisson(E

ijhij) ió1, . . . ,m; jó1, . . . ,T

Two different scenarios, which represent two possible changes in the relativerisk patterns over space and time, are considered.

Scenario 1. A global jump in relative risk occurs in a year ( jó10) and thendisappears. A jump of 50% can be defined as follows:

h@ijó1.5 h

ijió1, . . . , 46; jó10

Dow

nloa

ded

by [

Que

ensl

and

Uni

vers

ity o

f T

echn

olog

y] a

t 17:

58 3

1 O

ctob

er 2

014

Page 11: Developments in General and Syndromic Surveillance for Small Area Health Data

960 A. Lawson et al.

Figure 1. Scenario 2. Regions where changes in risk are considered

Scenario 2. Local jumps in risk of different intensities are generated for differentregions in years 10 and 12. We consider the local jumps defined by:

h@ijó1.2h

ijiéR

1; jó10

h@ijó1.6h

ijiéR

3; jó10

h@ijó1.5h

ijiéR

2; jó12

where R1, R

2and R

3represent a contiguous group of regions (see Figure 1).

We simulated 100 data sets for each of the two relative risk patterns. Theresults are averaged over these 100 realizations.

Fitted Model. In space–time disease surveillance, it is important that themodels are able to describe the disease in space and time and also that they willbe sensitive to changes in the spatio-temporal structure. To this end, the modelsshould capture spatial, temporal and spatio-temporal interaction effects andtheir parameterization should allow for natural changes in time. Here we use acommon statistical model for disease data in which the observed count in asmall area y

ijis assumed to be a Poisson random variable with mean E

ij�ij.

The logarithms of the relative risk parameters are assumed to follow normaldistributions with a mean that may incorporate potential risk factors, and acovariance matrix that incorporates the possibility of spatial dependence. Themodel is extended to accommodate observations over time:

log hijóv

iòu

iòt

jòc

ij

Dow

nloa

ded

by [

Que

ensl

and

Uni

vers

ity o

f T

echn

olog

y] a

t 17:

58 3

1 O

ctob

er 2

014

Page 12: Developments in General and Syndromic Surveillance for Small Area Health Data

Syndromic Surveillance for Small Area Health Data 961

where viis an uncorrelated spatial random effect, u

iis a spatially correlated

random effect, tjis a temporal random effect and �

ijrepresents the interaction

between spatial and temporal effects.The uncorrelated heterogeneity effect follows a normal distribution with mean

0 and variance p2v. For the spatially correlated random effect, the conditional

autoregressive (CAR) model proposed by Besag et al. (1991) is used. Finally, forthe temporal and spatio-temporal random effects, random walks that will allowfor a smooth variation in time (Knorr-Held, 2000) are considered.The space–time model described here is sequentially fitted as data for a new

year arrive in the two scenarios proposed in the previous sub-section. Theposterior sampling was carried out using Metropolis-Hastings steps. These stepsare straightforward to implement for a retrospective model. However, here wemust perform a sequential fit and so considerable computation can be incurred:the sequential re-fitting of the model presents a problem as, at each new year, anew set of data is added and a new set of parameters included. An alternativeapproach to this problem adopted in this work is the use of a sliding window oftime units (3, 5 and 8 years) within which the models are fitted (Doucet et al.,2001). This allows for a static data size, but some loss of temporal effects.

Results. Tables 1 and 2 display summaries of the absolute value of thesurveillance residuals (summed over space). They show a peak correspondingwith years 10 and 11 in the study time period, suggesting that something unusualtook place. In addition, for scenario 2, another peak appears in year 12, wherethe second jump in risk was generated. Note that, after a change has occurred

Table 1. Absolute value of surveillance residuals summed overspace. Model fitted sequentially as new data arrive every year

Year Scenario 1 Scenario 2

2 92.148 84.9743 99.071 98.1124 99.591 96.0125 105.317 97.3556 87.012 98.2428 95.053 85.5469 71.106 76.00910 3366.634 476.84511 3377.919 306.46012 384.034 413.08513 87.983 278.00014 82.639 82.40715 75.541 93.09116 82.374 75.12817 72.163 93.00418 74.674 98.30819 66.824 81.76320 79.092 74.316

Dow

nloa

ded

by [

Que

ensl

and

Uni

vers

ity o

f T

echn

olog

y] a

t 17:

58 3

1 O

ctob

er 2

014

Page 13: Developments in General and Syndromic Surveillance for Small Area Health Data

962 A. Lawson et al.

Table 2. Absolute value of surveillance residuals summed overspace. Model fitted sequentially using data in sliding windows

of 3 years

Year Scenario 1 Scenario 2

4 1133.636 1126.7085 319.945 326.5276 89.943 101.3947 114.429 95.3028 100.751 91.3529 75.530 82.72010 3366.269 484.76311 3377.356 308.29912 380.050 444.34113 103.577 269.68914 99.449 105.06215 92.536 126.69816 97.993 73.53217 91.670 102.83718 93.134 114.32419 81.589 95.83520 103.990 88.469

(say at time 10), the models automatically fit poorly the next time period (this isexpected since the temporal effects in the risk have an AR(1) structure), but itgoes back to normal after two time points.Fitting the model with data in sliding windows gives a worse fit than fitting it

sequentially as data arrive every year. Also, the residuals are bigger on the leftside of the time period due to censoring or end effects.Looking at the Bayesian pointwise residuals, we cannot tell that there are

changes in risk because they have an erratic behaviour along the years. However,surveillance residuals can be used for this task.In scenario 1, the pointwise surveillance residuals show that, in year 10,

something unusual happened and the data for that year are not representativeof what is expected under the model (see Figure 2, which displays the surveillanceresiduals for two randomly selected counties along the years of the study period).Note that in this figure, the AR(1) effect mentioned before is also present. If westudy their distribution we observe small p-values for year 10 in every county.In order to study the pointwise residuals in scenario 2, we randomly selected

two counties that are located in the regions where the jumps in risk weregenerated for years 10 and 12. Pointwise surveillance residuals (Figure 3) showunusual events in years 10 and 12 in the counties selected.For counties where changes in risk were not generated, the surveillance

residuals have an erratic behaviour and do not show any unusual events.With regard to the distribution of the surveillance residuals in this scenario,

small p-values appear for some counties in year 10. These counties correspondwith those where the changes in risk were generated. So, for them, the data are

Dow

nloa

ded

by [

Que

ensl

and

Uni

vers

ity o

f T

echn

olog

y] a

t 17:

58 3

1 O

ctob

er 2

014

Page 14: Developments in General and Syndromic Surveillance for Small Area Health Data

Syndromic Surveillance for Small Area Health Data 963

Figure 2. Surveillance residuals for two counties selected at random. Scenario 1, model fittedsequentially as new data arrive every year

Figure 3. Surveillance residuals for two counties selected at random. (a) County where changesin risk occurred in year 10; (b) county where changes in risk occurred in year 12. Scenario 2,

model fitted sequentially as new data arrive every year

not representative of what we can expect under the model. also, in year 12, smallp-values appear for the counties where the jump was generated. Figure 4 showthe p-value surface for years 10 and 12 when the model is fitted sequentially asnew data arrive every year.The method described in this section detects changes that have occurred

between the last time period and the actual time. To consider also the possibilitythat changes have occurred either at the present time or some time in the pastwe could adjust the surveillance residuals by simply constructing a weighted sumof previous residuals. This would help to detect a continuing change in the dataover time periods.

Computational Speedups

Surveillance of health effects, which is to be responsive, must be performed asquickly as possible. On the other hand, Bayesian models of reasonable complexity

Dow

nloa

ded

by [

Que

ensl

and

Uni

vers

ity o

f T

echn

olog

y] a

t 17:

58 3

1 O

ctob

er 2

014

Page 15: Developments in General and Syndromic Surveillance for Small Area Health Data

964 A. Lawson et al.

Figure 4. p-value surface. (a) Year 10, (b) Year 12. Scenario 2, model fitted sequentially asnew data arrive every year

often require the use of posterior sampling methods to provide estimates ofposterior expectations or other moments, and these estimates may be computa-tionally expensive to obtain. Clearly there is a balance to be struck betweensufficiently realistic models of the disease distribution and the need for fastresults. To aid this, special computational approaches to model estimation canbe employed.The basic efficiency problem lies in the fact that, over time, an increasingly

larger parameter space may need to be estimated and an increasingly larger dataset is to be modelled. In discrete time, for any new time period there will be anew set of disease indicators and also possibly new parameters. If, for example,a map of 40 regions were to be monitored then there may be 40 new data items

Dow

nloa

ded

by [

Que

ensl

and

Uni

vers

ity o

f T

echn

olog

y] a

t 17:

58 3

1 O

ctob

er 2

014

Page 16: Developments in General and Syndromic Surveillance for Small Area Health Data

Syndromic Surveillance for Small Area Health Data 965

and 40 new parameters. The model will also have to be repeatedly fit to anever-increasing data and parameter domain. To counter this, a number ofsimplifications can be adopted.First, a sliding window of fixed time units could be employed. If s time units

are used then correlation in time beyond s lags will be lost. However the ensuresthat, except for end-effects, there is a constant data size and (close-to-constant)parameter size. An alternative that has more recently been developed is toresample the output from initial iterations to provide reweighted estimates astime proceeds. This is known as filtration and uses importance resampling toprovide estimates (see for example Doucet et al., 2001). Over time this can leadto bias in estimates (Ridgeway & Madigan, 2002). Adjustments for bias can bemade. However, these methods crucially depend on having sampled enough ofthe process variation at the beginning of the surveillance exercise to allowsubsequent resample. An alternative to modifying the model fitting process is toapproximate the full likelihood or posterior distribution by a simpler form, whichcan be sampled easily. Multivariate normal approximations are commonlyemployed for this purpose as are asymptotic approximations. Finally, for certainspatial problems that involve making measures of inter-event distances (forexample) there are possibilities for computational gains in efficiency using specialalgorithms (see for example Moore, 1999).

Conclusions

Sequential Bayesian modelling and the use of Bayesian models within alarmfunctions are a very fruitful area for further research in health surveillance. Thisarea will become more relevant as computational efficiencies are achieved andrealistic models are routinely available for fitting. The interest in this area is alsoreflected by the soon to be published volume focusing primarily on syndromesand syndromic surveillance for public health (Lawson & Kleinman, 2005).

Acknowledgements

The authors would like to acknowledge the support of NIH grant number5R01CA092693-2 in support of part of this work.

References

Besag, J., York, J. & Mollie, A. (1991) A Bayesian image restoration, with two applications in spatialstatistics (with discussion), Annals of the Institute of Statistical Mathematics, 43, pp. 1–59.

Carlin, B. P. & Louis, T. A. (2000) Bayes and Empirical Bayes Methods for Data Analysis (New York:Chapman & Hall).

De-Mare, J. (1980) Optimal prediction of catastrophes with applications to gaussian processes, The Annalsof Probability, 4, pp. 841–850.

Diciccio, T., Kass, R., Raftery, A. & Wasserman, L. (1997) Computing Bayes factors by combiningsimulation and asymptotic approximations, Journal of the American Statistical Association, 92, pp.903–915.

Doucet, A., de Freitas, N. & Gordon, N. (2001) Sequential Monte Carlo Methods in Practice (New York:Springer).

Dow

nloa

ded

by [

Que

ensl

and

Uni

vers

ity o

f T

echn

olog

y] a

t 17:

58 3

1 O

ctob

er 2

014

Page 17: Developments in General and Syndromic Surveillance for Small Area Health Data

966 A. Lawson et al.

Efron, B. & Tibshirani, R. J. (1993) An Introduction to Bootstrap (New York: Chapman & Hall).Frisen, M. & De-Mare, J. (1991) Optimal surveillance, Biometrika, 78, pp. 271–280.Genkin, A., Lewis, D. D. & Madigan, D. (2004) Large-scale bayesian logistic regression for text

categorisation. Manuscript.Hand, D. J. (1999) Statistics and data mining: Intersecting disciplines, ACM SIGKDD, 1, pp. 16–19.Kass, R. & Raftery, A. (1995) Bayes factors, Journal of the American Statistical Association, 90, pp.

773–795.Knorr-Held, L. (2000) Bayesian modelling of inseparable space-time variation in disease risk, Statistics

in Medicine, 19, pp. 2555–2567.Lawson, A. B. (2001a) Population health surveillance, in: Encyclopedia of Environmetrics (Chichester:

Wiley).Lawson, A. B. (2001b) Statistical Methods in Spatial Epidemiology (New York: Wiley).Lawson, A. B. (2001c) Tutorial in biostatistics: disease map reconstruction, Statistics in Medicine, 20, pp.

2183–2204.Lawson, A. B. (2004) Some issues in the spatio-temporal analysis of public health surveillance data, in:

R. Brookmeyer & D. Stroup (Eds) Monitoring the Health of Populations: Statistical Principles andMethods for Public Health Surveillance, Chapter 11 (Oxford: Oxford University Press).

Lawson, A. B. & Kleinman, K. (2005) Spatial Surveillance for Public Health (Chichester: Wiley).Liu, J. & Chen, R. (1998) Sequential monte carlo methods for dynamic systems, Journal of the American

Statistical Association, 93, pp. 1032–1044.Moore, A. W. (1999) Very fast mixture-model-based clustering using multiresolution kd-trees, in:

M. Kearns & D. Cohn (Eds) Advances in Neural Information Processing Systems, Volume 10, pp.543–549 (San Francisco, CA: Morgan Kaufmann).

Ridgeway, G. & Madigan, D. (2002) A sequential monte carlo method for Bayesian analysis of massivedatasets, Journal of Data Mining and Knowledge Discovery, 1, p. 24.

Sonesson, C. & Bock, D. (2003) A review and discussion of prospective statistical surveillance in publichealth, Journal of the Royal Statistical Society, Series A, 166, pp. 5–21.

Tierney, L. & Kadane, J. (1986) Accurate approximations for posterior moments and marginal densities,Journal of the American Statistical Association, 81, pp. 82–86.

Wong, W., Moore, A., Cooper, G. & Wagner, M. (2002) Rule-based anomaly pattern detection fordetecting disease outbreaks, in 18th National Conference on Artificial Intelligence (Cambridge, MA:MIT Press).

Dow

nloa

ded

by [

Que

ensl

and

Uni

vers

ity o

f T

echn

olog

y] a

t 17:

58 3

1 O

ctob

er 2

014