28
The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard Pilgrim Health Care Farzad Mostashari, New York City Department of Health and Mental Hygiene Luiz Duczmal, Univ Fed Minas

The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

  • View
    220

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

The Space-Time Scan Statistic for Multiple Data Streams

Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard Pilgrim Health Care

Farzad Mostashari, New York City Department of Health and Mental Hygiene

Luiz Duczmal, Univ Fed Minas Gerais, Brazil

Page 2: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

Different Data SourcesFor example:

• OTC Drug Sales, from pharmacy chains

• Nurses Hotline Calls, from Optum

• Regular Physician Visits, from HMOs/VA

• Emergency Department Visits, from hospitals

• Ambulance Dispatches, from 911 call centers

• Lab Test Results, from laboratories

Page 3: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

Different Types of Data from the Same Data Source

For example, HMO data concerning:

• Telephone Calls to Physicians

• Regular Physician Visits

• Emergency Department Visits

• Lab Test Requests

• Lab Test Results

• Drug Prescriptions

Page 4: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

Different Groupings in the Same Type of Data

• Children, Young Adults, Adults age 65+

• Male, Female

• Diarrhea, Vomiting

Page 5: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

Early Work

Burkom HS, Biosurveillance Applying Scan Statistics with Multiple, Disparates Data Sources, Journal of Urban Health, 80i:57-65, 2003

Wong WK, Moore A, Cooper G, Wagner M. WSARE: What’s strange about recent events? Journal of Urban Health, 80i:66-75, 2003.

Page 6: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

Why Multivariate Detection Methods?

• We do not know whether an outbreak will create a signal in one or more data streams.

• The informational content is different in different data streams.

Page 7: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

Outline

• Method: Space-Time Permutation Scan Statistic

• Example: Gastrointestinal telephone calls, urgent care visits and regular physician visits in Boston

Page 8: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

The Spatial Scan Statistic

Create a regular or irregular grid of centroids covering the whole study region.

Create an infinite number of circles around each centroid, with the radius anywhere from zero up to a maximum so that at most 50 percent of the population is included.

Page 9: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

A small sample of the circles used

Page 10: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

Space-Time Scan Statistic

Use a cylindrical window, with the circular base representing space and the height representing time.

We will only consider cylinders that reach the present time.

Page 11: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

Space-Time Permutation Scan Statistic

1. For each cylinder, calculate the expectednumber of cases conditioning on the marginals

μst = Cs Ct / C

where Cs = # cases in location s

Ct = # cases in time interval t C = total number of cases

Page 12: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

Space-Time Permutation Scan Statistic

Let cst = # cases in the cylinder covering location s and time interval t.

Page 13: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

Space-Time Permutation Scan Statistic

2. For each cylinder, calculate the Poisson

likelihood Tst =

[cst / μst ]cst x [(C-cst)/(C- μst)]

C-cst

if cst / μst > 1, Tst = 1 otherwise

3. Test statistic T = maxst log [ Tst ]

Page 14: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

Statistical Inference

4. Generate random replicas of the data set conditioned on the marginals, by permuting the pairs of spatial locations and times.

5. Compare test statistic in real and random data sets using Monte Carlo hypothesis testing (Dwass, 1957):

p = rank(Treal) / (1+#replicas)

Page 15: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

Multiple Data Streams

For each cylinder, add the Poisson log

likelihoods: Tst =

log[ T[1]

st ] +log[ T[2]

st ] +log[ T[3]

st ]

Test statistic T = maxst Tst

Page 16: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

Syndromic Surveillance in Boston: Upper and Lower GI

• Harvard Pilgrim Health Care HMO members cared for by Harvard Vanguard Medical Associates

• Historical Data from Jan 1 to Dec 31, 2002• Mimicking Surveillance from Sept 1 to Dec 31,

2002

Page 17: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

Three Data Streams

• Telephone Calls ( ~ 20 / day)

• Urgent Care Visits ( ~ 9 / day)

• Regular Physician Visits ( ~ 22 / day)

Multiple contacts by the same person removed.

Page 18: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

Strongest Signal: October 18

p= Recurrence Int.

Tele: 0.001 < 1 / 1000 days

Urgent 0.91 ~ every day

Regular: 0.84 ~ every day

Multiple DS: 0.001 < 1 / 1000 days

Page 19: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

October 18 Signal

• Friday

• Number of Cases: 5

• Expected Cases: 0.04

• Location: Zip Code 01740

• Time Length: One Day

Page 20: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

October 18 Signal

• Friday

• Number of Cases: 5

• Expected Cases: 0.04

• Location: Zip Code 01740

• Time Length: One Day

• Diagnosis: Pinworm Infestation (all 5)

Page 21: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

October 18 Signal

• Friday

• Number of Cases: 5 (all tele)

• Expected Cases: 0.04

• Location: Zip Code 01740

• Time Length: One Day

• Diagnosis: Pinworm Infestation (all 5)

• Same Family: Mother, Father, 3 Kids

Page 22: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

2nd Strongest Signal: December 20

p= Recurrence Int.

Tele: 0.03 1 / 32 days

Urgent 0.71 ~ every day

Regular: 0.003 1 / 333 days

Multiple DS: 0.002 1 / 500 days

Page 23: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

December 20 Signal

• Number of Cases: 16 (7 tele, 7 regular, 2 urgent)

• Expected Cases: 3.5

• Location: Zips 01810,26,45,50,52,76

• Time Length: Two Days (Thu, Fri)

• Strong signals on the two following days.

Page 24: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

December 20 Signal

Mostly diverse vague GI diagnoses:Esophageal Reflux (3), Nausea (2),Abdominal Pain (2), Noninfectious GI (2),Acute pharyngitis, Mastodynia, Diarrhea,Anemia, Hypertension, Blood in stool,

Holiday parties?

Page 25: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

3rd Strongest Signal: October 26

p= Recurrence Int.

Tele: 0.07 1 / 14 days

Urgent 0.85 ~ every day

Regular: 0.18 1 / 6 days

Combined: 0.007 1 / 142 days

Page 26: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

October 26 Signal

• Saturday

• Number of Cases: 8 ( 5 tele, 3 regular)

• Expected Cases: 0.9

• Location: Zip Codes 01902,07,15,45,70

• Time Length: Two Days (Fri, Sat)

• Various specific diagnoses.

Page 27: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

Research Funded ByMethods: Alfred P Sloan Foundation

Data, National Bioterrorism Syndromic Surveillance Demonstration Program:

National Center for Infectious Diseases, Centers for Disease Control and Prevention

Page 28: The Space-Time Scan Statistic for Multiple Data Streams Martin Kulldorff, Katherine Yih, Ken Kleinman, Richard Platt, Harvard Medical School and Harvard

Free Software

SaTScan v 5.1

www.satscan.org