Disease signatures – a simple combinatorial-type exploitation of them for our own evil purposes

Disease signatures – a simple combinatorial-type

exploitation of them for our own evil purposes

Prof. Nina H. Fefferman

Visiting DIMACS from :

Tufts Univ. School of Medicine, Dept. Public Health and Family Medicine

Plan for today:1) Looking very quickly at traditional SIR

models

2) Communication problems

3) Tweaking parameter definitions

4) Using these definitions to clear up communication

5) Building disease signatures

6) Decomposing reported disease into component signature curves

7) Checking this method against reality

8) Where this method can take us from here…

A quick look at SIR models

I(t) = number of infected S(t) = number of susceptibles R(t) = number of recovered

in the population at time t

And if we want spatial spread :

Keep R, but I(t, x, y) and S(t, x, y) become functions of position (x, y), and a is replaced by an expression involving two other constants related to the rate at which the infection diffuses through space

Pictures of equations stolen from : http://maven.smith.edu/~callahan/ili/pde.html

Go ask HHS or NIH or CDC for a and b for

the next flu season so

our models can predict

it.

Good luck.

Leads us to : Communication Problems

Parameters/Variables used by epidemiologists are warm and fuzzy and not rigorously defined

So modelers made up their own (you just saw them) – these aren’t things doctors/public health people can really measure we can’t get accurate parameter values

Example: MANY people are worried about outbreaks

There is no good definition of what constitutes an outbreak

BIG problem (mostly just ignored)

Modelers use the concept of R0 – the reproductive number of disease (in the differential equation model, it’s the ratio of S to a/b)

It’s when the average number of new infections caused by contact with a current infection

is greater than 1

Really, if we think about it, public health people want ‘outbreak’ to refer to “times when we need to pay attention to disease spread for some reason”

How can we say this mathematically?

Communication Problems cont.

R0 gives us a rigorous definition of something good, but not of what we really

need ‘outbreak’ to mean

InfectivityInfectivity :: Probability of becoming infectious Probability of becoming infectious after becoming exposed after becoming exposed

Attack rateAttack rate :: Probability of developing disease Probability of developing disease after becoming exposedafter becoming exposed

Pathogenicity :Pathogenicity : Probability of developing disease Probability of developing disease after becoming infected after becoming infected

Virulence :Virulence : Probability of dying after becoming ill Probability of dying after becoming ill

ImmunogenicityImmunogenicity :: Attack rate for re-exposure Attack rate for re-exposure

What can public health people/ doctors measure (at least sometimes)?

Communication Problems cont.

So : • E(X,T)= Probability of exposure in population X at time T

• I = Probability of infection from exposure

• ST = Probability that infection at time 0 leads to manifestation of symptoms at time T (a distribution function which does not need to sum to one if not all of the infected develop symptoms)

• CT = Probability that infection takes T days to become contagion

• MT = Probability that the time from the onset of symptoms to death from the disease is T days

• NT = Size of the population possibly exposed to infection on day T (this will be our disease signature curve)

• IT = Probability of infection from current exposure, given previous infection T days ago

Tweaking Parameter Definitions

Really, these are

all functions of time, but my journal

referees got upset

with functions, so most are now subscript

s

Clearing up communication

With those we can build :Pathogenicity :Pathogenicity : The probability of developing The probability of developing

disease after becoming infecteddisease after becoming infected

= = SSTT , for n the maximum recovery time, for n the maximum recovery timen

T=0

Virulence :Virulence : The probability of dying after becoming illThe probability of dying after becoming ill

= = MMTT , for n the maximum , for n the maximum

recovery timerecovery time

n

T=0

Infectivity : The probability of becoming infectious after becoming exposed

= I* CT , for n the end of the window for the disease

n

T=0

And : Attack rate : The probability of developing disease

after becoming exposed

= I * ST , for n the end of the window for disease expression

n

T=0

But now we notice that, from our original list, Immunogenicity is not a truly meaningful idea, so we define instead:

PsuedoImmunogenicity : Probability of infection from current exposure, given previous infection T days ago = IT

Clearing up communication cont.

We won’t be using all of these today, but they’re still useful to have if you ever need to talk to health people

Now both the math and

health people have

the same picture!


But this is only one town

The SIR models could handle spatial spread with PDEs…

Uses a slightly different notation


? ?

With multiple locations and central reporting :

Notice : different occurrences don’t have to Notice : different occurrences don’t have to be separated only spatially or temporallybe separated only spatially or temporally

Can be different demographic populations, or Can be different demographic populations, or anything that allows narrower, more anything that allows narrower, more accurate estimations of exposure or accurate estimations of exposure or susceptibilitysusceptibility

Let’s call these narrower things Let’s call these narrower things subpopulationssubpopulations


For a given subpopulation, we can For a given subpopulation, we can compute a ‘disease signature curve’ compute a ‘disease signature curve’ representing the number of cases representing the number of cases predicted over time from a predicted over time from a singlesingle instance of exposureinstance of exposure

Notice : these signature curves depend on Notice : these signature curves depend on subpopulation-specific etiology, subpopulation-specific etiology, including the including the shapeshape of the distribution of the distribution for some parameters – for some parameters – notnot just averages just averages

Building Disease Signatures

So, using our definitions and our flow chart:

Decomposing curves into signatures

So, if we have a total reported disease So, if we have a total reported disease curve, we can iteratively definecurve, we can iteratively define

(Notice populations exposed on different days are disjoint sets (Notice populations exposed on different days are disjoint sets due to the definitions)due to the definitions)

Now we can think of a single reported Now we can think of a single reported curve curve CCTT as the composition of these as the composition of these curvescurves

Decomposing curves into signatures cont.

Since we are interested in exploiting the Since we are interested in exploiting the heterogeneity of etiological response heterogeneity of etiological response within a diverse population, we can within a diverse population, we can specify these curves by subpopulation specify these curves by subpopulation YY: :

Yielding the total disease incidence curve:Yielding the total disease incidence curve:


And we can even exploit immune memory And we can even exploit immune memory by further dividing subpopulations into by further dividing subpopulations into classes of those with similar immune classes of those with similar immune protection from previous infection protection from previous infection

With With IIT = = Probability of infection given previous infection T Probability of infection given previous infection T

days agodays ago

And T* = the last day of most recent prior infectionAnd T* = the last day of most recent prior infection

Giving usGiving us

Now we can use Now we can use high school mathhigh school math to find to find combinations of signature curves that make up the combinations of signature curves that make up the

total reported cases curve!total reported cases curve!

How many different combinations of coins can make $1.50…

Similarly, we can ask how many

combinations of ‘signature curves’

can go into a ‘Total Reported Cases’

curve:

10¢ 5¢25¢

Coins Sub-Populations

Important because public health people may trust it



Now let’s come back to the idea of an outbreak:

Remember, we wanted ‘outbreak’ to mean “times when we need to pay attention to

disease spread for some reason”Suppose that the only combination of disease signature curves

was to have EVERY subpopulation just beginning to show symptoms from a disease – that means that soon many many more people will be sick – we should probably pay attention to

that

OR

Maybe the only combination of signature curves indicates that only one location has been exposed – we might want to use that to find out what the source of exposure was, or quarantine the

area

No matter how we choose to define it (will be arbitrary), this method can tell us WHY we should care now


Let’s take a look at an example of how this can work

To begin with, let’s look at something very simple :

Giardiasis – a waterborne infection causing diarrheal disease in humans

with extremely low levels of secondary transmission (makes life simpler)

There was an actual “outbreak” in MA in 1995


Reported incidence for MA (all of it)

HIPPA requires aggregation of data released to public and to most researchers without special access



To use this method, we need some measured parameter

values

I’m cheating a little because I’m assuming

values for I, but we could in theory measure this


We know that most of the reporting came from 3 urban centers:


Then we can decompose by demographic subgroup for each town:


That was a really simple disease without any secondary transmission

So what happens if there is secondary spread?

It gets MUCH more complicated…

First of all, the probability of exposure in each subpopulation can start to depend on the levels of

infection in each other subpopulation

Now we start getting into the social network stuff

An aside

Social Networks : Oy vey

Since this is a talk and not a course, I can’t leave this as an exercise to the reader, but I can use the

‘we only have a little over an hour’ excuse to hand-wave some of the modeling details on this –

I’m going to talk about the concepts

If you are interested in the details, well, that’s why I’m going to be around for the year

Again, rather than using mass averages, let’s still keep the idea of a disease signature

So exposure isn’t a simple underlying rate - it’s based on contacting an infected individual

We can think of individuals in each subgroup as having certain probabilities of interacting with

others, possibly in other subgroups(People in the room who think of social interactions as edges in a graph, this is almost the same - it’s like weighted edges in a complete graph)

Also, membership in particular subgroups can changes over time (e.g. children becoming adults)

(In this case, both vertex states and edge weights can be thought of as vertex-state dependent progressions)

This all gets complicated enough that it’s nerve wracking not to check model

outcomes against some form of reality

Need to :

1. measure all model parameters

2. create disease outbreaks

3. check predicted spread against what actually happens

(I tried to get Thus Spake Zarathustra to play now, but I couldn’t make it work)

My beautiful termites

Checking Reality

On Thursday, at the DIMACS Mixer, I’ll be talking to you about ‘Why Termites’

For now, just go with it

Checking Reality cont.

Spores land on termite

Allogroomed off

Temporary

Immunity

Burrow throug

h cuticle

Death

The particular details:

Not a termite

Zootermopsis angusticollis

Metarhizium anisopliae

So we built some CA simulation models

Including age-based differences in :

1. direction of wandering through nest

2. interaction rates

3. exposure rates

4. susceptibility to infection from exposure

5. mortality from infection

6. efficacy/duration of induced immunity (via social vaccination)

As the model ran, individuals aged and behaved accordingly



And…

Thank god, all the work so far has shown that the models predict

spread accurately

Whew!

We’re even getting some interesting new directions

Regardless of why specific outputs happen

Now that we know the model can work, we can work backwards

Fit model outcome to observed data and look at which sets of parameter values and behavioral

mixing rates produce them

This might provide an odd way of understanding human social networks – especially since they can so dramatically

affect model output

Maybe this last part is a pipe-dream.

Who knows, but it’s so crazy it just might work…

Thanks for asking me to speak to you

I hope you’ve had funSome of what I’ve talked about has been accomplished in collaboration with Elena Naumova, James Traniello

and Rebeca Rosengaus

My thanks to the NIH for funding support for this research

Documents

Disease signatures – a simple combinatorial-type exploitation of them for our own evil purposes