73
Using model-based statistical inference Using model-based statistical inference to learn about evolution to learn about evolution Frederick “Erick” Matsen Frederick “Erick” Matsen http://matsen.fredhutch.org/ http://matsen.fredhutch.org/ @ematsen @ematsen

Using model-based statistical inference to learn about evolution

Embed Size (px)

Citation preview

Page 1: Using model-based statistical inference to learn about evolution

Using model-based statistical inferenceUsing model-based statistical inferenceto learn about evolutionto learn about evolution

Frederick “Erick” MatsenFrederick “Erick” Matsenhttp://matsen.fredhutch.org/http://matsen.fredhutch.org/

@ematsen@ematsen

Page 2: Using model-based statistical inference to learn about evolution

My group develops mathematical and computationaltoolsfor model-based statistical inference on continuous and discrete mathematical objects motivated by evolutionary sequence analysisof microbes and the immune system.

Page 3: Using model-based statistical inference to learn about evolution

What is model-based statistical inference?What is model-based statistical inference?

Page 4: Using model-based statistical inference to learn about evolution

Modern technology gives us the ability to in great detailobserve

Page 5: Using model-based statistical inference to learn about evolution

But very detailed observation is not the same as understanding

To understand we need to simplify and abstract.

Page 6: Using model-based statistical inference to learn about evolution

What abstractions do we have at our disposal?What abstractions do we have at our disposal?

Page 7: Using model-based statistical inference to learn about evolution

 

3

Page 8: Using model-based statistical inference to learn about evolution

 

x

Page 9: Using model-based statistical inference to learn about evolution

is useful and we love it dearly! is useful and we love it dearly!xx

allows us to describe knowledge in an implicit way:x

f(x) = y

then we can work towards solving for .x

Alternatively, one might be interested in taking the average of between two values and .

f(x)a b

Page 10: Using model-based statistical inference to learn about evolution

Define Define as area as areaff((xx)) ddxx∫∫ bb

aa

a b

Page 11: Using model-based statistical inference to learn about evolution

is average is average11//((bb −− aa)) ⋅⋅ ff((xx)) ddxx∫∫ bb

aa

a b

average on (a, b)

Page 12: Using model-based statistical inference to learn about evolution

Variables allow us to solveVariables allow us to solve

?xy

Problem 1: given , solve for .Problem 2: predict if a 10% bigger charge will hit the castle.Say the answer to this is , such that is 1 if that will make the cannonball hit the castle, and 0 otherwise.

y x

(x)hit10 (x)hit10 x

Page 13: Using model-based statistical inference to learn about evolution

Variables allow us to solveVariables allow us to solve

?xy

… in a deterministic framework.

Page 14: Using model-based statistical inference to learn about evolution
Page 15: Using model-based statistical inference to learn about evolution
Page 16: Using model-based statistical inference to learn about evolution

Life is a probabilistic process.

How do we abstract probabilistic quantities?

Page 17: Using model-based statistical inference to learn about evolution

X

Page 18: Using model-based statistical inference to learn about evolution

Random variables Random variables abstract variables abstract variablesXXIt doesn’t have a fixed value: we have to “ask” it for a value.

Random variables are capricious,but they are well defined behind their stochastic exterior.

Page 19: Using model-based statistical inference to learn about evolution

Random variable sampling determined byRandom variable sampling determined bydistributionsdistributions

Sometimes discrete:

P(heads)P(tails)

= 0.51= 0.49

Sometimes continuous:

Page 20: Using model-based statistical inference to learn about evolution

Working with Working with random variablesrandom variables ::XX

We can solve for in “equations” like , obtainingexpressions such as this is called inference.

X f(X) ∼ YP(X ∣ Y );

We can also average with respect to :

where now we are averaging out with respect to a probability.

X

∫ f(X) dP(X ∣ Y )

Page 21: Using model-based statistical inference to learn about evolution

Probabilistic approach to predictionProbabilistic approach to prediction

?XY

: horizontal distance traveled by a cannonball (random variable): cannon angle (inferred random variable)

Problem 1: given observed distribution , infer distribution of .Problem 2: find probability that a 10% bigger charge will hit castle.

YX

Y X

Solve to get .1. Integrate .2.

f(X) = Y P(X ∣ Y )∫ (X) dP(X ∣ Y )hit10

Page 22: Using model-based statistical inference to learn about evolution

Biological experiments are measurements withBiological experiments are measurements withuncertaintyuncertainty

?X YCATTCTTGTACG

GTTCGGCGAAGA

GCGTAAAATAGG

AGGGGTTGCATG

CTTCACTGGCAT

expressionlevel ofcertaingenes

risk

Page 23: Using model-based statistical inference to learn about evolution

Model-based statistical inference Model-based statistical inference ✓✓We can solve for in “equations” like ,

inferring an unknown distribution for (what can we learn about the angle of the cannon).

X f(X) ∼ YX

We can push uncertainty through an analysis using integrals like

(we don’t care what the angle of the cannon is really, we just want toknow with what probability the shot is going to hit the castle!)

f(X) dP(X ∣ Y ).∫ b

a

Page 24: Using model-based statistical inference to learn about evolution

Now, what is model-based statistical inferenceNow, what is model-based statistical inferenceon on discrete mathematical objectsdiscrete mathematical objects??

Page 25: Using model-based statistical inference to learn about evolution

Motivation: we would like to decide whether anMotivation: we would like to decide whether anindividual has been individual has been superinfectedsuperinfected, i.e. infected, i.e. infected

with a second viral variantwith a second viral variantin a separate eventin a separate event

single infection superinfection

Page 26: Using model-based statistical inference to learn about evolution

Integrate out phylogenetic uncertaintyIntegrate out phylogenetic uncertainty?X Y

CATTCTTGTACG

GTTCGGCGAAGA

GCGTAAAATAGG

AGGGGTTGCATG

CTTCACTGGCAT

To decide superinfection, we would like to calculate

where is now a phylogenetic-tree-valued random variable.

f(X) dP(X ∣ Y )∫S

X

Page 27: Using model-based statistical inference to learn about evolution

Time to count your blessings.Time to count your blessings. Real numbers are equipped with a total order. ( ) Real numbers are equipped with a simply-computed distancethat is compatible with the total order. ( ) Real numbers form a continuum. ( )

3 < 4

|7 − 3| = 4

2.9 < 2.95 < 3

Page 28: Using model-based statistical inference to learn about evolution

We can thus define the integralWe can thus define the integral

a ba b

for real-valued and .f(x)dx∫ b

af(X) dP(X ∣ Y )∫ b

a

Page 29: Using model-based statistical inference to learn about evolution

Integrating over phylogenetic trees?Integrating over phylogenetic trees?Phylogenetic trees have discrete topologies, there is no canonical

distance between them, nor a natural total order.

But we still want to do inference and integration in this setting!

ACATGGCTC...ATACGTTCC...TTACGGTTC...ATCCGGTAC...ATACAGTCT...

...

Joint work with postdoc Chris Whidden.

Page 30: Using model-based statistical inference to learn about evolution

Notion of proximity of trees?Notion of proximity of trees?

Page 31: Using model-based statistical inference to learn about evolution

Subtree-prune-regraft (rSPR) definitionSubtree-prune-regraft (rSPR) definition

1 4 5 61 2 3 4 5 6 1 2 34 5 6

2 3

These trees are then distance 1 apart.

Page 32: Using model-based statistical inference to learn about evolution

Tree graph connected by rSPR movesTree graph connected by rSPR moves

Page 33: Using model-based statistical inference to learn about evolution

Tree inference bounces around graphTree inference bounces around graph

Page 34: Using model-based statistical inference to learn about evolution

Probability is # of visits to nodesProbability is # of visits to nodes

Page 35: Using model-based statistical inference to learn about evolution

Subset to high probability nodesSubset to high probability nodes

node size proportional to posterior probability;color shows distance tohighest probability tree.

Page 36: Using model-based statistical inference to learn about evolution

The top 4096 trees for a data setThe top 4096 trees for a data set

Page 37: Using model-based statistical inference to learn about evolution

Graph effects matterGraph effects matterFor more details:

Chris Whidden and FM. Quantifying MCMC exploration of phylogenetic treespace. Systematic Biology 2015.

… so what do we know about this graph?

Page 38: Using model-based statistical inference to learn about evolution

Is the tree graph positively curved?Is the tree graph positively curved?

Page 39: Using model-based statistical inference to learn about evolution

Is it flat?Is it flat?

Page 40: Using model-based statistical inference to learn about evolution

Is it negatively curved?Is it negatively curved?

Page 41: Using model-based statistical inference to learn about evolution

curvature

SP

R distance

imbalanced

balanced

Page 42: Using model-based statistical inference to learn about evolution

Model-based statistical inference on discreteModel-based statistical inference on discreteand continuous mathematical objects and continuous mathematical objects ✓✓When we perform inference on , we can have be

something continuous, discrete, or continuous and discrete.f(X) ∼ Y X

Discrete-ness brings special challenges; graphs are helpful.

Page 43: Using model-based statistical inference to learn about evolution

Next: use model-based statistical inference toNext: use model-based statistical inference tolearn about adaptive immunitylearn about adaptive immunity

Joint with Trevor Bedford (VIDD), Connor McCoy (now at Google),Vladimir Minin (UW Statistics), and Duncan Ralph (postdoc).

Data from Harlan Robins (PHS/Adaptive).

Page 44: Using model-based statistical inference to learn about evolution

Jenner’s 1796 vaccineJenner’s 1796 vaccine

A revolutionary advance.

Page 45: Using model-based statistical inference to learn about evolution

Where are we 200 years later?Where are we 200 years later?

Vaccine trials still take a long time and are very costly.

Page 46: Using model-based statistical inference to learn about evolution

Where are we 200 years later?Where are we 200 years later?

Justinventedvaccines.I rock.LOL

Vaccine trials still take a long time and are very costly.

Page 47: Using model-based statistical inference to learn about evolution

Vaccines manipulate the adaptive immuneVaccines manipulate the adaptive immunesystemsystem

Current practice for trials:

Stimulate immune system1. Battle-test immune system via pathogen exposure2.

What can we learn from antibody-making B cells without battle-testing?

Page 48: Using model-based statistical inference to learn about evolution

Antibodies bind antigensAntibodies bind antigens

Page 49: Using model-based statistical inference to learn about evolution

B cell diversification processB cell diversification processV genes D genes J genes

Affinitymaturation

Somatic hypermutation

VDJrearrangement

includingerosion and

non-templatedinsertion

AntigenNaive B cell

Experienced B cell

Page 50: Using model-based statistical inference to learn about evolution

Overall goal: reconstruct processOverall goal: reconstruct process

ACATGGCTC...ATACGTTCC...TTACGGTTC...ATCCGGTAC...ATACAGTCT...

reality

inference

......

Page 51: Using model-based statistical inference to learn about evolution

Why reconstruct B cell lineages?Why reconstruct B cell lineages?

...

1. Vaccine design

This one is really good.How can we elicit it?

Page 52: Using model-based statistical inference to learn about evolution

Why reconstruct B cell lineages?Why reconstruct B cell lineages?

...

1. Vaccine design

Page 53: Using model-based statistical inference to learn about evolution

Why reconstruct B cell lineages?Why reconstruct B cell lineages?

...

1. Vaccine design

?

2. Vaccine assay

Page 54: Using model-based statistical inference to learn about evolution

Why reconstruct B cell lineages?Why reconstruct B cell lineages?

...

1. Vaccine design

3. Evolutionary analysis to learn about underlying mechanisms

2. Vaccine assay

Page 55: Using model-based statistical inference to learn about evolution

Goal 1: how are antibodies “drafted”?Goal 1: how are antibodies “drafted”?

ACATGGCTC...ATACGTTCC...TTACGGTTC...ATCCGGTAC...ATACAGTCT...

reality

rearrangement groups

......

Page 56: Using model-based statistical inference to learn about evolution

“Solve” “Solve” , where, whereff((XX)) ∼∼ YYV genes D genes J genes

Affinitymaturation

Somatic hypermutation

VDJrearrangement

includingerosion and

non-templatedinsertion

AntigenNaive B cell

Experienced B cell

is a statistical model of recombination and maturation are parameters of that model (including clusters) are antibody repertoire sequences

fXY

Page 57: Using model-based statistical inference to learn about evolution

VDJ annotation problem:VDJ annotation problem:from where did each nucleotide come?from where did each nucleotide come?

Somatic hypermutation

Sequencing primerSequencing error

3’V deletion

VD insertion

5’D deletion

3’D deletion5’J deletion

DJ insertion

Biological process

Sequencing

Inference

G

This is a key first step in BCR sequence analysis.

Page 58: Using model-based statistical inference to learn about evolution

Rich probabilistic models workRich probabilistic models work

hamming distance

0 5 10 15

freq

uen

cy

0.0

0.1

0.2

0.3

HTTNpartis (k=5)partis (k=1)ighutiliHMMunealignigblastimgt

HTTN

Page 59: Using model-based statistical inference to learn about evolution

Integrate out annotation uncertaintyIntegrate out annotation uncertaintyfor better clusteringfor better clustering

Page 60: Using model-based statistical inference to learn about evolution

Goal 2: how are antibodies “revised”?Goal 2: how are antibodies “revised”?Estimate per-residue level of natural selection on receptor

sequences from healthy individuals.ω = dN/dS

■ Large : diversifying sites

■ near 1: neutral sites ■ Small : purifying sites

ω

ω

ω

Page 61: Using model-based statistical inference to learn about evolution

AAC AAG

GTGGTC

more likely

less likely

In antibodies

Page 62: Using model-based statistical inference to learn about evolution

CCA CCT

Pro Pro

Thr Ile

ATCACC

synonymous

nonsynonymous

For selection

AAC AAG

GTGGTC

more likely

less likely

In antibodies

Page 63: Using model-based statistical inference to learn about evolution

CCA CCT

Pro Pro

Thr Ile

ATCACC

synonymous

nonsynonymous

For selection

AAC AAG

GTGGTC

more likely

less likely

In antibodies

Solution: use “out-of-frame” sequencesto determine neutral mutation rate.

Page 64: Using model-based statistical inference to learn about evolution

antigen

light chain

purifying

neutral

diversifying

Page 65: Using model-based statistical inference to learn about evolution

ConclusionConclusion We like to “solve equations” like , where and arerandom variables. We especially like the case when is sequence data and issomething weird. We can use these tools to learn about B cell receptor sequenceevolution.

f(X) ∼ Y X Y

Y X

Page 66: Using model-based statistical inference to learn about evolution

Next steps: phylogeneticsNext steps: phylogenetics Understand the impact of data on curvature Extend work to other models of tree space Use understanding to design biased proposals that don’t get stuck Implement phylogenetic algorithms that can update trees given moresequences Continue building community with phyloseminar.org phylobabble.org

Page 67: Using model-based statistical inference to learn about evolution

Next steps: B cellsNext steps: B cells

ACATGGCTC...

ATACGTTCC...

TTACGGTTC...

ATCCGGTAC...

ATACAGTCT...

reality

inference

......

Learn more about the mutation process in B cell maturation to betterreconstruct ancestral sequences; evolutionary dynamics

Etiology of Burkitt’s lymphoma

Page 68: Using model-based statistical inference to learn about evolution

Next steps: B cellsNext steps: B cells

Origin of protective antibodies;optimization of vaccination strategies

Watching immune repertoires evolve through time

Page 69: Using model-based statistical inference to learn about evolution

Wish I had time to talk aboutWish I had time to talk about

Evolution of innate immunity & viralantagonists; Origin of SIVcpz

Founder HIV sequence identificationfor sieve analysis

Page 70: Using model-based statistical inference to learn about evolution

Wish I had time to talk aboutWish I had time to talk about

Human microbiome

Simian foamy virus variation;innate immune defense

Page 71: Using model-based statistical inference to learn about evolution

Wish I had time to talk aboutWish I had time to talk about

HIV superinfectionDrug resistance mutations

Page 72: Using model-based statistical inference to learn about evolution

Thank you to my group membersThank you to my group members

Page 73: Using model-based statistical inference to learn about evolution

Thank you to the Fred Hutch communityThank you to the Fred Hutch community Brilliant students, postdocs, and staff scientist collaborators Computational biology program, esp. “scouts” and Marty Fantastic admin support: Sara, Melissa, and Anissa Fantastic computing support: esp. Dirk, Carl, Erik, and Michael

supporters: Katie P, Dan G, and Garnet Patience with my meddling: Larry, Myra, Jon C

fredhutch.io