Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Transmission tree reconstruction by augmentation ofinternal phylogeny nodes
Matthew Hall
Li Ka Shing Institute for Health Information and Discovery, University of Oxford
February 2017
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 1 / 29
The relationship of the phylogeny to the transmission tree
Let T be a time-tree (rooted, with branch lengths in units of time).
Let V be its node set of size n.
Suppose the isolates at the tips of T come from a set of H of hosts.
Initial assumptions:
Complete sampling of the epidemic since the TMRCANo superinfection or reinfectionTransmission is a complete bottleneck
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 2 / 29
The relationship of the phylogeny to the transmission tree
The transmission tree N (aDAG whose nodes are themembers of H, depicting whichhost infected which other) canbe represented by a mapd : V → H taking each node toa host (tips to the host theywere sampled from).
Visualised by collapsing thenodes in the preimage of eachh ∈ H under d to a single node.
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 3 / 29
The relationship of the phylogeny to the transmission tree
The transmission tree N (aDAG whose nodes are themembers of H, depicting whichhost infected which other) canbe represented by a mapd : V → H taking each node toa host (tips to the host theywere sampled from).
Visualised by collapsing thenodes in the preimage of eachh ∈ H under d to a single node. J
A
B
G
F
H C
I
E
D
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 3 / 29
The simplest version
Assume that the phylogeny andtransmission tree coincide;internal nodes are transmissionevents.
This implies no within-hostdiversity and necessitates nomore than one tip per host.
If n is internal with children nC1
and nC2, then eitherd(n) = d(nC1) ord(n) = d(nC2).
Trivially 2n−1 transmission treesfor a fixed T .
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 4 / 29
Within-host diversity
If within-host diversity isassumed then internal nodes arecoalescences of two lineageswithin a host.
The subgraph induced by thepreimage of d for any host mustbe connected.
An extra set of parameters qrepresent the infection times.
Question: How manytransmission trees for a fixed T ?(Depends on the topology.)
With one tip per host?With ≥ 1 tip per host?(Sometimes 0.)
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 5 / 29
Simultaneous MCMC reconstruction of phylogeny andtransmission tree
In either case we get an (injective but not surjective) map z from theset of possible ds to the space of transmission trees.
Thus an MCMC method that samples from the posterior distributionof phylogenies with internal node augmentation obeying either set ofrules simultaneously samples from the posterior distribution oftransmission trees.
Not only a method for reconstructing N , but a population model(tree prior) for reconstruction of T that is more realistic for anoutbreak than the standard unstructured coalescent models.
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 6 / 29
Decomposition
Let S be the sequence data and φ the various model parameters.
Without within-host diversity:
p(T , d , φ|S) =p(S |T )p(T , d |φ)p(φ)
p(S)
p(S |T ) is the standard phylogenetic likelihood and p(T , d |φ) theprobability of observing the augmented tree under a transmissionmodel.
With within-host diversity:
p(T , d , q, φ|S) =p(S |T )p(T |N , q, φ)p(N , q|φ)p(φ)
p(S)
p(N , q|φ) is the probability of the transmission tree and its timings asabove; p(T |N , q, φ) is the probability of the within-hostmini-phylogenies under a coalescent process.
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 7 / 29
MCMC implementation
Hall et al., 2015 implementedsimultaneous reconstruction ofboth trees in BEAST, withMCMC proposals that respectthe rules of node augmentation.
Several other approaches (e.g.Didelot et al., 2014, Morelli etal., 2012; Ypma et al., 2013;Klinkenberg et al., 2017) withrecent work on the incompletesampling problem (Didelot etal., 2016; Lau et al., 2016).
i)
ii)
iii)
ii)i)
iii)
iv)
50%
50%
Exchange
Subtree slide
Wilson-Balding
50%
50%
Exchange
Subtree slide
Wilson-Balding
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 8 / 29
43617 tips
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 9 / 29
The BEEHIVE study
NGS short-read sequence data acquired from samples taken fromEuropean (and one African) HIV cohort studies.
Some cohorts go back to the early epidemic in the 1980s
Current data from 3138 individuals
Epidemiology: age, gender, date of first positive test, countries oforigin and infection, risk group, ART dates, etc.
Sequences from one time point only (with a few exceptions)
Rather than making a consensus sequence from each host’s reads, wewant to use everything.
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 10 / 29
Phyloscanner: phylogenetic analysis of NGS pathogen data
sequencingmapping to references
Idea: align all short reads from all hosts to a reference genome andslide a window across the genome, building a phylogeny for the readsoverlapping each window.
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 11 / 29
Phyloscanner: phylogenetic analysis of NGS pathogen data
Identical reads from a single host are merged but the duplicate countskept as tip traits
We use RAxML for reconstruction
Tips are not associated with each other across different windows, buthosts are.
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 12 / 29
The topological signal of transmission
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
Once we have many tips fromeach host, transmission has atopological signal.
Direct transmission is suggestedwhen the clade from theinfectee is not monophyletic(Romero-Severson et al., 2016)but in general we only see thedirection of transmission fromthe topology.
Starts to look like a parsimonyproblem.
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 13 / 29
The topological signal of transmission
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
Once we have many tips fromeach host, transmission has atopological signal.
Direct transmission is suggestedwhen the clade from theinfectee is not monophyletic(Romero-Severson et al., 2016)but in general we only see thedirection of transmission fromthe topology.
Starts to look like a parsimonyproblem.
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 13 / 29
The topological signal of transmission
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
Once we have many tips fromeach host, transmission has atopological signal.
Direct transmission is suggestedwhen the clade from theinfectee is not monophyletic(Romero-Severson et al., 2016)but in general we only see thedirection of transmission fromthe topology.
Starts to look like a parsimonyproblem.
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 13 / 29
Challenges reconstructing transmission from this data
43617 tips
Datasets:
Enormous sizeContaminationpresentCoverage is uneven
Epidemiology:
Sampling isincompleteMultiple infectionspresentBottleneck attransmission may bewide (IDUs)
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 14 / 29
Transmission tree reconstruction using parsimony
For a fixed tree, we aim to:
Reconstruct hosts from those represented in the tips to internal nodesin the tree
But also allow reconstruction to “a host outside the dataset’ asrequired by incomplete sampling
Minimise the number of infection events amongst hosts in thedataset. . .
. . . except, penalise reconstructions which suggest an unreasonableamount of genetic diversity stemming from a single infection event.
Identify multiple infections and contaminations
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 15 / 29
Transmission tree reconstruction using parsimony
Suppose the T has nodes V and we are reconstructing charactersfrom the set of states S .
The cost function c(p, q; i , j) determines the cost of transitioningfrom state i to state j along the branch from p to q (in the directionaway from the root).
We take a node- and edge- dependent c on a known tree; costs fortransitions vary depending on the states involved and the branch onwhich they occur.
The lowest cost reconstruction is found with the Sankhoff algorithm.
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 16 / 29
Transmission tree reconstruction using parsimony
If reads are taken from n hosts h1, . . . , hn making up the studypopulation, we use the hi as states along with an “unsampled” stateu.
Assume that the root of the tree was in the unsampled state (usingan outgroup if required). Tips from outside the study population (theoutgroup, other reference sequences, contaminants) are assigned u asa state.
We are interested in minimising the cost of infections of members ofthe study population, but not hosts outside that population, soc(p, q; h, u) = 0 for all p, q, h.
Reconstruction starts by putting a u on the root. (Otherwise it willalways be cheap to reconstruct some hi to the root, plausible or no.)
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 17 / 29
Transmission tree reconstruction using parsimony
Let l(p, i) be the sum of thebranch lengths in the subtreerooted at p pruned so that onlytips from i remain, or ∞ if thereare no such tips. Then we takec(p, q; i , j) = 1 + kl(p, j) withk ∈ R+ a tunable parameter.
This penalises reconstructionswith unreasonable amounts ofwithin-host diversity (right).
●
●
●
●
●
●
●
●
●
●
r r
o o
p p
s s
hu
l1
l2
Left: C = 1 + k(l1 + l2)Right: C = 2
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 18 / 29
Transmission tree reconstruction using parsimony
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
c(p, q; h, u) = c(p, q, h, u) for all p, q, h. Above trees have equal C .Choose to break ties always towards h (for dense sampling), alwaystowards u (conservative), or based on branch lengths.
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 19 / 29
Parsimony for detection of contaminants
Small numbers of contaminant reads are frequent, where a virus fromthe wrong host has been sequenced
The parsimony reconstruction does double duty to detect these
Find “multiple introductions” where the tips in one split have verylow read counts, and ignore those tips
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 20 / 29
Without the assumptions of asingle infection event per hostand complete sampling, the“transmission tree” has multiplenodes per host and unsamplednodes.
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
AB1
B2●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
B
Augmented phylogeny "Transmission tree"
UA
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 21 / 29
Example
Known HIV transmission chain(Lemey et al., 2005, Vranckenet al., 2014).
Reconstruction on RAxML treesfor env and pol genes
Host True env polA B? U UB A? U AC B B BD C D DE C C CF A U AG F U FH B B BI B B BK E E EL C C C
I
E
F
B A
GL D
C
K
H
?
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 22 / 29
Full phyloscanner results
Full output from phyloscanner is a separate phylogeny for the reads ineach genome window
We would like to use these in a manner similar to bootstrapping, toindicate support for topological relationships
The phylogenies are not trivially comparable across windows as thetips are not the same
The hosts are the same, so the trasmission trees are morecomparable, but:
Some hosts are absent from some windows due to sequencing problemsOne or more nodes for unsampled regionsPotentially multiple nodes per host
Question: Can we nonetheless give a summary or mediantransmission tree?
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 23 / 29
Classification of relationships
For the time being, concentrate on classifying pairwise relationshipsbetween hosts on each window
Contiguity: is the subgraph induced by the nodes from the pair, withperhaps some unsampled nodes, connected?
Descent: Are all the nodes from one patient ancestral to those fromthe other?
Mean patristic distance in the phylogeny
Then count relationships across windows
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 24 / 29
Improved HIV cluster detection
By setting a threshold on patristic distance we can refine the proceduresfor identifying HIV clusters with likely directionality.
5
7
15
13
2
3
311
12
1
BEE0221-1
BEE0239-1
BEE0220-1
BEE0260-1
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 25 / 29
Improved HIV cluster detection
7
6
5
12
2
1
7
7
9
7
1
3
23
6
2
8
9
9
5
3
8
1
42
15
5
3
11
6 6
1
20
24
6
13
1
11
3
11
6
7
1
1
7
2
6
2
1
2
16
62
4
5
6
3
1
7
3
13
99
3
6
6
1
7
2
1
1
12
8
7
3
6
42
22
1
73
8
14
3
69
10
1
1
6
6
9
14
1
23
2
11
3
7
5
3
3
6
4
18
2
9
3
2
2
4
512
5
6
2
3
3
63
4
7
1
5
10
2
5
2
7
83
11
7
2
51
9
10
6
15
6
3
4
6
15
1
2
10
5
1
3
11
4
3
13
7
2
13
9
6
1
9
2
2
27
20
1
116
2
4
6
7
1
2
1
6
2
7
1
4
6
4
10
14
6
1
7 10
1
8
1
14
6
13
7
12
3
2
1
7
16
4
3
8
10
216
1
1
65
1
1 115
14
6
6
13
7
1
8
9
11
6
2
2
6
3
2
5
6
57
2
5
21
4
3
5
4 5
36
3
11
7
3
5
7
1
14
12
7
7
18
3
2
8
3
12
6
44
2
3
3
4
2
10
3
5
5
1
2
10
1
1
7
2
3
12
4
3
10
218
2
2
2
4
1
1
14
2
9
8
215
2
4
1
4
54
5
2
4
4
11 12
2
15
11
1
7
6
5
4
12
13
4
10
1
7
7
24
8
1
1
2
4
4
5
3
2
4
3
62
15
4
1
4
2
18
9
3
1 4
13
9
7
12
7
1
2
6
1
1 19
8
5
2
14
1 10
1
3
6
14
4
3
6
711
1
13
1
8
BEE0795-1
BEE0254-1
BEE0248-1
BEE0726-1
BEE0093-1
BEE0015-1
BEE0201-1
BEE0197-1
BEE0095-1
BEE0098-1
BEE0110-1
BEE0577-1
BEE0327-1
BEE0442-1
BEE0323-1
BEE0394-1
BEE0037-1
BEE0036-1
BEE0198-1
BEE0843-1
BEE0840-1
BEE0240-1
BEE0826-1
BEE0728-1
BEE0032-1
BEE0175-1
BEE0009-1
BEE0215-1BEE0804-1
BEE0527-1
BEE0564-1
BEE0738-1
BEE0023-1
BEE0112-1BEE0351-1
BEE0321-1
BEE0427-1
BEE0331-1
BEE0426-1
BEE0809-1
BEE0109-1
BEE0102-1
BEE0205-1BEE0100-1
BEE0259-1BEE0042-1BEE0529-1
BEE0234-1
BEE0096-1
BEE0241-1
BEE0243-1BEE0193-1
BEE0200-1 BEE0146-1
BEE0117-1
BEE0174-1
BEE0735-1
BEE0043-1
BEE0062-1 BEE0115-1
BEE0018-1
BEE0347-1
BEE0567-1
BEE0122-1
BEE0542-1
BEE0541-1BEE0019-1
BEE0141-1BEE0803-1
BEE0260-1 BEE0791-1
BEE0237-1
BEE0592-1BEE0181-1
BEE0017-1 BEE0758-1
BEE0139-1
BEE0057-1
BEE0107-1
BEE0014-1 BEE0767-1
BEE0114-1
BEE0108-1
BEE0345-1
BEE0206-1
BEE0395-1
BEE0799-1
BEE0368-1
BEE0223-1BEE0012-1
BEE0464-1BEE0474-1
BEE0341-1
BEE0160-1
BEE0161-1
BEE0164-1
BEE0573-1
BEE0154-1
BEE0454-1
BEE0532-1
BEE0130-1
BEE0736-1 BEE0807-1
BEE0290-1
BEE0322-1
BEE0470-1
BEE0213-1
BEE0662-1
BEE0305-1
BEE0031-1
BEE0534-1
BEE0273-1
BEE0555-1
BEE0794-1 BEE0808-1
BEE0131-1
BEE0092-1
BEE0222-1
BEE0545-1 BEE0162-1
BEE0144-1BEE0572-1BEE0877-1 BEE0065-1BEE0231-1BEE0097-1
BEE0121-1
BEE0465-1
BEE0780-1
BEE0398-1
BEE0539-1
BEE0797-1
BEE0385-1
BEE0473-1
BEE0221-1BEE0050-1
BEE0220-1
BEE0560-1
BEE0734-1
BEE0040-1
BEE0258-1
BEE0284-1
BEE0849-1
BEE0867-1
BEE0053-1
BEE0301-1
BEE0353-1
BEE0696-1
BEE0789-1 BEE0709-1
BEE0168-1
BEE0547-1 BEE0723-1BEE0148-1
BEE0574-1 BEE0016-1BEE0851-1BEE0230-1BEE0225-1
BEE0725-1
BEE0182-1
BEE0178-1
BEE0552-1
BEE0669-1
BEE0059-1
BEE0710-1
BEE0657-1
BEE0219-1
BEE0285-1
BEE0330-1
BEE0798-1
BEE0120-1 BEE0214-1 BEE0584-1BEE0538-1BEE0733-1
BEE0686-1
BEE0034-1
BEE0543-1BEE0557-1
BEE0253-1
BEE0147-1
BEE0335-1
BEE0318-1
BEE0599-1 BEE0263-1
BEE0049-1
BEE0247-1
BEE0255-1
BEE0233-1
BEE0406-1
BEE0428-1
BEE0086-1
BEE0085-1
BEE0886-1
BEE0381-1
BEE0865-1
BEE0854-1BEE0212-1
BEE0885-1
BEE0563-1
BEE0287-1
BEE0386-1
BEE0409-1
BEE0437-1
BEE0821-1
BEE0741-1
BEE0815-1 BEE0811-1
BEE0196-1
BEE0142-1 BEE0227-1BEE0153-1 BEE0265-1
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 26 / 29
Conclusions
The transmission tree can be viewed as an augmentation of theinternal nodes of a phylogeny with host information, subject tovarious sets of rules.
Considerable recent work in reconstructing it for smaller datasets,usually using MCMC.
Phyloscanner allows reconstruction with big, NGS datasets.
Work remains to be done for rigorous treatment of the output.
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 27 / 29
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 28 / 29
Acknowledgements
Oxford
Christophe Fraser
Chris Wymant
Imperial
Oliver Ratmann
Edinburgh
Andrew Rambaut
Mark Woolhouse
Matthew Hall (Oxford) Transmission tree reconstruction February 2017 29 / 29