25
Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci Maxwell Murphy – UC Berkeley/UCSF 4/17/2019 1

Spatial Mapping of Malaria Parasite Genetics · 2020-07-20 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Spatial Mapping of Malaria Parasite Genetics · 2020-07-20 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF

Spatial Mapping of Malaria Parasite GeneticsChallenges and Opportunities of High Diversity Genetic Loci

Maxwell Murphy – UC Berkeley/UCSF4/17/2019

1

Page 2: Spatial Mapping of Malaria Parasite Genetics · 2020-07-20 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF

Outline

• Applying gaussian process regression to estimate allelefrequency surfaces and classify origin of clinical infections

• Challenges of utilizing polygenomic data• Approaches to utilize polygenomic data with high diversity

genetic loci

2

Page 3: Spatial Mapping of Malaria Parasite Genetics · 2020-07-20 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF

Estimating Allele Frequencies UsingGaussian Processes

Page 4: Spatial Mapping of Malaria Parasite Genetics · 2020-07-20 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF

Motivation

• Classifying malaria cases as local vs imported and determiningorigin of infection is of great interest

• Spatial models of parasite diversity and distributions wouldmake for interesting inputs into other spatial models

• Integrating spatial information with genetic data remainschallenging

3

Page 5: Spatial Mapping of Malaria Parasite Genetics · 2020-07-20 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF

Application to VivaxGEN Data

• Publicly availablemicrosatellite datafrom VivaxGEN

• 617 Samplesgenotyped at 9microsatellites

• Polyclonal sampleswere restricted todominant alleles

4

Page 6: Spatial Mapping of Malaria Parasite Genetics · 2020-07-20 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF

Application to Regional Clinical Data

• 2585 samplescollected from thenorthern region ofNamibia (Tessema etal., eLife 2019)

• Genotyped at 26microsatellite markers

• Polyclonal sampleswere restricted todominant alleles

5

Page 7: Spatial Mapping of Malaria Parasite Genetics · 2020-07-20 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF

Takeaways

• Using vanilla GP regression with an exponential kernel anddeep enough sampling, spatial signal can be extracted that isuseful for origin classification

• More nuanced approaches to modeling spatial covariance wouldlikely be very fruitful in exposing spatial connectivity of parasitepopulations

• Publicly available regional databases of genetic data will becritical in developing allele frequency maps

6

Page 8: Spatial Mapping of Malaria Parasite Genetics · 2020-07-20 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF

The Dirty Little Secret of MalariaGenomics

Page 9: Spatial Mapping of Malaria Parasite Genetics · 2020-07-20 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF

Challenges of Complex Infections and High Diversity GeneticLoci

• “Polyclonal samples were restricted to dominant alleles”• “Analysis was restricted to monoclonal infections”• Primarily motivated by using statistics that were not designed

to be used in the context of mixed DNA populations• Also a consequence of noisiness of genotyping method

• e.g. microsatellite data• Statistical convenience is being prioritized at the consequence

of bias, either due to sampling or because of properties ofestimator

7

Page 10: Spatial Mapping of Malaria Parasite Genetics · 2020-07-20 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF

Example: Estimating Complexity of Infection

●●

●●●

●●●●

●●

●●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●●●●

●●

●●●●●●●●●

●●

●●

●●●●

●●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●

●●

●●●●

5

10

0 100 200 300Sample

Est

imat

e Estimate_Type●

naive_coitrue_coi

• Simulated data using12 Loci ranging from5 to 25 alleles

• FP Rate: .03• FN Rate: .1• Complexity of

infection estimated bytaking maximumobserved number ofalleles

8

Page 11: Spatial Mapping of Malaria Parasite Genetics · 2020-07-20 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF

Challenges of Complex Infections and High Diversity GeneticLoci

• Some tools exist to estimate parameters such as complexity ofinfection (COI), allele frequencies, and population structure

• COIL, THE REAL McCOIL• Restricted to SNP data, incorporates genotyping error model

• MALECOT• Supports multi-allelic data, does not incorporate genotyping

error model

9

Page 12: Spatial Mapping of Malaria Parasite Genetics · 2020-07-20 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF

Constructing Models to Account for Complex Infections UsingMulti-allelic Data

gi,j

g∗i,j

µi

ε+

ε−

πj

j locus

i sample

j locus

10

Page 13: Spatial Mapping of Malaria Parasite Genetics · 2020-07-20 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF

Constructing Models to Account for Complex Infections UsingMulti-allelic Data

gi,j

g∗i,j

µi

ε+

ε−

πj

j locus

i sample

j locus

L(π, ε−, ε+, µ|G) =n∏

i=1

k∏

j=1

g∗∈G∗

P (gi,j |g∗i,j , ε+, ε−)P (g∗i,j |µi, πj)

P (gi,j |g∗i,j , ε+, ε−) =a∏

k=1

(1− ε−)g∗i,j,k if gi,j,k = 1 and g∗i,j,k > 0

(ε−)g∗i,j,k if gi,j,k = 0 and g∗i,j,k > 0

(ε+) if gi,j,k = 1 and g∗i,j,k = 0

(1− ε+) if gi,j,k = 0 and g∗i,j,k = 0

P (g∗i,j |µi, πj) =µi!

g∗i,j,1! · · · g∗i,j,k!πg∗i,j,1

j,1 · · ·πg∗i,j,k

j,k

11

Page 14: Spatial Mapping of Malaria Parasite Genetics · 2020-07-20 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF

Computational Complexity

L(π, ε−, ε+, µ|G) =n∏

i=1

k∏j=1

∑g∗∈G∗

P(gi ,j |g∗i ,j , ε+, ε−)P(g∗i ,j |µi , πj)

12

Page 15: Spatial Mapping of Malaria Parasite Genetics · 2020-07-20 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF

Computational Complexity

L(π, ε−, ε+, µ|g) =n∏

i=1

k∏j=1

∑g∗∈G∗

P(gi ,j |g∗i ,j , ε+, ε−)P(g∗i ,j |µi , πj)

1 2 3 4 5 62 2 3 4 5 6 74 4 10 20 35 56 848 8 36 120 330 792 171616 16 136 816 3876 15504 5426432 32 528 5984 52360 376992 232478464 64 2080 45760 766480 10424128 119877472128 128 8256 357760 11716640 309319296 6856577728

13

Page 16: Spatial Mapping of Malaria Parasite Genetics · 2020-07-20 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF

Tackling Complex Infections withHigh Diversity Genetic Loci

Page 17: Spatial Mapping of Malaria Parasite Genetics · 2020-07-20 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF

Estimating Marginal Likelihoods

• We do not need to calculate the exact marginal likelihood foreach sample

• An unbiased estimate of the probability density results in aMarkov Chain with the exact target as its stationarydistribution (Andrieu and Roberts, 2009)

Computationally Intractable

r := P(x ′)g(x |x ′)P(x)g(x ′|x)

Tractabler̂ := P̂(x ′)g(x |x ′)

P̂(x)g(x ′|x)

14

Page 18: Spatial Mapping of Malaria Parasite Genetics · 2020-07-20 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF

Estimating Marginal Likelihoods

• For each sample at each locus, we can generate an unbiasedestimate of the likelihood instead of calculating an exactlikelihood

P(g |π, ε−, ε+, µ) =∑

g∗∈G∗P(g |g∗, ε+, ε−)P(g∗|µi , πj)

15

Page 19: Spatial Mapping of Malaria Parasite Genetics · 2020-07-20 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF

Importance Sampling

• Importance Sampling provides a computationally tractablemethod to generate unbiased estimates of the marginallikelihood of a given data point

• Intuitively makes sense, as the vast majority of potential truegenotypes contribute very little to the likelihood

P̂(g |π, ε−, ε+, µ) = 1K

K∑k=1

P(g |g∗k , ε+, ε−)P(g∗k |µi , πj)q(g∗)

g∗ ∼ q

16

Page 20: Spatial Mapping of Malaria Parasite Genetics · 2020-07-20 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF

Choosing an Importance Sampling Distribution

• Choice of Importance Sampling Distribution has significantimpact on variance of estimate, and subsequently the efficiencyof sampling

• A good heuristic for choosing a sampling distribution is toreweight the estimated allele frequency distribution based onthe observed genotype and false negative rate

• Ex:• π = [.1, .2, .3, .4]• ε− = .1• g = [1, 0, 0, 0]• q = [.9, .02, .03, .04]

17

Page 21: Spatial Mapping of Malaria Parasite Genetics · 2020-07-20 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF

Demonstration: Low Diversity

5

10

0 5 10True COI

Est

imat

ed C

OI

• Simulated data using12 Loci with 5 alleles

• FP Rate: .03• FN Rate: .1

18

Page 22: Spatial Mapping of Malaria Parasite Genetics · 2020-07-20 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF

Demonstration: High Diversity

2.5

5.0

7.5

10.0

12.5

0 5 10True COI

Est

imat

ed C

OI • Simulated data using

12 Loci with between5 and 25 alleles

• FP Rate: .03• FN Rate: .1

19

Page 23: Spatial Mapping of Malaria Parasite Genetics · 2020-07-20 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF

Future Directions

Page 24: Spatial Mapping of Malaria Parasite Genetics · 2020-07-20 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF

Future Directions

• With this computational framework, we will extend our spatialmodeling of allele frequencies to incorporate all observedgenetic data

• Further develop this framework for estimating other parametersof interest

20

Page 25: Spatial Mapping of Malaria Parasite Genetics · 2020-07-20 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF

Thank you

[email protected]

github.com/m-murphy

21