17
mStruct: Structure under mutations Suyash Shringarpure and Eric Xing Carnegie Mellon University mStruct: Inference of population structure in the presence of genetic admixing and allele mutations

MStruct: Structure under mutations Suyash Shringarpure and Eric Xing Carnegie Mellon University mStruct: Inference of population structure in the presence

  • View
    226

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MStruct: Structure under mutations Suyash Shringarpure and Eric Xing Carnegie Mellon University mStruct: Inference of population structure in the presence

mStruct: Structure under mutations

Suyash Shringarpure and Eric XingCarnegie Mellon University

mStruct: Inference of population structure in the presence of genetic

admixing and allele mutations

Page 2: MStruct: Structure under mutations Suyash Shringarpure and Eric Xing Carnegie Mellon University mStruct: Inference of population structure in the presence

2

Significance

Page 3: MStruct: Structure under mutations Suyash Shringarpure and Eric Xing Carnegie Mellon University mStruct: Inference of population structure in the presence

3

Genetic Population Structure

• Structure (Pritchard et al, 2000)

Genetic structure of Human Populations (Rosenberg et al. 2002)

Africa Europe Mid-East Cent./S. Asia East Asia Oceania

Ancestral proportion

Page 4: MStruct: Structure under mutations Suyash Shringarpure and Eric Xing Carnegie Mellon University mStruct: Inference of population structure in the presence

Generative model- Structure

0.3 0.7

0.8 0.2

α (for the dataset)

0.8 0.2

All the alleles observed at this locus

Page 5: MStruct: Structure under mutations Suyash Shringarpure and Eric Xing Carnegie Mellon University mStruct: Inference of population structure in the presence

Modeling allele similarity

• Microsatellite– Repeats of a small DNA unit, say

Allele - 2

Allele - 9

Allele - 10

•Allele 9 is much more similar to allele 10 than allele 2.•Allele 10 might be a mutation of allele 9.•Mathematically encode the idea in the model•mStruct – Structure under mutations

Page 6: MStruct: Structure under mutations Suyash Shringarpure and Eric Xing Carnegie Mellon University mStruct: Inference of population structure in the presence

Hypothesis

• Individual genomes in modern populations are a result of– Admixture of ancestral populations.– Mutations from ancestral alleles.

• Ancestral populations have fewer alleles– (Mostly) True for microsatellites

Page 7: MStruct: Structure under mutations Suyash Shringarpure and Eric Xing Carnegie Mellon University mStruct: Inference of population structure in the presence

Generative model- mStruct

0.3 0.7

0.8 0.2

α (for the dataset)

0.8 0.2

All the alleles observed at this locus

δ1

δ2

Page 8: MStruct: Structure under mutations Suyash Shringarpure and Eric Xing Carnegie Mellon University mStruct: Inference of population structure in the presence

Mutation models

• How to derive descendant alleles from ancestral alleles?

• Distribution based on the single step model

• P(b|a) α δabs(b-a) , δ < 1• Computationally “easy”• NOT conventional mutation rate.

Page 9: MStruct: Structure under mutations Suyash Shringarpure and Eric Xing Carnegie Mellon University mStruct: Inference of population structure in the presence

Finding ancestral alleles

• Fit mixtures of mutation distributions

• Try using 1,2,3….. ancestral alleles

• Use information theory to decide how many ancestral alleles are appropriate

Histogram of observed alleles

Page 10: MStruct: Structure under mutations Suyash Shringarpure and Eric Xing Carnegie Mellon University mStruct: Inference of population structure in the presence

Comparing population structure maps

Page 11: MStruct: Structure under mutations Suyash Shringarpure and Eric Xing Carnegie Mellon University mStruct: Inference of population structure in the presence

11

Phylogenetic Trees from the Structural Maps

Page 12: MStruct: Structure under mutations Suyash Shringarpure and Eric Xing Carnegie Mellon University mStruct: Inference of population structure in the presence

12

Phylogenetic Trees from the Structural Maps

mStruct Structure

Page 13: MStruct: Structure under mutations Suyash Shringarpure and Eric Xing Carnegie Mellon University mStruct: Inference of population structure in the presence

HGDP SNP results

Page 14: MStruct: Structure under mutations Suyash Shringarpure and Eric Xing Carnegie Mellon University mStruct: Inference of population structure in the presence

Implications of Inconsistency

• Simplistic mutation model• SNP mutations harder to discover from data• The model reduces to Structure• Fundamental difference– Different markers treated differently

• Structure’s treatment of alleles is almost categorical

Page 15: MStruct: Structure under mutations Suyash Shringarpure and Eric Xing Carnegie Mellon University mStruct: Inference of population structure in the presence

Contour of Empirical Mutation

Page 16: MStruct: Structure under mutations Suyash Shringarpure and Eric Xing Carnegie Mellon University mStruct: Inference of population structure in the presence

Conclusion

• Generative model for population structure• Modeling mutations from ancestral alleles• Gives mutational information apart from

population structure.• (in press) Genetics• Online version up now.

Page 17: MStruct: Structure under mutations Suyash Shringarpure and Eric Xing Carnegie Mellon University mStruct: Inference of population structure in the presence

Graphical model representations

Structure mStruct