Upload
trinhtram
View
222
Download
1
Embed Size (px)
Citation preview
Genetic Admixture, Human Population History and Local Adaptation
Shuhua Xu
CAS-MPG Partner Institute for
Computational Biology (PICB)
Otto Warburg International Summer School and Research Symposium 2013
1. A brief introduction to genetic admixture
2. Inference of population history with recombination
information and using admixture analysis
3. Local adaptation in admixed populations
Outline
A brief introduction to genetic admixture
1
• Genetic admixture refers to the result of interbreeding
between two or more previously isolated populations within a
species.
unpublished data Xu et al., AJHG 2009
Parental populations Admixed populations
European x African African American
European x Amerindian x African Latino, Mexican, Hispanic etc.
European x East Asian Uyghur, Kazakh, etc.
Genetic Admixture and Admixed Populations
Evolutionary studies
Increased population genetic diversity — evolutionary impact
Shed light on human migration history
Local adaptation
Medical studies
Increased individual genome herterozygosity — medical impact
Mapping genes — admixture mapping
Why interested in admixed populations?
Disease mapping strategies
Family-based linkage studies
Population-based association studies
Unlikely exist
No good strategy Mag
nitud
e o
f effect
Frequency in population
Principle of Association Study
3/6
2/4 3/2
6/2
3/5
2/6
3/6 5/6
Allele 6 is ‘associated’ with disease
4/6 2/6
6/6
6/6
3/4
5/2
Controls Cases
• Population stratification in Epidemiology.
• Analysis of mixed samples having different allele frequencies is a primary concern in human genetics, as it leads to false evidence for allelic association.
Population structures make trouble in association
studies
Association due to population structure
Population 1 Population 2
Case
Control
Genotype AA Aa aa
Association due to population structure
Genotype AA Aa aa
Population 1 Population 2
Case
Control
Odds Ratio (OR)
Odds ratio (OR) = Odds for case: a/c
Odds for control: b/d
Disease
Exposure yes no total
yes a b a + b
no c d c + d
total a + c b + d a + b + c + d
a/c
b/d
OR>1: exposure factors increase the risk of disease; positive association
OR<1: exposure factors decrease the risk of disease; negative association
OR=1: no association
Explanation of OR
Example
(+) (-)
Case Control
A 50 20
a 50 80
Odds for case 50:50 = 1
Odds for control 20:80 = 0.25
Odds ratio = 50:50/20:80 = 1/0.25 = 4
Allele
Heterogeneity/Stratification
Total Population case control
A 51 59 110
a 549 1,341 1,890
600 1,400 2,000
Subpopulation 1 Subpopulation 2
case control case control
A 50 50 100 A 1 9 10
a 450 450 900 a 99 891 990
500 500 1,000 100 900 1000
= 9.2% = 4.4% 51
549
59
1,341 OR=2.11 !
OR=1 OR=1
Xu et al. AJHG 2009
Geographic Distribution Sample Size
PC1 distribution Heterozygosity
Xu et al. AJHG 2009
PCA of world-wide samples
PCA of East Asian samples
Xu et al. AJHG 2009
After removing populations sampled from Beijing, Shanghai, Guangzhou, Anhui, and Jiangsu
Xu et al. AJHG 2009
Xu et al. AJHG 2009
Genomic control
• Devlin and Roeder (1999) used theoretical arguments to propose that with population structure, the distribution of Cochran-Armitage trend tests, genome-wide, is inflated by a constant multiplicative factor λ.
• We can estimate the multiplicative inflation factor using the statistic λ = median(Xi
2)/0.465.
• Inflation factor λ > 1 indicates population structure and/or genotyping error.
• We can carry out an adjusted test of association that takes account of any mismatching of cases/controls at any SNP using the statistic Xi
2/ λ. Inflation factor λ = 1.11
Xu et al. AJHG 2009
• Human populations are generally not homogenous, as a whole, is not in HWE due to population structure
• Population substructure can cause false positive or false negative results in association studies
• Population substructure can be controlled and corrected with AIMs, but only globally
• Controlling local structure due to population admixture is challenging, but the information itself is useful for both evolutionary and medical studies
Summary
Population structure vs population admixture
1 generation ago
2 generations ago
3 generations ago
4 generations ago
Two African chromosomes
Two European chromosomes
One African One European chromosome
Disease locus
today
Admixture Mapping
100%
50%
0% 20cM 40cM 60cM 80cM 100cM 120cM 140cM
Position on chromosome (centimorgans)
• Controls are not necessary!
• The perfect control is the rest of peoples’ genome
• ~2,000 SNPs for genome-wide mapping
• Reducing multiple testing and computational burden
Human population admixture in Asia is common
Population Structure and
Genetic History of Uyghurs
Xu et al. Am.J.Hum.Genet. 2008a
Xu & Jin Am.J.Hum.Genet. 2008b
Xu et al. Mol.Biol.Evol. 2009
Genetic relationship of Uyghurs
and HapMap populations
Xu et al. AJHG 2008
Cluster relationship of populations
Xu & Jin. AJHG 2008
Population Genetic Structure
Southern Uyghurs Northern Uyghurs
European East Asian
Xu & Jin. AJHG 2008
Inference of population history with recombination
information and using admixture analysis
2
Genetic information used to
reconstruct human phylogeny
① Mutation
Kivisild et al. 2002 (YCC, 2002)
mtDNA Y chromosome
② Drift (allele frequency)
• The accumulated recombination events in genome is expected to provide additional information for human genetic relationship studies.
• The vast recombination information in human genome is generally ignored or deliberately avoided in studies on human population genetic relationship.
③ Recombination
►4Ner: population recombination parameter.
►Alternatively denoted by ρ, 4Nec or C
– r or c is the recombination rate across the region of interest;
– Ne is the effective population size.
Population recombination rate (4Ner)
• Estimation of population recombination parameter 4Ner from genotyping data is computationally challenging.
• The theory of optimal estimation is not fully worked out.
• Estimators rely on assumptions about demography and selective neutrality.
Challenges in Studies on Recombination
• Full sequence data
• Polymorphisims
• Rare mutations
• CNVs
• Small indels
• Recombination
Information from NGS
Reconstruct human phylogeny using
recombination information
Now
Admixture point
Inter-ancestral segments Intra-ancestral segments
pre-
post-
Intra-ancestral segments
Modified from Xu et al,(AJHG 2008a)
Recombination info in admixed genomes
Dating Austronesian Expansion
Xu et al, PNAS 2012
Geographical distribution and relationship of genetic components
• Because genetic recombination breaks down parental
genomes into segments of different sizes, the genome of
a descendant of an admixture event is composed of
different combinations of these ancestral segments.
• Admixture time can be estimated from the information
based on the distribution of ancestral segments and the
recombination breakpoints in an admixed genome.
• Admixture time can be considered as an estimation of
the expansion time of the population of the second wave
of migration.
Principles and Methods
Xu et al, PNAS 2012
Dating Population Admixture using recombination information
Xu et al, PNAS 2012
Estimating admixture time using different methods
Xu et al, PNAS 2012
Estimation of recombination parameter and admixture time
Xu et al, PNAS 2012
A “cline” of admixture time decreasing from west to east
• We provided the first genetic dating for Austronesian
expansion using recombination and admixture analysis.
• Our analysis indicates a cline of decreasing time of
admixture across E. Indonesia, with oldest time in the
west and youngest time in the east.
• The estimated Austronesian expansion began was about
4,000 years ago, in excellent agreement with inferences
based on linguistic and archeological information.
Xu et al, PNAS 2012
Summary
Local adaption in admixed populations
3
Population Genomics
Population Genetics
Local adaptation (positive selection)
Functional restriction (negative selection)
Disease (negative selection)
High Altitude Adaptation in Tibetans
The Tibetan Plateau, known as "the roof of the world" and with an average elevation of over 4,500 meters, is the highest plateau in the world.
Identification of HAA genes
HIF2A encodes HIF-2α, a transcription factor involved In the induction of genes regulated by oxygen.
HIFPH2 encodes HIF-prolyl hydroxylase 2 , which catalyzes the post-translational formation of 4-hydroxyproline in HIF-α
Detecting selection in admixed population based on biased ancestry contribution
140cM
100%
50%
0%
20cM 40cM 60cM 80cM 100cM 120cM
Position on chromosome (cM)
Pro
port
ion o
f an
cest
ry
European
Population admixture
Current African American
African
Selection before admixture
Tough living conditions
Selection After admixture
Local environmental
challenges
Middle Passage
N Generations
Selection before admixture
Schematic of possible natural selections in African Americans
Schematic of local ancestry inference and genome partition
African component European component
AfA
Local ancestry inference
Partitioning admixed genomes and detecting nature selection
Genome Research, 2009
Developing Methods for detecting local adaptation signatures in admixed genomes
HB
B
HLA
-C
CD
36
Regions with highly differentiated allele frequency between AAF and African
Regions or
SNPs Position Size (bp) SNPs
Highest
FST Genes Pathways Related disease
1p21 chr1:100125058..1
00183875 58817 2 0.0562 AGL*
Metabolism of
carbohydrate Glycogen storage disease
1q22 chr1:153401959..1
53464086 62127 4 0.0692
THBS3*, MUC1*, MTX1, TRIM46, KRTCAP2
Signaling by PDGF Stomach cancer, breast cancer, osteosarcoma
rs12094201 chr1:236509336 1 1 0.0561 (ZP4* 389kb) NA Hypertension,
Non-alcoholic fatty liver
rs7642575 chr3: 31400165 1 1 0.0453 (STT3B, OSBPL10*
149 kb) NA Peripheral arterial disease
6p21-p22 chr6:26554684..33
961049 7406365 11 0.0711
HLA-B*, HLA-C, EHMT2*, HLA-DPA1*,
HLA-DRB5, EHM,
BTN3A3, et al
Signaling by GPCG,
signaling in immune
system, HIV infection,
Diabetes pathway
HIV, Crohn’s disease, rheumatoid arthritis,
juvenile idiopathic
arthritis, colorectal cancer, systemic sclerosis
6q25 chr6:151555551..1
51569258 13707 2 0.0545 (AKAP12* 40kb) Cell growth
Hypertension,
hemorrhagic stroke
rs10499542 chr7: 22235870 1 0.04606 RAPGEF5* GTP/GDP-regulation Thyroid stimulating
hormone
7q21 chr7:79768487..80
482597 714110 10 0.0946 CD36*, SEMA3C
Metabolism of lipids and lipoprotein
Metabolic syndrome, malaria
8q24 chr8:143754039..1
43758933 4894 2 0.04679 PSCA* NA
Prostate cancer, bladder
cancer, gastric cancer
11p15 chr11:5034229..54
21456 387227 3 0.0617
HBB*, HBD*, HBE1*,
HBG2, OR51I1, et al Signaling by GPCR
Sickle cell disease,
beta-thalassemia, malaria
rs6015945 chr20:59319574 1 1 0.0627 CDH4* Cell junction organization
Alzheimer's Disease
1
Ingenuity pathway analysis (IPA)
• Diseases and Disorders – Metabolic diseases (P = 1.5110-16)
– Endocrine system disorder (P = 2.23 10-16)
– Immunological diseases (P = 9.3010-12)
– Genetic disorder (P = 5.6710-11).
• Pathways (all related with immune system) – Antigen presentation pathway (P = 1.9510-4)
– allograft rejection signaling (P = 4.6910-3)
– Graft-versus-host disease signaling (P = 4.6910-3)
– Autoimmune thyroid diseases signaling (P = 5.3510-3)
Evolutionary analysis of human disease related genes
• Human disease genes have been subjected to both purifying and positive selection.
Andreas Dress, Wenfei Jin; Haiyi Lou; Administration staff; IT staff …
Li Jin; Shilin Li; Yajun Yang…
Mark Stoneking Irina Pugach
Children’s Hospital Boston,
Harvard Medical School Bailin Wu; Yiping Shen
Edison T. Liu
Mark Seielstad
Guoping Zhao; Wei Huang; Ying Wang; Haifeng Wang
Max-Planck EVA
Chinese National Human Genome Center at Shanghai
Fudan University CAS-MPG PICB
Manfred Kayser
Erasmus MC University
Maude Phipps Zilfalil Bin Alwi Boon Peng Hoh
The HUGO PanAsian
SNP Consortium
93 scientists, >10 countries
Anhui Medical University
Xuejun Zhang; Xianyong Yin
Acknowledgements
Thank you!