View
223
Download
3
Category
Preview:
Citation preview
1
2
Plant and Animal Association Mapping
• Introduce yourselves:- What species do you work on- What kind of population- What traits interest you- What marker resources do you have
3
Objective
• Convey ideas and tools to help you think about your population to decide what resources and methods will help you identify loci that affect the traits that interest you- Are we going to have the same problems as human
geneticists are having?
• This is a grand objective that we will fail to achieve
• Think about what we present in that light
4
Association MappingLD
Methods Germplasm
DefinitionCauses
Haplotype Blocks
Marker DensityRecombinationHotspots
Model-basedor PCA?
Candidate locior whole genome?
Sub-populationstructure
Extent of LD
BreedingSystem
Gene identification orMarker-assisted
selection?
Regression
Genomic selection
Multiple testingvs. Shrinkage
Signatures ofselection
Species
Panel diversity
Confounded structure andpolymorphism
5
Outline
• All mapping requires linkage disequilibrium• Why association over linkage mapping?• Refresher on LD: measures and causes• Whole-genome scan marker densities• Extent of LD in plants
6
Association Mapping
• It’s the same thing as linkage mapping in a bi-parental population but in a population that has not been carefully designed and generated experimentally.
7
aa AA BBbbIl y a un QTL à proximité
du marqueur AIl n’y a pas QTL à proximité
du marqueur B
Distribution des performances en fonction du génotype au marqueur A ou B
Performances Agronomiques dans la population
1
234 56
Détection de QTL
q q q Q Q Q
qqqQQ Q qqqQQ Q
Q q2 0
1 3
Q q2 2
1 1
Marqueur A
A a
Individus de la populationparents
1 2 3 4 5 6
Marqueur B
B b
Individus de la populationparents
1 2 3 4 56
8
Linkage Disequilibrium <–> Association
Jannink, J.-L. et al. 2001. Trends Plant Sci 6:337-342
9
Linkage Disequilibrium <–> Association
• pQM = pQpM <=> p(Q|M) = p(Q|m) = pQ• pQM ≠ pQpM <=> p(Q|M) ≠ p(Q|m)• Lines carrying M do not carry Q at the same
frequency as lines carrying m.
10
Why Association Mapping?
• Sometimes you can’t generate a population experimentally…
• Mapping efficiency• Fine mapping• Link to plant / livestock breeding
11
Dissecting A Quantitative Trait: Resolution Versus Time
Resolution in bp1x1071
Rese
arch
Tim
e in
Yea
rs5
1Associations
1x104
NILs Positional
Cloning
RI QTL Mapping
Yu and Buckler, Curr Opin Biotechnol 17: 1-6 (2006)
Pedigree
F2 or RILMapping
12
Resolution Versus Allelic Range
Resolution in bp1x1071
Alle
les
Eval
uate
d
>40
1
Associations In Diverse Germplasm
1x104
NIL
Pedigree
F2 or RIL Mapping
Positional Cloning
Associations In Narrow Germplasm
Yu and Buckler, Curr Opin Biotechnol 17: 1-6 (2006)
13
Link to plant / livestock breeding
• Phenotyping is not getting cheaper- use data collected in breeding for discovery:
association mapping does not require the generation of experimental mapping populations
• Dense genome-wide markers together can predict polygenic breeding values- Predict yield in the greenhouse before seed increase
for field testing- Predict performance of embryo / immature animal
14
Refresher on LD
• Definition: alleles at different loci not co-inherited independently
• Association with the phenotype• Parameter D; min and max of D• Standardize D: D’ and r2
• Causes of LD- mutation, drift / sampling, structure, selection
• Decay of LD: recombination
15
Linkage disequilibrium
• Alleles are co-inherited either more or less often than predicted by “chance”:
• Loci M and Q. Alleles {M, m} and {Q, q}• pMQ: probability that a parent transmits a
gamete carrying both alleles M and Q• “Chance” = “Alleles are independent”
16
Independence in a Table
17
Non-independence
Non-Independence in a Table
• Algebra shows D = ru – st• By convention, qA and qB are the minor allele
frequencies 18
19
Linkage Disequilibrium <–> Association
• pQM = pQpM <=> p(Q|M) = p(Q|m) = pQ• pQM ≠ pQpM <=> p(Q|M) ≠ p(Q|m)• Lines carrying M do not carry Q at the same
frequency as lines carrying m.
20
Minimal and maximal values of D
• If D < 0r = pApB + D ≥ 0 D ≥ –pApBu = qAqB + D ≥ 0 D ≥ –qAqB
• D ≥ max(–pApB, –qAqB)• If D > 0
s = pAqB – D ≥ 0 D ≤ pAqBt = qApB – D ≥ 0 D ≤ qApB
• D ≤ min(pAqB, qApB)
21
Standardize D between 0 and 1
• Define Dmax = max(–pApB, –qAqB) if D < 0or Dmax = min(pAqB, qApB)
if D > 0then 0 ≤ D’ = D / Dmax ≤ 1
• When is |D| maximized?
pA qApB r = 0 t = pBqB s = pA u = qAqB – pApB
=> D = –pApB
22
Recombination and Maximal D
• After a new mutation, one of the four gametes is missing so D’ = 1
• The missing gamete can be created by recombination
• D’ = 1 until recombination occurs: series of loci with D’ = 1 can define a haplotype block
Number of allelesA1B1 N11 A2B1 N21
A1B2 1 A2B2 0
Slatkin, M. 2008. Linkage disequilibrium – understanding the evolutionary past and mapping the medical future. Nat Rev Genet 9:477-485.
23
Other standardized LD measure
• See D as a covariance of allelic states:cov(X, Y) = E(XY) – E(X)E(Y)–> If allele A1 (B1) present, X (Y) = 1 else 0D = pAB – pApB
• To standardize a covariance turn it into a correlation
24
Coefficient of Determination
• Allele at locus M can predict allele at QCorrelation between allelic states:
• This standardization is more useful for association mapping purposes
25
Variance explained by marker
• Non-independence leads the mean of lines carrying M versus m to differ: the marker explains phenotypic variance
• variance explained by the marker• variance generated by the QTL
Long, A.D. and Langley, C.H. 1999. The Power of Association Studies to Detect the Contribution of Candidate Genetic Loci to Variation in Complex Traits. Genome Res. 9: 720-731
26
r2 and Sample Size
• If you typed the causal polymorphism, you would need a sample N1 to detect it
• Then to identify a marker in LD with the cause, you need a sample of size N2 ≅ N1 / r2
Pritchard, J.K., and M. Przeworski. 2001. Linkage disequilibrium in humans: models and data. Am J Hum Genet 69:1-14.
27
r2max
•
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Associated Marker Frequency (pB)
r2m
ax
pA = 0.2
pA = 0.5Focal LocusFrequency
28
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Associated Marker Frequency (pB)
r2m
ax
r2max
•
pA = 0.2
pA = 0.5
0 0.05 0.1 0.15 0.20
0.2
0.4
0.6
0.8
1
Associated Marker Frequency (pB)
r2m
ax
pA = 0.02
Focal LocusFrequency
29
QTL & Marker frequencies must match
r2max
Series1Series2Series3Series4Series5Series6Series7Series8Series9Series10Series11Series12Series13
Associated Marker Frequency
Foca
l Lo
cus
Frequency
30
Multi-allelic LD
• l is the minimum number of alleles at loci A and B that is, l = min(k, m)
• If A and B are biallelic
Zhao, H., D. Nettleton, M. Soller, and J.C.M. Dekkers. 2005. Evaluation of linkage disequilibrium measures between multi-allelic markers as predictors of linkage disequilibrium between markers and QTL. Genetical Research 86:77-87
31
Causes of LD
• Four categories- Mutation- Drift / Sampling- Structure- Selection
Slatkin, M. 2008. Linkage disequilibrium – understanding the evolutionary past and mapping the medical future. Nat Rev Genet 9:477-485.
32
Mutation
• Locus M is polymorphic {M, m}• Locus Q is monomorphic {q}• Allele Q appears on gamete carrying M• M and Q are not independent since Q always
appears with M
33
Drift / Sampling
• Linkage equilibrium: MQ, Mq, mQ, mq all represented in expected frequencies
• Sampling differentially increases or reduces certain combinations by chance
• LD has appeared since combinations are no longer in expected frequencies
• Punctual event: Founder effectsRecurrent: Finite population, effective size Ne
34
Structure
• Differential relatedness among individuals in the sample
• Four sub-categories- Subpopulation structure- Admixture- Migration- Hybridization / Pedigree relatedness / “Familial
structure”
35
Subpopulation structure
• Mating is random within sub-populations but very little occurs between sub-populations
• “Very little” means 0 < 4Nm < 1- N effective population size- m migration rate: number of migrants /
population size• No / low migration –> allele frequencies drift:
pA(1) ≠ pA(2) AND pB(1) ≠ pB(2)
36
Subpopulation Structure
37
Extreme / General cases
• Extreme case:- One subpopulation fixed for A and B, the other for
a and b => the Ab and aB gametes never occur• Each subpopulation is contributing an excess
of its “major two-locus haplotype”• General two-subpopulation case:
D = k(1 – k)[pA(1) – pA(2)][pB(1) – pB(2)]- k proportion of subpopulation 1 in total
population
38
One gamete per subpopulation reduces structure-induced LD
• 18 unlinked SSR: of 149 wheats, 95 retained basedon diversity
• Low “repeat contributions” from one subpop
• <=> single pop with high Ne 149
95
Breseghello, F., and M.E. Sorrells. 2006. Association Mapping of Kernel Size and Milling Quality in Wheat (Triticum aestivum L.) Cultivars. Genetics 172:1165-1177.
39
Admixture
• Just like subpoplation structure, but reproductive barriers between subpopulations have recently broken down
Migration• Occasional gametes with “non-equilibrium”
linkage phases arrive in the population
40
• Start with a double heterozygote• It produces gametes with
• So:
Hybridization
41
Selection
• Two sub-categories- LD between different causal loci
o Sub-sub categories:o Selection under an additive model that changes the
additive varianceo Selection under an epistatic model
- LD between causal and nearby neutral loci
42
Selection that changes the variance
• Variance conferred by two-locus AB gametes:var(AX + BY) = (A)2var(X) + (B)2var(Y) +
2(AB)cov(X,Y)= (A)2var(X) + (B)2var(Y) + 2(AB)D
• LD modulates additive variance
Directional SelectionBulmer Effect
Disruptive Selection
43
Selection under epistasis
• Favored gamete will have higher frequency than expected
44
LD caused by hitchhiking
• Selection increases frequency of a novel mutation
• Combinations of neutral loci are co-inherited and reach higher-than-equilbrium frequencies
Smith, J.M., and J. Haigh. 1974. The hitch-hiking effect of a favorable gene. Genetical Research 23:23-35
45Hayes, B.J. et al. 2008. Animal Genetics 39:105-111.
Selection example (Bovine)
46
Selection example (Human)
Sabeti, P.C. et al. 2007. Nature 449:913-918
47
Decay of LD
• One systematic process: recombination• Generation 0
- r0 = Pr(A1B1) and D = r0 – pApB
• What are Generation 1 A1B1 gametes origins?- non-recombinant from A1B1/AB parent: r0(1 – c)
- recombinant from A1B/AB1 parent: pApB c
- r1 = r0(1 – c) + pApB c
48
Recombination decay of LD
• In Generation 1- r1 = r0(1 – c) + pApB c
- D1 = r1 – pApB = D0(1 – c)
• In Generation t- Dt = D0(1 – c)t
• Valid for- Random mating- No drift
0 5 10 15 20 250
0.2
0.4
0.6
0.8
1
c = 0.001c = 0.005c = 0.01c = 0.05c = 0.1c = 0.5
Generation
Dt /
D0
49
Hybrid-Source vs. Population LD
• Double heterozygote hybrid: D = ½ ( ½ – c)- Unlinked D goes to zero in one generation
• Population-wide: D1 = D0(1 – c)- Unlinked D is reduced by half in one generation
• In the population, gametes are produced by individuals for which recombination is ineffective, e.g.
50
Generation + Decay => Equilibrium
• Random mating population of constant size:- Mutation and drift are constantly generating LD- Recombination removes it as a function of
distance between loci
• E(r2) = 1/(1+4Nec)
51
Linked IBD and E(r2)
Sved J.A. (2009) Genetics Research 91:183-192
52
Relationship between LIBD and LD
• LD does not require non-recombination between loci, LIBD does
• For tightly linked loci, LD ≈ LIBD• For loosely linked loci, LD ≠ LIBD
53
LIBD recurrence equation
• If loci have not recombined, they are perfectly correlated, else they are uncorrelated:
• If loci have not recombined over two independent pathways, they are LIBD:
• From one generation to the next:
54
Extension to population subdivision
• Both α and β depend on migration between subpopulations• Any LIBD across subpopulations is generated within
subpopulations• Barring migration, common linkage phase will be ancestral
to subpopulation divergence• Structure is a case where LD and LIBD would not be
expected to be similar
55
Variation around E(r2)
• E(r2) = 1/(1+4Nec)• This is an expectation. There is a LOT of
variability around it.
56
Simulated / expected LD
Mutation / Drift LD: Ne = 100 LD in RIL (Hybridization)
57
Marker Density (chicken example)About 140 cM
• E(r2) still only about 0.4 at 0.2 cM• And when you are that close, you still have
some probability of a very low r2
Andreescu, C. et al. 2007. Genetics 177:2161-2169
58
Diverse 2-row barley example
• E(r2) about 0.3 at 0.2 cM• But when you are that close, you still have
good probability of a very low r2 (< 0.2)
59
Mean versus P(r2 > 0.5)
P(r2 >
0.5
)
Elite N. American spring oat dataset
60
QTL & Marker frequencies must match
r2max
Series1Series2Series3Series4Series5Series6Series7Series8Series9Series10Series11Series12Series13
Associated Marker Frequency
Foca
l Lo
cus
Frequency
61
Association Mapping: a search in 2DM
AF: 0
.0 –
0.5
Genome
Associated markers need to be close in the genome to be in high LD, but they also need to have comparable allele frequencies
62
Extent of LD and marker density
• Power of detection is a function of QTL effect size, number of observations, and LD between QTL and marker
• Use this relationship to choose the desired r2
• “Extent of LD” analyses show the expected r2 at a given distance
• Combine to determine the required density• Hedge because of variability
63
Coverage ≠ Power
Spencer, C.C.A. et al. 2009. Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip. PLoS Genet 5:e1000477.
64
Whole Genome or Candidate Loci?
• Focusing on candidate loci imposes the bias of our current biological knowledge
• Whole Genome imposes two burdens- Higher genotyping cost (usually)- Higher multiplicity of testing
65
Extent of LD in plants
• Breeding system (selfing / outcrossing)• Selection at the loci assayed• Diversity of the panel
- E(r2) = 1/(1+4Nec)
- More diverse Larger Ne
• Population structure in the sample
66
LD affected by selfing rate s
• Recombination is ineffective in homozygotes• ceteris paribus LD decays more slowly in
(partial) selfers than in outcrossersS = 0.00 S = 0.95
Nordborg, M. 2000. Genetics 154:923-929
67
Maize: Outcrosser; Diverse vs. Elite
Tenaillon, M.I. et al. 2002. Patterns of Diversity and Recombination Along Chromosome 1 of Maize. Genetics 162:1401-1413
68
Maize
Remington, D.L. et al. 2001. Structure of linkage disequilibrium and phenotypic associations in the maize genome. PNAS USA 98:11479-11484.
Ching, A. et al. 2002. SNP frequency, haplotype structure and linkage disequilibrium in elite maize inbred lines. BMC Genetics 3:19.
69
Arabidopsis: Selection; Founder effects
Nordborg, M. et al. 2002. The extent of linkage disequilibrium in Arabidopsis thaliana. Nat Genet 30:190-193.
Very approximately 10 Mbp
70
LD decay in Pinus taeda
• Loblolly pine has pollen that moves >100km
• Summary of LD at 19 candidate genes
Brown et al. 2004 PNAS 10:15255
71Copyright © 2007 by the Genetics Society of America
Krutovsky, K. V. et al. Genetics 2005. 171:2029-2041
Douglas fir
72Copyright © 2007 by the Genetics Society of America
Ingvarsson, P. K. Genetics 2005. 169:945-953
Aspen (Populus)
73
Barley: Selfer; Wild vs Cultivated
Caldwell, K.S. et al. 2006. Extreme Population-Dependent Linkage Disequilibrium Detected in an Inbreeding Plant Species, Hordeum vulgare. Genetics 172:557-567.
Wild
Cultivated
74
More Wild Barley
Steffenson, B.J. et al. 2007. Aust. J Agric. Res. 58:532-544
75
Barley: North American EliteVery approximately 200 Mbp
Ham
blin
et a
l. 20
10. C
rop
Scie
nce
50:5
56:5
66
76
Rice
Mather, K.A. et al. 2007. The Extent of Linkage Disequilibrium in Rice (Oryza sativa L.). Genetics 177:2223-2232.
77
Rice
McNally et al. in preparation
78Copyright © 2007 by the Genetics Society of America
Hamblin, M. T. et al. Genetics 2005;171:1247-1256
Sorghum
kbp
Recommended