Upload
loki
View
39
Download
0
Tags:
Embed Size (px)
DESCRIPTION
QTL mapping. Simple Mendelian traits are caused by a single locus, and come in the ‘ all-or-none ’ flavor. A Quantitative Trait is one in which many loci contribute. The phenotype can therefore vary in a ‘ quantitative ’ manner. Ades 2008, NHGRI. Modified from Mike White slides, 2010. - PowerPoint PPT Presentation
Citation preview
1
QTL mapping
Ades 2008, NHGRI
Simple Mendelian traits are caused by a single locus, and come in the ‘all-or-none’ flavor.
A Quantitative Trait is one in which many loci contribute. The phenotype can therefore vary in a ‘quantitative’ manner.
Modified from Mike White slides, 2010
2
Goals of QTL mapping
Ades 2008, NHGRI
To identify the loci that contribute to phenotypic
variation
1. Cross two parents with extreme phenotypes
2. Score the progeny for the phenotype
3. Genotype the progeny at markers across the genome
4. Associate the observed phenotypic variation with the underlying genetic variation
5. Ultimate goal: identify causal polymorphisms that explain the phenotypic variation
Modified from Mike White slides, 2010
3
Backcross
Broman and Sen 2009
Phenotype: Drug tolerance
80% 20% viability
Usually have at least 100 individuals
4
Intercross
Broman and Sen 2009
Phenotype: Drug tolerance
80% 20% viability
5
Backcross vs. Intercross
• An intercross recovers all three possible genotypes (AA, BB, AB). This allows detection of dominance with both alleles and provides estimates of the degree of dominance.
• A backcross has more power to detect QTL with fewer individuals.
• A backcross may be the only possible scheme when crossing two different species.
6
Genetic map: specific markers spaced across the genome
Markers can be:
•SNPs at particular loci
•Variable-length repeatse.g. ALU repeats
•ALL polymorphisms (if have whole genomes)
Ideally, markers shouldbe spaced every 10-20 cM
and span the whole genome
7
Genotype data: Determine allele at all markers in each F2
8
Phenotype data
9Broman and Sen 2009
1. Missing Data ProblemUse marker data to infer intervening genotypes
2. Model Selection ProblemHow do the QTL across the genome combine with the covariates to
generate the phenotype?
Test which markers correlate with the phenotype
10
Marker regression: simple T-test (or ANOVA) at each marker
Marker 1: no QTL Marker 2: significant QTL (population means are different)
Test which markers correlate with the phenotype
11
Marker regression
• Simple test – standard T-test/ANOVA
• Covariates (e.g. Gender, Environment) are easy to incorporate
• No genetic map necessary, since test is done separately on each marker
Advantages:
Disadvantages:
• Any individuals with missing marker data must be omitted from analysis
• Does not effectively consider positions between markers
• Does not test for genetic interactions (e.g. epistasis)
• The effect size of the QTL (i.e. power to detect QTL) is reduced by incomplete linkage to the marker
• Difficult to pinpoint QTL position, since only the marker positions are considered
12
Interval mapping
• In addition to examining phenotype-genotype associations at markers, look for associations between makers by inferring the genotype
Q
• The methods for calculating genotype probabilities between markers typically use hidden Markov models to account for additional factors, such as genotyping errors
• Lander and Botstein 1989
13
Interval mapping
Broman and Sen 2009
14
Interval mapping
• Takes account of missing genotype information – all individuals are included
• Can scan for QTL at locations in between markers
• QTL effects are better estimated
Advantages:
Disadvantages:
• More computation time required
• Still only a single-QTL model – cannot separate linked QTL or examine for interactions among QTL
15
LOD scores
• Measure of the strength of evidence for the presence of a QTL at each marker location
LOD(λ) = log10 likelihood ratio comparing the hypothesis of a QTL at position λ versus that of no QTL
Pr(y|QTL at λ, µAAλ, µABλ, σλ)
Pr(y|no QTL, µ, σ) { }log10
Ph
en
oty
pe
LOD 3 means that the TOP model is 103 times more likely than
the BOTTOM model
16
LOD curves
How do you know which peaks are really significant?
17
LOD threshold
Broman and Sen 2009
•Consider the null hypothesis that there are no QTLs genome-wide
one locationgenome-wide
1. Randomize the phenotype labels on the relative to the genotypes2. Conduct interval mapping and determine what the maximum LOD score is
genome-wide3. Repeat a large number of times (1000-10,000) to generate a null distribution
of maximum LOD scores
18
LOD threshold
• 1000 permutations10% ‘Genome-wide Error Rate’ = LOD 3.19
(means that at this LOD cutoff 10% of peaks could be random chance)5% GWER = LOD 3.52
• Boundary of the peak is often taken as points that cross (Max LOD – 1.5) (or - 1.8 for an intercross)
•Often these regions are very large & encompass many (hundreds) of genes
19
Lessons from QTL mapping studies about Genetic Architecture
* Often have a few big effect QTL and many small modifier QTLwith small effects on the phenotype
need lots of power (good phenotypic measurements and many individuals) to detect QTLs with small effects
* Recombination in F2’s can reveal negative effects segregating in the parentse.g. can find resistant-parent allele associated with sensitivity
MacKay review: often have loci with complementary effects found nearby
* Effects of an allele can be context dependentEnvironment-specific effects: Gene x Environment (GxE) interactionsGenomic context: epistatic (i.e. gene-gene) interactions are likely very
common … but difficult to detect
An alternative approach: Genome Wide Association Studies (GWAS)
Here the phenotypes and genotypes come from manydifferent individuals from a population
Identify SNPs that are significantly associated with the traitacross a bunch of individuals
An alternative approach: Genome Wide Association Studies (GWAS) across many individuals
Str
ains
Genotypes for 65 strains
Phenotypes for 65 strains
Population Structure
PhylogeneticRelatedness
RandomError
RandomError
Typically use a mixed linear model to test for significance
Phenotypic variance y = μ + a + other stuff + Error
Phenotypicmean
Additive Genetic Effects
across all involved genes
Phe
noty
pe
GenotypeAA TT
Identify SNPs that are significantly associated with the trait
23
A very important control for both types of mapping:controlling for covariates
Sometimes a SNP can appear correlated with phenotypic variation … but it can be due to some other feature that co-varies with the SNP and the phenotype
The clearest example: population structure
Other examples:- gender of the individuals- shared environments for subgroups- an example from our yeast studies:
ploidy differences when some F2s are haploidand some are diploid
24
Example: S. cerevisiae strains (Liti et al. 2009)
Oak strains
Vineyard strains
GenotypeAA TT
Phe
noty
pe
Mixed linear model identifies SNPs with a significant p-value.Often plot the –log(p) across the genome (Manhattan plot)
Again, the p-value cutoff comes from permutations(randomize the strain-phenotype labels and perform mapping
on randomized data 10,000 times)
How to find the causative SNP/polymorphism in giant regions?
Often very challenging to find which SNP(s) or polymorphisms(copy-number differences, rearrangements, etc) are causal
Some strategies people use:- Look at what’s known about the genes in the peak
CAUTION: very easy to get led by what ‘seems likely’
- Look at signatures of selection within the populatione.g. differences in FST
- Look for derived alleles
- Look for coding changes, genes in the region with severe expressiondifferences
- Combine with other datae.g. other mapping studies (QTL + GWAS), genomic datasets