QTL mapping

1

QTL mapping

Ades 2008, NHGRI

Simple Mendelian traits are caused by a single locus, and come in the ‘all-or-none’ flavor.

A Quantitative Trait is one in which many loci contribute. The phenotype can therefore vary in a ‘quantitative’ manner.

Modified from Mike White slides, 2010

2

Goals of QTL mapping

Ades 2008, NHGRI

To identify the loci that contribute to phenotypic

variation

1. Cross two parents with extreme phenotypes

2. Score the progeny for the phenotype

3. Genotype the progeny at markers across the genome

4. Associate the observed phenotypic variation with the underlying genetic variation

5. Ultimate goal: identify causal polymorphisms that explain the phenotypic variation

Modified from Mike White slides, 2010

3

Backcross

Broman and Sen 2009

Phenotype: Drug tolerance

80% 20% viability

Usually have at least 100 individuals

4

Intercross

Broman and Sen 2009

Phenotype: Drug tolerance

80% 20% viability

5

Backcross vs. Intercross

• An intercross recovers all three possible genotypes (AA, BB, AB). This allows detection of dominance with both alleles and provides estimates of the degree of dominance.

• A backcross has more power to detect QTL with fewer individuals.

• A backcross may be the only possible scheme when crossing two different species.

6

Genetic map: specific markers spaced across the genome

Markers can be:

•SNPs at particular loci

•Variable-length repeatse.g. ALU repeats

•ALL polymorphisms (if have whole genomes)

Ideally, markers shouldbe spaced every 10-20 cM

and span the whole genome

7

Genotype data: Determine allele at all markers in each F2

8

Phenotype data

9

Statistical framework

Broman and Sen 2009

1. Missing Data ProblemUse marker data to infer intervening genotypes

2. Model Selection ProblemHow do the QTL across the genome combine with the covariates to

generate the phenotype?

10

Marker regression: simple T-test (or ANOVA) at each marker

Marker 1: no QTL Marker 2: significant QTL (population means are different)

11

Marker regression

• Simple test – standard T-test/ANOVA

• Covariates (e.g. Gender, Environment) are to incorporate

• No genetic map necessary, since test is done separately on each marker

Advantages:

Disadvantages:

• Any individuals with missing marker data must be omitted from analysis

• Does not effectively consider positions between markers

• Does not test for genetic interactions (e.g. epistasis)

• The effect size of the QTL (i.e. power to detect QTL) is reduced by incomplete linkage to the marker

• Difficult to pinpoint QTL position, since only the marker positions are considered

12

Interval mapping

• In addition to examining phenotype-genotype associations at markers, look for associations between makers by inferring the genotype

Q

• The methods for calculating genotype probabilities between markers typically use hidden Markov models to account for additional factors, such as genotyping errors

• Lander and Botstein 1989

13

Interval mapping

Broman and Sen 2009

14

Interval mapping – maximum likelihood

1. Calculate genotype probabilities at intervening locations for every individual

2. At a marker, calculate the conditional probability that an individual is in one of the two QTL genotype groups (AA or AB) given their phenotype and the current estimates of µAA

(s-1) and µAB

(s-1) (Expectation Step)

3. Calculate new estimates of µAA(s)

and µAB(s), by combining the genotype

probabilities of each individual with their phenotypic values (Maximization Step)

4. Repeat until the estimates of µAA(s-1), µAA

(s) and µAB

(s-1), µAB(s) converge.

15

Interval mapping

• Takes account of missing genotype information – all individuals are included

• Can scan for QTL at locations in between markers

• QTL effects are better estimated

Advantages:

Disadvantages:

• More computation time required

• Still only a single-QTL model – cannot separate linked QTL or examine for interactions among QTL

16

LOD scores

• Measure of the strength of evidence for the presence of a QTL at each marker location

LOD(λ) = log10 likelihood ratio comparing the hypothesis of a QTL at position λ versus that of no QTL

Pr(y|QTL at λ, µAAλ, µABλ, σλ)

Pr(y|no QTL, µ, σ) { }log10

Ph

en

oty

pe

LOD 3 means that the TOP model is 103 times more likely than

the BOTTOM model

17

LOD curves

How do you know which peaks are really significant?

18

LOD threshold

Broman and Sen 2009

•Consider the null hypothesis that there are no QTLs genome-wide

one locationgenome-wide

1. Randomize the phenotype labels on the relative to the genotypes2. Conduct interval mapping and determine what the maximum LOD score is

genome-wide3. Repeat a large number of times (1000-10,000) to generate a null distribution

of maximum LOD scores

Leoine Moyle, Indiana University “Dissecting Speciation via the Genetics of Isolation and Adaptation”

Genetics ColloquiumWednesday, March 14

3:30 pmBiotech Center Auditorium Room 1111

20

LOD threshold

• 1000 permutations10% False Discovery Rate = LOD 3.19

(means that at this LOD cutoff 10% of peaks could be random chance)5% FDR = LOD 3.52

• Boundary of the peak is often taken as points that cross (Max LOD – 1.5) (or - 1.8 for an intercross)

21

LOD curves – Marker regression vs. interval mapping

IMMR

•With complete marker genotype information, marker regression would give the same results as interval mapping

22

Other mapping methods

• Methods discussed assume single QTL models• Multiple QTLs on a chromosome are not estimated correctly• Cannot detect a QTL whose effect is dependent on the genotype at a second QTL (epistasis)

• Two-dimensional two-QTL scans•Consider all pairs of markers across the genome

• Multiple QTL Models• Jointly estimate all sets of QTL, interactions, and covariates in a single, coherent model• Focuses on the model selection problem of QTL mapping

Can also apply other Models

23

From QTL to candidate genes

• F2 mapping results in large loci associated with the phenotype• Mapping a QTL that explains 5% of the phenotypic variance in 300 F2 animals will yield a region approximately 40 cM in size (800 genes in mice!)

• 2050 mouse and 700 rat QTL have been mapped (reviewed in Flint et al. 2005)• ~20 underlying genes have been identified

Strategies for getting to causal loci:1.Generate additional recombinants to fine map QTL

•Effect sizes of QTL can be overestimated•Often one large QTL is composed of manly tightly linked QTL of small effect

2.Identify candidate genes from known mutants, tissue-specific expression, etc.

3.Identify candidate genes through comparison to association mapping studies or population genomics studies

•Are the results repeatable across environments?•Association mapping and population genomics approaches only identify alleles with large effect sizes

Documents

QTL mapping