20
1 Signatures of Selection nt types of selection leave behind different signatures on th e selection through recent selective sweep: reduces variation flan he selected site (even if neutral) due to hitchhiking ing selection can increase variation since >1 extreme alleles se selection for diverse viral antigens to evade host immune system selection can increase variation by maintaining >1 allele in p maintained heterozygosity (sickle cell anemia) OR erent alleles in different subpopulations due to fluctuating enviro election: reduces variation at the affected site(s) but also hboring sites through background selection

1 Signatures of Selection Different types of selection leave behind different signatures on the genome Positive selection through recent selective sweep:

Embed Size (px)

Citation preview

Page 1: 1 Signatures of Selection Different types of selection leave behind different signatures on the genome Positive selection through recent selective sweep:

1

Signatures of Selection

Different types of selection leave behind different signatures on the genome

Positive selection through recent selective sweep: reduces variation flankingthe selected site (even if neutral) due to hitchhiking

Diversifying selection can increase variation since >1 extreme alleles selected e.g. selection for diverse viral antigens to evade host immune system

Balancing selection can increase variation by maintaining >1 allele in populatione.g. maintained heterozygosity (sickle cell anemia)

OR different alleles in different subpopulations due to fluctuating environments

Negative selection: reduces variation at the affected site(s) but also atneighboring sites through background selection

Page 2: 1 Signatures of Selection Different types of selection leave behind different signatures on the genome Positive selection through recent selective sweep:

2

Signatures of SelectionAlso different methods of looking for these signatures

1. Evolutionary rate within species vs. between speciese.g. Ka/Ks ratio & McDonald-Kreitman tests for coding sequences

HKA and multi-locus HKA tests for non-coding sequences

2. Frequency spectrum: frequency of different alleles in the populatione.g. Tajima’s D … Fay & Wu’s H … Fu & Li’s D*

3. Linkage disequillibrium & Haplotype structure

For all of these tests: compare REAL DATA toa MODEL of what data should look like under neutral evolution …

can also compare test results at specific loci vs. a scan across the genome

Page 3: 1 Signatures of Selection Different types of selection leave behind different signatures on the genome Positive selection through recent selective sweep:

3

Signatures of SelectionAlso different methods of looking for these signatures

1. Evolutionary rate within species vs. between speciese.g. Ka/Ks ratio & McDonald-Kreitman tests for coding sequences

HKA and multi-locus HKA tests for non-coding sequences

2. Frequency spectrum: frequency of different alleles in the populatione.g. Tajima’s D … Fay & Wu’s H … Fu & Li’s D*

3. Linkage disequillibrium & Haplotype structure

For all of these tests: compare REAL DATA toa MODEL of what data should look like under neutral evolution …

can also compare test results at specific loci vs. a scan across the genome

Page 4: 1 Signatures of Selection Different types of selection leave behind different signatures on the genome Positive selection through recent selective sweep:

4

Methods based on the Allele Frequency Spectrum

1. For each ‘derived’ (=non-ancetsral) allele at a given locus, calculate the frequency.Some alleles will be at high frequencies in the population,some at low frequencies (i.e. very uncommon)

2. Make a histogram of the % of alleles with different frequencieslooking for an excess of rare alleles or of common alleles

From Nielsen Nat Rev Gen 2005 review

Page 5: 1 Signatures of Selection Different types of selection leave behind different signatures on the genome Positive selection through recent selective sweep:

5

Methods based on the Allele Frequency Spectrum

Tajima’s D (F. Tajima, 1989): takes the # of segregating sites within species (S)and also the average # difference between each pair of sequences ()

S = 3 = (2 + 2 + 1 + 2) + (2 + 1 + 0) + (1 + 2) +(1) = 1.4

10 pairwise comparisons

avg. # difs betweeneach pair of sequences

Tajima’s D compares S and to estimate the proportion of low/high-frequency alleles

Page 6: 1 Signatures of Selection Different types of selection leave behind different signatures on the genome Positive selection through recent selective sweep:

6

Methods based on the Allele Frequency Spectrum

Tajima’s D (F. Tajima, 1989): takes the # of segregating sites within species (S)and also the average # difference between each pair of sequences ()

S versus reflects on allele frequency

Multiple ways to calculate …S/a

Negative Tajima’s D = excess of low-frequency alleles (= reduced variation) (< S/a)

Indicates positive selection, OR recent deleterious alleles, OR population expansion**

Positive Tajima’s D = excess of intermediate-frequency alleles(> S/a)

(low amounts of both high- and low-frequency alleles)

Indicates balancing selection OR partial sweep OR population bottleneck**

How can you get a p-value? Difficult to estimate - best to compare across loci

Page 7: 1 Signatures of Selection Different types of selection leave behind different signatures on the genome Positive selection through recent selective sweep:

7

From Nielsen Nat Rev Gen 2005 review

Empirical model for significance of Tajima’s D

Sliding window across a locus OR Compare to several other loci

From Will et al. PLoS Genetics 2010

Page 8: 1 Signatures of Selection Different types of selection leave behind different signatures on the genome Positive selection through recent selective sweep:

8

Genome-wide scans of FST

FST is a measure of population subdivision:the proportion of the total genetic variance T contained in a subpopulation Srelative to the total genetic variance in the species

T - S

T

FST =

Where = average # pairwise nucleotide differences per site

If S = T (i.e amount of variation in the subpopulation is same as total population)FST = 0 … NO population subdivision

If there’s variation in the total sample, but NO variation within each subpopulationS = FST = 1 … COMPLETE differentiation between subpopulations

Page 9: 1 Signatures of Selection Different types of selection leave behind different signatures on the genome Positive selection through recent selective sweep:

9

FST = 1: very strong population subdivisions … may belittle gene flow between

those populations

Page 10: 1 Signatures of Selection Different types of selection leave behind different signatures on the genome Positive selection through recent selective sweep:

10

Difficult to interpret what a given FST means (FST = 0.15 means ???)

But, can use variation in FST across the genome to look for evidence of partial selective sweeps in specific sub-populations:

i.e. little gene flow at specific loci only

Genome-wide scans of FST

Page 11: 1 Signatures of Selection Different types of selection leave behind different signatures on the genome Positive selection through recent selective sweep:

From Akey et al. 2002: FST across each human chromosome

Page 12: 1 Signatures of Selection Different types of selection leave behind different signatures on the genome Positive selection through recent selective sweep:

12

LD & Haplotype Structure

Linkage equillibrium: when segregation of two different alleles is independent of one another

Linkage disequillibrium (LD): segregation of two alleles are NOT random- two SNPs in close proximity are linked physically- can measure the distance over which their association breaks down

LD break-down depends on generation time and recombination rate

SNPs very close together will take many generations to get separated

Page 13: 1 Signatures of Selection Different types of selection leave behind different signatures on the genome Positive selection through recent selective sweep:

13

Haplotype: block of linked SNPs

LD & Haplotype Structure

Linkage equillibrium: when segregation of two different alleles is independent of one another

Linkage disequillibrium (LD): segregation of two alleles are NOT random- two SNPs in close proximity are linked physically- can measure the distance over which their association breaks down

Haplotype 1 at Locus A

Haplotype 2 at Locus A

Haplotype 3 at Locus A

Page 14: 1 Signatures of Selection Different types of selection leave behind different signatures on the genome Positive selection through recent selective sweep:

14

Remember that a recent selective sweep can reduce variation flankingthe advantageous site.

The strength of selection and time since sweep affects the degree and length of reduced variation.

This effectivelycreates an unusually

long haplotype(compared to others

in the genome)

LD & Haplotype Structure

Page 15: 1 Signatures of Selection Different types of selection leave behind different signatures on the genome Positive selection through recent selective sweep:

15

EHH: Extended Haplotype Homozygosity test for RECENT positive selection

Recent positive selection through partial selective sweep:* extended haplotype length* high frequency in subpopulation

must account for regional differences in recombination rates

EuropeanAsian

Yoruban

Beni

Shona

African

Page 16: 1 Signatures of Selection Different types of selection leave behind different signatures on the genome Positive selection through recent selective sweep:

16

EHH: Extended Haplotype Homozygosity test for RECENT positive selection

EHH = % of individuals sharing CORE haplotype that remain identicalout to a distance of x

Defined Core Haplotype

Page 17: 1 Signatures of Selection Different types of selection leave behind different signatures on the genome Positive selection through recent selective sweep:

17

Relative EHH: normalize EHH for one haplotype to EHH of all others at that locusinternally controls for locus-specific effects

EHH: Extended Haplotype Homozygosity test for RECENT positive selection

African haplotype

Page 18: 1 Signatures of Selection Different types of selection leave behind different signatures on the genome Positive selection through recent selective sweep:

18

EHH: Extended Haplotype Homozygosity (& other methods) test for RECENT positive selection

Related test from Jonathan Pritchard: iHS test

Benefits of EHH & iHS scans:* Don’t have to know populations a priori … define by haplotypes* More sensitive than traditional tests for selection

Remaining challenges:* Often have no idea WHY - how to link to phenotypes of interest?

Stinchcombe & Hoekstra review: combining scans with QTL mapping

* Often unclear what SNP was selected for … identifies huge regions

Page 19: 1 Signatures of Selection Different types of selection leave behind different signatures on the genome Positive selection through recent selective sweep:

19

CMS incorporates results of 5 different tests:FST

iHS & XP-EHHDAF (looking at derived allele frequencies)

iHH (looking at absolute haplotype length)

Science. February 12, 2010

Page 20: 1 Signatures of Selection Different types of selection leave behind different signatures on the genome Positive selection through recent selective sweep:

20

CMS outperforms single tests in simulated data