49
Population Genetics I. Bio5488 - 2016 Don Conrad [email protected]

Population Genetics I. Bio5488 - 2016

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Population Genetics I. Bio5488 - 2016

Population Genetics I.�Bio5488 - 2016

Don [email protected]

Page 2: Population Genetics I. Bio5488 - 2016

Why study population genetics?

•  Functional Inference•  Demographic inference:

–  History of mankind is written in our DNA. We can learn about our species’ population size changes, migrations, etc.

•  Complex disease:–  What approaches for analysis make sense?

•  Molecular biology:–  Measure rates of biological processes like mutation and recombination,

learn about gene regulation, speciation •  Sequence era. Framework for understanding these sequences.•  You will have your own genome sequence

Page 3: Population Genetics I. Bio5488 - 2016

Outline for Part I and Part II •  Theory

–  Hardy-Weinberg –  Forward Models: Wright Fisher Model –  Backward Models: Coalescent

•  Data –  Mutation, mutation rates –  Global diversity, serial bottleneck model –  Recombination, LD blocks, hotspots PRDM9 –  Natural Selection

Page 4: Population Genetics I. Bio5488 - 2016

Hardy-Weinberg

•  What is the fate of a neutral genetic variant at a biallelic locus in an infinite population?

•  Udney Yule: individuals with dominant traits will increase in the population over time

•  Hardy: Yule is wrong, and that expected genotype frequencies are simply the product of underlying allele frequencies assuming independence

Page 5: Population Genetics I. Bio5488 - 2016

A (100%) a (0%)

A (0%) AA (0%) Aa (0%)

a (100%) Aa (100%)

aa (0%)

Hardy’s Argument: Generation 1

Males

Females

Page 6: Population Genetics I. Bio5488 - 2016

A (50%) a (50%)

A (50%) AA (25%) Aa (25%)

a (50%) Aa (25%)

aa (25%)

p2 + 2pq + q2 = 1

Hardy’s Argument: Generation 2

Males

Females

p = ( 2*25+1*50 ) / 200 = 0.5q = 1-p = 0.5

Page 7: Population Genetics I. Bio5488 - 2016

Gcbias.org

Page 8: Population Genetics I. Bio5488 - 2016

Modern Synthesis •  Reconciliation of Mendelian genetics with

observations of the Biometrists •  Reconciliation of Mendelian genetics with Darwinian

evolution

R.A. Fisher Sewell Wright J.B.S Haldane

Page 9: Population Genetics I. Bio5488 - 2016

Wright-Fisher Model Assumptions: •  Two allele system •  N diploid individuals in each generation •  2N gametes •  Random mating, no selection •  Discrete generations

Aa

Generationt

t + 1

A a

a A

a

A

A A

a a Gamete pool

Page 10: Population Genetics I. Bio5488 - 2016

Let’s play a round of this game

Page 11: Population Genetics I. Bio5488 - 2016

The game is faster by computer

I = 400 A = 200 R = 100 G = 100

I = Number of GenerationsA = Population size (gametes)�G = Count of the G alleleR = Count of the R allele

Page 12: Population Genetics I. Bio5488 - 2016

I = 400 A = 200 R = 100 G = 100

Page 13: Population Genetics I. Bio5488 - 2016

I = 400 A = 200 R = 100 G = 100

Page 14: Population Genetics I. Bio5488 - 2016

Let’s investigate this phenomenon

•  Change Population Size

•  Change allele frequencies

Page 15: Population Genetics I. Bio5488 - 2016

I = 40 A = 20 R = 10 G = 10

Page 16: Population Genetics I. Bio5488 - 2016

I = 40 A = 20 R = 10 G = 10

Page 17: Population Genetics I. Bio5488 - 2016

I = 1000 A = 2000 R = 1000 G = 1000

Page 18: Population Genetics I. Bio5488 - 2016

I = 1000 A = 2000 R = 1000 G = 1000

Page 19: Population Genetics I. Bio5488 - 2016

I = 400 A = 200 R = 150 G = 50

Page 20: Population Genetics I. Bio5488 - 2016

I = 400 A = 200 R = 150 G = 50

Page 21: Population Genetics I. Bio5488 - 2016

I = 400 A = 200 R = 150 G = 50

Page 22: Population Genetics I. Bio5488 - 2016

Can we deduce general rules?

•  Larger population size = alleles stick around longer. Less susceptibility to “random walk”

•  Probability of winning seems related to initial frequencies. At 50/50 50% chance of either allele winning. Hypothesize: probability of winning is proportional to initial frequency.

•  Hypothesis: One allele must always win.

Page 23: Population Genetics I. Bio5488 - 2016

•  Each generation, the new population is made by sampling with replacement from the previous generation

A a

a A

a

A

A A

a a

aA

Aa

AA

Aa

Let: Pt = freq (A) among gametesPt+1 = …. In the next generationnt+1 = count of (A) …..

Then: nt+1 ~ Binomial (Pt, 2N)

Pr( nt+1 = m) E( pt+1) = Pt Var( pt+1) = pt (1-pt)

2N

= 2Nm

!

"##

$

%&& pt

m1−pt( )2N−m

Implications: sampling variance (“genetic drift”) is dependent on population size. Allele frequency is a random sequence of numbers: p1, p2, p3,… Eventually p = 1 or p = 0. Stay “fixed”until new mutation.

Page 24: Population Genetics I. Bio5488 - 2016

An important concept: Drift

•  Drift – stochastic fluctuations in allele frequency due to random sampling in a finite population.

Page 25: Population Genetics I. Bio5488 - 2016

Drift versus Darwin

•  How can we add selection to our game?

•  We need to account for dominant and recessive alleles!

Page 26: Population Genetics I. Bio5488 - 2016

The Wright Fisher Game v0.2 •  Define relative fitness for each possible

individual Fitness RR = 1 Fitness RB = 1.1 Fitness BB = 2 Modify rules. Pick an individual with probability

proprotional to the fitness of her genotype. A given BB individual is twice as likely to be picked. Now choose one chromosome and put into the next generation.

Page 27: Population Genetics I. Bio5488 - 2016

What relative fitness should we select?

•  Conserved elements <0.01% increase

in fitness

Page 28: Population Genetics I. Bio5488 - 2016

Drift versus Darwin

I = 100 A = 100 R = 99 G = 1 fG = 2*fR

Page 29: Population Genetics I. Bio5488 - 2016

I = 100 A = 100 R = 99 G = 1 fG = 3*fR

Page 30: Population Genetics I. Bio5488 - 2016

I = 100 A = 100 R = 99 G = 1 fG = 3*fR

Page 31: Population Genetics I. Bio5488 - 2016

I = 100 A = 100 R = 99 G = 1 fG = 3*fR

Page 32: Population Genetics I. Bio5488 - 2016

I = 100 A = 2000 R = 1999 G = 1 fG = 3*fR

Page 33: Population Genetics I. Bio5488 - 2016

Some startling results!

•  Survival of the fittest luckiest.

•  Sometimes drift can overcome selection. Depends on allele frequency, population size.

•  Most new advantageous mutations are not fixed!

Page 34: Population Genetics I. Bio5488 - 2016

Mutation

•  Infinite alleles model – Assumptions

Page 35: Population Genetics I. Bio5488 - 2016

I = 5000 U = 0.0001 Start as Homozygous At allele A

U=mutation rate

Page 36: Population Genetics I. Bio5488 - 2016

Summary thus far •  Chance can play a large role in determining which

polymorphisms are fixed in a population. •  The fittest don’t always survive. •  These findings are/were not obvious. •  They become (more) obvious with quantitative

investigation.

•  And we’ve only scratched the surface.

Page 37: Population Genetics I. Bio5488 - 2016

Further explorations of this model

•  To date our approach has been based on observations of simulations. But the model is simple – analytic approach may prove fruitful.

•  Our hypotheses: –  Can we prove them?

–  Can we quantify them?

•  Lets explore this hypothesis: One allele must always win.

Page 38: Population Genetics I. Bio5488 - 2016

The Decay of Heterozygosity

•  Define Gt, the homozygosity at generation t.

= probability of picking two genomes from population and they are the same allele

•  Then the heterozygosity Ht = 1- Gt .

Page 39: Population Genetics I. Bio5488 - 2016

What is G0 R

B R B B B

B B

Generation 0

1. Pick R then R

= number of R’s / 2N * number Rs-1 / (2N-1) 2. Pick B then B

= number of B’s / 2N) * (number B’s-1) / (2N-1)

Page 40: Population Genetics I. Bio5488 - 2016

What is G1?

Probability = 1/2N

Probability (1-1/2N)*G0

Generation 0 Generation 1

Generation 0 Generation 1

Page 41: Population Genetics I. Bio5488 - 2016

Proof of decay of heterozygosity

Page 42: Population Genetics I. Bio5488 - 2016

What is the half life of H?

•  H0 /2 = H0(1-1/2N)t

•  t = 2Nln2

•  N = 10^4, t = 1.1e5 generations

Page 43: Population Genetics I. Bio5488 - 2016

What does this mean?

•  In a large population, eventually, every allele will have descended from a single allele in the founding population! All but 1 allele will have “died off”.

•  Drift-Mutation-Selection balance.

Page 44: Population Genetics I. Bio5488 - 2016

-Genealogical Analysis of all 131K Icelanders born after 1972

Page 45: Population Genetics I. Bio5488 - 2016

Analysis of selection

Genotype Total AA Aa aa

Freq in generation t q2 2pq p2 1 = q2 + 2pq + p2 Fitness w11 w12 w22 Freq (after selection) q2w11 2pqw12 p2w22 ŵ = q2w11 + 2pqw12+p2w22

pt+1 = p2w22 +pqw12ŵ

qt+1 = q2w11 +pqw12ŵ

“Recursion equations”

Assumptions in this example: no drift or mutation, discrete generations, random mating

Page 46: Population Genetics I. Bio5488 - 2016

Evolutionary dynamics in a simplex for a biallelic locus

Modified from Gokhale C S , Traulsen A PNAS 2010;107:5500-5504 ©2010 by National Academy of Sciences

AA

Aa

aa

Page 47: Population Genetics I. Bio5488 - 2016

Dynamics:Topics covered •  Selection (additive, balancing, frequency-dependent) •  Altruism, kin selection •  Structural variation (inversions) •  Multiple loci (recombination, epistatic selection) •  Population structure (island model, stepping stone

model, isolation by distance, metapopulation models) •  Assortative mating •  Sex-specific effects (migration, selection) •  Variable environments, etc…

Page 48: Population Genetics I. Bio5488 - 2016

Sampling with Replacement

•  Some alleles pass on no copies to the next generation, while some pass on more than one.

Present

Past

Page 49: Population Genetics I. Bio5488 - 2016

The Coalescent Process •  “Backward in time process” •  Discovered by JFC Kingman,

F. Tajima, R. R. Hudson c. 1980

•  DNA sequence diversity is shaped by genealogical history

•  Genealogies are unobserved but can be estimated

•  Conceptual framework for population genetic inference: mutation, recombination, demographic history

ACTT

ACGT ACGT ACTT ACTT AGTT

T

G

C G