29
Lecture 3: Allele Frequencies and Hardy- Weinberg Equilibrium August 27, 2012

Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

Embed Size (px)

Citation preview

Page 1: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium

August 27, 2012

Page 2: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

Last Time

Review of genetic variation and Mendelian Genetics

Methods for detecting variation

Morphology

Allozymes

DNA Markers

AnonymousSequence-tagged

Page 3: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

Today

Sequence probability calculation

Molecular markers: DNA sequencing

Introduction to statistical distributions

Estimating allele frequencies

Introduction to Hardy-Weinberg Equilibrium

Using Hardy-Weinberg: Estimating allele frequencies for dominant loci

Page 4: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

If nucleotides occur randomly in a genome, which sequence should occur

more frequently?AGTTCAGAGT

AGTTCAGAGTAACTGATGCT

What is the expected probability of each sequence to occur once?

How many times would each sequence be expected to occur by chance in a

100 Mb genome?

Page 5: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

AGTTCAGAGT

What is the expected probability of each sequence to occur once?

What is the sample space for the first position?A

T

G

C

Probability of “A ” at that position?

4

1

Probability of “A ” at position 1, “G” at position 2, “T ” at position 3, etc.?

710 1054.925.04

1

4

1

4

1

4

1

4

1

4

1

4

1

4

1

4

1

4

1 xxxxxxxxxx

AGTTCAGAGTAACTGATGCT

1320 1009.925.0 x

Page 6: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

AGTTCAGAGT

How many times would each sequence be expected to occur in a 100 Mb

genome?

4.95101054.9 87 x

AGTTCAGAGTAACTGATGCT

5813 101.9101009.9 xx

Why is this calculation wrong?

Page 7: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

),()|()( BPBAPBAP ),()()()( BAPBPAPBAP

A B

AGTTCAGAGTAACTGATGCTAGT TCA GAG TAA CTG ATG CT

UCA AGU CUC AUU GAC UAC GA

Ser Cys Phe Ile Asp Tyr

UGA AGU CUC AUU GAC UAG GA Stop Cys Phe Ile Asp Stop

Page 8: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

DNA Sequencing Direct determination of sequence

of bases at a location in the genome

Shotgun versus PCR sequencing

Dye terminators (Sanger) and capillaries revolutionized DNA sequencing

Modern sequencing methods (sequencing by synthesis, pyrosequencing) have catapulted sequencing into realm of population genetics

Human genome took 10 years to sequence originally, and hundreds of millions of dollars

Now we can do it in a week for <$2,000

Page 9: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

SNPs A Single Nucleotide

Polymorphism (SNP) is a single base mutation in DNA.

The most common source of genetic polymorphism (e.g., 90% of all human DNA polymorphisms).

Identify SNP by screening a sample of individuals from study population: usually 16 to 48

Once identified, SNP are assayed in populations using high-throughput methods

Page 10: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

Genotyping by Sequencing New sequencing methods generate 10’s of millions of short

sequences per run

Combine restriction digests with sequencing and pooling to genotype thousands of markers covering genome at very high density

http://www.maizegenetics.net/images/stories/GBS_CSSA_101102sem.pdf

Generate 10’s of thousands of markers for <$100 per sample

Presence-Absence Polymorphism

SNP

Page 11: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

Genotyping by Sequencing Cost Example

http://www.maizegenetics.net/gbs-overview

Page 12: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

Statistical Distributions: Normal Distribution

Many types of estimates follow normal distribution

Can be visualized as a frequency distribution (histogram)

Can interpret as a probability density function

Variance (Vx): A measure of the dispersion around the mean:

n

iix xx

nV

1

2)(1

1

Expected Value (Mean):

n

iixn

x1

1

where n is the number of samples

Standard Deviation (sd): A measure of dispersion around the mean that is on same scale as mean

xVsd

1 sd

2 sd

Page 13: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

Standard Error of Mean

Standard Deviation is a measure of how individual points differ from the mean estimates in a single sample

Standard Error is a measure of how much the estimate differs from the true parameter value (in the case of means, μ)

If you repeated the experiment, how close would you expect the mean estimate to be to your previous estimate?

Standard Error of the Mean (se): n

Vse x

95% Confidence Interval: )(96.1 sex

Page 14: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

Estimating Allele Frequencies, Codominant Loci

Measured allele frequency is maximum likelihood estimator of the true frequency of the allele in the population (See Hedrick, pp 82-83 for derivation)

N

NNp

1211 21

Expected number of observations of allele A1: E(Y)=np

Where n is number of samples

For diploid organisms, n = 2N , where N is number of individuals sampled

Expected number of observations of allele A1 is analogous to the mean of a sample from a normal distribution

Allele frequency can also be interpreted as an estimate of the mean

Page 15: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

Assume a population of Mountain Laurel (Kalmia latifolia) at Cooper’s Rock, WV

Allele Frequency Example

Red buds: 5000 Pink buds: 3000White buds: 2000

Phenotype is determined by a single, codominant locus: Anthocyanin

What is frequency of “red” alleles (A1), and “white” alleles (A2)?

A1A1

A1A2

A2A2

,2

221

12111211

N

NN

N

NNp

Frequency of A1 = p

,2

221

12221222

N

NN

N

NNq

Frequency of A2 = q

Page 16: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

Allele Frequencies are Distributed as Binomials

Binomials are variables that can be interpreted as the number of successes and failures in a series of trials

Based on samples from a population

For two-allele system, each sample is like a “trial”

Does the individual contain Allele A1?

Remember, q=1-p, so only one parameter is estimated

Number of ways of observing y positive results in n trials

Probability of observing y positive results in n trials once

,)( yny fsy

nyYP

)!(!

!

yny

nC

y

n ny

where s is the probability of a success, and f is the probability of a failure

Page 17: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

Given the allele frequencies that you calculated earlier for Cooper’s Rock Kalmia latifolia, what is the

probability of observing two “white” alleles in a sample of two plants?

Page 18: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

Variation in Allele Frequencies, Codominant Loci

Binomial variance is pq or p(1-p)

Variance in number of observations of A1: V(Y) = np(1-p)

Variance in allele frequency estimates (codominant, diploid):

N

ppVp 2

)1(

Standard Error of allele frequency estimates:

N

ppSEp 2

)1(

Notice that estimates get better as sample size increases

Notice also that variance is maximum at intermediate allele frequencies

Page 19: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

Maximum variance as a function of allele frequency for a codominant

locus

0

0.05

0.1

0.15

0.2

0.25

0.3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

p

p(1-p)

Page 20: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

Why is variance highest at intermediate allele frequencies?

p = 0.5

If this were a target, how variable would your outcome be in each case (red versus white hits)?

Variance is constrained when value approaches limits (0 or 1)

p = 0.125

Page 21: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

What if there are more than 2 alleles? General formula for calculating allele frequencies in

multiallelic system with codominant alleles:

N

ppV iipi 2

)1(

Variance and Standard Error of allele frequency estimates remain:

N

ppSE ii

pi 2

)1(

ijN

NN

p

n

jijii

i

,21

1

Page 22: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

How do we estimate allele frequencies for dominant loci?

A2A2

Codominant locus Dominant locus

A1A1 A1A2 A2A2-

+

A1A1 A1A2

Codominant locus Dominant locus-

+

Page 23: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

Hardy-Weinberg Law

After one generation of random mating, single-locus genotype frequencies can be represented by a binomial (with 2 alleles) or a multinomial function of allele frequencies

222 2)( qpqpqp

Frequency of A2A2 (Q)Frequency of A1A1 (P) Frequency of A1A2 (H)

Page 24: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

How does Hardy-Weinberg Work? Reproduction is a sampling process

Example: Mountain Laurel at Cooper’s RockRed Flowers: 5000 Pink Flowers: 3000White Flowers: 2000

A1A1

A1A2

A2A2

Frequency of A1 = p = 0.65

Frequency of A2 = q = 0.35

: A2=14 : A1=26Alleles:

: 4 : 10

Genotypes:

: 6Phenotypes:

: 4 : 10 : 6

What are expected numbers of phenotypes and genotypes in a sample of 20 trees?

What are expected frequencies of alleles in pollen and ovules?

Page 25: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

What will be the genotype and phenotype frequencies in the

next generation?

What assumptions must we make?

Page 26: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

Hardy-Weinberg Equilibrium

After one generation of random mating, genotype frequencies remain constant, as long as allele frequencies remain constant

Provides a convenient Neutral Model to test for departures from assumptions

Allows genotype frequencies to be represented by allele frequencies: simplification of calculations

Page 27: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

Hardy-Weinberg Assumptions Diploid

Large population

Random Mating: equal probability of mating among genotypes

No mutation

No gene flow

Equal allele frequencies between sexes

Nonoverlapping generations

Page 28: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

Graphical Representation of Hardy-Weinberg Law

(p+q)2 = p2 + 2pq + q2 = 1

Page 29: Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012

Relationship Between Allele Frequencies and Genotype Frequencies under Hardy-

Weinberg