Introduction to Basic and Quantitative Genetics. Darwin & Mendel Darwin (1859) Origin of Species –Instant Classic, major immediate impact –Problem: Model

Introduction to Basic and Quantitative Genetics

Darwin & Mendel• Darwin (1859) Origin of Species

– Instant Classic, major immediate impact– Problem: Model of Inheritance

• Darwin assumed Blending inheritance• Offspring = average of both parents• zo = (zm + zf)/2• Fleming Jenkin (1867) pointed out problem

– Var(zo) = Var[(zm + zf)/2] = (1/2) Var(parents)– Hence, under blending inheritance, half the

variation is removed each generation and this must somehow be replenished by mutation.

Mendel• Mendel (1865), Experiments in Plant

Hybridization• No impact, paper essentially ignored

– Ironically, Darwin had an apparently unread copy in his library

– Why ignored? Perhaps too mathematical for 19th century biologists

• The rediscovery in 1900 (by three independent groups)

• Mendel’s key idea: Genes are discrete particles passed on intact from parent to offspring

Mendel’s experiments with the Garden Pea

7 traits examined

Mendel crossed a pure-breeding yellow pea linewith a pure-breeding green line.

Let P1 denote the pure-breeding yellow (parental line 1)P2 the pure-breed green (parental line 2)

The F1, or first filial, generation is the cross ofP1 x P2 (yellow x green).

All resulting F1 were yellow

The F2, or second filial, generation is a cross of two F1’s

In F2, 1/4 are green, 3/4 are yellow

This outbreak of variation blows the theory of blending inheritance right out of the water.

Mendel also observed that the P1, F1 and F2 Yellow lines behaved differently when crossed to pure green

P1 yellow x P2 (pure green) --> all yellow

F1 yellow x P2 (pure green) --> 1/2 yellow, 1/2 green

F2 yellow x P2 (pure green) --> 2/3 yellow, 1/3 green

Mendel’s explanationGenes are discrete particles, with each parent passingone copy to its offspring.

Let an allele be a particular copy of a gene. In Diploids,each parent carries two alleles for every gene

Pure Yellow parents have two Y (or yellow) alleles

We can thus write their genotype as YY

Likewise, pure green parents have two g (or green) alleles

Their genotype is thus gg

Since there are lots of genes, we refer to a particular geneby given names, say the pea-color gene (or locus)

Each parent contributes one of its two alleles (atrandom) to its offspring

Hence, a YY parent always contributes a Y, whilea gg parent always contributes a g

In the F1, YY x gg --> all individuals are Yg

An individual carrying only one type of an allele(e.g. yy or gg) is said to be a homozygote

An individual carrying two types of alleles issaid to be a heterozygote.

The phenotype of an individual is the trait value weobserve

For this particular gene, the map from genotype tophenotype is as follows:

YY --> yellow

Yg --> yellow

gg --> green

Since the Yg heterozygote has the same phenotypicvalue as the YY homozygote, we say (equivalently)

Y is dominant to g, or

g is recessive to Y

Explaining the crossesF1 x F1 -> Yg x Yg

Prob(YY) = yellow(dad)*yellow(mom) = (1/2)*(1/2)

Prob(gg) = green(dad)*green(mom) = (1/2)*(1/2)

Prob(Yg) = 1-Pr(YY) - Pr(gg) = 1/2

Prob(Yg) = yellow(dad)*green(mom) + green(dad)*yellow(mom)

Hence, Prob(Yellow phenotype) = Pr(YY) + Pr(Yg) = 3/4

Prob(green phenotype) = Pr(gg) = 1/4

Dealing with two (or more) genes

For his 7 traits, Mendel observed Independent Assortment

The genotype at one locus is independent of the second

RR, Rr - round seeds, rr - wrinkled seeds

Pure round, green (RRgg) x pure wrinkled yellow (rrYY)

F1 --> RrYg = round, yellow

What about the F2?

Let R- denote RR and Rr. R- are round. Note in F2,Pr(R-) = 1/2 + 1/4 = 3/4

Likewise, Y- are YY or Yg, and are yellow

Phenotype Genotype Frequency

Yellow, round Y-R- (3/4)*(3/4) = 9/16

Yellow, wrinkled Y-rr (3/4)*(1/4) = 3/16

Green, round ggR- (1/4)*(3/4) = 3/16

Green, wrinkled ggrr (1/4)*(1/4) = 1/16

Or a 9:3:3:1 ratio

Probabilities for more complex genotypes

Cross AaBBCcDD X aaBbCcDd

What is Pr(aaBBCCDD)?

Under independent assortment, = Pr(aa)*Pr(BB)*Pr(CC)*Pr(DD) = (1/2*1)*(1*1/2)*(1/2*1/2)*(1*1/2) = 1/25

What is Pr(AaBbCc)?

= Pr(Aa)*Pr(Bb)*Pr(Cc) = (1/2)*(1/2)*(1/2) = 1/8

Mendel was wrong: Linkage

Phenotype

Genotype Observed Expected

Purple long P-L- 284 215

Purple round

P-ll 21 71

Red long ppL- 21 71

Red round ppll 55 24

Bateson and Punnet looked at flower color: P (purple) dominant over p (red )

pollen shape: L (long) dominant over l (round)

Excess of PL, pl gametes over Pl, pL

Departure from independent assortment

Linkage

If genes are located on different chromosomes they(with very few exceptions) show independent assortment.

Indeed, peas have only 7 chromosomes, so was Mendel luckyin choosing seven traits at random that happen to allbe on different chromosomes? Problem: compute this probability.

However, genes on the same chromosome, especially ifthey are close to each other, tend to be passed ontotheir offspring in the same configuation as on theparental chromosomes.

Consider the Bateson-Punnet pea data

Let PL / pl denote that in the parent, one chromosomecarries the P and L alleles (at the flower color andpollen shape loci, respectively), while the other chromosome carries the p and l alleles.

Unless there is a recombination event, one of the twoparental chromosome types (PL or pl) are passed ontothe offspring. These are called the parental gametes.

However, if a recombination event occurs, a PL/pl parent can generate Pl and pL recombinant chromosomesto pass onto its offspring.

Let c denote the recombination frequency --- theprobability that a randomly-chosen gamete from theparent is of the recombinant type (i.e., it is not aparental gamete).

For a PL/pl parent, the gamete frequencies are

Gamete type Frequency Expectation under independent assortment

PL (1-c)/2 1/4

pl (1-c)/2 1/4

pL c/2 1/4

Pl c/2 1/4

Parental gametes in excess, as (1-c)/2 > 1/4 for c < 1/2Recombinant gametes in deficiency, as c/2 < 1/4 for c < 1/2

Expected genotype frequencies under linkage

Suppose we cross PL/pl X PL/pl parents

What are the expected frequencies in their offspring?

Pr(PPLL) = Pr(PL|father)*Pr(PL|mother) = [(1-c)/2]*[(1-c)/2] = (1-c)2/4

Recall from previous data that freq(ppll) = 55/381 =0.144

Hence, (1-c)2/4 = 0.144, or c = 0.24

Likewise, Pr(ppll) = (1-c)2/4

A (slightly) more complicated case

Again, assume the parents are both PL/pl. Compute Pr(PpLl)

Two situations, as PpLl could be PL/pl or Pl/pL

Pr(PL/pl) = Pr(PL|dad)*Pr(pl|mom) + Pr(PL|mom)*Pr(pl|dad) = [(1-c)/2]*[(1-c)/2] + [(1-c)/2]*[(1-c)/2]

Pr(Pl/pL) = Pr(Pl|dad)*Pr(pL|mom) + Pr(Pl|mom)*Pr(pl|dad) = (c/2)*(c/2) + (c/2)*(c/2)

Thus, Pr(PpLl) = (1-c)2/2 + c2 /2

Generally, to compute the expected genotypeprobabilities, need to consider the frequenciesof gametes produced by both parents.

Suppose dad = Pl/pL, mom = PL/pl

Pr(PPLL) = Pr(PL|dad)*Pr(PL|mom) = [c/2]*[(1-c)/2]

Notation: when PL/pl, we say that alleles P and Lare in coupling

When parent is Pl/pL, we say that P and L are in repulsion

Molecular MarkersYou and your neighbor differ at roughly 22,000,000 nucleotides (base pairs) out of the roughly 3 billionbp that comprises the human genome

Hence, LOTS of molecular variation to exploit

SNP -- single nucleotide polymorphism. A particularposition on the DNA (say base 123,321 on chromosome 1)that has two different nucleotides (say G or A) segregating

STR -- simple tandem arrays. An STR locus consists ofa number of short repeats, with alleles defined bythe number of repeats. For example, you might have6 and 4 copies of the repeat on your two chromosome 7s

SNPs vs STRsSNPs

Cons: Less polymorphic (at most 2 alleles)

Pros: Low mutation rates, alleles very stable

Excellent for looking at historical long-termassociations (association mapping)

STRs

Cons: High mutation rate

Pros: Very highly polymorphic

Excellent for linkage studies within an extended Pedigree (QTL mapping in families or pedigrees)

Quantitative Genetics

The analysis of traits whose variation is determined by

both a number of genes and environmental factors

Phenotype is highly uninformative as tounderlying genotype

Complex (or Quantitative) trait

• No (apparent) simple Mendelian basis for variation in the trait

• May be a single gene strongly influenced by environmental factors

• May be the result of a number of genes of equal (or differing) effect

• Most likely, a combination of both multiple genes and environmental factors

• Example: Blood pressure, cholesterol levels– Known genetic and environmental risk factors

• Molecular traits can also be quantitative traits – mRNA level on a microarray analysis– Protein spot volume on a 2-D gel

Phenotypic distribution of a traitConsider a specific locus influencing the trait

For this locus, mean phenotype = 0.15, whileoverall mean phenotype = 0

Basic model of Quantitative Genetics

Basic model: P = G + E

Phenotypic value -- we will occasionallyalso use z for this value

Genotypic valueEnvironmental value

G = average phenotypic value for that genotypeif we are able to replicate it over the universeof environmental values, G = E[P]

G x E interaction --- G values are differentacross environments. Basic model nowbecomes P = G + E + GE

Q1Q1 Q2Q1 Q2Q2

C C + a(1+k) C + 2aC C + a + d C + 2aC -a C + d C + a

2a = G(Q2Q2) - G(Q1Q1) d = ak =G(Q1Q2 ) - [G(Q2Q2) + G(Q1Q1) ]/2 d measures dominance, with d = 0 if the heterozygoteis exactly intermediate to the two homozygotes

k = d/a is a scaled measure of the dominance

Contribution of a locus to a trait

Example: Apolipoprotein E & Alzheimer’s

Genotype ee Ee EE

Average age of onset

68.4 75.5 84.3

2a = G(EE) - G(ee) = 84.3 - 68.4 --> a = 7.95

ak =d = G(Ee) - [ G(EE)+G(ee)]/2 = -0.85

k = d/a = 0.10 Only small amount of dominance

Example: Booroola (B) gene

Genotype bb Bb BB

Average Litter size 1.48 2.17 2.66

2a = G(BB) - G(bb) = 2.66 -1.46 --> a = 0.59

ak =d = G(Bb) - [ G(BB)+G(bb)]/2 = 0.10

k = d/a = 0.17

Fisher’s (1918) Decomposition of G

One of Fisher’s key insights was that the genotypic valueconsists of a fraction that can be passed from parent tooffspring and a fraction that cannot.

πG =X

Gi j ¢freq(QiQj )Mean value, withAverage contribution to genotypic value for allele iSince parents pass along single alleles to theiroffspring, the i (the average effect of allele i)represent these contributions

Gi j = πG +Æi +Æj +±i j

bGi j = πG +Æi +Æj

The genotypic value predicted from the individualallelic effects is thus

G i j ° Gi j =±i jb

Dominance deviations --- the difference (for genotypeAiAj) between the genotypic value predicted from thetwo single alleles and the actual genotypic value,

Consider the genotypic value Gij resulting from an AiAj individual

Gi j = πG +2Æ1 + (Æ2 ° Æ1)N +±i j

2Æ1 + (Æ2 ° Æ1)N =

8><

>:

2Æ1 forN =0; e.g, Q1Q1

Æ1 +Æ1 forN =1; e.g, Q1Q2

2Æ1 forN =2; e.g, Q2Q2

Gi j = πG +Æi +Æj +±i j

Fisher’s decomposition is a Regression

Predicted valueResidual errorA notational change clearly shows this is a regression,

Independent (predictor) variable N = # of Q2 allelesRegression slopeIntercept Regression residual

0 1 2

N

G G22

G11

G21

Allele Q1 common, 2 > 1

Slope = 2 - 1

Allele Q2 common, 1 > 2Both Q1 and Q2 frequent, 1 = 2 = 0

Genotype Q1Q1 Q2Q1 Q2Q2

Genotypicvalue

0 a(1+k) 2a

Consider a diallelic locus, where p1 = freq(Q1)

πG = 2p2a(1+p1k)Mean

Allelic effects

Æ2 = p1a[1+k (p1 ° p2 ) ]

Æ1 = °p2a[1+ k (p1 ° p2 )]Dominance deviations±i j = G i j ° πG ° Æi ° Æj

Average effects and Additive Genetic Values

A (Gi j ) =Æi +ÆjA =nX

k=1

≥Æ(k)

i +Æ(k)k

¥

The values are the average effects of an allele

A key concept is the Additive Genetic Value (A) ofan individual

Why all the fuss over A?

Suppose father has A = 10 and mother has A = -2for (say) blood pressure

Expected blood pressure in their offspring is (10-2)/2 = 4 units above the population mean. Offspring A =Average of parental A’s

KEY: parents only pass single alleles to their offspring.Hence, they only pass along the A part of their genotypicValue G

Genetic Variances

Gi j = πg + (Æi +Æj ) +±i j

æ2(G) =nX

k=1

æ2(Æ(k)i +Æ(k)

j ) +nX

k=1

æ2(±(k)i j )

æ2G =æ2

A +æ2D

æ2(G) =æ2(πg +(Æi +Æj ) +±i j ) =æ2(Æi +Æj ) +æ2(±i j)

As Cov() = 0

Additive Genetic Variance(or simply Additive Variance)

Dominance Genetic Variance(or simply dominance variance)

Key concepts (so far)• i = average effect of allele i

– Property of a single allele in a particular population (depends on genetic background)

• A = Additive Genetic Value (A) – A = sum (over all loci) of average effects– Fraction of G that parents pass along to their offspring– Property of an Individual in a particular population

• Var(A) = additive genetic variance– Variance in additive genetic values– Property of a population

• Can estimate A or Var(A) without knowing any of the underlying genetical detail (forthcoming)

æ2D = 2E[±2] =

mX

i=1

mX

j=1

±2i j pi pj

æ2D = (2p1p2 ak)2

æ2A = 2p1p2 a2[1+k (p1 ° p2 ) ]2One locus, 2 alleles:

One locus, 2 alleles:

Q1Q1 Q1Q2 Q2Q2

0 a(1+k) 2a

Dominance effects additive variance

When dominance present, asymmetric function of allele frequencies

Equals zero if k = 0This is a symmetric function ofallele frequencies

æ2A =2E[Æ2 ] = 2

mX

i=1

Æ2i pi

Since E[] = 0, Var() = E[( -a)2] = E[2]

Additive variance, VA, with no dominance (k = 0)

Allele frequency, p

VA

Complete dominance (k = 1)

Allele frequency, p

VA

VD

Epistasis

Gi j kl = πG + (Æi +Æj +Æk +Æl) + (±i j +±k j )

+ (ÆÆik +ÆÆi l +ÆÆjk +ÆÆj l)+ (Æ±ikl +Æ±jkl +Æ±ki j +Æ±l i j )+ (±±i j kl)

= πG + A + D + AA + AD + DD

Additive Genetic valueDominance value -- interactionbetween the two alleles at a locus

Additive x Additive interactions --interactions between a single alleleat one locus with a single allele at another

Additive x Dominant interactions --interactions between an allele at onelocus with the genotype at another, e.g.allele Ai and genotype Bkj

Dominance x dominance interaction ---the interaction between the dominancedeviation at one locus with the dominancedeviation at another.

These components are defined to be uncorrelated,(or orthogonal), so that

æ2G =æ2

A +æ2D +æ2

AA +æ2AD +æ2

D D

Resemblance Between Relatives

Heritability• Central concept in quantitative genetics• Proportion of variation due to additive genetic

values (Breeding values)– h2 = VA/VP

– Phenotypes (and hence VP) can be directly measured

– Breeding values (and hence VA ) must be estimated

• Estimates of VA require known collections of relatives

X

3

o2

o

o k

...

o 1

1 1X

3

o2

o

o k

...

o 1

X

3

o2

o

o k

...

o 1

3 32 2

Ancestral relatives e.g., parent and offspringCollateral relatives, e.g. sibs

1

3

o2

o

o k

...

o 1*

*

*

*

2

3

o2

o

o k

...

o 1*

*

*

*

n

3

o2

o

o k

...

o 1*

*

*

*

1

1

3

o2

o

o k

...

o 1*

*

*

*

2

3

o2

o

o k

...

o 1*

*

*

*

n

3

o2

o

o k

...

o 1*

*

*

*

n

. . .

Full-sibsHalf-sibs

Key observations

• The amount of phenotypic resemblance among relatives for the trait provides an indication of the amount of genetic variation for the trait.

• If trait variation has a significant genetic basis, the closer the relatives, the more similar their appearance

Genetic Covariance between relatives

Genetic covariances arise because two related individuals are more likely to share alleles than are two unrelated individuals.

Sharing alleles means having alleles that are identical by descent (IBD): both copies of can be traced back to a single copy in a recent common ancestor.

Father Mother

No alleles IBD One allele IBDBoth alleles IBD

Parent-offspring genetic covariance

Cov(Gp, Go) --- Parents and offspring share EXACTLY one allele IBD

Denote this common allele by A1

Gp = Ap + Dp =Æ1 +Æx + D1x

Go = Ao + Do =Æ1 +Æy + D1y

IBD alleleNon-IBD alleles

Cov(Go;Gp) = Cov(Æ1 +Æx + D1x;Æ1 +Æy + D1y

= Cov(Æ1;Æ1) + Cov(Æ1;Æy) +Cov(Æ1; D1y)+ Cov(Æx;Æ1) +Cov(Æx;Æy) +Cov(Æx; D1y)

+ Cov(D1x;Æ1) + Cov(D1x;Æy) + Cov(D1x; D1y)

All white covariance terms are zero.

• By construction, and D are uncorrelated

• By construction, from non-IBD alleles are uncorrelated

• By construction, D values are uncorrelated unless both alleles are IBD

Cov(Æx;Æy) =Ω0 if x6=y; i.e., not IBD

Var(A)=2 if x =y; i.e., IBD

Var(A) = Var(Æ1 +Æ2) = 2Var(Æ1)

so thatVar(Æ1) = Cov(Æ1;Æ1) = Var(A)=2

Hence, relatives sharing one allele IBD have agenetic covariance of Var(A)/2

The resulting parent-offspring genetic covariance becomes Cov(Gp,Go) = Var(A)/2

Half-sibs

The half-sibs share one allele IBD • occurs with probability 1/2

1

o1

2

o2

The half-sibs share no alleles IBD • occurs with probability 1/2

Each sib gets exactly one allele from common father,different alleles from the different mothers

Hence, the genetic covariance of half-sibs is just (1/2)Var(A)/2 = Var(A)/4

Full-sibsFather Mother

Full SibsPaternal allele not IBD [ Prob = 1/2 ]Maternal allele not IBD [ Prob = 1/2 ]-> Prob(zero alleles IBD) = 1/2*1/2 = 1/4

Paternal allele IBD [ Prob = 1/2 ]Maternal allele IBD [ Prob = 1/2 ]-> Prob(both alleles IBD) = 1/2*1/2 = 1/4

Prob(exactly one allele IBD) = 1/2= 1- Prob(0 IBD) - Prob(2 IBD)

Each sib getsexact one allelefrom each parent

IB D alleles Probability Contr ibution

0 1/ 4 0

1 1/ 2 Var(A)/ 2

2 1/ 4 Var(A) + Var(D)

IBD alleles Probability Contribution

0 1/4 0

1 1/2 Var(A)/2

2 1/4 Var(A) + Var(D)

Resulting Genetic Covariance between full-sibs

Cov(Full-sibs) = Var(A)/2 + Var(D)/4

Genetic Covariances for General Relatives

Let r = (1/2)Prob(1 allele IBD) + Prob(2 alleles IBD)

Let u = Prob(both alleles IBD)

General genetic covariance between relativesCov(G) = rVar(A) + uVar(D)

When epistasis is present, additional terms appearr2Var(AA) + ruVar(AD) + u2Var(DD) + r3Var(AAA) +

Components of the Environmental Variance

E = Ec + Es

Total environmental valueCommon environmental value experiencedby all members of a family, e.g., shared maternal effects

Specific environmental value,any unique environmental effectsexperienced by the individual VE = VEc + VEs

The Environmental variance can thus be writtenin terms of variance components as

One can decompose the environmental further, ifdesired. For example, plant breeders have termsfor the location variance, the year variance, and the location x year variance.

Shared Environmental Effects contributeto the phenotypic covariances of relatives

Cov(P1,P2) = Cov(G1+E1,G2+E2) = Cov(G1,G2) + Cov(E1,E2)

Shared environmental values are expectedwhen sibs share the same mom, so thatCov(Full sibs) and Cov(Maternal half-sibs)not only contain a genetic covariance, butan environmental covariance as well, VEc

Cov(Full-sibs) = Var(A)/2 + Var(D)/4 + VEc

Documents

Introduction to Basic and Quantitative Genetics. Darwin & Mendel Darwin (1859) Origin of Species –Instant Classic, major immediate impact –Problem: Model