82
Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean, C., and Bejerano, G., Dispensability of Mammalian DNA. Genome Research 18, 1743- 1751 (2008). Image source: http://mbbnet.u Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA. Genome Research 18, 1743-1751 (2008).

Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

  • View
    217

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Comparative Genomics and Evolution

Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006.

McLean, C., and Bejerano, G., Dispensability of Mammalian DNA. Genome Research 18, 1743-1751 (2008).

Image source: http://mbbnet.umn.edu

Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA. Genome Research 18, 1743-1751 (2008).

Page 2: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

“Forces shaping the fastest evolving regions in the human genome”

by Katherine S. Pollard et al.

Page 3: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

What’s the difference?

Image sources: http://pro.corbis.com, http://www.science.psu.edu

Page 4: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

What’s the difference?

• Humans have higher “brainpower”

• Examples: creativity, problem solving, language

• What part of the genome is the cause?

Image source: http://www.spaceflight.esa.int

Page 5: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

What’s the difference?

• Human and chimpanzee DNA is 98% similar

• The 2% difference is 29 million bases (mostly in non-coding DNA)

Image source: http://en.wikipedia.org

Page 6: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Comparative Genomics

• Human and rodent genomes are often compared to identify conserved (presumably functional) elements.

• Humans and chimpanzees are compared to understand what is uniquely human about our genome.

Image source: http://genome.ucsc.edu

Page 7: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Comparative Genomics

• Look at HARs in human genome

• HAR - human accelerated region. High rate of nucleotide substitution in humans, low in other vertebrates.

• Fastest is HAR1 – novel RNA gene expressed in development of neocortex (language, conscious thought).

Page 8: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

HARs

~ 100 bp, mostly non-coding

• Function is likely to be gene regulation.

• Seem to have been under strong negative selection up to common ancestor of chimp and human.

• Rapid positive selection then started in humans only.

Image source: http://www.shutterstock.com

Page 9: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Finding HARs

• Evolutionary tree based on the comparison of conserved regions in whole-genome alignments between species.

Branch lengths given in substitutions per base, or in millions of years

Evolution of vertebrates

Image from: Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions of the Human Genome.

Page 10: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Finding HARs

• Find HARs by using LRT, the likelihood ratio test.

• In statistical hypothesis testing, the likelihood ratio (Λ) is the ratio of the maximum probability of a result under a null hypothesis and alternative hypothesis.

• The LRT decides between the two hypothesis based on the value of the likelihood ratio.

Page 11: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

• Two models were used for genomic LRT.

Model 1: human substitution rate is held proportional to the other substitution rates in the evolutionary tree.

Model 2: human substitution rate can be accelerated relative to the rates in the rest of the tree.

Finding HARs

Page 12: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Finding HARs

. . .

. . .

Human

Another vertebrate . . .

. . . . . .

All the conserved alignments

Page 13: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Finding HARs

. . .

. . .

Human

Another vertebrate . . .

. . . . . .

Determine 1st set of rates

Determine 2nd set of rates

Determine 3rd set of rates

Scale all by the same amount

Model 1

Page 14: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Finding HARs

Human

Another vertebrate . . .

. . . . . .

Scale all by the same amount

Model 2

. . .

. . .

Scale the human rates separately

Page 15: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Identify regions conserved between human and other vertebrates (34,498 of them)

Page 16: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Identify regions conserved between human and other vertebrates (34,498 of them)

For all regions, fit model 1 and determine the proportional rates that maximize the likelihood of the tree

Obtain P1(max probability 1)

Page 17: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Identify regions conserved between human and other vertebrates (34,498 of them)

For all regions, fit model 1 and determine the proportional rates that maximize the likelihood of the tree

Loop over all conserved regions. For each region, do:

Obtain P1(max probability 1)

Page 18: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Identify regions conserved between human and other vertebrates (34,498 of them)

For all regions, fit model 1 and determine the proportional rates that maximize the likelihood of the tree

Loop over all conserved regions. For each region, do:

Fit model 2 to the region in human, find acceleration for that region that maximizes the likelihood of the tree

Obtain P1(max probability 1)

Obtain P2(max probability 2)

Calculate LRT for the region as Λ = log(P2 / P1)

Page 19: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Finding HARs

• Big LRT value indicates an HAR. How big is big?

• Do 1 million simulations of the 34,498 conserved alignments.

• To create each simulation, use the model 1 proportional rates.

• Repeat the LRT calculation for each simulation.

• Then for each region, find proportion of simulated LRTs that are bigger than its original LRT.

• That proportion is a p-value that tells if the region is an HAR.

Page 20: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Finding HARs

• Note on methods: vertebrates that were used in selecting the conserved regions (chimp, macaque, mouse, rat, rabbit) were omitted from any LRT analysis.

• This ensured that the LRT test is independent of the method used to select the conserved regions.

Page 21: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Finding HARs

• Result: 202 HARs were found in the human genome.

Image source: http://www.3dscience.com

Page 22: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Results for Conserved Elements

• 80.4% of the 34,498 conserved regions are non-coding.

• 45.4% of non-coding regions are intronic, 31% are intergenic,

• Non-coding regions are enriched for transcription factors, DNA-binding proteins, regulators of nucleic acid metabolism

Page 23: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Results for HARs

• 202 HARs have p < 0.1, 49 of them have p < 0.05

• HAR1 through HAR5 have p < 4.5e-4, very accelerated

• Most HARs are non-coding

• 66.3% are intergenic, 31.7% are intronic, only 1.5% are coding

• Results support the hypothesis (King and Wilson) that most chimp-human differences are regulatory.

Page 24: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Results: Confirming Accelerated Selection in HARs

• Are the HARs just due to relaxation of negative selection?

• No. Compare to neutral rate for 4D sites to see.

Negative selection

Positive selection

Image source: http://cs273a.stanford.edu [Bejerano Aut 08/09]

Page 25: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

The chimp rates in all five elements fall well below the human rates, which exceed the background rates by as much as an order of magnitude. H, human; C, chimp.

Genome-wide neutral rate for 4D sites in human and chimp

Genome-wide neutral rate for 4D sites in human and chimp in chromosome end bands

Image from: K.S. Pollard et al., Forces Shaping the Fastest Evolving Regions of the Human Genome.

Page 26: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Results: W S Bias in HARs

• Dramatic AT GC bias was observed in HARs.

AT GC substitution bias in HARs

HAR1 – HAR5

HAR6 – HAR49

HAR50 – HAR202GC AT

AT GC

Rest of ~ 34000 conserved elements

Image from: Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions of the Human Genome.

Page 27: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Results: W S Bias in HARs

• Top 49 HARs are 2.7 times as likely to be located near final chromosomal bands as the other conserved elements

• Interestingly, HAR1 and HAR5 are also in end regions in other mammals, but are not accelerated.

Image source: http://www.intelihealth.com

Page 28: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

• HARs tend to be located in regions of high recombination in humans.

• All of this evidence points to biased gene conversion (BGC) as the driving force behind HARs.

Results: W S Bias in HARs

Page 29: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Genetic Recombination

• Paired chromosomes can exchange homologous pieces

• Typically occurs during meiosis

Page 30: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

maternal chromosome Apaternal chromosome A

diploid germ cellMeiosis

Page 31: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

maternal chromosome Apaternal chromosome A

centromeresister chromatids

DNA replication

diploid germ cellMeiosis

Page 32: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

maternal chromosome Apaternal chromosome A

centromeresister chromatids

DNA replication

Recombination

diploid germ cellMeiosis

Page 33: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

maternal chromosome Apaternal chromosome A

centromeresister chromatids

DNA replication

Recombination

Segregation

diploid germ cellMeiosis

Page 34: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

maternal chromosome Apaternal chromosome A

centromeresister chromatids

DNA replication

Recombination

Segregation

haploid gametes

diploid germ cellMeiosis

Page 35: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Recombination

Recombination hotspot

Page 36: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Genetic Recombinationduplex 1

duplex 2

Formation of Holliday Junction intermediate

Vertical resolution with crossover

Horizontal resolution with gene conversion

Mismatch repair

or

Image source: http://www.sanger.ac.uk

Page 37: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Genetic Recombination: Chromosomal Crossover

• Chromosomal crossover results in exchange of DNA pieces

Homologous chromosomes

Recombinant chromatids

Image source: http://www.emc.maricopa.edu

Page 38: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Genetic Recombination: Gene Conversion

• Gene conversion results in nonreciprocal transfer of DNA

Mismatch repair causes DNA to revert back to its original formRecombinant

chromatids

Image source: http://www.emc.maricopa.edu

Page 39: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Genetic Recombination: Gene Conversion

• The result is a nonstandard ratio of alleles, such as 3:1

• This causes homogenization of a species’ gene pool

haploid gametes

Image source: http://www.emc.maricopa.edu

Page 40: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Biased Gene Conversion

• DNA repair machinery likes to replace weak pairings with strong pairings during gene conversion.

A - T is a weak pairing

G - C is a strong pairing

Image source: http://commons.wikimedia.org

Page 41: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

• Biased gene conversion results in G – C enrichment of a species’ gene pool (in addition to causing homogenization)

Recombinant chromatids

Biased Gene Conversion

A – T replaced by G – C during mismatch repair

Page 42: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

HARs and Recombination Hotspots

• HARs tend to be located near recombination hotspots in humans

Page 43: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Recombination Hotspots

• Mysterious

• Extremely different between chimps and humans (change rapidly during evolution)

• Not caused by the local DNA sequence (it is the same in human and chimp)

Page 44: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Some

HARsRecombination

hotspots ?

Page 45: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Possible Conclusion

• Recombination-caused BGC (often seen negatively) played a big role in the development of our species.

Page 46: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Alternative Explanation

• Isochore – DNA region (~100 kb) with high gene concentration

• Isochores are stabilized by many strong (GC) pairings

HAR HAR

Isochore

Page 47: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

• Theory (Bernardi et al.) that weakly deleterious changes drive isochore to a critical point of destabilization

• At critical point, GC content cannot decrease – otherwise isochore becomes unstable

• AT GC substitution in the isochore suddenly gains selective advantage and sweeps through the population

Alternative Explanation

Page 48: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

• Isochore selective sweep theory vs. the BGC theory.

• Isochore sweep has a different DNA signature than BGC

Alternative Explanation

~ 100 kb

GC GC GC GCGC GC GC

Isochore selective sweep

~ 100 bases

GC GC GC GCGC GC GC

Biased gene conversion

Page 49: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

• Evidence so far favors the BGC explanation for HARs

• However, the results are not yet conclusive

Alternative Explanation

Page 50: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

“Dispensability of Mammalian DNA”

by Gill Bejerano and Cory McLean

Page 51: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Are mammalian CNEs dispensable?

• CNE – conserved non-exonic element

• Examples: cis-regulatory DNA, ultraconserved DNA

?

Image source: http://apps.co.marion.or.us

Page 52: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Cis-regulatory DNA elements

promoter or inhibitor

Image source: http://cnx.org

Page 53: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Cis-regulatory DNA elements

Image source: http://cnx.org

Page 54: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Ultraconserved elements

• 200 bp and up, many seem to be regulatory

• “100% identity with no insertions or deletions between orthologous regions of the human, rat, and mouse genomes.”

• “Nearly all of these segments are also conserved in the chicken and dog genomes, with an average of 95 and 99% identity, respectively. Many are also significantly conserved in fish.”

(quotes from “Ultraconserved elements in the human genome” by Bejerano et al.)

Page 55: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Are mammalian CNEs dispensable?

• About 20% of gene knockout experiments, including cis-regulatory and ultraconserved knockouts, produce no phenotype measurable in lab settings.

Image source: http://www.sciencedaily.com

Page 56: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Are mammalian CNEs dispensable?

Do CNEs have functional redundancy?

OR

Are CNEs indispensable, but in a way that cannot be observed in the lab?

• Approach: look at CNEs lost in rodents due to evolution

Page 57: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Finding CNEs lost by rodents

Computational Pipeline

Identify conserved mammalian sequences

Pick out the ones absent in rodents

Remove artifacts

due to assembly, alignment, structural RNA migration

Page 58: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Page 59: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Page 60: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Use UCSC chains and nets

To avoid assembly artifacts

Ignore multi-level nets

Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Page 61: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Identify lost

DNAValidate quality of results

Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Page 62: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Look at the aligned orthologous sequences in primates (human, macaque), dog, and rodents (mouse, rat).

Identifying DNA lost by rodents

primates

A

G

dog

rodents

primates

dog

Different bases between primates and dog

Page 63: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

100 bp window

Compute primate-dog %id (percentage of identical alignment columns)

Identifying DNA lost by rodents

primates

A

G

dog

rodents

primates

dog

Page 64: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Compute primate-dog %id

Identifying DNA lost by rodents

primates

A

G

dog

rodents

primates

dog

Page 65: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

primates

A

G

dog

rodents

Compute primate-dog %idDeletion in rodents

Identifying DNA lost by rodents

primates

dog

!

Page 66: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

primates

A

G

dog

rodents

Ultraconserved-like element between primates-dog

Identifying DNA lost by rodents

primates

dog

Page 67: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

primates

A

G

dog

rodents

Ultraconserved-like element that was lost in rodents

Identifying DNA lost by rodents

primates

dog

!

Page 68: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Results for non-exonic ultras

• 1,691,090 bp of ultraconserved-like sequences were found

• 1147 bp of these sequences were lost in rodents

• Thus only 0.086% of ultras is lost in rodents

• In comparison, ¼ of neutrally-evolving DNA (50%id – 65%id) is lost in rodents

• Thus ultraconserved-like sequences are 300 times more indispensable than neutrally-evolving DNA

Page 69: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Results for neutral DNA

• Expected uniform rate of lost neutrally-evolving DNA

• Observed that less conserved sequences are more retained

Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Page 70: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Results for neutral DNA

• Phenomenon due to poorly conserved sequences being adjacent to exons, and thus shielded from being lost

• Larger deletions are biased away from gene structures

Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Page 71: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

• Moving away from 100%id, there is a mixing of DNA under purifying selection and neutrally evolving DNA

Separating DNA under selection from neutral DNA

Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Page 72: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

• To distinguish neutral DNA from conserved DNA in the mix, use longer evolutionary tree branch lengths

Separating DNA under selection from neutral DNA

Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Page 73: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

• Example: human-dog-horse alignment has longer cumulative branch length than human-macaque-dog

Separating DNA under selection from neutral DNA

Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Page 74: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

• Example: human-dog-horse alignment has longer cumulative branch length than human-macaque-dog

Separating DNA under selection from neutral DNA

Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Page 75: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

• Thus human-dog-horse alignment has lower %id for neutral DNA than human-macaque-dog

• This shifts the neutral DNA curve shifts to the right

Separating DNA under selection from neutral DNA

Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Page 76: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Results for DNA under purifying selection

Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Page 77: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Results for DNA under purifying selection

• 80%id to 100%id identified as DNA under purifying selection

• As is visible from the figure, practically none of this DNA is lost in the primates (only 0.154% of bases are lost)

Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Page 78: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Results for DNA under purifying selection

• The previous results were for CNEs

• Those results compare to the numbers for lost coding DNA:

Fraction of lost CNEs: 0 at 100%id, 0.00122 at 80%id

Fraction of lost exons: 0 at 100%id, 0.0000861 at 80%id

Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Page 79: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Results for DNA under purifying selection

• Thus CNEs under purifying selection are indispensable, similarly to coding elements.

Page 80: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

CNE dispensability ranking

In rodents In primates

Deepest in vertebrate tree, so corresponds to the most indispensable CNEs

Region of high conservation (CNEs)

• Left plot explanation (right plot is similar): take the h-m-d alignments, find their conservation %id in each of the shown species. Then for each of those species, plot the fraction of DNA lost in rodents vs the %id.

Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Page 81: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

CNE dispensability ranking

Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Page 82: Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Conclusion

• Many mammalian CNE knockouts produce no observable phenotype in the lab, suggesting great functional redundancy.

• However, evolutionary analysis shows that the CNEs, and particularly ultraconserved regions, are indispensable.

• Seems like the phenotype in knockouts is subtle, but very important.

Image source: http://apps.co.marion.or.us