Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
http://bejerano.stanford.edu 1
GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAGGCAGCTGAGGTCAACTTCTTTTGAACTTCCACGTGGTATTTACTCAGAGCAATTGGTGCCAGAGGCTCAGGGCCCTGGAGTATAAAGCAGAATGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCGAAAGACCTGTTGGAGGCTATGAATGCAATCAAGGTGACAGACAACTGGTGCAATGATGGTAGTGGAAATGGAGGAGAGGGGATTGATTCAAGATGCATTTAGGACCAAGAATCGGGAGCTTGTGAACGTGTGTATGAGTACTGTAGACGGAGTGGGTGTGTCATCAGAGAAGATCTGAGCATTTGGGCTTGCTCTCCTCAGAGGCCCTGCGAGTGGAGTTCAGCTTTTCCTCATGGGGCAAATCTCACTTTCGCTCCAGTTCCTGGGGCTCAGAGTCCCTGGCCCAGATGCCTCTTGCCATCTCATCTTCACCCTGCCTGGCTTCCCTTGCTTGTTCCAGGATTGTTTCATAAAGAGGGATGTGGTTGGTCTTTAACCCTATGAATGCTGGCTGAGGATGCCTGCGGAACCTGTAGTGAAGCTTTCAGGGGCTGCTCGGGTTCTGGCTGGTAGGTGAACACTGTCCATCTTGCCGGCTGGGACACAGTGACTCTGGGTAGTTGTGTAAGAGAGGGGCCCTTGGCAGACAAACAGGTTCTTCTCTGTTGGTGGGCCAGCCAGCAGGTCAGTGGGAAGGTTAAAGGTCATGGGGTTTGGGAGAACTGGGTGAGGAGTTCAGCCCCATCCCCCGTAAAGCTCCTGGGAAGCACTTCTCTACTGGGGCAGCCCCTGATACCAGGGCACTCATTAACCCTCTGGGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAGGCAGCTGAGGTCAACTTCTTTTGAACTTCCACGTGGTATTTACTCAGAGCAATTGGTGCCAGAGGCTCAGGGCCCTGGAGTATAAAGCAGAATGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGGAAAGACCTGTTGGAGGCTATGAATGCAATCAAGGTGACAGACAACTGGTGCAATGATGGTAGTGGAAATGGAGGAGAGGGGATTGATTCAAGATGCATTTAGGACCAAGAATCGGGAGCTTGTGAACGTGTGTATGAGTACTGTAGACGGAGTGGGTGTGTCATCAGAGAAGATCTGAGCATTTGGGCTTGCTCTCCTCAGAGGCCCTGCGAGTGGAGTTCAGCTTTTCCTCATGGGGCAAATCTCACTTTCGCTCCAGTTCCTGGGGCTCAGAGTCCCTGGCCCAGATGCCTCTTGCCATCTCATCTTCACCCTGCCTGGCTTCCCTTGCTTGTTCCAGGATTGTTTCATAAAGAGGGATGTGGTTGGTCTTTAACCCTATGAATGCTGGCTGAGGATGCCTGCGGAACCTGTAGTGAAGCTTTCAGGGGCTGCTCGGGTTCTGGCTGGTAGGTGAACACTGTCCATCTTGCCGGCTGGGACACAGTGACTCTGGGTAGTTGTGTAAGAGAGGGGCCCTTGGCAGACAAACAGGTTCTTCTCTGTTGGTGGGCCAGCCAGCAGGTCAGTGGGAAGGTTAAAGGTCATGGGGTTTGGGAGAAACTGGGTGAGGAGTTCAGCCCCATCCCCCGTAAAGCTCCTGGGAAGCACTTCTCTACTGGGGCAGCCCCTGATACCAGGGCACTCATTAACCCTCTGGGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAGGCAGCTGAGGTCAACTTCTTTTGAACTTCCACGTGGTATTTACTCAGAGCAATTGGTGCCAGAGGCTCAGGGCCCTGGAGTATAAAGCAGAATGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCGAAAGACCTGTTGGAGGCTATGAATGCAATCAAGGTGACAGACAACTGGTGCAATGATGGTAGTGGAAATGGAGGAGAGGGGATTGATTCAAGATGCATTTAGGACCAAGAATCGGGAGCTTGTGAACGTGTGTATGAGTACTGTAGACGGAGTGGGTGTGTCATCAGAGAAGATCTGAGCATTTGGGCTTGCTCTCCTCAGAGGCCCTGCGAGTGGAGTTCAGCTTTTCCTCATGGGGCAAATCTCACTTTCGCTCCAGTTCCTGGGGCTCAGAGTCCCTGGCCCAGATGCCTCTTGCCATCTCATCTTCACCCTGCCTGGCTTCCCTTGCTTGTTCCAGGATTGTTTCATAAAGAGGGATGTGGTTGGTCTTTAACCCTATGAATGCTGGCTGAGGATGCCTGCGGAACCTGTAGTGAAGCTTTCAGGGGCTGCTCGGGTTCTGGCTGGTAGGTGAACACTGTCCATCTTGCCGGCTGGGACACAGTGACTCTGGGTAGTTGTGTAAGAGAGGGGCCCTTGGCAGACAAACAGGTTCTTCTCTGTTGGTGGGCCAGCCAGCAGGTCAGTGGGAAGGTTAAAGGTCATGGGGTTTGGGAGAACTGGGTGAGGAGTTCAGCCCCATCCCCCGTAAAGCTCCTGGGAAGCACTTCTCTACTGGGGCAGCCCCTGATACCAGGGCACTCATTAACCCTCTGGGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAGGCAGCTGAGGTCAACTTCTTTTGAACTTCCACGTGGTATTTACTCAGAGCAATTGGTGCCAGAGGCTCAGGGCCCTGGAGTATAAAGCAGAATGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAG
Ultraconservation and Living Fossils:Mysteries of the Human Genome
Assistant ProfessorDept. of Developmental Biology& Dept. of Computer Science
Stanford University
Postdoc w/David HausslerSchool of Engineering
UC Santa Cruz
Gill Bejerano, PhD20072006
This is “the Century of Biology”
http://bejerano.stanford.edu 2
We can now cast Biology in “our” terms
http://bejerano.stanford.edu 3
strings
time series
circuits
The Meaning of Life (abridged)
http://bejerano.stanford.edu 4
DNA: Functional and Non-Functional
http://bejerano.stanford.edu 5
DNA = linear molecule that carries genetic instructions for making living organisms ~ long string over a small alphabetAlphabet of four {A,C,G,T} Strings of length 104-1011
...ACGTACGACTGACTAGCATCGACTACGACTAGCAC...
genetic instructions:
how to...when to...where to...
“junk” DNA “junk” DNA
One Cell, One Genome, One Replication
http://bejerano.stanford.edu 6
Every cell holds a single copy of all its DNA = its genome.The genome is replicated every cell division.The human body is made of ~1014 cells.All originate from a single cell through cell division.
cell
genome =all DNA
chicken ≈ 1014 copies(DNA) of egg (DNA)
chicken
egg egg
egg
celldivision
DNAstring
http://bejerano.stanford.edu 7
Comparative Genomics
human
mouserat
chimp
chicken
fugu
zfish
dog
tetra
opossum
cow
macaque
platypus
humanchimp
mouserat
chicken
fugu
zfish
dog
tetra
macaque
cow
opossumplatypus
“Nothing in Biology Makes Sense Except in the Light of Evolution”
Theodosius Dobzhansky
t
Intelligent Designer
DNA Replication is Imperfect
http://bejerano.stanford.edu 8
Small Scale: single letters are substituted, erased, added
...ACGTACGACTGACTAGCATCGACTACGA...
chicken
egg ...ACGTACGACTGACTAGCATCGACTACGA...
functionaljunk
TT CAT
“anythinggoes”
many changesare not tolerated
chicken
thus, sequence conservation function!
Conservation implies Function
http://bejerano.stanford.edu 9
Comparative Genomics of Distantly related species:
functional region!
human
mouse
mammalianancestor
...CTTTGCGA-TGAGTAGCATCTACTATTT...
...ACGTGGGACTGACTA-CATCGACTACGA...
(but which function/s?...)
The Human Genome is Full of Mysteries
http://bejerano.stanford.edu 10
all human-mouse DNAhuman-mouse junk DNA
Difference: 5% of Human
HumanGenome:
3*109 letters
[Mouse Consortium 2002]conservation level
frequ
ency
[Science 2004 Breakthrough of the Year, 5th runner up]
1.5%known
function >50%junk
3x more functional DNA than known!But what do these 107 substrings do?..
hilo
why bother?why bother?
Genes, Proteins and Gene Control
http://bejerano.stanford.edu 11
gene (how to)control region(when & where)
DNA
proximal: in 103 letters
genome.ucsc.edu3kb
Ultraconserved Elements
http://bejerano.stanford.edu 12
[Bejerano et al., Science 2004]
HOXA4 exon
Why is Perfect Conservation So Surprising?
http://bejerano.stanford.edu 13
If a substring is identical between enough distant species,it must have rejected many different changes over time.But... all functions we understand in our genome are encoded using redundant codes.
*****
Coding: 3 DNA letters → 1 Protein letter.E.g. Protein Coding Genes:DNA – 108 letters
over alphabet of 4.Protein – 102 letters
over alphabet of 20.
Genes, Proteins and Gene Control Revisited
http://bejerano.stanford.edu 14
gene (how to)control region(when & where)distal: in 106 letters
DNA
proximal: in 103 letters
DNA bindingproteins
Vertebrate Gene Regulation
http://bejerano.stanford.edu 15
gene (how to)control region(when & where)~106 letters!!!
DNA
~103 letters
crucial regulationmany thousandspreviously invisible
Ultraconserved Elements
http://bejerano.stanford.edu 16
481 regions perfectly conserved over 200 DNA bases or more, between human, mouse and rat (P<10-22 in "junk")
• Evolve 20-fold slower than human average.• Most do not overlap protein coding DNA.• Those that do not code cluster spatially,
near genes encoding DNA binding proteins.Dozens validated since as controlling genes.
• Those that do code, are found in genescoding for a specific type of protein.
• The tip of a continuum of very slowly evolving elements.• The ultras cannot be found beyond vertebrates.
[Bejerano et al., Science 2004Chicken Consortium, Nature 2004]
conservation
freq
Origins of Ultraconserved Elements?
http://bejerano.stanford.edu 17
ultra
cons
erve
d el
emen
ts
Origins of Ultraconserved Element
http://bejerano.stanford.edu 18
uc.338
Coelacanth Homologs to uc.338 Closer than Human Ones
http://bejerano.stanford.edu 19
[Bejerano et al., Nature 2006]
Coelacanth “the Living Fossil” Fish
http://bejerano.stanford.edu 20
Fossil Record: Appeared >360Mya, Peaked 240Mya, Disappeared 80MyaRediscovered (by science) in 1938. Possible Explanation: Habitat Switch.
Repeats / obile Elements ("selfish DNA")
http://bejerano.stanford.edu 21
HumanGenome:
3*109 letters1.5%
knownfunction >50%
junk
>360My Old and Going Strong
http://bejerano.stanford.edu 22
?
xB
D
Upto 80%id between Coelacanth repeatand human instances, inc uc.338.
repeat repeat
Cis-reg & Ultra elements from obile Elements
http://bejerano.stanford.edu 23
Co-option event, probably due to favorable genomic context
All other copies are destined to decay over time at a neutral rate
[Yass is a small town in New South Wales, Australia.]
[Bejerano et al., Nature 2006]
Exapted Into Which Cellular Roles?
http://bejerano.stanford.edu 24
gene
?
xHuman instances cluster together, found <1Mb from 35 TFs (P<3*10-6).
No evidence for Transcription (Tx) as small RNAs,no orientation preference in introns, not in antisense Tx.
Transient Transgenics
http://bejerano.stanford.edu 25
Eddy Rubin’s Lab, LBNL
Reporter GeneMinimal PromoterConservedElement
in situ
Construct is injected into 1 cell embryosTaken out at embryonic day 10.5-14.5Assayed for reporter gene activity
transgenic
Instance 500kb Downstream of ISL1
http://bejerano.stanford.edu 26
ISL1 is a neuro-developmental gene, also expressed in testis.Three previously known enhancers are conserved in all vertebrates.
1Mb
http://bejerano.stanford.edu 27
Mouse Isl1 in situ (B) vs. LacZ driven by LF SINE region (C)
Matched staining in genital emimence
Matched staining in dorsal apical ectodermal ridge (part of limb bud)
Nadav Ahituv, Eddy Rubin
Matched Level Sections
http://bejerano.stanford.edu 28
Bryan King, Sofie Salama, Nadav Ahituv, Eddy Rubin
in situtransgenic
Corresponding expression patterns in: (a, b) the developing thalamus (Th)
and basal plate (BP) in the brain. (c, d) the trigeminal (V) ganglion and
facio-acoustic (VII/VIII) ganglia in the head region.
(e, f) the dorsal root ganglion (DRG), and the lateral region of the ventral horn (VH) of the spinal cordin thoracic sections.
DNA Replication is Imperfect (contd)
http://bejerano.stanford.edu 29
Medium Scale: substrings are duplicated, deleted, invertedLarge Scale: whole DNA strings are duplicated, deleted
junk functional
...ACGTACGACTGACTAGCATCGACTACGA........TCTGACTAGCATCGACTACGA...
...ACGTACGACTGACTAGCATCGACTACGA...
...ACGTACGACTGACTAGCATCGACTACGA........TCTGACTAGCATCGACTACGA...
functionalfunctional
functional’’functional’
substringduplication
functionaldivergence
So...More Genes...More Complexity!!...Right?
Genes & Complexity
http://bejerano.stanford.edu 30
Gene numbers do not correlate with organism complexity. Many gene families are surprisingly old.
flyworm
humanweed
fishrice
# genes
103 cells1014 cells pre-genomic era:
“100,000 genes tothe human genome”
The Evolution of Morphological Diversity
http://bejerano.stanford.edu 31
Gene numbers do not correlate with organism complexity. Many gene families are surprisingly old.
“Regulatory sequence evolution must be the major contribution to the evolution of form.” [Sean Carroll, PLoS Bio 2005]
In/vertebrate DivideIn/vertebrate Dividefly
wormhuman
weedfishrice
# genes
Hold on... junk DNA can contribute these elements
From junk DNA to recruitment into pathway?
http://bejerano.stanford.edu 32
[Davidson & Erwin, 2006]
[Britten & Davidson, 1971]
Same Junk, Different Functional Elements
http://bejerano.stanford.edu 33
proteincoding
repeat
generegulating
Additional Mysteries Abound
http://bejerano.stanford.edu 34
Genome in Flux
http://bejerano.stanford.edu 35
Human Genome
Copied out to make ... ???
Copied out to makeprotein coding genes
Bejerano Lab: Research Interests
http://bejerano.stanford.edu 36
Many thousands of human conserved elementscongregate en-masse near developmental genes.[Dog Genome Paper, Nature, 2005; Bejerano et al., Nature Methods, 2005]
Contribution toHuman Disease
Origins & Evolution Functions & Encoding
deve
lopm
ent
Bejerano Lab: Research Interests
http://bejerano.stanford.edu 37
Many thousands of human conserved elementscongregate en-masse near developmental genes.[Dog Genome Paper, Nature, 2005; Bejerano et al., Nature Methods, 2005]
Contribution toHuman Disease
Origins & Evolution Functions & Encoding
deve
lopm
ent
[Ernst Haeckel, 1866]
Bejerano Lab: Research Interests
http://bejerano.stanford.edu 38
Many thousands of human conserved elementscongregate en-masse near developmental genes.
Contribution toHuman Disease
Origins & Evolution Functions & Encoding
Break regulatory code• syntax• grammar• meaning
Bejerano Lab: Research Interests
http://bejerano.stanford.edu 39
Many thousands of human conserved elementscongregate en-masse near developmental genes.
Contribution toHuman Disease
Origins & Evolution Functions & Encoding
In/vertebrate DivideIn/vertebrate Divide
Understand our evolution• Reconstruct ancient genomes• Track regulatory regions histories
Bejerano Lab: Research Interests
http://bejerano.stanford.edu 40
Many thousands of human conserved elementscongregate en-masse near developmental genes.
Contribution toHuman Disease
Origins & Evolution Functions & Encoding
Make a difference• “bench to bedside”
Bejerano Lab: Research Interests
http://bejerano.stanford.edu 41
Many thousands of human conserved elementscongregate en-masse near developmental genes.
Contribution toHuman Disease
Origins & Evolution Functions & Encoding
Discovery tools• large databases• heterogeneous, noisy data• statistical correlations• human interfaces
thousands and thousands ofpage requests served daily
exponential growth of public data
Summary
http://bejerano.stanford.edu 42
We are only beginning to understand the complexity unearthed by observing whole genomes.
Technology (genome sequencing, gene chips, etc) is flooding us with different form of whole genome measurements – extremely valuable, if challenging.
Some of the challenges discussed today:•Explain Ultraconservation in particular, and the myriad of unexplained constrained elements in our genome.
•Understand the evolution of morphological diversity(how much has repeats contributed to it quantitatively and qualitatively)
•Understand why so much of our genome is transcribed.
http://bejerano.stanford.edu 43
Kudos
UC Santa CruzDavid HausslerDavid HausslerSofie Sofie SalamaSalama, Jim Kent, , Jim Kent, Craig Lowe,Bryan King, Bryan King, Adam Siepel, JakobJakob Pedersen Pedersen Katie Pollard, Courtney OnoderaRachel Harte, Genomics/Browser Group
Lawrence Berkeley LabsEddy RubinNadav Ahituv
McGill U.Mathieu Mathieu Blanchette
Penn State U.Webb Miller’s group
U. QueenslandJohn Mattick’s group
Genome Sequencing ConsortiaAll GenBank contributors
Blanchette
Gill [email protected]