Upload
sucheta-tripathy
View
913
Download
0
Tags:
Embed Size (px)
Citation preview
Encyclopedia Of DNA Elements
A consortium of 440 scientists, 32 laboratories
Sucheta Tripathy, IICB, 17th Sept. 2012
http://www.nature.com/encode/ http://www.encodeproject.org/ENCODE/ http://www.factorbook.org/ http://encodeproject.org/ENCODE/dataStand
ards.html http://1000genomes.org http://genome.ucsc.edu/ENCODE/
Some of the useful links:
http://www.gencodegenes.org/data.html
http://homes.gersteinlab.org/people/rar62/subwaymap/SubwayMap8_16_12.pdf
Characterization of intergenic region and gene definition
http://homes.gersteinlab.org/people/rar62/subwaymap/SubwayMap8_16_12.pdf
A Road map
In October 1990 Human
Genome project started
First Publication in 2000
Finished paper in
2003
NHGRI Solicited
pilot proposal
for ENCODE
First Report on Encode Published in 2007
RFAs were sought for
full ENCODE
ENCODE published
2012
GWAS -90% lies outside coding
2005
http://www.nature.com/nature/journal/v489/n7414/full/489049a.html
It is like google map says Eric Lander : Map of earth from outer space
Treasure Hunt?
95% of the genome is “junk”.◦2.94% of the genome is coding
cis regulatory elements occur within a limited genome distance.
Most of the genome is transposable elements that are of obscure origin are dying.
Transcribed elements are most often translated than not.
What we knew
80% of the human genome is active!!◦ 70,000 promoters and 400,000 enhancers
75% of the genome transcribed in some tissue or other during life time.
Environment plays great role in switching on or off of a lot many genes. [Epigenetics]
Most of the diseases don’t lie with the genes but the switches!!
Dark matters controlling the genes are physically close to the genes they control.
Key Findings:
Genes and the switches don’t hold one to one relationship!
4 million switches controlling 21,000 genes!!
Identical twins are NOT identical – greatly influenced by environments.
Astronomy and genetic Biology looks similar(95% of the Universe is called as dark matter – we don’t understand)
Key Findings:
“This explains why 6.5 billion people on earth don’t look alike”..
Intelligent Design (Creationism) believers are excited that it is handiwork of God.
Natural selectionists (Darwinists) excited that natural selection at its best.◦ This has raged a war between democrats and
republicans as usual. Junk DNA is an “Oxymoron”. Some are still wondering about the
remaining 20%.
Who said What (common people)
‘I hope this information stirs the mind of those researchers that have ignored "trace minerals" in food as part of the nutritional package’.
The more we think we are close to finding an answer – the far we find ourselves. Reminds me of Aristotle Who once said “The more you know, the more you know you don't know”
Who said What Contd…
Most part of DNA was considered “Garbage” but later upgraded to “junk”.
Most people are actually happy because it is happening during their “life time”.
Switches are software and genes are hardware.
Ancient Egyptians considered “torso” has a divine role and discarded grey matter in head as “junk”.
Historically “Junk” Vs “Garbage”
Sean Eddy “At least 40% of the human genome is composed of the decaying DNA remains of transposable elements (TEs), different species of which have replicated in great waves during the evolution of our genome.”
“I sure wish I’d gotten the memo, because this week a collaboration of labs led by myself, Arian Smit, and Jerzy Jurka just released a new data resource that annotates nearly 50% of the human genome as transposable element-derived, and transposon-derived repetitive sequence is the poster child for what we colloquially call “junk DNA”.”
http://cryptogenomicon.org/
Some people are upset
PLoS Biol. 2011 April; 9(4): e1001046.
PLoS Biol. 2011 April; 9(4): e1001046.
PLoS Biol. 2011 April; 9(4): e1001046.
Cell Type Tier Description Source
GM12878 1 B-Lymphoblastoid cell line Coriell GM12878
K562 1Chronic Myelogenous/Erythroleukemia cell line
ATCC CCL-243
H1-hESC 1Human Embryonic Stem Cells, line H1
Cellular Dynamics International
HepG2 2 Hepatoblastoma cell line ATCC HB-8065
HeLa-S3 2 Cervical carcinoma cell line ATCC CCL-2.2
HUVEC 2Human Umbilical Vein Endothelial Cells
Lonza CC-2517
Various (Tier 3) 3Various cell lines, cultured primary cells, and primary tissues
Various
PLoS Biol. 2011 April; 9(4): e1001046.
The Cell Types
DNAseI -> Transcription factor binding sites (2.9 million sites, 1/3 rd in one cell type and remaining in others)
Chip-seq -> sequence transcription factor and histone binding sites (HeLA and GM12878 – qualified to be called as new species)
5C technology -> Finding proximity between regulatory and regulated regions
High density 5 bp tiling DNA micro arrays
The Experiments
Cap Analysis of Gene Expression Paired-End diTag (PET) Reduced Representation Bisulphite
Sequencing (RRBS)
Contd.
33.45% exon and 66.55% intron. 62% of the genome is transcribed
reproducibly. 231 MB of genome has protein binding sites.
◦ 80% of which are low affinity sites (http://www.factorbook.org/)
◦ Many are highly conserved cell selective type 96% of the CpG exhibited differential
methylation pattern. GWAS SNPs had overlaps with ENCODE
elements.
The Main Nature paper
Chromosome confirmation capture carbon copy(5C)◦ 1% of the genome is distally regulated (>1000
bp)◦ On an average 3.9 distal elements interacted with
TSS.◦ Distance could be several KBs to MBs
Chromosome Interacting regionsSanyal et al Nature 489, 109–113 (06 September 2012)
cis-regulatory elements - Enhancers, promoters, insulators, silencers.
2.9 million DHS encompassing 125 diverse cell and tissue types.
20-50 bp length DHS mapped uniquely to 86.9% of genome◦ 580,000 distal DHS with target promoters ◦ 3% lie in TSS◦ 5% lie within 2.5 KB of TSS◦ 95% lie distally (introns and intergenic regions)◦ Strongly enriched in LTRs
Dnase Hypersensitive Site studiesThurman et. al Nature 489, 75–82http://www.nature.com/nature/journal/v489/n7414/full/nature11232.html
3/4th of genome is capable of transcription – redefine concept of gene?◦ 62.1% AND 74.7% are processed or primary
transcripts.◦ 10-12 expressed isoforms per gene per cell.◦ Coding and non-coding transcripts are localized in
cytoplasm and nucleus respectively.◦ 6% of the coding and non-coding transcripts
overlap with small RNAs – precursors?◦ Most of the novel transcripts lacked protein
coding ability.
Landscape of Transcriptiondjebali et al. Nature 489, 101–108 http://www.nature.com/nature/journal/v489/n7414/full/nature11233.html
Mapping job is only half done. Characterizing everything a genome does is
10% done. Finding Network of switches for genes. A number of correlations…..
What is yet to be done
Where does gene therapy go from here? Our fundamental understanding of genes as
the functional units are flawed?? Epigenetics becomes the key player… Gives impetus to holistic approach in
treating a disease.
Do we still believe that human genome is most efficient?
Future Implications: