27
Encyclopedia Of DNA Elements A consortium of 440 scientists, 32 laboratories Sucheta Tripathy, IICB, 17 th Sept. 2012

Human encodeproject

Embed Size (px)

Citation preview

Page 1: Human encodeproject

Encyclopedia Of DNA Elements

A consortium of 440 scientists, 32 laboratories

Sucheta Tripathy, IICB, 17th Sept. 2012

Page 2: Human encodeproject

http://www.nature.com/encode/ http://www.encodeproject.org/ENCODE/ http://www.factorbook.org/ http://encodeproject.org/ENCODE/dataStand

ards.html http://1000genomes.org http://genome.ucsc.edu/ENCODE/

Some of the useful links:

Page 3: Human encodeproject

http://www.gencodegenes.org/data.html

Page 4: Human encodeproject

http://homes.gersteinlab.org/people/rar62/subwaymap/SubwayMap8_16_12.pdf

Characterization of intergenic region and gene definition

Page 5: Human encodeproject

http://homes.gersteinlab.org/people/rar62/subwaymap/SubwayMap8_16_12.pdf

Page 6: Human encodeproject

A Road map

In October 1990 Human

Genome project started

First Publication in 2000

Finished paper in

2003

NHGRI Solicited

pilot proposal

for ENCODE

First Report on Encode Published in 2007

RFAs were sought for

full ENCODE

ENCODE published

2012

GWAS -90% lies outside coding

2005

Page 7: Human encodeproject

http://www.nature.com/nature/journal/v489/n7414/full/489049a.html

Page 8: Human encodeproject

It is like google map says Eric Lander : Map of earth from outer space

Treasure Hunt?

Page 9: Human encodeproject

95% of the genome is “junk”.◦2.94% of the genome is coding

cis regulatory elements occur within a limited genome distance.

Most of the genome is transposable elements that are of obscure origin are dying.

Transcribed elements are most often translated than not.

What we knew

Page 10: Human encodeproject

80% of the human genome is active!!◦ 70,000 promoters and 400,000 enhancers

75% of the genome transcribed in some tissue or other during life time.

Environment plays great role in switching on or off of a lot many genes. [Epigenetics]

Most of the diseases don’t lie with the genes but the switches!!

Dark matters controlling the genes are physically close to the genes they control.

Key Findings:

Page 11: Human encodeproject

Genes and the switches don’t hold one to one relationship!

4 million switches controlling 21,000 genes!!

Identical twins are NOT identical – greatly influenced by environments.

Astronomy and genetic Biology looks similar(95% of the Universe is called as dark matter – we don’t understand)

Key Findings:

Page 12: Human encodeproject

“This explains why 6.5 billion people on earth don’t look alike”..

Intelligent Design (Creationism) believers are excited that it is handiwork of God.

Natural selectionists (Darwinists) excited that natural selection at its best.◦ This has raged a war between democrats and

republicans as usual. Junk DNA is an “Oxymoron”. Some are still wondering about the

remaining 20%.

Who said What (common people)

Page 13: Human encodeproject

‘I hope this information stirs the mind of those researchers that have ignored "trace minerals" in food as part of the nutritional package’.

The more we think we are close to finding an answer – the far we find ourselves. Reminds me of Aristotle Who once said “The more you know, the more you know you don't know”

Who said What Contd…

Page 14: Human encodeproject

Most part of DNA was considered “Garbage” but later upgraded to “junk”.

Most people are actually happy because it is happening during their “life time”.

Switches are software and genes are hardware.

Ancient Egyptians considered “torso” has a divine role and discarded grey matter in head as “junk”.

Historically “Junk” Vs “Garbage”

Page 15: Human encodeproject

Sean Eddy “At least 40% of the human genome is composed of the decaying DNA remains of transposable elements (TEs), different species of which have replicated in great waves during the evolution of our genome.”

“I sure wish I’d gotten the memo, because this week a collaboration of labs led by myself, Arian Smit, and Jerzy Jurka just released a new data resource that annotates nearly 50% of the human genome as transposable element-derived, and transposon-derived repetitive sequence is the poster child for what we colloquially call “junk DNA”.”

http://cryptogenomicon.org/

Some people are upset

Page 19: Human encodeproject

Cell Type Tier Description Source

GM12878 1 B-Lymphoblastoid cell line Coriell GM12878

K562 1Chronic Myelogenous/Erythroleukemia cell line

ATCC CCL-243

H1-hESC 1Human Embryonic Stem Cells, line H1

Cellular Dynamics International

HepG2 2 Hepatoblastoma cell line ATCC HB-8065

HeLa-S3 2 Cervical carcinoma cell line ATCC CCL-2.2

HUVEC 2Human Umbilical Vein Endothelial Cells

Lonza CC-2517

Various (Tier 3) 3Various cell lines, cultured primary cells, and primary tissues

Various

PLoS Biol. 2011 April; 9(4): e1001046.

The Cell Types

Page 20: Human encodeproject

DNAseI -> Transcription factor binding sites (2.9 million sites, 1/3 rd in one cell type and remaining in others)

Chip-seq -> sequence transcription factor and histone binding sites (HeLA and GM12878 – qualified to be called as new species)

5C technology -> Finding proximity between regulatory and regulated regions

High density 5 bp tiling DNA micro arrays

The Experiments

Page 21: Human encodeproject

Cap Analysis of Gene Expression Paired-End diTag (PET) Reduced Representation Bisulphite

Sequencing (RRBS)

Contd.

Page 22: Human encodeproject

33.45% exon and 66.55% intron. 62% of the genome is transcribed

reproducibly. 231 MB of genome has protein binding sites.

◦ 80% of which are low affinity sites (http://www.factorbook.org/)

◦ Many are highly conserved cell selective type 96% of the CpG exhibited differential

methylation pattern. GWAS SNPs had overlaps with ENCODE

elements.

The Main Nature paper

Page 23: Human encodeproject

Chromosome confirmation capture carbon copy(5C)◦ 1% of the genome is distally regulated (>1000

bp)◦ On an average 3.9 distal elements interacted with

TSS.◦ Distance could be several KBs to MBs

Chromosome Interacting regionsSanyal et al Nature 489, 109–113 (06 September 2012)

Page 24: Human encodeproject

cis-regulatory elements - Enhancers, promoters, insulators, silencers.

2.9 million DHS encompassing 125 diverse cell and tissue types.

20-50 bp length DHS mapped uniquely to 86.9% of genome◦ 580,000 distal DHS with target promoters ◦ 3% lie in TSS◦ 5% lie within 2.5 KB of TSS◦ 95% lie distally (introns and intergenic regions)◦ Strongly enriched in LTRs

Dnase Hypersensitive Site studiesThurman et. al Nature 489, 75–82http://www.nature.com/nature/journal/v489/n7414/full/nature11232.html

Page 25: Human encodeproject

3/4th of genome is capable of transcription – redefine concept of gene?◦ 62.1% AND 74.7% are processed or primary

transcripts.◦ 10-12 expressed isoforms per gene per cell.◦ Coding and non-coding transcripts are localized in

cytoplasm and nucleus respectively.◦ 6% of the coding and non-coding transcripts

overlap with small RNAs – precursors?◦ Most of the novel transcripts lacked protein

coding ability.

Landscape of Transcriptiondjebali et al. Nature 489, 101–108 http://www.nature.com/nature/journal/v489/n7414/full/nature11233.html

Page 26: Human encodeproject

Mapping job is only half done. Characterizing everything a genome does is

10% done. Finding Network of switches for genes. A number of correlations…..

What is yet to be done

Page 27: Human encodeproject

Where does gene therapy go from here? Our fundamental understanding of genes as

the functional units are flawed?? Epigenetics becomes the key player… Gives impetus to holistic approach in

treating a disease.

Do we still believe that human genome is most efficient?

Future Implications: