Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
1
Mario Cáceres
TRANSCRIPTOMICS(or the analysis of the transcriptome)
• Identify and annotate the complete set of genes encoded within a genome
• Determine the entire DNA sequence of an organism
• Characterize the gene-expression profiles in different tissues and cell types on a genome-wide scale
• Ascertain the function of each and every gene product
• Understand the genetic basis of the phenotypic differences between individuals and species
Main objectives of genomics
2
Central dogma of molecular biology
Central dogma of molecular biology
Genome
Proteome Transcriptome
Complete DNA content of an organism with all its genesand regulatory sequences
Complete set of transcripts andrelative levels of expression ina particular cell or tissue under
defined conditions at a given time
Complete collection of proteinsand their relative levels
in each cell
Transcription
Translation
3
RNA profiling provides clues to:
- Functional differences between tissues and cell types
- Function and interaction between genes
- Gene regulation and regulatory sequences
- Identification of candidate genes for any givenprocess or disease
- Expressed sequences and genes of a genome
Why study of RNA is so important?
Beyond the genome
Gene Number ≠ Complexity
Co
mp
lexity
Gene regulation
Gene
Transcriptome
4
1. Methods for transcriptome analysis
2. Genome transcription landscape
3. Types of RNAs and functions
4. Analysis of gene structure
5. Transcriptome profiling (tissues, individuals)
6. Gene expression and evolution
Overview
Methods for transcriptome analysis
Gene expression arrays- Quantification of transcript abundance- Single/multiple 3’ probes
Genome tiling arrays- Identification of transcribed sequences- Multiple probes along the genome
Alternative splicing arrays- Quantification of different RNA isoforms- Probes in exons and exon-exon junctions
End of CDS or 3’ UTR(100-500 bp)
Inclusion isoform
Exclusion isoform
+
5
Methods for transcriptome analysis
• RNA-tag sequencing- Quantification of transcript abundance- Single end reads of each RNA species
• Whole RNA sequencing- Identification of transcribed sequences- Multiple reads along each RNA species
The ENCODE project
• Pilot phase (2003-2007)Identification of all functional elements in 1% of the human genome
• Production phase (2007-?)Identification of all functional elements in the whole human genome
ENCyclopedia of DNA Elements
6
Main results from ENCODE project
Pervasive transcription of the human genome(14.7% of the analyzed bases are transcribed in at least one tissue sample, but as much as 93% could be present in primary transcripts)
Identification of many novel non-protein transcripts (ncRNAs either overlapping protein-coding loci or in regions thought to be transcriptionally silent)
Numerous unrecognized transcription start sites
Great complexity and overlap of transcribed regions(multiple overlapping transcripts of different sizes from the same or opposite DNA strands)
The human transcription landscape
(Kapranov et al.,Science 2002)
7
The human transcription landscape
(Kapranov et al.,Science 2002)
Most transcribed nucleotides are outside annotated exons
The human transcription landscape
(Kapranov et al., Science 2007)
Genomic distribution of long RNAs
8
Genome transcription relevance?
1. Transcriptional noise
2. Functional
Types of RNAs
Figure 1.12 Genomes 3 (© Garland Science 2007)
(Non‐coding)
piRNA
PASR PALR PASR
Structural RNAs Regulatory RNAs
9
Types of RNAs and characteristics
Regulation, imprinting~35000>200 bpLong non-codingRNAs
lncRNAs
Gametogenesis, transposon defense~3300026-30 ntpiwi RNAspiRNAs
(rasi RNAs)
Gene expressionregulation~100021-23 ntmicro RNAsmiRNAs
RNA modification~37560-300 ntsmall nucleolarRNAs
snoRNAs
Splicing?100-300 ntsmall nuclear RNAs
snRNAs
Translation~100074-95 nttransfer RNAstRNAs
Component ofribosome~300-400114-5000 ntribosomal RNAsrRNAs
Coding for proteins~30,000Several kbmessenger RNAmRNA
FunctionNumberSizeNameType
Messenger RNAs
Typical mRNA
10
Micro RNAs
Small non-coding RNAs (21-23 nt) involved in post-transcriptional regulation of gene expression by binding to the 3’ UTR of target mRNAs
Identified in the early 1990s, but not recognized as a distinct class of regulators until the early 2000s
Abundant in many cell types and may be involved in different developmental processes (complex organisms)
Target around 60% of mammalian genes and each miRNA can repress hundreds of genes
Micro RNAs pathway
11
Other new non-coding RNAS
Transcriptional complexity
(61%-72% genes)
12
Long range transcription
Long range transcription could be more frequent thanpreviously thought and involve both exons located very faraway (up to Mb) and exons in different chromosomes
(Gingeras et al., Nat. Rev. Genet. 2007)
Mapping of CAGE-tags to the human genome
Identification of new TSS
Carninci et al., Nat. Genet. 2006
(all) (>2)
13
Alternative splicing affords extensive proteomic andregulatory diversity from a limited repertoire of genes
Cassette exons: Exons that are spliced out of certain mRNAs to alter the amino acid sequence of the expressed protein.
Alternative splicing (AS)
Large-scale analysis of alt. splicing
14
Alternative splicing facts
92-98% of multi exon genes are subjected to alternative splicing (almost 86% of human genes)
~100,000 AS events of different frequency detected in human tissues and many show variation between them
Less than 20% of alternative splicing events appear to be conserved between humans and mice
Alternative splicing could be a key mechanism in generating proteomic diversity
Alternative splicing in human tissues
Wang et al.
Pan et al.
15
Mouse
Rat
Su et al. 2004
Transcript profiling across tissues
Human
http://biogps.gnf.org/
Transcript profiling in the brain
Relationship between tissues
Functional differences between tissues
Identification of tissue specific promoters
16
Transcriptome
reflects Biology
ALL – acute lymphoblastic leukemia
AML – acute myeloid leukemia
Transcript profiling in cancer
Molecular classificationof cancer types
- Predictive
- Treatment
Transcript profiling across individuals
Characterization of expression levels in lymphoblastoidcell lines from individuals of hapMap populations
Many genes with variation accross individuals
Gene expression phenotypes are highly heritable
Higher proportion of expression differences related tocis-variants (although they are also easier to detect)
17
Transcript profiling across individuals
Mapping gene expression variants
(moderate effects, more genes)
(bigger effects, few genes)
18
Mapping gene expression variants
Mapping gene expression variants
19
Persistence of lactase expression
Lactase production in adults shows large variability in human populations and seems related with pastoralism:
Persistence of lactase expression
20
Persistence of lactase expression
Coding vs. regulatory evolution
Regulatory changes have unique properties that could make them especially important in phenotypic evolution:
• Reduced pleiotropical effects
• Fine-tuning of gene function
• Co-dominance and more efficient selection
Expression analysis are an important tool in evolution study
21
Multiple levels of regulatory evolution
- Basal promoter- Tissue-specific element- Distant enhancer
- Gene duplication or deletion
- Chromosomal inversion- TE insertion
- Transcription factor- microRNA
- Alternative splicing- Alternative 5’ or 3’ UTRs- RNA editing
Epigeneticchromatin
modifications
GeneGeneexpressionexpression
levelslevelsRegulatory change in cis
Structuralchanges Changes in
trans-actingregulator
Changes inthe mRNAsequence
- DNA methylation- Histone modifications
Comparison of gene-expression levels in the brain during human evolution
22
Microarrays and brain evolution
Table 1 | Comparative studies of human and non-human primate gene expression in the brain by oligonucleotide arrays Study Platform Coverage Tissue
compared Species compareda
Sex Age (years)
Tissue origin
Follow-up validation
3 humans all males adults 3 chimps all males adults
Enard et al. (2002)
HG_U95A arrays
12533 probe sets (~9100 genes)
Brain frontal cortex (area 9), plus liver
1 orangutan male adult
autopsy (PMI = 2-6 h)
Northern
5 humans 3 mal./2 fem. 30-71 4 chimps 1 mal./3 fem. 1-24
Cáceres et al. (2003)
HG_U95Av2 arrays
12558 probe sets (~9100 genes)
Brain frontal, and temporal cortex (several regions), plus heart
4 rhesus 2 mal./1 fem. 1-9
surgery and autopsy (PMI = 2-14 h)
cDNA arrays, real-time PCR and in-situ hybridization
3 humans all males. 45-70 3 chimps all males. 12-40
Khaitovich et al. (2004)
HG_U95Av2 arrays
12558 probe sets (~9100 genes)
Brain frontal, temporal and occipital cortex (several regions)
autopsy -
Uddin et al. (2004)
HG_U133A and B arrays
44792 probe sets (~16400 genes)
Brain anterior cingulate cortex
3 humans 1 chimp 1 gorilla 3 rhesus
all females female male 2 mal./1 fem.
14-28 34 41 4-11
autopsy (PMI < 12 h.)
-
Khaitovich et al. (2005)
HG_U133 Plus2 arrays
54613 probe sets (~20700 genes)
Brain frontal cortex (area 9), plus liver, heart, kidney and testis
6 humans 5 chimps
all males all males
45-70 7-40
autopsy -
Khaitovich et al. (2006)
ENCODE 0.1 Fwd. and Rev. arrays
30 Mb of DNA
Brain frontal cortex (area 9), plus heart and testis
6 humans 5 chimps
all males all males
45-70 7-40
autopsy -
Several studies have compared gene-expression levelsin the adult brain of humans and non-human primates:
(Modified from Preuss, Cáceres et al., Nat. Rev. Genet. 2004)
2. Increased expression levels in human brain compared to non-human primates
1. Acceleration of gene-expression changes in the brain during human evolution
(Preuss, Cáceres et al., Nat. Rev. Genet. 2004)
(Enard, Khaitovich et al., Science 2002)
Gene-expression evolution patterns
23
Gene-expression changes in cortex
Bayes t-test(PPDE > 0.99%)
612 genes
SAM (t –test FDR < 0.1%)
577 genes
Metanalysis identified 533 differentially expressed genes in human-chimp cortex with good confirmation by RT-PCR:
89%Sensitivity (TP/TP+FN)
64%Specificity (TP/TP+FP)
78 (61)Total probe sets (genes)
All PCRsRT-PCR validation
y = 0.4775x
R2 = 0.5023
-6
-4
-2
0
2
4
6
-4 -3 -2 -1 0 1 2 3 4 5
Diff. exp. ps.
Array log2 (Hs/Pt ratio)
PC
R lo
g2
(Hs
/Pt
rati
o)
533 4479
40.2% Dec. / 59.8% Inc.(Cáceres, Oldham et al., in prep.)
Functional classification of genes
Genes differentially expressed in human and chimp cortexbelong to a wide diversity of functional categories:
Cell adhesion and signaling
Signal transduction
Cell cycle and proliferation
Cell organization and biogenesis
Nucleic acid metabolism
Carbohydrate metabolism
Lipid metabolism
Protein metabolism
Response to stress
Transport
Development
Unknown
Biological process categories 38
71
45
24
18 33
82
86
46
54
45
76
(Cáceres, Oldham et al., in prep.)
**
24
Follow-up of expression differences
mRNA levels
Protein levels
Cellulardistribution
Phenotypicchanges
Westernblots
cDNAmicroarrays
Northernblots
Real-timeRT-PCR
Immunohistochemistry
In situhybridization
Regulatorystudies
Molecularevolution
Functionalanalysis
Why thrombospondins?
Thrombospondins are extracellular matrix proteins thatregulate cell interactions in multiple tissues
- THBS1 and THBS2(Subfamily A)
- THBS3, THBS4 and COMP(Subfamily B)
(Christopherson et al., Cell 2005)
Synapse formation
THBS domains
In rodents, THBSs have beenimplicated in the induction ofsynapse formation by astrocytesboth in vitro and in vivo
25
THBSs expression patterns
Real-time RT-PCR
mRNA and protein levels of THBS4 and THBS2 are specifically upregulated in human cortex:
~6 foldincrease
Western blot analysis
~2 foldincrease
******
*** ***
(Cáceres et al., Cerebral Cortex 2006)
Cortical distribution of THBS4
Chimp RhesusHuman
Human
Chimp
(Cáceres et al., Cerebral Cortex 2006)
In situ hybridization(THBS4 mRNA)
Immunohistochemistry(THBS4 protein)
Chimp RhesusHuman
Humans sections exhibit denser THBS4labeling of the neuropil, consistent with a
potential role in synaptic activity and plasticity
26
– Greater density of synapses
– Higher synapses turnover
– Increased neurite growth
Increased levels of thrombospondins could be related to changes in synaptic organization and plasticity in the cortex:
Hu
man
Mac
aqu
e
(Duan et al., Cereb.Cortex 2003)
Possible functional consequences?