Upload
alissa-eidson
View
221
Download
2
Tags:
Embed Size (px)
Citation preview
Gramene Meeting, PAG 2015
Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants
Robert Petryszak
Gene Expression Team Leader
Functional Genomics Group
Hinxton Genome Campus
ArrayExpresswww.ebi.ac.uk/arrayexpress
Expression Atlas
www.ebi.ac.uk/gxa
The two databases
Curation, Quality Control,Ontology Annotation,Statistical analysis
Experimental Factor Ontology (EFO)
http://www.ebi.ac.uk/efo• Systematically organized experimental factor terms
• controlled vocabulary + hierarchy (relationship)
• Used in EBI databases:
and external projects (e.g. NHGRI GWAS Catalogue)
• Combine terms from a subset of well-maintained and compatible ontologies, e.g.
• Gene Ontology (cellular component + biological process terms)
• NCBI Taxonomy
• UBERON
• Plant Ontology (~50 terms, but new terms being added regularly)
Expression Atlas
Baseline Expressione.g. which rice genes are expressed in a healthy leaf?
Differential Expressione.g. which maize genes are up-regulated in moderate drought stress versus control?
Plant data sets in Atlas
226 (1)
2 (2)
All (Baseline)
5 (2)
9 (1)
Brassica oleracea 1 --
Oryza Sativa Indica 3 --
Populus trichocarpa 2 --
Vitis Vinifera 5 --
Atlas Search
Search Atlas by gene, organism, condition (in any combination)
Baseline Atlas – Gene Page
Corroboration, but different FPKMs
Baseline Atlas
Baseline Atlas - Hierarchical Clustering Plots
Differential Atlas
Sort by/displaylog2fold change
Differential Atlas (Experiment Page)
Differential Atlas (Experiment Page Plots)
Gene set overlap:GO, Reactome. InterPro
MA plots
Genome Browser View of Expression
Expression data in Reactome Pathway Portal
Annotare – A submission tool for ArrayExpress
Annotare – A submission tool for ArrayExpress
Automatically add sample characteristic column: cultivar?
Pokkali vs. IR29 Salt stress RNA-seq analysis
Points in red represent genes that are DE (FDR = 0.05)Each line corresponds to a gene (94 genes – with different DE pattern across two time points between sensitive and tolerant – are shown)
Which genes have a different DE pattern across time when comparing the salt tolerant and salt sensitive plants?
Plans for 2015 - General
• Deploy good enough normalisation method within (and across different?) RNA-seq experiments
• Show expression variability across biological replicates
• Report gene co-expression in single experiment
• Visualise expression of orthologues
• Automate QC for RNA-seq experiments
• Gene set overlap for user provided gene sets
• Search, presentation and branding improvements
Plans for 2015 - Gramene
✓ Baseline expression summary for pathways in Plant Reactome (Pathway Portal)
• Baseline resource publication for Gramene• Atlas workflow• Differential analysis of salt stress in Pokkali and IR29
• Add Brachypodium distachyon data sets to Atlas
• Procure more maize experiments
• Facilitate plant experiment submissions in Annotare
• …
Molecular Atlas Cluster Leader
Expression Analysis, Statistics, RNA-seq
Curation Tools, Ontology, RDF Member StatesFunding:
Alvis Brazma
Eleanor WilliamsAmy TangCatherine Snow
Maria KeaysNuno Fonseca
Simon Jupp James MaloneTony Burdett Julie McMurray
Research GuidanceJohn MarioniWolfgang Huber
Training, OutreachAmy Tang
Acknowledgements
Annotare Olga Melnichuk Emma HastingsNikolay Kolesnikov Tony BurdettArray Express Submissions, Curation
Web ServicesElisabet Barrera Casanova Oliver MannionAlfonso Munoz-Pomer FuentesUgis Sarkans
Gramene Doreen Ware Justin PreecePankaj Jaiswal Matt Geniza
ReactomeAntonio FabregatWolfgang Huber
Thank You
The two databases: how do they compare? ArrayExpress
(Archive)Expression Atlas
(Value added)Central object Experiment Gene and condition
Microarray data
Sequencing data RNA-seq data
Query for…Experiment details and
associated data
Gene expression under certain conditions or in
contrasting sample groups
Download data for further analysis
Submit data X
Private data
Curated dataYes (direct
submissions) /No (GEO-imported)
All curated and annotated to EFO
ontology
Reporting standards - MAGE-TAB formatA simple spreadsheet format that uses a number of tab-delimited text files
Array Design Format file
Probe names, sequence, genomic mapping location
ADF(microarray
only)
Raw and processed data files
1.fq.gz
.CEL
A1.CEL
Normalized.txt
2.fq.gz
MAGE-TAB in FGED: http://www.mged.org/mage-tab/index.html
Investigation Description Format file
• Experiment title + description• Submitter’s details • All protocols
IDF
Sample Data Relationship Format file
SDRF /Seq lib
Hyb/seq assays
Data1.txtData2.txt
**
Example SDRF: workflow from samples to data
Example IDF: expt. info and protocols
Quality control
Normalisation (RMA)
Moderated T-test
FDR-adjustment
MetadataCEL files
Outlier samples
Gene (Adj. P-Val, Log2Fold, T-statistic)
Microarray Processing Pipeline (limma)
Normalised expression values per probe set
Samples Disease
Sample 1 normal
Sample 2 diabetes
Sample 3 diabetes
Sample 4 normal
Sample 5 normal
Sample 6 diabetes
Manually curated “comparison”Disease: diabetes vs. normal
fold-changep-value
Benjamini & Hochberg
Insufficient replicates
Quality controlLow-quality reads
NNACGANNNNNMapping
(TopHat2)
BaselineQuantification
HTSeq
Summarization
FASTQ files
Metadata
DESeq
Differential
Contamination: Fungi, microbes
Gene (FPKM) Gene (Adj. P-Val, Log2Fold)
Genome References
RNA-seq Processing Pipeline (iRAP)https://code.google.com/p/irap
RNA-seq Processing Pipeline (iRAP)https://code.google.com/p/irap
RNA-seq Processing Pipeline (iRAP)https://code.google.com/p/irap