30
Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team Leader Functional Genomics Group

Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Embed Size (px)

Citation preview

Page 1: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Gramene Meeting, PAG 2015

Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants

Robert Petryszak

Gene Expression Team Leader

Functional Genomics Group

Page 2: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Hinxton Genome Campus

Page 3: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

ArrayExpresswww.ebi.ac.uk/arrayexpress

Expression Atlas

www.ebi.ac.uk/gxa

The two databases

Curation, Quality Control,Ontology Annotation,Statistical analysis

Page 4: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Experimental Factor Ontology (EFO)

http://www.ebi.ac.uk/efo• Systematically organized experimental factor terms

• controlled vocabulary + hierarchy (relationship)

• Used in EBI databases:

and external projects (e.g. NHGRI GWAS Catalogue)

• Combine terms from a subset of well-maintained and compatible ontologies, e.g.

• Gene Ontology (cellular component + biological process terms)

• NCBI Taxonomy

• UBERON

• Plant Ontology (~50 terms, but new terms being added regularly)

Page 5: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Expression Atlas

Baseline Expressione.g. which rice genes are expressed in a healthy leaf?

Differential Expressione.g. which maize genes are up-regulated in moderate drought stress versus control?

Page 6: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Plant data sets in Atlas

226 (1)

2 (2)

All (Baseline)

5 (2)

9 (1)

Brassica oleracea 1 --

Oryza Sativa Indica 3 --

Populus trichocarpa 2 --

Vitis Vinifera 5 --

Page 7: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Atlas Search

Search Atlas by gene, organism, condition (in any combination)

Page 8: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Baseline Atlas – Gene Page

Corroboration, but different FPKMs

Page 9: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Baseline Atlas

Page 10: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Baseline Atlas - Hierarchical Clustering Plots

Page 11: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Differential Atlas

Sort by/displaylog2fold change

Page 12: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Differential Atlas (Experiment Page)

Page 13: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Differential Atlas (Experiment Page Plots)

Gene set overlap:GO, Reactome. InterPro

MA plots

Page 14: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Genome Browser View of Expression

Page 15: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Expression data in Reactome Pathway Portal

Robert Petryszak
Page 16: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Annotare – A submission tool for ArrayExpress

Robert Petryszak
Page 17: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Annotare – A submission tool for ArrayExpress

Automatically add sample characteristic column: cultivar?

Robert Petryszak
Page 18: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Pokkali vs. IR29 Salt stress RNA-seq analysis

Points in red represent genes that are DE (FDR = 0.05)Each line corresponds to a gene (94 genes – with different DE pattern across two time points between sensitive and tolerant – are shown)

Which genes have a different DE pattern across time when comparing the salt tolerant and salt sensitive plants?

Robert Petryszak
Page 19: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Plans for 2015 - General

• Deploy good enough normalisation method within (and across different?) RNA-seq experiments

• Show expression variability across biological replicates

• Report gene co-expression in single experiment

• Visualise expression of orthologues

• Automate QC for RNA-seq experiments

• Gene set overlap for user provided gene sets

• Search, presentation and branding improvements

Robert Petryszak
Page 20: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Plans for 2015 - Gramene

✓ Baseline expression summary for pathways in Plant Reactome (Pathway Portal)

• Baseline resource publication for Gramene• Atlas workflow• Differential analysis of salt stress in Pokkali and IR29

• Add Brachypodium distachyon data sets to Atlas

• Procure more maize experiments

• Facilitate plant experiment submissions in Annotare

• …

Robert Petryszak
Page 21: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Molecular Atlas Cluster Leader

Expression Analysis, Statistics, RNA-seq

Curation Tools, Ontology, RDF Member StatesFunding:

Alvis Brazma

Eleanor WilliamsAmy TangCatherine Snow

Maria KeaysNuno Fonseca

Simon Jupp James MaloneTony Burdett Julie McMurray

Research GuidanceJohn MarioniWolfgang Huber

Training, OutreachAmy Tang

Acknowledgements

Annotare Olga Melnichuk Emma HastingsNikolay Kolesnikov Tony BurdettArray Express Submissions, Curation

Web ServicesElisabet Barrera Casanova Oliver MannionAlfonso Munoz-Pomer FuentesUgis Sarkans

Gramene Doreen Ware Justin PreecePankaj Jaiswal Matt Geniza

ReactomeAntonio FabregatWolfgang Huber

Page 22: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Thank You

Page 23: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

The two databases: how do they compare? ArrayExpress

(Archive)Expression Atlas

(Value added)Central object Experiment Gene and condition

Microarray data

Sequencing data RNA-seq data

Query for…Experiment details and

associated data

Gene expression under certain conditions or in

contrasting sample groups

Download data for further analysis

Submit data X

Private data

Curated dataYes (direct

submissions) /No (GEO-imported)

All curated and annotated to EFO

ontology

Page 24: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Reporting standards - MAGE-TAB formatA simple spreadsheet format that uses a number of tab-delimited text files

Array Design Format file

Probe names, sequence, genomic mapping location

ADF(microarray

only)

Raw and processed data files

1.fq.gz

.CEL

A1.CEL

Normalized.txt

2.fq.gz

MAGE-TAB in FGED: http://www.mged.org/mage-tab/index.html

Investigation Description Format file

• Experiment title + description• Submitter’s details • All protocols

IDF

Sample Data Relationship Format file

SDRF /Seq lib

Hyb/seq assays

Data1.txtData2.txt

**

Page 25: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Example SDRF: workflow from samples to data

Page 26: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Example IDF: expt. info and protocols

Page 27: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Quality control

Normalisation (RMA)

Moderated T-test

FDR-adjustment

MetadataCEL files

Outlier samples

Gene (Adj. P-Val, Log2Fold, T-statistic)

Microarray Processing Pipeline (limma)

Normalised expression values per probe set

Samples Disease

Sample 1 normal

Sample 2 diabetes

Sample 3 diabetes

Sample 4 normal

Sample 5 normal

Sample 6 diabetes

Manually curated “comparison”Disease: diabetes vs. normal

fold-changep-value

Benjamini & Hochberg

Insufficient replicates

Page 28: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

Quality controlLow-quality reads

NNACGANNNNNMapping

(TopHat2)

BaselineQuantification

HTSeq

Summarization

FASTQ files

Metadata

DESeq

Differential

Contamination: Fungi, microbes

Gene (FPKM) Gene (Adj. P-Val, Log2Fold)

Genome References

RNA-seq Processing Pipeline (iRAP)https://code.google.com/p/irap

Page 29: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

RNA-seq Processing Pipeline (iRAP)https://code.google.com/p/irap

Page 30: Gramene Meeting, PAG 2015 Expression Atlas - a New Resource for Baseline and Differential Gene Expression for Plants Robert Petryszak Gene Expression Team

RNA-seq Processing Pipeline (iRAP)https://code.google.com/p/irap