Upload
bigdatabm
View
119
Download
2
Tags:
Embed Size (px)
Citation preview
BDBM, Moscow June 30, 2014
An Introduction to MOPED Multi-Omics Profiling Expression Database
Eugene [email protected],
What is MOPED?
moped.proteinspire.orgPublically accessible Multi-Omics Database
Protein, Gene, and Pathway expression data
Expression Categorized by organism, tissue,
condition, localization
More info on kolkerlab.org
Thanks to Oxana Trifonova, Andrey Lisitsa!
What is MOPED?
The Multi-OMICS Cascade
PROTEOME
TRANSCRIPTOME
GENOME
METABOLOME
Cell & Organism
What can happen?
What appears to be
happening?
What has happened and
what makes it happen?
What has happened and
what is happening?
???
Modified from Hammock, 2007
Protein and Gene pages summarize expression, external
links, and pathway connections
Consistently processed expression from raw data
Relative expression experiments for comparisons across
tissues and conditions
Key MOPED Features
Discover expression of pathways within experiments
Experiment Metadata linked to expression data
Visualizations of expression data along the chromosome
Key MOPED Features, 2
Protein Details
Multi-omics connection to Gene
Connections to:
Pathways (from Reactome, BioCyc, and PANTHER
External Databases (including GeneCards, UniProt, NCBI)
Protein
concentrations in
ppm, ng/mL, nM
Gene Chromosome Visualizations
Advanced Filtering
Relative Gene
Expression Data
Experiment Metadata
Nature, 2013
OMICS, 2014
• >4 million Gene Expression Records
• >600,000 Protein Expression Records
• Data on Human, Mouse, Worm, Yeast
• >60,000 proteins
• >90,000 genes
• >5,000 pathways
• >22,000 users
from 90 countries
Nature, 2014
Pandey:
~2200 raw
data sets, 1.2 TB
Kuster:
twice less +
other labs’ data
Release Statistics
Volume, Veracity, Velocity,
Variety, and Value
Banking/Marketing/IT:
Volume, Velocity
ValueLife Sciences/Healthcare:
Veracity, Variety
5 Vs of Big Data
Big Data, 2013, 1(1)
What is DELSA?
Data-Enabled Life Sciences
Alliance @ delsaglobal.org Data
Knowledge
Action
Outcomes
Contact EK: [email protected] [email protected]
For more info: moped.proteinspire.organd kolkerlab.org
Спасибо!
Вопросы?
Protein Relative Expression
Life Sciences and Fourth Paradigm
- Theory, Experimentation, Simulation, & Data-enabled Science
- Enormous increase in scale of data generation, vast data
diversity and complexity
- Development, improvement and sustainability of 21st Century
tools, databases, algorithms & cyberinfrastructure
- Past: 1 PI (Lab/Institute/Consortium) = 1 (Gene) Problem
- Future: Knowledge ecologies and New metrics to assess
scientists & outcomes (lab’s capabilities vs. ideas/impact)
- Unprecedented opportunities for scientific discovery and
solutions to major world problems
Urgent Need:
A Sustainable Supporting Ecosystem!
High-dimensional data are particularly prone to overfitting; as a
result, a computational model emerging from the research and
discovery phase may function well on the samples used for the
discovery research, but is inaccurate on any other sample.
Micheel, Nass, Omenn, US National Academies, 2012
The future of science will be influenced by the interconnectivity
of governments, research and educational institutions, and
individual citizens around the globe. Subra Suresh, NSF, 2012
From Data to Outcomes
What is the Local FDR (LFDR)?
• FDR measures cumulative false rate above the threshold
(shaded areas)
• LFDR measures the FDR at the certain threshold (heights)
• LFDR = b/(a+b)• If there are many IDs above the threshold, it is possible for
FDR to be small (e.g. 2%) and LFDR big (e.g. 20%)
• Using LFDR prevents bad IDs being lumped with good IDs
Bioinformatics, 2008
Proteomics, 2010