Thomas Lemberger Chief Editor, Molecular Systems Biology Deputy
Head, Scientific Publications, EMBO Publishing actionable data
Slide 2
Slide 3
Big Data
Slide 4
structures sequences functional genomics proteomics genotype
phenotype Structured data metabolomics
Slide 5
Publishing papers Depositing datasets
Slide 6
The two vital components of the scientific endeavor the idea
and the evidence are too frequently separated
Slide 7
Slide 8
Database Users Journals Centralized vs distributed
infrastructure Research data
Slide 9
Scientific publishing Dominant channel for the dissemination of
peer-reviewed data. Journals function as a proxy for quality in
research assessment The rate of publishing keeps increasing. Papers
are human-readable but poorly machine-readable.
Slide 10
Slide 11
Figure = Data Text = Narrative
Slide 12
Data in figures Use casesIssues Understand the data Re-analyze
the data Give / claim credit for the data Seach for specific
evidence Compare to related data Mine data systematically Browse
through the data & the literature Complexity Unstructured No
metadata standard Source data not available
Slide 13
Tools to publish figures as structured digital objects that
link the human-readable illustrations with machine-readable
metadata and source data in order to improve data transparency;
make published data useable; enable data-oriented search. 9/27
SourceData
Slide 14
A scientific result converted into a collection of pixels 8/27
What is a figure?
Slide 15
11/27
Slide 16
12/27
Slide 17
Data archival service Data reproducibility Data reuse
Data-oriented search
Slide 18
Reproducibility: figures as packages descriptive metadata RDF
experimental data CSV figure JPEG manifest XML caption HTML code
PY
Slide 19
Reproducibility: figures as packages
Slide 20
Slide 21
Slide 22
Slide 23
Slide 24
(A) Primary early-passage MEFs were infected with
MSCV-Myc-ERTAM-IRES- GFP (Myc-ER) or MSCV-IRES-GFP (GFP) virus.
GFP+ cells were then left untreated () or were treated (+) with 2 m
4-HTChx pretreatment (30 min) for 24 h and assessed for their
expression of the indicated mRNAs ( cks1, skp2, rcl and cdc) by
SYBR-green real-time PCR analysis. Levels of mRNA were standardized
to Ub.
Slide 25
Slide 26
Type:chemical and biological OBJECTS: small molecules, genes,
proteins, sub-cellular structures, cell type, tissue, species.
Role:role of each in the experimental DESIGN Assay:the type of
ASSAY used to perform the measurements Measured object Target of
intevention experimental system 15/27 Assay Knowledge model
Slide 27
Uniprot:P02340 Uniprot:Q61769 Gene:20613 Gene:433759 HDAC 1 SN1
Entity types uniprot:P12004 Gene:9773 Gene:107932 CHEBI:6635
Gene:6635 PubChem:72511 Uniprot:P27661 Uniprot:Q6PDQ2
Uniprot:P02340 Uniprot:O09106 A B C Y CHD4 HDAC1 p53 H2AX Phleo
E-cadherin Ki67
Data workflow SourceData curation OK? Major issues? REJECT
Positive decision? REJECT ACCEPT Query author Author response Check
data integrity OK? Check data presentation OK? Check plagiarism OK?
Check expanded view files OK?
Slide 33
Validation by authors
Slide 34
Application: Smart Figures Use casesIssuesSmart Figures
Understand Re-analyze Credit attribution Directed seach
Contextualization Data mining Browsing Complexity Unstructured No
metadata standard Source data not available Panel as coherent units
Descriptive metadata Standard identifiers Source data files Visual
summarization Data-oriented queries Actionable data viewer
Slide 35
35 Paper 1 Paper 2 Data viewer Data-oriented search
Slide 36
Resulting hypothesis: test drug Z in disease D. tissue T
disease D gene x Paper 3 protein X P P kinase Y Paper 2 kinase Y
activity drug Z Paper 1 Data integration 19/27
Slide 37
Database Users Research data Journals Centralized vs
distributed infrastructure
Slide 38
Next Gen Open Access Search
Slide 39
39
Slide 40
search
Slide 41
Title Abstract Synopsis Main paper Supplementary information
Datasets & code What is a paper?