Thomas Lemberger Chief Editor, Molecular Systems Biology Deputy Head, Scientific Publications, EMBO...
Preview:
Citation preview
- Slide 1
- Thomas Lemberger Chief Editor, Molecular Systems Biology Deputy
Head, Scientific Publications, EMBO Publishing actionable data
- Slide 2
- Slide 3
- Big Data
- Slide 4
- structures sequences functional genomics proteomics genotype
phenotype Structured data metabolomics
- Slide 5
- Publishing papers Depositing datasets
- Slide 6
- The two vital components of the scientific endeavor the idea
and the evidence are too frequently separated
- Slide 7
- Slide 8
- Database Users Journals Centralized vs distributed
infrastructure Research data
- Slide 9
- Scientific publishing Dominant channel for the dissemination of
peer-reviewed data. Journals function as a proxy for quality in
research assessment The rate of publishing keeps increasing. Papers
are human-readable but poorly machine-readable.
- Slide 10
- Slide 11
- Figure = Data Text = Narrative
- Slide 12
- Data in figures Use casesIssues Understand the data Re-analyze
the data Give / claim credit for the data Seach for specific
evidence Compare to related data Mine data systematically Browse
through the data & the literature Complexity Unstructured No
metadata standard Source data not available
- Slide 13
- Tools to publish figures as structured digital objects that
link the human-readable illustrations with machine-readable
metadata and source data in order to improve data transparency;
make published data useable; enable data-oriented search. 9/27
SourceData
- Slide 14
- A scientific result converted into a collection of pixels 8/27
What is a figure?
- Slide 15
- 11/27
- Slide 16
- 12/27
- Slide 17
- Data archival service Data reproducibility Data reuse
Data-oriented search
- Slide 18
- Reproducibility: figures as packages descriptive metadata RDF
experimental data CSV figure JPEG manifest XML caption HTML code
PY
- Slide 19
- Reproducibility: figures as packages
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- (A) Primary early-passage MEFs were infected with
MSCV-Myc-ERTAM-IRES- GFP (Myc-ER) or MSCV-IRES-GFP (GFP) virus.
GFP+ cells were then left untreated () or were treated (+) with 2 m
4-HTChx pretreatment (30 min) for 24 h and assessed for their
expression of the indicated mRNAs ( cks1, skp2, rcl and cdc) by
SYBR-green real-time PCR analysis. Levels of mRNA were standardized
to Ub.
- Slide 25
- Slide 26
- Type:chemical and biological OBJECTS: small molecules, genes,
proteins, sub-cellular structures, cell type, tissue, species.
Role:role of each in the experimental DESIGN Assay:the type of
ASSAY used to perform the measurements Measured object Target of
intevention experimental system 15/27 Assay Knowledge model
- Slide 27
- Uniprot:P02340 Uniprot:Q61769 Gene:20613 Gene:433759 HDAC 1 SN1
Entity types uniprot:P12004 Gene:9773 Gene:107932 CHEBI:6635
Gene:6635 PubChem:72511 Uniprot:P27661 Uniprot:Q6PDQ2
Uniprot:P02340 Uniprot:O09106 A B C Y CHD4 HDAC1 p53 H2AX Phleo
E-cadherin Ki67
- Slide 28
- interventionobservation Experimental roles Gene:107932 CHD4
Uniprot:Q6PDQ2 CHD4 Uniprot:O09106 HDAC1 Uniprot:P02340 p53
Uniprot:P27661 H2AX PubChem:72511 Phleo
- Slide 29
- Curation tool for data editors
- Slide 30
- Slide 31
- Slide 32
- Data workflow SourceData curation OK? Major issues? REJECT
Positive decision? REJECT ACCEPT Query author Author response Check
data integrity OK? Check data presentation OK? Check plagiarism OK?
Check expanded view files OK?
- Slide 33
- Validation by authors
- Slide 34
- Application: Smart Figures Use casesIssuesSmart Figures
Understand Re-analyze Credit attribution Directed seach
Contextualization Data mining Browsing Complexity Unstructured No
metadata standard Source data not available Panel as coherent units
Descriptive metadata Standard identifiers Source data files Visual
summarization Data-oriented queries Actionable data viewer
- Slide 35
- 35 Paper 1 Paper 2 Data viewer Data-oriented search
- Slide 36
- Resulting hypothesis: test drug Z in disease D. tissue T
disease D gene x Paper 3 protein X P P kinase Y Paper 2 kinase Y
activity drug Z Paper 1 Data integration 19/27
- Slide 37
- Database Users Research data Journals Centralized vs
distributed infrastructure
- Slide 38
- Next Gen Open Access Search
- Slide 39
- 39
- Slide 40
- search
- Slide 41
- Title Abstract Synopsis Main paper Supplementary information
Datasets & code What is a paper?
- Slide 42
- Smad3 Hey1 TGFbeta VE-cdh Rad51 foci AR Tsc2 1 4 62 5 3 1,4 4 5
6 2 Rad51 Nuclear complexes TGFb, Smad3