Thomas Lemberger Chief Editor, Molecular Systems Biology Deputy Head, Scientific Publications, EMBO Publishing actionable data

Embed Size (px)

Citation preview

  • Slide 1
  • Thomas Lemberger Chief Editor, Molecular Systems Biology Deputy Head, Scientific Publications, EMBO Publishing actionable data
  • Slide 2
  • Slide 3
  • Big Data
  • Slide 4
  • structures sequences functional genomics proteomics genotype phenotype Structured data metabolomics
  • Slide 5
  • Publishing papers Depositing datasets
  • Slide 6
  • The two vital components of the scientific endeavor the idea and the evidence are too frequently separated
  • Slide 7
  • Slide 8
  • Database Users Journals Centralized vs distributed infrastructure Research data
  • Slide 9
  • Scientific publishing Dominant channel for the dissemination of peer-reviewed data. Journals function as a proxy for quality in research assessment The rate of publishing keeps increasing. Papers are human-readable but poorly machine-readable.
  • Slide 10
  • Slide 11
  • Figure = Data Text = Narrative
  • Slide 12
  • Data in figures Use casesIssues Understand the data Re-analyze the data Give / claim credit for the data Seach for specific evidence Compare to related data Mine data systematically Browse through the data & the literature Complexity Unstructured No metadata standard Source data not available
  • Slide 13
  • Tools to publish figures as structured digital objects that link the human-readable illustrations with machine-readable metadata and source data in order to improve data transparency; make published data useable; enable data-oriented search. 9/27 SourceData
  • Slide 14
  • A scientific result converted into a collection of pixels 8/27 What is a figure?
  • Slide 15
  • 11/27
  • Slide 16
  • 12/27
  • Slide 17
  • Data archival service Data reproducibility Data reuse Data-oriented search
  • Slide 18
  • Reproducibility: figures as packages descriptive metadata RDF experimental data CSV figure JPEG manifest XML caption HTML code PY
  • Slide 19
  • Reproducibility: figures as packages
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • (A) Primary early-passage MEFs were infected with MSCV-Myc-ERTAM-IRES- GFP (Myc-ER) or MSCV-IRES-GFP (GFP) virus. GFP+ cells were then left untreated () or were treated (+) with 2 m 4-HTChx pretreatment (30 min) for 24 h and assessed for their expression of the indicated mRNAs ( cks1, skp2, rcl and cdc) by SYBR-green real-time PCR analysis. Levels of mRNA were standardized to Ub.
  • Slide 25
  • Slide 26
  • Type:chemical and biological OBJECTS: small molecules, genes, proteins, sub-cellular structures, cell type, tissue, species. Role:role of each in the experimental DESIGN Assay:the type of ASSAY used to perform the measurements Measured object Target of intevention experimental system 15/27 Assay Knowledge model
  • Slide 27
  • Uniprot:P02340 Uniprot:Q61769 Gene:20613 Gene:433759 HDAC 1 SN1 Entity types uniprot:P12004 Gene:9773 Gene:107932 CHEBI:6635 Gene:6635 PubChem:72511 Uniprot:P27661 Uniprot:Q6PDQ2 Uniprot:P02340 Uniprot:O09106 A B C Y CHD4 HDAC1 p53 H2AX Phleo E-cadherin Ki67
  • Slide 28
  • interventionobservation Experimental roles Gene:107932 CHD4 Uniprot:Q6PDQ2 CHD4 Uniprot:O09106 HDAC1 Uniprot:P02340 p53 Uniprot:P27661 H2AX PubChem:72511 Phleo
  • Slide 29
  • Curation tool for data editors
  • Slide 30
  • Slide 31
  • Slide 32
  • Data workflow SourceData curation OK? Major issues? REJECT Positive decision? REJECT ACCEPT Query author Author response Check data integrity OK? Check data presentation OK? Check plagiarism OK? Check expanded view files OK?
  • Slide 33
  • Validation by authors
  • Slide 34
  • Application: Smart Figures Use casesIssuesSmart Figures Understand Re-analyze Credit attribution Directed seach Contextualization Data mining Browsing Complexity Unstructured No metadata standard Source data not available Panel as coherent units Descriptive metadata Standard identifiers Source data files Visual summarization Data-oriented queries Actionable data viewer
  • Slide 35
  • 35 Paper 1 Paper 2 Data viewer Data-oriented search
  • Slide 36
  • Resulting hypothesis: test drug Z in disease D. tissue T disease D gene x Paper 3 protein X P P kinase Y Paper 2 kinase Y activity drug Z Paper 1 Data integration 19/27
  • Slide 37
  • Database Users Research data Journals Centralized vs distributed infrastructure
  • Slide 38
  • Next Gen Open Access Search
  • Slide 39
  • 39
  • Slide 40
  • search
  • Slide 41
  • Title Abstract Synopsis Main paper Supplementary information Datasets & code What is a paper?
  • Slide 42
  • Smad3 Hey1 TGFbeta VE-cdh Rad51 foci AR Tsc2 1 4 62 5 3 1,4 4 5 6 2 Rad51 Nuclear complexes TGFb, Smad3