25
The Chemtools LaBLog Recording research in the real world Cameron Neylon Contributions from Jeremy Frey, Andrew Milsted, Steve Wilson, Simon Coles, Mark Borkum, Jenny Hale, and others

The Chemtools LaBLog

Embed Size (px)

DESCRIPTION

Presentation on the Chemtools LaBLog electronic notebook system and how I think it could fit into a larger ecosystem of tools and services.

Citation preview

  • 1. The Chemtools LaBLog Recording research in the real world Cameron Neylon Contributions from Jeremy Frey, Andrew Milsted, Steve Wilson, Simon Coles, Mark Borkum, Jenny Hale, and others

2. Goals

  • A complete and useable record for the researcher and research team
  • Enable a human reader to fully reproduce all experiments and replicate all data analysis in detail
  • New functionality (video, search, communication, links, visualisation)
  • Enable machine reading for automated aggregation and analysis

3. A small challenge Can anyone name or identify a paper in which it is possible to completely and precisely replicate the data analysis, including availability of raw data, full details of tools, version, and parameters for data analysis, and version (or date) of any databases used in the analysis. 4. A blog as the lab book http://chemtools.chem.soton.ac.uk/projects/blog / Bio Blogs http://blogs.openwetware.org/scienceintheopen Discussion 5. One item one post (1I-1P) system 6. 1I-1P gives every sample a URI 7. 1I-1P relationships between posts An rdf dump of posts and links between them rendered using Welkin (simile.mit.edu/welkin) 8. 1I-1P relationships between posts 9. What about semantics?

  • System is semantically unaware
  • Arbitrary key-value pairs stored as XML
  • Complete freedom to add or modify metadata
  • Complete freedom to muck it up

10. Templates provide ease of use and consistent metadata [table] [row] Lane[col]Sample[col]ul [/row] [row] 4[col] [[Dna:%]] [col] [[box]] [/row] [/table] [[Section>Procedure]] [[Procedure_Type>electrophoresis_agarose]] [[Sandpit_group>DrexelDemo]] 11. System to date

  • Our main laboratory notebook system
  • Around 4000 posts, 800 Gb of data
  • Used for biochemistry, synthetic chemistry, biophysics
  • Also used as a collaboration and management tool in other projects
  • Currently rolling out onto other sites

12. Goals

  • A complete and useable record for the researcher and research team
  • Enable a human reader to fully reproduce all experiments and replicate all data analysis in detail
  • New functionality (video, search, communication, links, visualisation)
  • Enable machine reading for automated aggregation and analysis

13. Versioning and provenance for analysis using workflows and API

  • Workflow enacted online (MyExperiment)
  • Pull down data from lab book and process
  • Write results and record back to blog
  • Provenance of workflow, versioning, and sharing via MyExp
  • Record of enactment in LaBLog

14. Automatic Blogging by Machines 15. Automatic Blogging by Sensors

  • Continuous log of environmental conditions in a laboratory
  • Instant detection of erroneous events
  • Correlate with inconsistencies in datasets

16. Goals

  • A complete and useable record for the researcher and research team
  • Enable a human reader to fully reproduce all experiments and verify all data analysis in detail
  • New functionality (video, search, communication, links, visualisation)
  • Enable machine reading for automated aggregation and analysis

17. Visualisations and communication 18. 19. 20. Pictorial commenting

  • Annotation tools allow comments and foster collaboration and / or communication
  • Need for more advanced Blog tools / technology around data

21. Goals

  • A complete and useable record for the researcher and research team
  • Enable a human reader to fully reproduce all experiments and verify all data analysis in detail
  • New functionality (video, search, communication, links, visualisation)
  • Enable machine reading for automated aggregation and analysis

? 22. RDF to real RDF?

  • Currently just links and post titles
  • Include metadata
  • Infer a vocabulary (probably human driven process)
  • Refactor to generate a rich rdf version

23. Linking it all up Unstructured Unfiltered Arbitrary vocabulary Structured Filtered Controlled vocab Primary lab book Autoblogging instrument Published paper Database entry Personal journal Raw data Data processing 24. What could it look like? GO Ontology Browser Raw SANS Data - D22 run #29483 fromD22 at the Institut Laue-Langevin Raw SANS Data - D22 run #29483 fromD22 at the Institut Laue-Langevin 25.