Presentation on the Chemtools LaBLog electronic notebook system and how I think it could fit into a larger ecosystem of tools and services.
- 1. The Chemtools LaBLog Recording research in the real world
Cameron Neylon Contributions from Jeremy Frey, Andrew Milsted,
Steve Wilson, Simon Coles, Mark Borkum, Jenny Hale, and others
2. Goals
- A complete and useable record for the researcher and research
team
- Enable a human reader to fully reproduce all experiments and
replicate all data analysis in detail
- New functionality (video, search, communication, links,
visualisation)
- Enable machine reading for automated aggregation and
analysis
3. A small challenge Can anyone name or identify a paper in
which it is possible to completely and precisely replicate the data
analysis, including availability of raw data, full details of
tools, version, and parameters for data analysis, and version (or
date) of any databases used in the analysis. 4. A blog as the lab
book http://chemtools.chem.soton.ac.uk/projects/blog / Bio Blogs
http://blogs.openwetware.org/scienceintheopen Discussion 5. One
item one post (1I-1P) system 6. 1I-1P gives every sample a URI 7.
1I-1P relationships between posts An rdf dump of posts and links
between them rendered using Welkin (simile.mit.edu/welkin) 8. 1I-1P
relationships between posts 9. What about semantics?
- System is semantically unaware
- Arbitrary key-value pairs stored as XML
- Complete freedom to add or modify metadata
- Complete freedom to muck it up
10. Templates provide ease of use and consistent metadata
[table] [row] Lane[col]Sample[col]ul [/row] [row] 4[col] [[Dna:%]]
[col] [[box]] [/row] [/table] [[Section>Procedure]]
[[Procedure_Type>electrophoresis_agarose]]
[[Sandpit_group>DrexelDemo]] 11. System to date
- Our main laboratory notebook system
- Around 4000 posts, 800 Gb of data
- Used for biochemistry, synthetic chemistry, biophysics
- Also used as a collaboration and management tool in other
projects
- Currently rolling out onto other sites
12. Goals
- A complete and useable record for the researcher and research
team
- Enable a human reader to fully reproduce all experiments and
replicate all data analysis in detail
- New functionality (video, search, communication, links,
visualisation)
- Enable machine reading for automated aggregation and
analysis
13. Versioning and provenance for analysis using workflows and
API
- Workflow enacted online (MyExperiment)
- Pull down data from lab book and process
- Write results and record back to blog
- Provenance of workflow, versioning, and sharing via MyExp
- Record of enactment in LaBLog
14. Automatic Blogging by Machines 15. Automatic Blogging by
Sensors
- Continuous log of environmental conditions in a laboratory
- Instant detection of erroneous events
- Correlate with inconsistencies in datasets
16. Goals
- A complete and useable record for the researcher and research
team
- Enable a human reader to fully reproduce all experiments and
verify all data analysis in detail
- New functionality (video, search, communication, links,
visualisation)
- Enable machine reading for automated aggregation and
analysis
17. Visualisations and communication 18. 19. 20. Pictorial
commenting
- Annotation tools allow comments and foster collaboration and /
or communication
- Need for more advanced Blog tools / technology around data
21. Goals
- A complete and useable record for the researcher and research
team
- Enable a human reader to fully reproduce all experiments and
verify all data analysis in detail
- New functionality (video, search, communication, links,
visualisation)
- Enable machine reading for automated aggregation and
analysis
? 22. RDF to real RDF?
- Currently just links and post titles
- Infer a vocabulary (probably human driven process)
- Refactor to generate a rich rdf version
23. Linking it all up Unstructured Unfiltered Arbitrary
vocabulary Structured Filtered Controlled vocab Primary lab book
Autoblogging instrument Published paper Database entry Personal
journal Raw data Data processing 24. What could it look like? GO
Ontology Browser Raw SANS Data - D22 run #29483 fromD22 at the
Institut Laue-Langevin Raw SANS Data - D22 run #29483 fromD22 at
the Institut Laue-Langevin 25.