Sharing the knowledge of electrophysiology data
Phillip Lord, Frank Gibson and the CARMEN Consortium
“In the standard model, one collects data, publishes a paper or papers and then gradually loses the original dataset.”
THE NEW KNOWLEDGE ECONOMY AND SCIENCE AND TECHNOLOGY POLICY Geoffrey Bowker, University of California, San Diego
The need for clear metadata
• Most neurosciences data is relative simple in structure
• But often contextually complex
• Sometimes associated with behavioural features
Neuroscience spike data
• The raw data is normally a waveform • But, advances in instrumentation
• High-throughput methods
• But what is the experiment for?• What stimulus is the organism/tissue receiving?
• Which channel is which
• The data sets being produced are (reasonably) large (10’s of Gb, or 1Tb in three months)
Information Extraction
• How do we get extract the information?
http://en.wikipedia.org/wiki/Image:Brain_090407.jpg
http://en.wikipedia.org/wiki/Image:ATTtelephone-large.jpg
istockphoto.com
Multi-Author data
Author PMID Type Size
1 Davierwala et al 16155567 Synthetic_Lethality 627
2 Krogan et al 14759368 Affinity_Capture-MS 164
3 Hazbun et al 14690591 Affinity_Capture-MS 3210
4 Gavin et al 11805826 Affinity_Capture-MS 3596
5 Ho et al 11805837 Affinity_Capture-MS 733
6 Ito et al 11283351 Two-hybrid 275
From Katherine James, NCL
234Two-hybrid17634282Wong et al50576Co-fractionation17507646Aronova et al49117Affinity_Capture-MS17200106Collins et al48
9064Phenotypic_Enhancement17314980Collins et al4714421Phenotypic_Enhancement16269340Schuldiner et al463416Synthetic_Rescue16729061Ye et al45290Co-fractionation16476776Frazier et al44103Biochemical_Activity16319894Ptacek et al43
4179Affinity_Capture-MS14660704Graumann et al42477Two-hybrid16172405Measday et al41215Synthetic_Lethality16118188Milgrom et al40107Affinity_Capture-MS16429126Gavin et al39
6531Affinity_Capture-MS16554755Krogan et al387076Synthetic_Growth_Defect16487579Pan et al374535Synthetic_Lethality16157669Daniel et al36214Synthetic_Lethality15725626Loeillet et al35124Synthetic_Lethality15525520Pan et al34323Synthetic_Lethality15715908Lesage et al33292Synthetic_Lethality15166135Lesage et al32175Affinity_Capture-Western15657441Ingvarsdottir et al31138Biochemical_Activity14574415Ubersax et al30369Affinity_Capture-Western15879519Millson et al29125Reconstituted_Complex15766533Zhao et al28464Two-hybrid11087867Newman et al27134Two-hybrid15590687Hannich et al26157Protein-peptide15563457Kong et al25113Affinity_Capture-MS15353583Krogan et al24181Affinity_Capture-MS15292183Panse et al23116Affinity_Capture-MS11387327Allen et al22125Affinity_Capture-Western11743162Tong et al21232Two-hybrid11489916Drees et al20182Two-hybrid9207794Fromont-Racine et al19160Two-hybrid10900456Fromont-Racine et al18258Affinity_Capture-MS14729968Baetz et al17102Affinity_Capture-MS12052880Sanders et al16370Affinity_Capture-MS14690608Krogan et al15630Affinity_Capture-MS11884590Ohi et al14150Affinity_Capture-MS12150911Grandi et al13456Affinity_Capture-MS12374754Nissan et al12134Affinity_Capture-MS12556496Lindstrom et al11104Two-hybrid16093310Miller et al10
1941Two-hybrid10688190Uetz et al9823Synthetic_Lethality14764870Tong et al8
3411Synthetic_Lethality11743205Tong et al7275Two-hybrid11283351Ito et al6733Affinity_Capture-MS11805837Ho et al5
3596Affinity_Capture-MS11805826Gavin et al43210Affinity_Capture-MS14690591Hazbun et al3164Affinity_Capture-MS14759368Krogan et al2627Synthetic_Lethality16155567Davierwala et al1
SizeTypePMIDAuthor
How do we represent…
LaboratoryExperiments
In silico Analysis
Derived data
View from microarrays
Content Standard – Minimal Information
MAGE -- Structure MO -- Terminology
From the MGED society
The CARMEN approach
Content Standard – Minimal Information about a Neuroscience Investigation
FuGE -- StructureOBI – Ontology for Biomedical Investigations
Minimal Information About a Neuroscience Investigation: What do I have to tell you, for you to understand what I did?
• Subdivided as;– Contact and context– Study subject – Recording location – Task– Stimulus– Behavioural event – Recording– Time series data
Study inputs
Assay inputs
Assay procedures
Data
MINI in relation to the life-science reporting requirements
generic data analysis ● 11 clustering 2 image characterisation ● 7 population genetic analysis 2 gene function assignment 11 genetic association/ mapping 2 relative/ absolute quantitation 8 sequence assembly 1
SPECIALISATION ● C
IMR [
†]
● M
IACA
●
MIAM
E
● M
IAME/
Env
●
MIAM
E/Nu
tr
● M
IAME/
Plant
●
MIAM
E/To
x
● M
IAPA
●
MIAP
E [†]
●
MIAR
E
● M
IFlow
Cyt
●
MIG
en
● M
IGS/M
IMS
●
MIM
Ix
● M
IMPP
●
MIN
I
● M
IQAS
●
MI
qPC
R
● M
IRIAM
●
MISF
ISHIE
●
STRE
NDA
Ro
w to
tals
MIAOWS
• Describe essential metadata for your analysis code– What does it do? Objective– What type of input does it need– What type of output does it produce
– If this information is not described, your code is of most value to yourself and much less value to the community
The CARMEN approach
Content Standard – Minimal Information about a Neuroscience Investigation
FuGE -- StructureOBI – Ontology for Biomedical Investigations
Functional Genomics Experiment (FuGE)How do I tell you, for you to understand?
• Model of common components in science investigations, such as materials, data, protocols, equipment and software.
• Provides a framework for capturing complete laboratory workflows, enabling the integration of pre-existing data formats.
Model Driven Architecture -- SyMBA
XMLXMLJava objects
UML
database
FuGE community of users
• MGED (transcriptomics) • Proteomics Standards Initiative • Metabolomics Standards Initiative (NMR and sample
processing groups) • Genomics Standards Consortium (MIGS) • CARMEN, Code Analysis, Repository and Modelling for
e-Neuroscience • Flow Informatics and Computational Cytometry Society • MIARE: Minimum Information About an RNAi Experiment
Functional Genomics Experiment (FuGE)How do I tell you, for you to understand?
• Model of common components in science investigations, such as materials, data, protocols, equipment and software.
• Provides a framework for capturing complete laboratory workflows, enabling the integration of pre-existing data formats.
The CARMEN approach
Content Standard – Minimal Information about a Neuroscience Investigation
FuGE -- StructureOBI – Ontology for Biomedical Investigations
OBI – Ontology of Biomedical Investigations
OBI branches:development work
Diversity communitiesfrom Nutrition to Metabolomics, from Environmental to genomics to Immunology, Imaging and Data analysis
Protocol application branch
Biomaterial branch
Function branch
Data Transformation branch
Role branch
Digital entity branch
Molecular entities
Instrument branch
Adapted from Philippe Rocca-Serra, 2008
Summary
• We are generating metadata “standards” for neurosciences
• We are following a well-trodden path from bioinformatics• We adopted FuGE and have built MINI
Future Work
• More neurosciences experimental datatypes. • Minimal Information about a Service
– Describe analysis software as well as lab experiments. • Outreach!
Acknowledgements
MINI: Frank Gibson, Paul G Overton, Tom V Smulders, Simon R Schultz, Stephen J Eglen, Colin D Ingram, Stefano Panzeri, Phil Bream, Evelyne Sernagor, Mark Cunningham, Christopher Adams, Christoph Echtermeyer, Jennifer Simonotto, Marcus Kaiser, Daniel C Swan, Martyn Fletcher, Phillip Lord
CISBAN: Anil Wipat (PI), Allyson Lister (Research Associate), FuGE: The FuGE consortiumOBI: The OBI consortium
CARMEN: http://www.carmen.org.ukSyMBA: http://symba.sourceforge.netFuGE: http://fuge.sourceforge.netOBI: http://obi.sourceforge.net
CARMEN Consortium