View
229
Download
5
Category
Preview:
Citation preview
The European Bioinformatics InstituteThe European Bioinformatics Institute
MGED ontology for consistent annotation of microarray experiments
Manchester Bioinformatics WeekOntologies Workshop1
March 23-24th 2002
Philippe Rocca-Serra
Microarray Informatics Team
EBI-EMBL, Hinxton Cambridge
The European Bioinformatics InstituteThe European Bioinformatics Institute
ArrayExpress: a database for Gene Expression Studies
Samples
Genes
Gene expression data matrix
The European Bioinformatics InstituteThe European Bioinformatics Institute
ArrayExpress goals
To create a public repository for gene expression data:
apply a standard format
apply curation to the data (high quality control)
easy access to information
search and retrieve information
To compare experiments.
To perform analysis and data mining using complex querying
The European Bioinformatics InstituteThe European Bioinformatics Institute
Gene expression data matrix
Experiment (platform, conditions…)
What kind of data should be stored ?
Samples
Genes & transcription units
annotations
The European Bioinformatics InstituteThe European Bioinformatics Institute
Important issues about data annotation
Sufficient annotation of the experiment, genes and samples
Efficient annotation:
•Machine processable: effective mining agents•Homogenous: consistent annotation•Unambiguous: accurate description, sample discrimination.
The European Bioinformatics InstituteThe European Bioinformatics Institute
MIAME Requirements:addressing the issue of sufficient annotation
Experimental design: the set of hybridisation experiments as a whole
Array design: each array used and each element (spot) on the array
Samples: samples used, extract preparation and labelling Hybridisations: procedures and parameters Measurements: images, quantitation, specifications Normalisation controls: types, values, specifications
(Brazma et al, Nature Genetics, 2001)
Samples: samples used, extract preparation and labelling
Recorded info should be sufficient to interpret and replicate the experiment
The European Bioinformatics InstituteThe European Bioinformatics Institute
Second ChallengeAddressing the issue of annotation efficiency
requires machine understandable annotations:– Avoid free text and natural language:– Avoid synonyms: adrenaline / epinephrine
– General use of CV and Ontologies Gene annotation using e.g. GO and pathway analysis
Create a new ontology where necessary:– Task assigned to MGED for Biomaterial (sample)
description
One of the main MGED Goal to facilitate the adoption of standards for DNA-array experiment annotation and data
representation
The European Bioinformatics InstituteThe European Bioinformatics Institute
ArrayExpress DB is an implementation of the MAGE-OM model (a UML model)
MAGE model by construction includes the use of ontology entries :
-37 locations for an “Ontology Entry”
-36 cases of simple Controlled Vocabularies: e.g. Image Format (TIFF, JPEG)
-1 has required development of specific modelling:
Biomaterial (sample) description
Ontology integration in the object model describing ArrayExpress database
The European Bioinformatics InstituteThe European Bioinformatics Institute
MAGE BioMaterial Model
The European Bioinformatics InstituteThe European Bioinformatics Institute
Facts about MGED biomaterial ontology
Authors: Developed by Chris Stoeckert, U. Penn and Helen Parkinson, EBI
Coordinated with the ArrayExpress database model (mapping available)
Technical choices: Use of the OIL Language–A new standard for building ontologies provides support for Formal
Semantics and Reasoning:–Class/property modelling primitives based on Frame based systems:–Semantics Capturing based on Description Logics: –Syntax for encoding primitives and semantics based on existing Web
languages: XML
Availability: http://mged.sourceforge.net/Ontologies.shtml
The European Bioinformatics InstituteThe European Bioinformatics Institute
MGED ontology:features & complexity
Facts about the ontology:– 75 classes– 70 slots– 98 individuals– more individuals to
be added
The European Bioinformatics InstituteThe European Bioinformatics Institute
Using MGED Ontology: a Browseable Form
The European Bioinformatics InstituteThe European Bioinformatics Institute
MGED defined concepts: internal terms
The European Bioinformatics InstituteThe European Bioinformatics Institute
Linking to external ontologies: an application
The European Bioinformatics InstituteThe European Bioinformatics Institute
©-BioMaterialDescription
©-Biosource Property
©-Organism
©-Age
©-DevelopmentStage
©-Sex
©-StrainOrLine
©-BiosourceProvider
©-OrganismPart
©-BioMaterialManipulation
©-EnvironmentalHistory
©-CultureCondition
©-Temperature
©-Humidity
©-Light
©-PathogenTests
©-Water
©-Nutrients
©-Treatment
©-CompoundBasedTreatment
(Compound)
(Treatment_application)
(Measurement)
Instances
7 weeks after birth
Female
Charles River, Japan
22 2C
55 5%
12 hours light/dark cycle
Specified pathogen free conditions
ad libitum
MF, Oriental Yeast, Tokyo, Japan
in vivo, oral gavage
100mg/kg body weight
MGED Ontology External References
NCBI TaxonomyNCBI Taxonomy
Mouse Anatomical DictionaryMouse Anatomical Dictionary
International Committee on Standardized Genetic Nomenclature for Mice
International Committee on Standardized Genetic Nomenclature for Mice
Mouse Anatomical DictionaryMouse Anatomical Dictionary
ChemIDplusChemIDplus
Mus musculus musculus id: 39442
Stage 28
C57BL/6
Liver
Fenofibrate, CAS 49562-28-9
The European Bioinformatics InstituteThe European Bioinformatics Institute
Referencing to external ontologies
NCBI taxonomy database Jackson Lab mouse strains and genes Edinburgh mouse atlas anatomy GO Gene Ontology HUGO nomenclature for Human genes Chemical and compound Ontologies - Merck index TAIR Flybase …..and many more…www.mged.org/ontology/
The European Bioinformatics InstituteThe European Bioinformatics Institute
Planning MGED ontology’s future
Making the ontology available where it’s needed:
Develop browser or other interface for the ontology and link to LIMS
Incorporate the ontology into submission/annotation and curation tools (MIAMExpress)
The European Bioinformatics InstituteThe European Bioinformatics Institute
Planning MGED ontology’s future
ArrayExpress DB
Direct Submission in Mage-ML
Large centres LIMSSubmission via
MIAMExpress
Curation DB
Other submitters
Ontology availability made simple ?
MGED/ArrayExpress ontology
External Ontologies
The European Bioinformatics InstituteThe European Bioinformatics Institute
Planning MGED ontology’s future
Making the ontology available where it’s needed: Develop browser or other interface for the ontology and link to
LIMS Incorporate the ontology into submission/annotation and
curation tools (MIAMExpress)
Further ontology development : new instances, class refinementBetter integration of available ontologiesWriting guidelines on how to use ontologies for annotating data:
Developing Use cases (non trivial task)
The European Bioinformatics InstituteThe European Bioinformatics Institute
Resources
List of ontology resources from MGED pages MAGE-MIAME-ontology mappings, MIAME glossary Schemas for both ArrayExpress and MIAMExpress Annotation examples in MAGE-ML
URL: www.mged.org ¦ www.ebi.ac.uk/microarray
mailing lists: microarray-ontol-request@ebi.ac.uk
microarray-annot-request@ebi.ac.uk
The European Bioinformatics InstituteThe European Bioinformatics Institute
Acknowledgements
EBI-EMBL: University of Pennsylvania:
H. Parkinson C. Stoeckert
S. Sansone
E. Holloway
A. Brazma
And the Microarray Informatics Team.
Recommended