21
The European Bioinformatics Institute The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1 March 23-24 th 2002 Philippe Rocca-Serra Microarray Informatics Team EBI-EMBL, Hinxton Cambridge

The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1

Embed Size (px)

Citation preview

Page 1: The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1

The European Bioinformatics InstituteThe European Bioinformatics Institute

MGED ontology for consistent annotation of microarray experiments

Manchester Bioinformatics WeekOntologies Workshop1

March 23-24th 2002

Philippe Rocca-Serra

Microarray Informatics Team

EBI-EMBL, Hinxton Cambridge

Page 2: The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1

The European Bioinformatics InstituteThe European Bioinformatics Institute

ArrayExpress: a database for Gene Expression Studies

Samples

Genes

Gene expression data matrix

Page 3: The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1

The European Bioinformatics InstituteThe European Bioinformatics Institute

ArrayExpress goals

To create a public repository for gene expression data:

apply a standard format

apply curation to the data (high quality control)

easy access to information

search and retrieve information

To compare experiments.

To perform analysis and data mining using complex querying

Page 4: The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1

The European Bioinformatics InstituteThe European Bioinformatics Institute

Gene expression data matrix

Experiment (platform, conditions…)

What kind of data should be stored ?

Samples

Genes & transcription units

annotations

Page 5: The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1

The European Bioinformatics InstituteThe European Bioinformatics Institute

Important issues about data annotation

Sufficient annotation of the experiment, genes and samples

Efficient annotation:

•Machine processable: effective mining agents•Homogenous: consistent annotation•Unambiguous: accurate description, sample discrimination.

Page 6: The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1

The European Bioinformatics InstituteThe European Bioinformatics Institute

MIAME Requirements:addressing the issue of sufficient annotation

Experimental design: the set of hybridisation experiments as a whole

Array design: each array used and each element (spot) on the array

Samples: samples used, extract preparation and labelling Hybridisations: procedures and parameters Measurements: images, quantitation, specifications Normalisation controls: types, values, specifications

(Brazma et al, Nature Genetics, 2001)

Samples: samples used, extract preparation and labelling

Recorded info should be sufficient to interpret and replicate the experiment

Page 7: The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1

The European Bioinformatics InstituteThe European Bioinformatics Institute

Second ChallengeAddressing the issue of annotation efficiency

requires machine understandable annotations:– Avoid free text and natural language:– Avoid synonyms: adrenaline / epinephrine

– General use of CV and Ontologies Gene annotation using e.g. GO and pathway analysis

Create a new ontology where necessary:– Task assigned to MGED for Biomaterial (sample)

description

One of the main MGED Goal to facilitate the adoption of standards for DNA-array experiment annotation and data

representation

Page 8: The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1

The European Bioinformatics InstituteThe European Bioinformatics Institute

ArrayExpress DB is an implementation of the MAGE-OM model (a UML model)

MAGE model by construction includes the use of ontology entries :

-37 locations for an “Ontology Entry”

-36 cases of simple Controlled Vocabularies: e.g. Image Format (TIFF, JPEG)

-1 has required development of specific modelling:

Biomaterial (sample) description

Ontology integration in the object model describing ArrayExpress database

Page 9: The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1

The European Bioinformatics InstituteThe European Bioinformatics Institute

MAGE BioMaterial Model

Page 10: The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1

The European Bioinformatics InstituteThe European Bioinformatics Institute

Facts about MGED biomaterial ontology

Authors: Developed by Chris Stoeckert, U. Penn and Helen Parkinson, EBI

Coordinated with the ArrayExpress database model (mapping available)

Technical choices: Use of the OIL Language–A new standard for building ontologies provides support for Formal

Semantics and Reasoning:–Class/property modelling primitives based on Frame based systems:–Semantics Capturing based on Description Logics: –Syntax for encoding primitives and semantics based on existing Web

languages: XML

Availability: http://mged.sourceforge.net/Ontologies.shtml

Page 11: The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1

The European Bioinformatics InstituteThe European Bioinformatics Institute

MGED ontology:features & complexity

Facts about the ontology:– 75 classes– 70 slots– 98 individuals– more individuals to

be added

Page 12: The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1

The European Bioinformatics InstituteThe European Bioinformatics Institute

Using MGED Ontology: a Browseable Form

Page 13: The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1

The European Bioinformatics InstituteThe European Bioinformatics Institute

MGED defined concepts: internal terms

Page 14: The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1

The European Bioinformatics InstituteThe European Bioinformatics Institute

Linking to external ontologies: an application

Page 15: The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1

The European Bioinformatics InstituteThe European Bioinformatics Institute

©-BioMaterialDescription

©-Biosource Property

©-Organism

©-Age

©-DevelopmentStage

©-Sex

©-StrainOrLine

©-BiosourceProvider

©-OrganismPart

©-BioMaterialManipulation

©-EnvironmentalHistory

©-CultureCondition

©-Temperature

©-Humidity

©-Light

©-PathogenTests

©-Water

©-Nutrients

©-Treatment

©-CompoundBasedTreatment

(Compound)

(Treatment_application)

(Measurement)

Instances

7 weeks after birth

Female

Charles River, Japan

22 2C

55 5%

12 hours light/dark cycle

Specified pathogen free conditions

ad libitum

MF, Oriental Yeast, Tokyo, Japan

in vivo, oral gavage

100mg/kg body weight

MGED Ontology External References

NCBI TaxonomyNCBI Taxonomy

Mouse Anatomical DictionaryMouse Anatomical Dictionary

International Committee on Standardized Genetic Nomenclature for Mice

International Committee on Standardized Genetic Nomenclature for Mice

Mouse Anatomical DictionaryMouse Anatomical Dictionary

ChemIDplusChemIDplus

Mus musculus musculus id: 39442

Stage 28

C57BL/6

Liver

Fenofibrate, CAS 49562-28-9

Page 16: The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1

The European Bioinformatics InstituteThe European Bioinformatics Institute

Referencing to external ontologies

NCBI taxonomy database Jackson Lab mouse strains and genes Edinburgh mouse atlas anatomy GO Gene Ontology HUGO nomenclature for Human genes Chemical and compound Ontologies - Merck index TAIR Flybase …..and many more…www.mged.org/ontology/

Page 17: The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1

The European Bioinformatics InstituteThe European Bioinformatics Institute

Planning MGED ontology’s future

Making the ontology available where it’s needed:

Develop browser or other interface for the ontology and link to LIMS

Incorporate the ontology into submission/annotation and curation tools (MIAMExpress)

Page 18: The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1

The European Bioinformatics InstituteThe European Bioinformatics Institute

Planning MGED ontology’s future

ArrayExpress DB

Direct Submission in Mage-ML

Large centres LIMSSubmission via

MIAMExpress

Curation DB

Other submitters

Ontology availability made simple ?

MGED/ArrayExpress ontology

External Ontologies

Page 19: The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1

The European Bioinformatics InstituteThe European Bioinformatics Institute

Planning MGED ontology’s future

Making the ontology available where it’s needed: Develop browser or other interface for the ontology and link to

LIMS Incorporate the ontology into submission/annotation and

curation tools (MIAMExpress)

Further ontology development : new instances, class refinementBetter integration of available ontologiesWriting guidelines on how to use ontologies for annotating data:

Developing Use cases (non trivial task)

Page 20: The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1

The European Bioinformatics InstituteThe European Bioinformatics Institute

Resources

List of ontology resources from MGED pages MAGE-MIAME-ontology mappings, MIAME glossary Schemas for both ArrayExpress and MIAMExpress Annotation examples in MAGE-ML

URL: www.mged.org ¦ www.ebi.ac.uk/microarray

mailing lists: [email protected]

[email protected]

Page 21: The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1

The European Bioinformatics InstituteThe European Bioinformatics Institute

Acknowledgements

EBI-EMBL: University of Pennsylvania:

H. Parkinson C. Stoeckert

S. Sansone

E. Holloway

A. Brazma

And the Microarray Informatics Team.