Upload
osborne-harvey
View
218
Download
0
Embed Size (px)
Citation preview
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
MIAME and ArrayExpress- a standard for microarray data
annotation and a database to store it
Helen ParkinsonMicroarray Informatics Team
European Bioinformatics Institute Hinxton
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
Three parts of my talk
Microarray data standards Ontologies for gene expression data ArrayExpress - a public database for
microarray data Analysis tools at the EBI
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
The size of the datasets Experiments:
– ~100 000 different transcripts in human – ~320 cell types– 2000 compounds– 3 time points– 2 concentrations– 2 replicates
Data– 8 x 1011 data-points– 1 x 1015 = 1 Peta Byte for Affymetrix
(data from Jerry Lanfear)
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
Microarray data Microarrays are widely used in experiments and
already producing massive amounts of data These data have to be stored in a well organised
and standard way, if they are to be accessed and analysed by the wide research community
There is a general consensus that there is a need for a public repository for microarray data
It is much less clear what exactly should be stored in such a repository
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
A gene expression database from the data analyst’s point of view
SamplesG
enes
Gene expression levels
Sample annotations
Gene annotations
Gene expression matrix
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
Three parts of a gene expression database Gene annotation – can be given by links to
gene sequence databases and GO (function,process,cell compartment) – not perfect but lets not worry about it
Sample annotation – we do not have any external databases for sample description (except species taxonomy) – problem 1
Gene expression matrix – what are the measurement units for gene expression levels? – problem 2
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
Problem/consideration 1 – sample annotation
Gene expression data only have meaning in the context of detailed sample descriptions
If the data is going to be interpreted by independent parties, sample information has to be searchable and in the database
Controlled vocabularies and ontologies (species, cell types, compound nomenclature, treatments, etc) are needed for unambiguous sample description
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
Sample annotation- what can be done? Few cv’s and ontologies for sample
description are available (species taxonomy, model organisms)
Some use of free text descriptions are unavoidable (curation workload)
Existing efforts of creating such ontologies should be coordinated (MGED ontology working group)
Use existing ontologies and cv’s wherever possible
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
Problem 2 – the lack of gene expression measurement units
What we would like to have– gene expression levels expressed in
some standard units (e.g. molecules per cell)
– reliability measure associated with each value (e.g. standard deviation)
What have we got– each experiment using different units– no reliability information
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
Comparing expression data
cm inc
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
Comparing expression data
? ?
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
Comparing expression data
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
What to do in the absence of standard measurement
units? Record raw, intermediate and final
analysis data together with the detailed annotation of how the analysis has been performed
This effectively passes on the responsibility about interpreting the final analysis data to the user
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
Raw data
Array scans
Ge
nes
Samples
Gene expressiondata
Gene exp. levels
Three levels of microarray data processing
Sp
ots
Quantitations
Quantitationmatrices
Spot quantitations
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
Measurement units
In perspective:– standard controls for experiments (on chips
and in the samples) should be introduced– replicate measurements will become a norm
Temporary solution:– storing intermediate analysis results (including
the images) and annotations of how they were obtained
– Standards within experiments themselves (standard controls and protocols)
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
Standards for microarray data
Standards are needed to build a well organised microarray database
– Standards for annotation– Standards for data exchange– Standards for controls in the experiment
and data normalisation
www.dnachip.org/mged/normalization.html
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
How to create microarray data standards
1. To understand thoroughly what is the minimum information about a microarray experiment that is needed to interpret it unambiguously and what is the structure of this information (objects and relationships)
2. To create the technical data format able to capture this information
3. Finding appropriate controlled vocabularies
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
Standardisation of microarray data and annotations -MGED
group
The goal of the group is to facilitate the adoption of standards for DNA-array experiment annotation and data representation, as well as the introduction of standard experimental controls and data normalisation methods. Includes most of the worlds largest microarray laboratories and companies (TIGR,Affymetrix Stanford,Sanger,Agilent etc)
www.mged.org
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
MGED MGED 2 meeting in Heidelberg in 2000,
MGED 3 in Stanford in 2001, both ~ 300 participants
Minimum Information About a Microarray Experiment – MIAME version 1.0 posted
Collaboration with OMG on data formats MAML+GEML = MAGE-ML and MAGE-OM
MGED 4 meeting in February 2001, Boston MGED will become an ISCB Special Interest
Group
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
MIAME – Minimum Information About a Microarray Experiment
PublicationExternal links
6 parts of a microarray experiment
www.mged.org
Hybridisation ArrayGene
(e.g., EMBL)Sample
Source(e.g., Taxonomy)
Data
Experiment
Normalisation
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
sample source and treatment ID as used in section 1organism (NCBI taxonomy)additional "qualifier, value, source" list; the list includes:
cell source - provider type (if derived from primary sources (s))sexagegrowth conditionsdevelopment stageorganism part (tissue)animal/plant strain or linegenetic variation (e.g., gene knockout, transgenic variation)individualindividual genetic characteristics (e.g., disease alleles, polymorphisms)disease state or normaltarget cell typecell line and source (if applicable)in vivo treatments (organism or individual treatments)in vitro treatments (cell culture conditions)treatment type (e.g., small molecule, heat shock, cold shock, food deprivation)compoundis additional clinical information available (link)separation technique (e.g., none, trimming, microdissection, FACS)
laboratory protocol for sample treatment……
MIAME Section on Sample Source and Treatment
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
What is an ontology? An ontology is a specification of
concepts that includes the relationships between those concepts.
Provides semantics and constraints Allows for computational inferences and
reliable comparisons
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
MGED Biomaterial Ontology Under construction by Chris Stoeckert
– Using OILed (may use others) Motivated by MIAME and coordinated
with the database model Extend classes, provide constraints,
define terms, provide terms to use,develop cv’s for submissions (EBI)
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
Use case scenarioOWG Use Cases
• Return a summary of all experiments that use a specified type of biosource.– Group the experiments according to treatment.
• Return a summary of all experiments done examining effects of a specified treatment– Group the experiments according to biosource.
• Return a summary of all experiments measuring the expression of a specified gene.– Indicate when experiments confirm results, provide new
information, or conflict.
• Generate a distance metric for experiment types• Generate an error estimation for experimental
descriptions
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
Ontology Example
Concept=Age def=in standard units referenced to an identifiable time point from (class) developmental stage
Age=6 {units=days}, {dev_stage}=dauer Hierarchy=Dev_stage->larva->dauer
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
Excerpts from a Sample Descriptioncourtesy of M. Hoffman, S. Schmidtke, Lion BioSciences
Organism: mus musculus [ NCBI taxonomy browser ]Cell source: in-house bred mice (contact: [email protected]) Sex: female [ MGED ]Age: 3 - 4 weeks after birth [ MGED ]Growth conditions: normal
controlled environment20 - 22 oC average temperaturehoused in cages according to EU legislationspecified pathogen free conditions (SPF)14 hours light cycle10 hours dark cycle
Developmental stage: stage 28 (juvenile (young) mice)) [ GXD "Mouse Anatomical Dictionary" ]Organism part: thymus [ GXD "Mouse Anatomical Dictionary" ]Strain or line: C57BL/6 [International Committee on Standardized Genetic Nomenclature for Mice]Genetic Variation: Inbr (J) 150. Origin: substrains 6 and 10 were separated prior to 1937. This substrain is now probably the most widely used of all inbred strains. Substrain 6 and 10 differ at the H9, Igh2 and Lv loci. Maint. by J,N, Ola. [International Committee on Standardized Genetic Nomenclature for Mice ]Treatment: in vivo [MGED] intraperitoneal injection of Dexamethasone into mice, 10 microgram per 25 g bodyweight of the mouseCompound: drug [MGED] synthetic glucocorticoid Dexamethasone, dissolved in PBS
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
ArrayExpress conceptual model
PublicationExternal links
Hybridisation ArraySampleSource
(e.g., Taxonomy)
Experiment
Normalisation
Gene(e.g., EMBL)
Data
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
ArrayExpress object model
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
ArrayExpress – the state of the art
ArrayExpress Object model supporting MIAME requirements developed
Data model implemented in Oracle Data loader from MAML file format Expression Profiler – data analysis tool
already available
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
ArrayExpress – plans and schedule
EU grant – new staff being recruited A web based query interface - under
development A web based submission tool – under test Participation in OMG – MAGE-OM & MAGE-
ML MAGE-ML will replace MAML in October Full scale database operation expected to start
at the beginning of 2002 Expression Profiler to link to ArrayExpress
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
Microarray data analysis
Expression Profiler – a web based gene expression data analysis tool: www.ebi.ac.uk/microarray/
EPCLUST(cluster Expression profiles)
GENOMESsequence, function,
annotation
SPEXS(Sequence Pattern Exhaustive Search)
novel patterns
URLMAP:provide links
Expression Profiler - web based tool for microarray data analysishttp://www.ebi.ac.uk/microarray/
Expression data
External data, toolspathways, function,
etc.
PATMATCHknown patterns
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
Conclusions Microarray standardisation is a challenge
and an imperative Join MGED to contribute to this process
www.mged.org Participate in the development of ontologies
and controlled vocabularies Send me your protocols Make your data available Feedback on MIAME, it’s up for discussion
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
Acknowledgments
Microarray Informatics Team, EBI Alvis Brazma, Katja Kivinen, Helen Parkinson, Olga Perez,Johan Rung, Ugis Sarkans,Thomas Schlitt, Mohammad Shojatalab, Lev Soinov, Koichi Tazaki, Jaak Vilo
Industry Support team, EBI Alan Robinson
MGED steering committee MIAME working group Chris Stoeckert, U. Penn. and MGED
EMBL Outstation — The European Bioinformatics InstituteEMBL Outstation — The European Bioinformatics Institute
Useful URL’s
www.mged.org www.tigr.org
www.ebi.ac.uk/array www.geneontology.org www.hgmp.mrc.ac.uk
www.dnachip.org/mged/normalization.html [email protected]