1
ArrayExpress
Ugis Sarkans, EBI
2
Overview• Underlying standards
– MIAME– MAGE*
• Data submission• Data access
– annotations– actual data– array design descriptions
• Some technical details• Future developments
3
What information should be exchanged?
• MIAME - Minimum Information About a Microarray Experiment– informal specification– paper published in Nature Genetics– goal - to initiate discussion:
• which details are important and which may not be
– ArrayExpress can store MIAME data (and more)
4
MAGE-OM
• MAGE-OM: MicroArray Gene Expression Object Model– in January 2002 became an “adopted” OMG
specification– January to August 2002 - finalization process– in September became an “available” specification– should be set in stone for the next 2 years– thinking about MAGE v2 started
• user feedback• support for other types of functional genomics data• more precise handling of data manipulation
5BioEvent
Experiment
ArrayDesign
BioMaterialBioAssayData
BioAssay
DesignElement
UML Packages of MAGE
HigherLevelAnalysis
BioSequence
Array QuantitationType
DescriptionProtocol
MeasurementAuditAndSecurity
BQS
what was used what was done results
miscellaneous
6
MAGE-ML
• MAGE-ML: MicroArray Gene Expression Markup Language– generated from MAGE-OM, therefore
evolved automatically– translation from Jan 2002 to Sep 2002
DTD quite easy
7
ArrayExpress: data• currently - 9 experiments, 4 array designs:
– from EMBL - human, yeast– from Sanger - pombe
• coming:– array descriptions: Affymetrix, Agilent– labs: TIGR, Utrecht, more from Sanger, ...– export from existing DBs: SMD, RAD– tools - MAGE-ML export: Jexpress, BASE, ...– ILSI project
• journal requirements: Nature, Lancet, ...
8
Help with MAGE-ML: MAGEstk
• MAGE-ML - the only way of getting data into ArrayExpress
• MAGEstk: MicroArray Gene Expression Software ToolKit– Jamboree IV in Stanford, beginning of
December– used in MIAMExpress (MAGE-ML export)
9
MAGEstk
• Programming APIs• Mapping of MAGE-OM to language-
specific OMs• API’s are automatically generated from
the OM specifications– get/set methods for associations– get/set methods for attributes
• XML <=> language-specific OM marshallers/unmarshallers - also automatically generated
10
MAGEstk (cont.)
• Use opensource/standard modules/packages– Xerces, JDBC, etc.
• Implementation in Java, C++, Perl, Python
• database access modules on top of these APIs– Postgres schema– DB access layer
• annotation tools - planned
11
ArrayExpress data retrieval
• main objective - help in finding and initial exploration of data; download for detailed analysis
• data repository (now) + data warehouse (in development)
12
Array Design- accession
- name
Protocol- accession
Experiment- accession
Organisation- name
Array
Species Sample
Hybridisation
ExperimentDesign
ExperimentType
ExperimentalFactor
Person- last name
Protocol Type
Queries - logical structure
13
Query form
14
Annotation browsing
15
Data representation
spots
measurements
BioAssays (hybridizations, data transformations)
QuantitationTypes (signal intensity, ratio etc.)
DesignElements (spots, genes)
in MAGE/ArrayExpress
in Expression Profiler
16
Exporting data to Expression Profiler
BioAssays (hybridizations, data transformations)
QuantitationTypes (signal intensity, ratio etc.)
DesignElements (spots)
BioAssayData1
BioAssayData2
select BioAssayData cubes
select QuantitationTypes
select BioAssaysDesignElements
(QT,BA) pairs
17
Data export form
18
Array representation - ADF format
19
Experiment plan display
20
ArrayExpress(Oracle + Tomcat)
OtherMicroarraydatabases
www
EBI
ExpressionProfiler
ExternalBioinformatics
databases
Data analysis
www
Queries
www
MIAMExpress(MySQL)
MAGE-ML
Submissions
Array Manufacturers
LIMS
Microarray
software
Data Analysissoftware
ArrayExpress Infrastructure
MAGE-ML import,
export
Local MIAMExpressInstallations
Data
pipelines
MAGE-ML
21
Tomcat
ArrayExpress architecture
ArrayExpress(Oracle)
MAGE-ML(DTD)
MAGE-OMMAGE-ML (doc)MAGE-ML (doc)MAGE-ML (doc)
MAGEloader
Velocitytemplateengine
Castor
object/relationalmapping
Web pagetemplateWeb pagetemplate
Java servlets
MAGEvalidator
MAGEunloader
error.log
22
ArrayExpress: other technical details
• Data matrices - stored in NetCDF format:– binary format for efficient storage of
multidimensional array
• Arrays - stored as ADF spreadsheets (in addition to normal MAGE structures)
23
In development
• Immediate:– interface efficiency improvements– BioAssays - graphical display– better integration with Expression Profiler
• Medium-term:– user management
• non-public data (e.g., for reviewers)
– MAGE-ML export
• Curation tool
24
ratio absolute change
confidence measure
namedesign element type
speciessample type
bioassay type
performer labexper. type
array design name
platform type
provider
Properties
Properties
Properties
Properties Properties
Data warehouse - for gene- and data-driven queries
namebiological entity type
25
Microarray Informatics team at EBIAlvis Brazma - group leader
ArrayExpress Curation MIAMExpress
•Ugis Sarkans
•Gonzalo Garcia •Helen Parkinson •Mohammadreza Shojatalab
Expression Profiler
•Jaak Vilo
Research, students•Thomas Schlitt•Katja Kivinen•Johan Rung•Patrick Kemmeren
•Misha Kapushesky•Lev Soinov
•Koichi Tazaki
•Anastasia Samsonova
•Susanna Sansone
•Philippe Rocca-Serra
•Ele Holloway
•Niran Abeyguna- wardena
•Ahmet Oezcimen
•Gaurab Mukherjee •Sergio Contrino
•Anjan Sharma
•Aurora Torrente