Upload
joy-sabina-conley
View
239
Download
4
Tags:
Embed Size (px)
Citation preview
caArray: Cancer Array InformaticsOpen Source Tools for Microarray Data Management, Analysis and Annotation
http://caarray.nci.nih.gov/
caArray overview & demoMervi Heiskanen (15 min)
caArray architectureScott Gustafson (15 min)
webCGH overview & demoDavid Hall (15 min)
2
1. Data Portal: Promotes data sharing, - submission of original, raw data files with associated experiment and sample information.
2. Data analysis and visualization tools: webCGH (NCICB/RTI), XpressionWay (NCICB/SAIC) caBIG tools:
1. caWorkbench - Columbia2. DWD - UNC Lineberger3. GenePattern - MIT/Broad ?4. Magellan - UC San Francisco5. VISDA – Georgetown6. Cancer Molecular Pages – Burnham7. Function Express – Wash U Siteman8. GoMiner –NCI/CCR
caArray Data Portal &Data Analysis Tools
3
caArray version 1.0
Key features:1. MIAME 1.1 compliant data annotation forms2. Support for Affymetrix and GenePix native files3. MAGE-ML import and export4. controlled vocabularies (MGED ontology)5. access to data via MAGE-OM API
caArray installations: 1. NCICB caArray instance supports NCI funded programs.2. Local installations at the cancer centers:
caBIG funded caArray adopters (Lombardi, Wistar, NYU)
4
caArray listservs:
1. caArray developers 2. caArray users3. caArray team
5
caArray: Compliance with Standardization Efforts MIAME
Minimum Information About a Microarray Experiment 1.1 Draft 6 (April 1, 2002) http://www.mged.org/Workgroups/MIAME/miame_1.1.html
MAGE-ML MicroArray and GeneExpression Object Model and Markup
Language 1.1 (October 2003) http://www.omg.org/docs/formal/03-10-01.pdf
MGED Ontology Microarray Gene Expression Data Ontology 1.1.8 (April 2004) http://mged.sourceforge.net/ontologies/MGEDontology.php
caBIG compatibility guidelines http://cabig.nci.nih.gov/guidelines_documentation/caBIG_Compatibility_Document
6
7
8
9
10
class CellLineDatabasenamespace:
http://mged.sourceforge.net/ontologies/MGEDOntology.daml#
documentation:Database of cell line information.
type: primitive
superclasses: Database
used in classes: CellLine
used in individuals: ATCC_CulturesCABRI_Human_and_Animal_Cell_lines
class TechnologyTypenamespace:
http://mged.sourceforge.net/ontologies/MGEDOntology.daml#
documentation:The technology type or platform of the reporters on the array.
type: primitive
superclasses: ArrayDesignPackage
used in classes: FeatureGroup
used in individuals: in_situ_oligo_featuresspotted_antibody_featuresspotted_colony_featuresspotted_ds_DNA_featuresspotted_protein_featuresspotted_ss_oligo_features
12
13
14
15
16
17
18
19
20
caArray Phase 2caArray 1.2 (June 2005)
•Support for additional file formats via a software toolkit•Public search without login•Copy bio sample information
caArray 1.5 (September 2005)•XpressionWay, pathway visualization tool•Integration with caDSR 3.0
caArray 1.7 (December 2005)•Store filtered and normalized data•User management user interface
caArray 2.0 (March 2006)•Embedded MAGE-ML validation
All releases:Defect fixes and usabilityenhancements
Acknowledgements
NCICB
Sue Dubman, Mervi Heiskanen, Xioapeng Bian, Subha Madhavan, Carl Schaefer, Gilberto Fragoso, Denise Warzel…
and Ken Buetow
NCICB/SAIC
Development team:Hangjiong ChenScott GustafsonJuergen LorenzJohn MoySumeet MujuBeth NeubergerPhu TranJim ZhouQA: Durga AddepalliAndrew ShinoharaYe Wu
NCICB/TerpSysDon Swan, Jamie Keller
Research Triangle InstituteDavid Hall (webCGH)
22
caARRAY’s Architecture
Credits toSumeet MujuPhu Tran
23
caArray ArchitectureTOMCAT WEBCONTAINER
MAGE-MLExperiment and ArrayDesign
BROWSER
FTP APPLETNATIVE DATAFILE
FTP STAGING AREA
DATATRANSFER
OBJECT(DTO)
SERVLET
JSP ST
RU
TS
EJB CONTAINER
VOCABMGR EJB
SECURITYMGR EJB
VOCABINTERFACE
SECURITYOBJECTS
OBJECTRELATIONAL
BRIDGE(OJB)
caARRAYDB
SECURITYDB
NETCDF API
MAGE-MLIMPORTER MDB
FILE UPLOADERMDB
caCORE------------
caBIOcaDSR
EVS
MAGE-OM APIJAR
MAGE-OMOBJECTS
MAGE-OMRMI MGR
NETCDF API
MAGE-OMPERSISTENCE
PROTOCOLMGR EJB
EXPERIMENTMGR EJB
OTHERMGR EJB
MAGEMANAGER
MA
GE
-ST
K(
MA
GE
OB
JE
CT
S )
FILE SHARE
24
caArray Interfaces: caArray EJB API
caArrayEJB API: Provides transaction control, asynchronous processes,service location, common security and distributed capabilities for submission and retrieval of Microarray Experiments. The caArray presentation layer utilizes the
above functionality via the caArrayEJB API. Data Transfer Objects (DTOs) utilized to
transfer data between calling application and the EJBs.
APIs can be used for federated access and submission of transaction data.
25
caArray Interfaces: Mage-OM API MAGE-OM API :Provides fine grain search
and retrieval of all caArray data via a caBIO-like RMI based API. The MAGE-OM API maps the MAGE objects to
the new caArray database schema. RMI Security module incorporated for
user/group level data access. NetCDF API logic incorporated for faster
retrieval of data Built to be grid enabled
caArray Middleware Data Representation
Data Transfer Objects (DTO) MicroArray Gene Expression Software Toolkit (MAGE-stk) DTO - MAGE-stk Conversion
Data Persistence Data Access Layer
ObJectRelationalBridge (OJB) OJB Abstraction Layer and Data Access Objects (DAO)
EJB Layer Stateless Session Façade Bean-managed Persistence
NETCDF Files Large Data Set Fast Binary Access
MAGE-ML Import and Export Message-Driven Beans
<MAGE-ML identifier="gov.nih.nci.ncicb.caarray:MAGEML:123:1"> <AuditAndSecurity_package> <Contact_assnlist> <Person identifier="gov.nih.nci.ncicb.caarray:Person:456:1" lastName="Doe" firstName="John"> </Person> <Contact_assnlist> </AuditAndSecurity_package> <Experiment_package> <Experiment_assnlist> <Experiment identifier="gov.nih.nci.ncicb.caarray:Experiment:789:1" name=“Sample Experiment"> <Descriptions_assnlist> <Description text="This is a sample experiment."></Description> </Descriptions_assnlist> <Providers_assnreflist> <Person_ref identifier="gov.nih.nci.ncicb.caarray:Person:456:1"/> </Providers_assnreflist> </Experiment> </Experiment_assnlist> </Experiment_package></MAGE-ML>
MAGE-ML Import and Export: An Example
Identifiable element
Referenced Identifiable element to be resolved
MAGE-ML Import and Export
Modified from the MAGE-stk’s MAGE-ML SAX-based parser to include a persistence mechanism to insert, update and resolve (look up) parsed objects
Any valid MAGE-ML can be imported. MAGE-ML is assumed valid. Validation is typically done using ArrayExpress’s MAGEValidator
Identifiable objects are first resolved from database by matching their identifier, and if resolved the in-coming object is updated against the existing one
Identifier represents the globally unique key of a MAGE object across domains for its entire lifecycle
Identifier is separate from persisted MAGE-stk object’s primary key which is only internal to caARRAY
29
MAGE-ML Export
The entire object graph of an object, e.g., ArrayDesign, Experiment, is traversed to collect all Identifiable objects
The MAGE-stk’s MAGEJava object is utilized to contain all the Identifiable objects collected
When an Identifiable object is encountered, the appropriate method in the MAGEJava object is discovered and invoked using reflection to store the object into it
Ultimately MAGEJava.writeMAGEML(Writer) is invoked to recursively invoke the same method of all the contained Identifiable objects.
Xerces’s XMLSerializer pretty-formats the XML content as it is being written with appropriate new lines and indentations
30
A caArray Configuration
NCICB
caArray 1
caBIO
caDSR / EVS
Security
caBIO
caDSR / EVS
NCICB Security
caWorkbench
caWorkbench
caArrayschema
caArrayschema
MAGE-OM API
MAGE-OM API
MAGE-ML GRID(future)
caARRAY EJB
caARRAY EJB
JAVAAPP
31
webCGHA web application for the visualization and analysis of array-based CGH and gene expression data
David Hall, Ph.D.Research Triangle Institute
32
arrayCGH
33
webCGH Functions
Visualization of copy number and gene expression levels
Interrogation of genome features Data normalization and analysis Virtual experiments
34
Whole-genome View
35
Ideograms
36
Chromosome 17
37
Chromosome 17
38
Zoom
39
Annotated Genes
40
Gene List
41
Gene Watch
42
Data Flow
Database
Transformer
CacheAnalytical Pipeline
Plot Generator
Database
Adaptor Adaptor
Op Op Op Op
43
Analytical Pipelines
44
Architecture
caArray
Cloudscape
POJOs
StrutsJSPsDAO
Cache
Web Container (Tomcat)
Client
(HTML, SVG)
45
Key Design Features
+perform(in data)+validate(in data)
«interface»AnalyticOperation
«interface»FilterOperation
«interface»NormalizationOperation
«interface»SummaryStatisticalOperation
46
Key Design Features
«interface»DaoFactory
«interface»Authenticator
«interface»ArrayExperimentDao
«interface»UserProfile
«interface»AnnotationDao
creates creates creates
creates usesuses
47
Past, Present, Future
Dec. 2003 – Version 1.0 Basic plots, analytics, GEDP
March 2005 – Version 2.0 More plots, analytics, caArray
Late April 2005 – Version 2.1 Mouse/human plots CGH/gene expression SKY/M-FISH&CGH integration
48
webCGH Team NCICB
Mervi Heiskanen RTI
David Hall Vesselina Bakalov Ying Chen Matt Westlake Bing Liu Laxminarayana Ganapathi Sheping Li Stuart Allen