Upload
anjelita-ortiz
View
52
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Macromolecular Structure Middleware. OpenMMS An Ontology Driven Architecture. Overview. The mmCIF Ontology OpenMMS Toolkit Macromolecular Structure (MMS) Metamodel Parser, XML SQL / Corba Servers and Clients Corba UML and the future. How do we “Enable” Science?. - PowerPoint PPT Presentation
Citation preview
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
Macromolecular Structure Middleware
OpenMMS
An Ontology Driven Architecture
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
Overview
The mmCIF Ontology OpenMMS Toolkit
Macromolecular Structure (MMS) Metamodel Parser, XML SQL / Corba Servers and Clients
Corba UML and the future...
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
How do we “Enable” Science?
Promote well defined Macromolecular Structure (MMS) Specifications
Distribution – Open Interfaces– Now:
• flat files• W3 browsing and searching
– Future: • XML, SQL, CORBA
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
Why OpenMMS? Allow programmers to more easily create
efficient, high performance and robust applications.
A Java-only toolkit with that creates XML, CORBA and Relational DB representations of the mmCIF Macromolecular Structure Data.
Source code is publicly available so users can easily modify the metamodel or create an entirely new one.
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
What Do We Mean by an Ontology Driven Architecture?
What do we mean by an Ontology?
A bridge between Our World of Natural Languageand the World of Machines.
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
mmCIF Dictionary and Data Files Based on Ontology for Macromolecular
Structure defined by the International Union of Crystallography
Replaces the older 80-Column PDB files mmCIF Dictionary contains over 140 Category
and 1600 Item definitions Open, Extensible Provides a well-defined reference standard for
data distribution
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
OpenMMS Toolkit Data Flow
Applications
mmCIF Data Files(Reference Standard)
CorbaServer
Relational
Database
mmCIFParsers
XML Files
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
Metamodel Information Flow
mmCIF Dictionary
Metamodel Framework
Corba IDL, SQL Schema,XML DTD,
Java Data LoadersJDBC Loaders
mmCIF OntologyMetamodel
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
What can OpenMMS do? PDBase program will load any or all PDB files
into any SQL-92 compatible database (Oracle, mySQL, Sybase...)
Translate any PDB file into an XML file. Contains Two Corba servers:
– Reference server will cache and serve data read from PDB flat files.
– DB server will cache and serve data read from a SQL database (very quickly...)
All Source code written in Java and publicly available.
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
Some Advantages of Using an Ontology Driven Architecture
Scales to very large Ontologies More reliable and maintainable code Transfer between representations Scientific Correctness of representation Help in maintaining backward
compatibility
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
How does one actually represent an ontology?(OpenMMS Internal Metamodel Overview)
Root
Module Module
Interface
Field
Struct Struct
Struct
Field
VisitorAbstract Class
VisitorSubclass
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
mmCIF Parsers
General Purpose, Low-level access to data
Parsers available in many languages OpenMMS toolkit includes Java Parser
– Uses “Builder” Design Pattern– An application subclasses Abstract
Builder class and stores data into its data structures
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
MMS in XML Large Flat Files (open and close tags) Tables can be grouped by rows or columns XML from SQL Query
– Many requests from Web browsers don’t really need or want all the data
– SW available from DB Vendors and ISVs for creating XML files from SQL result sets
– Smaller files load faster
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
Relational DB Expression SQL-92 Compatible Schemas for all the standard DB vendors Fast and Flexible Keyword searches PDBase loader allows structures to be
selectively loaded Oracle Instance Tested
– 14,556 Structures– 16GB, 88 Million Atom Records
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
A very high-level (and very-rough) classification of communication
Person-to-Person communication– email
Person-to-Machine communication– HTTP/HTML
Machine-to-Machine communication– CORBA, SQL, .NET, Soap
Not Communications -> Data Formats– XML, mmCIF (STAR), many more …
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
What is CORBA?
Common Object Request Broker Architecture
Defines a family of open software interface specifications for distributed object computing.
http://www.omg.org
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
What is an Object? “A Data Structure with an Attitude”
Programs = Algorithms + Data Structure
Object Oriented Programming Principle: Partition the parts of algorithms with the
data structures they use
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
Side View of a Distributed Application
ClientE.g. a
Java Applet
ServerE.g. Mainframe
Computer Server
Internet (TCP/IP)
MiddleWare
MiddleWare
Network
IDL IDL
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
The “Hourglass” view of the Internet
Unreliable Datagrams
Reliable Bitsteam
Applications
TCP, RTP,...
IP
Copper, GlassRadio Spectrum
HTTP, Corba, .NET OO High-Level Interface
(ATM, Ethernet, V.90, SONET...)
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
Where is Corba?
Inside every Java Runtime Environment. Commonly used in middle tier and backend
(e.g. database) connections. Open Source and Commercial
Implementations Available Usually buried deep inside the software
– Difficult or impossible to tell when it is being used
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
What is Distributed Object Computing?
Extends the benefits of object-oriented technology across process and machine boundaries to encompass entire networks.
Attempts to make remote objects appear to programmers as if they were local objects in the same process. This is called location transparency.
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
Advantages of Distributed Object Computing
Easier (and faster) for programmers to create distributed applications
Increases Reliability Increases Maintainability Increases Portability Increases Extensibility
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
The Alphabet Soup
OMG = Object Management GroupConsortium of 800+ companies founded in 1989.
IDL = Interface Definition Language
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
The key is to focus on boundaries, interfaces, how things fit together
Not on the internal Not on the internal details of how they’re details of how they’re built; assume that will built; assume that will be diverse & be diverse & changingchanging
Shape of boundaryShape of boundaryis defined in IDLis defined in IDL
Boundaries, Interfaces
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
The Interface to an The Interface to an object can be object can be distributed over a distributed over a network network
The glue that binds partsThe glue that binds partstogether is the ORBtogether is the ORB
Shape of boundaryShape of boundaryis defined in IDLis defined in IDL
Boundaries, Interfaces
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
Corba Independence
Open Standard for Distributed Object Oriented Design
Independent of Hardware Platform Independent of Operating System Independent of Programming Language Independent of Object Location
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
Object Request BrokerObject Request Broker
ClientClient ObjectObjectIDLLIDLIDL
ORBs mediate between objects and things that use them (clients)
Object Request BrokerObject Request Broker
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
Terminology
IIOP– The Internet Inter-ORB Protocol, defined in
the Spec as a vendor-independent, wire-level network protocol on top of TCP/IP. This allows ORB implementations of different vendors to interoperate.
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
ORB ORB
ORB
Java PerlC++ C Ada Java
VB ActiveX
Corba / IIOP—Internet Inter-ORB Protocol
ORBs: Medium for Integration
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
Corba Facilities:Industry Standards in Vertical Markets
Manufacturing Finance Life Sciences Research C4I Many others...
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
Using Corba to accessMacromolecular Structure Data
No Parsing of Flat Files Direct Access to Binary Data Structures Strongly Typed Data Granularity of Access Indices and Presence Flags Pre-computed Highest Performance
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
OMG/LSR Macromolecular Structure Adoption Process
August 1999 RFP issued March 2000 Initial Submission September 2000 Revised Submission February 2001 Adopted Spec by the OMG 4Q 2001 OpenMMS LSR/MMS1.0
compliant implementationsource code publicly available
February 2002 Approved as a Formal OMGAvailable Specification.
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
Using the CORBA MMS Server
An excerpt from legacy PDB Formatted File ATOM Record (4hhb.ent)...ATOM 6 CG1 VAL A 1 7.009 20.127 5.418 6.00 61.79 ...ATOM 7 CG2 VAL A 1 5.246 18.533 5.681 6.00 80.12 ...ATOM 8 N LEU A 2 9.096 18.040 3.857 7.00 26.44 ...ATOM 9 CA LEU A 2 10.600 17.889 4.283 6.00 26.32 ...ATOM 10 C LEU A 2 11.265 19.184 5.297 6.00 32.96 ...ATOM 11 O LEU A 2 10.813 20.177 4.647 8.00 31.90 ...ATOM 12 CB LEU A 2 11.099 18.007 2.815 6.00 29.23 ...ATOM 13 CG LEU A 2 11.322 16.956 1.934 6.00 37.71 ...ATOM 14 CD1 LEU A 2 11.468 15.596 2.337 6.00 39.10 ...ATOM 15 CD2 LEU A 2 11.423 17.268 .300 6.00 37.47 ......
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
LSR/MMS “ATOM Record”
struct AtomSite { string id; IndexId type_symbol; AtomIndex label; IndexId label_entity; VectorXYZ cartn; float occupancy; float b_iso_or_equiv; };
DsLSRMacromolecularStructure.idl excerpt:
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
Example Code and Resulting Output
Entry e = entryFactory.get_entry_from_id(”4hhb");AtomSite[] a = e.get_atom_site_list();for (int i = 0; i < a.length; i++) { System.out.println(a[i].id + " " + a[i].type_symbol.id + " (" + a[i].cartn.x + ", " + a[i].cartn.y + ", " + a[i].cartn.z + ")"); }
produces:
1 N (11.065, 7.352, 9.598)2 C (12.436, 7.764, 9.902)3 C (12.883, 7.09, 11.208)4 O (12.088, 7.0, 12.147)5 C (12.611, 9.264, 10.06)...
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
What are the alternatives to Corba?
TCP/IP Sockets - Byte stream
DCOM, COM++, OLE, .NET (Microsoft Only)– DCOM Corba Bridges are available from
several vendors
SOAP (Simple Object Access Protocol) – XML Based
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
Unified Modeling Language – UMLWhat do all those arrows and boxes Mean?
Schematic Language for Defining SW Graphics Representations UML = Things, Relations and Diagrams 9 types of Diagrams The most commonly used diagram is the
“Class Diagram”
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
UML Class Diagram Example
get_version()
get_entry_id_list()
get_entry_modification_dates()
native_formats_supported()
get_native_entry_representation()
EntryFactory
EntryIdList * EntryId
Identifier
ModificationDateList
Entry_id : EntryIddate: TimeBase::TimeT
ModificationDate*
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
UML Class Diagram Basics
method1()
method2()
method3()
Class_Name
var1: Type
var2: Type
Underlined for Class Instances, Italics for Abstract Classes
Variables
Methods
Details may be omitted if not important
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
UML Relationships
*
*0..1
Dependency
Association
Generalization (Inheritance)
Aggregation
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
UML Example
get_version()
get_entry_id_list()
get_entry_modification_dates()
native_formats_supported()
get_native_entry_representation()
EntryFactory
EntryIdList * EntryId
Identifier
ModificationDateList
Entry_id : EntryIdDate : TimeBase::TimeT
ModificationDate*
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
XMI: XML Metadata Interchange UML is a graphical representation; need
some way to exchange UML models between applications
XMI is used to store and transmit UML models
XML based Defines XML tags for classes,
relationships between classes etc.
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
OMG MDA
Platform Independent Models (PIMs) that define the interface are defined in UML
The PIMs are translated to Platform Specific Models (PSMs) such as Corba, SOAP, .NET or XML Schemas
The Corba servers and clients may be the same, but now the interface is defined in UML and the IDL is then generated from the UML
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
MDA Platform Independent toPlatform Dependent Translation
UML
Corba
SOAP XML
.NET
Research Collaboratory for Structural Bioinformaticshttp://openmms.sdsc.edu
Thanks and Acknowledgments
Phil Bourne John Westbrook David Benton
Karl Konnerth Lynn TenEyck