Upload
melinda-merritt
View
214
Download
1
Tags:
Embed Size (px)
Citation preview
Peter CornillonPeter CornillonGraduate School of OceanographyGraduate School of Oceanography
University of Rhode IslandUniversity of Rhode Island
Presented at the NSF Sponsored Presented at the NSF Sponsored Cyberinfrastructure MeetingCyberinfrastructure Meeting
31 October 200231 October 2002
OPeNDAP: Accessing Data in a Distributed, Heterogeneous
Environment
OutlineOutline
DODS NVODS & OPeNDAP
Interoperability: The Core Infrastructure
How OPeNDAP is being used
Lessons learned – also throughout
Distributed Oceanographic Data System (DODS)Distributed Oceanographic Data System (DODS)
Conceived in 1992 at a workshop held at URI.
Objectives were:– to facilitate access to PI held data as well as data
held in national archives and– to allow the data user to analyze data using the
application package with which he or she is the most familiar.
Basic system designed and implemented in 1993-1994 by Gallagher and Flierl
Distributed Oceanographic Data SystemDistributed Oceanographic Data System
DODS consisted of two fundamental parts:
a discipline independent core infrastructure for moving data on the net,
a discipline specific portion related to data –population, location, specialized clients, etc.
DODS DODS OPeNDAP & NVODS OPeNDAP & NVODS
To isolate the discipline independent part of the system from the discipline specific part, two entities have been formed:
Open Source Project for a Network Data Access Protocol (OPeNDAP)
National Virtual Ocean Data System (NVODS)
DODS NVODS/OPeNDAP
OPeNDAP was formed to maintain and evolve the DODS core infrastructure
OPeNDAP is a non-profit corporation
OPeNDAP focuses on the discipline neutral parts of the DODS data access protocol
Objective of OPeNDAP
To provide a data access protocol allowing for machine-to-machine interoperability with semantic meaning in a distributed, heterogeneous data environment
The scripted exchange of data between computers, without human intervention.
Considerations with regard to the development OPeNDAP
Many data providers
Many data formats
Many different semantic representations of the data
Many different client types
Interoperability - Metadata
The degree to which machine-to-machine interoperability is achieved depends on the metadata associated with the data.
Metadata Types
We define two classes of metadata:
• Use metadata –needed to actually use the data.
• Search metadata – used to locate data sets of interest in a distributed data system.
Use Metadata
We divide use metadata into two classes:
• Syntactic use metadata
• Semantic use metadata
Syntactic Use Metadata
Information about the data types and structures at the computer level - the syntax of the data;
– e.g., variable T represents a 20x40 element floating point array.
Semantic Use Metadata
Information about the contents of the data set.
e.g., variable T represents • sea surface temperature • with units of ºC
Semantic Use Metadata
We divide semantic use metadata into two classes:
• Descriptive Semantic Use Metadata
• Translational Semantic Use Metadata
Translational Semantic Use Metadata
Metadata required to make use of the data; e.g., to properly label a plot of the data
Define the translation from received values to semantically meaningful values
Examples
• Units of the data: 0.125 C+4 C
• Variable names in the data set: t SST
• Missing value flags: -99 missing value
Interoperability – Data Exchange
Interoperability may be defined at any one of a number of levels ranging from:
the lowest (hardware) - how computers are linked electronically, to
the highest – semantically meaningful, machine-to-machine exchanges.
Organizational Complexity
Example: Consider the different ways of organizing a multi-year data set consisting of one global sea surface temperature (SST) field per day:
one 2-d file per day sst(lat,lon) - URI
one 3-d file sst(lon,lat,time) - PMEL
one file per year with one variable per day 365 variables per file, n files for n year - GSFC
Structure Layer
Provide the capability to reorganize data so that they are in a consistent structural form.
Objective is to reduce the granularity of the data set
Example: one 3-d file sst(lon,lat,time)
Format Layer
Data values are not modified
Format transformation only between server and client
The organizational structure of the data is not modified
An OPeNDAP Structural Layer Component – The Aggregation Server
Developed by John Caron of Unidata
Is for the aggregation of grids and arrays only
Operates in the Syntactic Structural Level
Projects Using OPeNDAP
GODAE (Global Ocean Data Assimilation Experiment)
NOMADS (NOAA Operational Model Archive and Distribution System) AOIMPS
ESG II - Earth System Grid II
Ocean. US (US-GOOS)
High Altitude Observatory Community
Institutions Making Heavy use of OPeNDAP2
Ingrid - Columbia University
COLA - Center for Ocean-Land-Atmosphere
Goddard DAAC
CDC - Climate Diagnostic Center
PMEL - Pacific Marine Environment Lab
OPeNDAP Monthly Accesses (2002)
Site/Month April May June July August
URI 4,856 19,504 3,691 26,693 7,440
LDEO 80,709 62,930 46,092 93,088 32,084
CDC 102,518 153,362 62,395 181,974 107,512
JPL 3,068 34,028 63,309 8,260 13,282
COLA 347,506 412,991 337,310 400,314 638,376
TOTAL 535,589 648,787 502,797 702,069 785,412
OPeNDAP Unique Users (2002)
Site April May June July August
URI 73 68 72 44 69
CDC 124 105 91 116 111
JPL 122 152 173 198 174
COLA 158 199 317 197 408
Interesting OPeNDAP Access Statistics
• IRI data accesses for 1st quarter of 2002
Type Requests % Volume (gb) %
OPeNDAP
191,611 8.5 375.2 69.4
Other 2,062,681 91.5 165.2 30.6
Total 2,254,292 100.0 540.4 100.0
•PMEL OPeNDAP2 ~ 35,000 with ~26,000 internal.
Lessons (Re)Learned
1. Modularity provides for flexibility
The more modular the underlying infrastructure the more flexible the system. This is particularly important for network based systems for which the technology, software and hardware, are changing rapidly.
Lessons (Re)Learned
2. Data of interest will be stored in a variety of formats.
Regardless of how much one might want to define the format to be used by system participants, in the end the data will be stored in a variety of formats.
2a. The same is true of translational use metadata!
Lessons Learned
3. Structural representation of sequence data sets is a major obstacle to interoperability
Care must be given to the organizational structure (as opposed to the format) of the data. This is the single largest constraint to the use of profile data in NVODS.
Lessons (Re)Learned
4. “Not invented here”
Avoid the “not invented here” trap. The basic concepts of a data system are relatively straightforward to define. Implementing these concepts ALWAYS involves substantially more work than originally anticipated. The “Devil’s in the details”.
Take advantage of existing software wherever possible.
Lessons (Re)Learned
5. Work with those who adopt the system for their own needs.
Take advantage of those who are interested in contributing to the system because the system addresses their needs as opposed to those who are simply doing the work for the associated funding. => Open source.
Lessons Learned
6. There is no well defined funding structure for community based operational systems.
It is much easier to obtain funding to develop a system than it is to obtain funding to maintain and evolve a system. This is a major obstacle to development of a stable cyberinfrastructure that meets the needs of the research community.
Lessons Learned
7. It is relatively more difficult to obtain funding for applied system development than for research related to data systems.
This is another obstacle to the development of cyberinfrastructure that meets the needs of the research community.
Lessons (Re)Learned
8. “Tough to teach old dogs new tricks”
Introducing new technology often requires a cultural change in usage that is difficult to effect. This can negatively effect system development.
Lesser Lessons Learned
9. Some surprises encountered in the NVODS/ OPeNDAP effort
Heavy within organization usage.
Metadata focus in the past is appropriate for interoperability at the data level.
Number of variables increases almost linearly with the number of data sets.
Users will take advantage of all of the flexibility offered by a system sometimes to the disadvantage of all.
Incredible variability in the structural organization of data.
Lessons Learned
10. Metrics suggest Increasing use of scripted requests Large volume transfers
As data systems offering machine-to- machine interoperability with semantic meaning take hold, we could well see an explosive growth in the use of the web.
Lessons Learned
11. Time to maturity is order 10 years not 3
Developing new infrastructure takes time, both to iron out all of the %^*% little details and adoption of the infrastructure takes time.
Peter’s Law
The more metadata required the less data delivered
Of course, the less metadata, the harder it is to use the data