16
Development of DAP4 (a Data Access Protocol) Describing Progress and Seeking Input at the ESIP Summer Meeting 2012 by Dave Fulker (OPeNDAP President)

OPeNDAP-Unidata Development of DAP4 (a Data Access Protocol) Describing Progress and Seeking Input at the ESIP Summer Meeting 2012 by Dave Fulker (OPeNDAP

Embed Size (px)

Citation preview

OPeNDAP-Unidata Development of

DAP4 (a Data Access

Protocol)

Describing Progress and Seeking Input

at the ESIP Summer Meeting 2012

by Dave Fulker (OPeNDAP President)

2

Overarching Concept of OPeNDAP’s Data Access Protocol (DAP):

Clients Get Only Needed Data, When They Need themAccessing data through web services (i.e., URL ≈ dataset)

Appending query strings to invoke server functions, esp. subsetting

Getting responses of 2 major types:

Metadata - dataset descriptions & catalogs (textual)

Content - values and metadata (binary or textual)

Using responses in diverse client contexts, e.g.,

MATLAB maps DAP responses directly to its internal math types

DAP libraries (netCDF, e.g.) simplify the programming of apps

3

Some of DAP Users’

Distinguishing Needs Data often depict (scientific) phenomena where

Geospatial maps are among the useful views

But other views are important as well

Coordinates often are 2-, 3-, 4- & even 5-dimensional

These may include (time-dependent) coordinate-proxies

Users often wish to use data whose source files

Are in a variety of inconvenient formats

With insufficient or obsolete metadata

4

Present State of DAP

The DAP2 specification (after nearly 2 decades!) has multiple contemporary realizations on servers and clients

Clients include: MATLAB, GRADS, IDL, IDV...

Python apps that employ the PyDAP library

Fortran, C, C++ & Java apps that employ the netCDF library

Servers include: PyDAP, ERDAP... (often with augmented services)

Most widely deployed: TDS (Unidata) & Hyrax (OPeNDAP)

Widely used by data providers and users, including cases where DAP servers provide translations of inconveniently formatted source files

5

Branching: Hyrax & THREDDS

Multiple implementations of a protocol often is considered a good thing (per IETF, e.g.)

This can be a problem, however, if the implementations embody excessive redundancy or confuse users

Our view: co-existence of TDS (Unidata) & Hyrax (OPeNDAP) reflects some redundancy & creates some inconsistencies for users

Need #1: achieve conformance ⇒ consistency for users

Need #2: more software reuse ⇒ more advancement

6

NOAA/BAA grant for

OPeNDAP-Unidata Linked Servers (OPULS)

Goal 1: OPeNDAP/Unidata conformance & linkage

New data-model/protocol specs (DAP4), with conformance tests & extensibility demos:

Modes of asynchronous access (to near-line data, e.g.)

Server-side subsetting of data on irregular meshes

Goal 2: common software for OPeNDAP & Unidata servers

Work yet to begin...

7

OPeNDAP Data-Type Philosophy(reflected in DAP2 & now DAP4)Data model has few data types

For simplified programming & lowered risk of errors

Data types are deliberately domain-neutral

For better trans-domain utility & programmer uptake

But they allow both syntactic & semantic structures/metadata

These Types do in fact support domain needs

NetCDF-like (can represent functions on 4-D domains, e.g.)

Sequences & selections match DBMS sensibilities

8

DAP4 Data Model (simplified)dataset ≈ unique URL (with no query

string)

a dataset holds a hierarchy of groups, each a namespace

/container for variables, dimensions & attributes

each variable comprises

a name(unique

in the

group)

a type(which

applies to all values)

value(s) (organized as dimensioned

arrays)

attributes*

(optional)

*Attributes are like variables but with a semantic purpose, making a variable or a group more meaningful. E.g., variables often have an attribute (of type string) named “units.”

9

DAP4 Data Types & Relations

as in C or Java, e.g., a variable’s type may be structured or atomic: integer,

float, byte, string...

DAP variables may be (semantically) related to one another via two key grouping

constructs

relations link 1-D variables as columns

in a table;

sampled functions link

coordinate-map variables (domain) to

function-value variables (ranges)

having common indexes

in turn, relations can be linked via

variables that serve as foreign keys

10

DAP4 Operations (invoked as query strings)3 kinds of constraint expressions (i.e. query strings) yield subsets or invoke

(server-side) processing

projection(returns a subset)

selection(returns a subset)

function(today’s town

hall!)

specify included

variables (by name) as well as indices of included array

elements

limit tuples (rows) of a relation to those with

variable values satisfying a DBMS-style predicate

invoke server functions to calculate a return [we intend to target

critical needs]

11

Like netCDF, but as a Web service, users may

Skip indices

Limit index ranges

Reduce dimensionality

OPeNDAP Projection Operators

12

Other DAP-Related ServericesNote: these were not part of the DAP2 specification...Many DAP-based servers (from Unidata &

OPeNDAP, e.g.)

Accept multiple types of data as inputs

Offer several views of them over the web

Native DAP web services: for DAP-enabled clients

Source format (lossless): netCDF-to-netCDF or HDF4-to-HDF4, e.g.

Alternative web services: html (browser views), XML, WCS, etc.

Town-Hall: what other services should be offered?

13

Other OPULS AccomplishmentsIrregular mesh subsetting

Progress with U WA (Bill Howe)

To be released soon...

Asynchronous accessPreliminary trials...

Cloud-based service provision (with parallelism)MODIS reprojection (related, but not OPULS funding)

14

OPULS Process

Transparency

Public documentation updated weekly (just Google OPULS!)

Advisory committee

Jeff de La Beaujardiere, James Frew, Mike Folk, Steve Hankin, Eric Kihn, Rich Signell

Welcoming input (per this town hall)

15

Town-Hall Questions

What server functions ought to be specified in the DAP4 protocol?

Simple point-wise mathematics

Mathematics on sampled functions

Truly domain-specific functions (involving the datum, e.g.)

Which (other) web-service protocols should be leveraged by DAP servers, & what are the pertinent use cases?

To facilitate open search (exploiting ATOM), e.g.

To facilitate semantic analysis (providing RDF output, e.g.)

Others?

16

i thank

you

• OPeNDAP, Inc

• http://opendap.org

• increasing data

’s visibility

• OPeNDAP, Inc

• http://opendap.org

• increasing data

’s visibility