36
Unidata’s Common Data Model John Caron Unidata/UCAR Nov 2006

Unidata's Common Data Model

Embed Size (px)

Citation preview

Page 1: Unidata's Common Data Model

Unidata’s Common Data Model

John Caron

Unidata/UCAR

Nov 2006

Page 2: Unidata's Common Data Model

Goals / Overview

• Look at the landscape of scientific datasets from a few thousand feet up.

• What semantics are needed to make these useful?– georeferencing– specialized subsetting

Page 3: Unidata's Common Data Model

What’s a Data Model?

• An Abstract Data Model describes data objects and what methods you can use on them.

• An API is the interface to the Data Model for a specific programming language

• A file format is a way to persist the objects in the Data Model.

• An Abstract Data Model removes the details of any particular API and the persistence format.

Page 4: Unidata's Common Data Model

Coordinate Systems

Common Data Model Layers

Data Access

Scientific Datatypes

Grid

Point

Radial

Trajectory

Swath

Station Profile

Page 5: Unidata's Common Data Model

NetcdfDataset

ApplicationScientific Datatypes

NetCDF-Java version 2.2 architecture

OPeNDAP

THREDDS

Catalog.xml NetCDF-3

HDF5

I/O service provider

GRIB

GINI

NIDS

NetcdfFile

NetCDF-4

…Nexrad

DMSP

CoordSystem Builder

Datatype Adapter

ADDE

NcMLNcML

Page 6: Unidata's Common Data Model

NetCDF-4 andCommon Data Model(Data Access Layer)

Page 7: Unidata's Common Data Model

I/O Service Provider Implementations

• General: NetCDF, HDF5, OPeNDAP

• Gridded: GRIB-1, GRIB-2

• Radar: NEXRAD level 2 and 3, DORADE

• Point: BUFR, ASCII

• Satellite: DMSP, GINI

• In development– NOAA: GOES (Knapp/Nelson), many others

Page 8: Unidata's Common Data Model

Coordinate Systems needed

• NetCDF, OPeNDAP, HDF data models do not have integrated coordinate systems– so georeferencing not part of API– Need conventions to specify (eg CF-1,

COARDS, etc)

• Contrast GRIB, HDF-EOS, other specialized formats

Page 9: Unidata's Common Data Model

NetCDF Coordinate Variables

dimensions:

lat = 64;

lon = 128;

variables:float lat(lat);

float lon(lon);double temperature(lat,lon);

Page 10: Unidata's Common Data Model

Coordinate Variables

– One-dimension variable with same name as its dimension

– Strictly monotonic values

– No missing values

The coordinates of a point (i,j,k) is

{CV1(i), CV2(j), CV3(k)}

Page 11: Unidata's Common Data Model

Limitations of 1D Coordinate Variables

• Non lat/lon horizontal grids:float temperature(y,x) float lat(y, x); float lon(y, x);

• Trajectory data:float NKoreaRadioactivity(pt); float lat(pt); float lon(pt); float altitude(pt); float time(pt)

Page 12: Unidata's Common Data Model

General Coordinates in CF-1.0

float P(y,x); P:coordinates = “lat lon”; float lat(y, x);float lon(y, x);

float Sr90(pt); Sr90:coordinates = “lat lon altitude time”;

Page 13: Unidata's Common Data Model

Coordinate Systems (abstract)

• A Coordinate System for a data variable is a set of Coordinate Variables2 such that the coordinates of the (i,j,k) data point is

{CV1(i,j,k),CV2(i,j,k),CV3(i,j,k),CV4(i,j,k)…}

previous was {CV1(i), CV2(j), CV3(k)}

• The dimensions of each Coordinate Variable must be a subset of the dimensions of the data variable.

Page 14: Unidata's Common Data Model

Need Coordinate Axis Types

float gridData(t,z,y,x); float time(t); float y(y); float x(x); float lat(y,x); float lon(y,x); float height(t,z,y,x);

float radialData(radial, gate) float distance(gate) float azimuth(radial) float elevation(radial) float time(radial)

Page 15: Unidata's Common Data Model

The same??

float stationObs(pt); float lat(pt); float lon(pt); float z(pt); float time(pt);

float trajectory(pt); float lat(pt); float lon(pt); float z(pt); float time(pt);

Page 16: Unidata's Common Data Model

Revised Coordinate Systems

1. Specify Coordinate Variables 2. Specify Coordinate Types

(time, lat, lon, projection x, y, height, pressure, z, radial, azimuth, elevation)

3. Specify connectivity (implicit or explicit) between data points– Implicit: Neighbors in index space are

(connected) neighbors in coordinate space. Allows efficient searching.

Page 17: Unidata's Common Data Model

Gridded Data

Connected meansNeighbors in index space

are neighbors in coordinate space

float gridData(t,z,y,x); float time(t); // Time float y(y); // GeoX float x(x); // GeoY float z(t,z,y,x); // Height or Pressure

• Cartesian coordinates• All dimensions are connected

Page 18: Unidata's Common Data Model

Coordinate Systems UML

Page 19: Unidata's Common Data Model

Scientific Data Types

• Based on datasets Unidata is familiar with– APIs are evolving

• How are data points connected?• Intended to scale to large, multifile

collections• Intended to support “specialized queries”

– Space, Time

• Corresponding “standard” NetCDF file conventions

Page 20: Unidata's Common Data Model

Gridded Data

float gridData(t,z,y,x); float time(t); float y(y); float x(x); float lat(y,x); float lon(y,x); float height(t,z,y,x);

• Cartesian coordinates• All dimensions are connected• x, y, z, time• recently added runtime and ensemble• refactored into GridDatatype interface

Page 21: Unidata's Common Data Model

GridDatatype methods

CoordinateAxis getTaxis();CoordinateAxis getXaxis();CoordinateAxis getYaxis();CoordinateAxis getZaxis();Projection getProjection();

int[] findXYindexFromCoord( double x_coord, double y_coord);

LatLonRect getLatLonBoundingBox();

Array getDataSlice (Range[] …) GridDatatype makeSubset (Range[] …)

Page 22: Unidata's Common Data Model

Radial Data

radialData(radial, gate) : distance(gate) azimuth(radial) elevation(radial) time(radial)

• Polar coordinates• All dimensions are connected• Not separate time dimension

Page 23: Unidata's Common Data Model

Swath

swathData(line,cell) lat(line,cell) lon(line,cell) time(line) z(line,cell) ??

• lat/lon coordinates• not separate time dimension• all dimensions are connected

Page 24: Unidata's Common Data Model

Point Observation Data

Structure { lat, lon, z, time; v1, v2, ... } obs( pt);

• Set of measurements at the same point in space and time• Point dimension not connected

float obs1(pt);float obs2(pt); float lat(pt); float lon(pt); float z(pt); float time(pt);

Page 25: Unidata's Common Data Model

PointObsDataset Methods

// Iterator<StructureData>

Iterator getData(

LatLonRect boundingBox,

Date start, Date end);

Page 26: Unidata's Common Data Model

Time series Station Data

Structure { name; lat, lon, z; Structure{ time; v1, v2, ... } obs(*); // connected } stn(stn); // not connected

Page 27: Unidata's Common Data Model

StationObs Methods

// List<Station>List getStations( LatLonRect boundingBox);

// Iterator<StructureData>Iterator getData( Station s, Date start, Date end);

Page 28: Unidata's Common Data Model

Structure { name; Structure { lat, lon, z, time; v1, v2, ... } obs(*); // connected } traj(traj) // not connected

Trajectory Data

Structure { lat, lon, z, time; v1, v2, ... } obs(pt); // connected

• pt dimension is connected• Collection dimension not connected

Page 29: Unidata's Common Data Model

Profiler/Sounding Station Data Structure { name; lat, lon, time; Structure { z; v1, v2, ... } obs(*); // connected } loc(nloc); // not connected

Structure { name; lat, lon; Structure { time, Structure { z; v1, v2, ... } obs(*); // connected } time(*); // connected } stn(stn); // not connected

Page 30: Unidata's Common Data Model

Unstructured Grid

float unstructGrid(t,z,pt); float lat(pt); float lon(pt); float time(t); float height(z);

• Pt dimension not connected• Looks the same as point data• Need to specify the connectivity explicitly

Page 31: Unidata's Common Data Model

Data Types Summary

• Data access through a standard API

• Convenient georeferencing

• Specialized subsetting methods– Efficiency for large datasets

Page 32: Unidata's Common Data Model

File Format#N

File Format#2

File Format#1

CDM

Visualization&Analysis

PayoffN + M instead of N * M things on your TODO List!

NetCDF file

OpenDAP Server

WCS Service

Web Service

Page 33: Unidata's Common Data Model

HTTP Tomcat Server

THREDDS Data Server

Datasets

Catalog.xml

hostname.edu

THREDDS ServerApplication

NetCDF-Javalibrary

IDD Data

•OPeNDAP

•HTTPServer

•WCS

Page 34: Unidata's Common Data Model

Next: DataType Aggregation

• Work at the CDM DataType level, know (some) data semantics

• Forecast Model Collection– Combine multiple model forecasts into single

dataset with two time dimensions– With NOAA/IOOS (Steve Hankin)

• Point/Station/Trajectory/Profile Data – Allow space/time queries, return nested sequences– Start from / standardize “Dapper conventions”

Page 35: Unidata's Common Data Model

Forecast

Model

Collections

Page 36: Unidata's Common Data Model

Conclusion

• Standardized Data Access in good shape– HDF5, NetCDF, OPeNDAP– Write an IOSP for proprietary formats (Java)

• But that’s not good enough!• To do:

– Standard representations of coordinate systems

– Classifications of data types, standard services for them