15
Three Flavors of Data Science Data Simulations and Sensor Readings Catalog Data Metadata; descriptors of datasets, data products and other processing artifacts. Active Data Data associated with logging, monitoring and scheduling compute tasks.

Three Flavors of Data

Embed Size (px)

DESCRIPTION

Three Flavors of Data. Science Data Simulations and Sensor Readings Catalog Data Metadata; descriptors of datasets, data products and other processing artifacts. Active Data Data associated with logging, monitoring and scheduling compute tasks. Three Flavors of Data (1). Science Data - PowerPoint PPT Presentation

Citation preview

Page 1: Three Flavors of Data

Three Flavors of Data

Science Data Simulations and Sensor Readings

Catalog Data Metadata; descriptors of datasets, data

products and other processing artifacts.

Active Data Data associated with logging,

monitoring and scheduling compute tasks.

Page 2: Three Flavors of Data

Three Flavors of Data (1)

Science Data Simulation Data: Solutions to partial differential

equations governing the physics of the Columbia River Estuary

Sensor Data: measurements of the physical characteristics used to guide and validate simulations

Wanted: Simple means for specifying new data products

from these raw data and computing them efficiently

Approach: Data manipulation language based on a GridField

data model.

Page 3: Three Flavors of Data

Three Flavors of Data (2)

Catalog Data Explicit metadata to describe system artifacts

Wanted: Tools to locate artifacts given descriptors (query) A metadata collection facility that tolerates

change The metadata we wish to collect may change (eg, new

product ‘lines’ are developed) The source of the metadata may change (eg, file

naming conventions or directory structures evolve.)

Approach: Generic database; custom collection scripts

Page 4: Three Flavors of Data

Three Flavors of Data (3)

Active Data Data describing past, current, and future

compute tasks.

Wanted: Tools for scheduling, monitoring, and managing...

individual tasks (eg, a single data product derivation) groups of interdependent tasks (eg, a daily forecast run) campaigns (eg, a series of calibration runs followed by

a re-computation of the runs of 2002 with a different implicitness)

Approach: undecided

Page 5: Three Flavors of Data

Simulation Data: GridFields

The data product suite exhibits recurring processing idioms

larger grids reduced to smaller grids Ex: ‘estuary’ data products vs. ‘far’ data products

grids mapped to other gridsEx: 3D grid mapped to a 2D slice

grids combinedEx: 1D depth grid ‘crossed’ with a 2D horizontal

grid.

Page 6: Three Flavors of Data

Simulation Data: GridFields (2)

We’re expressing these idioms as operators over a grid-based data model. Advantages: Simpler recipes

5 ops for all the data products (plus helper functions) Flexible model; fewer maintenance troubles

N dimensions uniform handling of space and time (maybe more...)

Any cell type segments, triangles, quadrangles, arbitrary polytopes

Optimization opportunities operators prescribe semantics, but not implementation topological equivalences exposed and exploited

Page 7: Three Flavors of Data

Simulation Data: GridFields (3)

Status: Core operators functional Simple examples hooked to XMVIS for

viewing Todo:

Examples hooked to VTK Write/Test examples from the current product

suite Support GridFields too large for memory Expose a nice syntax for writing recipes

Page 8: Three Flavors of Data

Catalog Data: CollectionWhere is the Metadata?

File Name File Path

Version: 1.04

Variable: salt

:

File Content

1_salt.63

/forecasts/2003-184/run/images/isosal_estuary7/anim-sal_estuary_7.gif

Other Files?

Page 9: Three Flavors of Data

Collection scripts

For each file type the meta-data collection mechanism is different. gifs binary output Param.in

Use a script for each file type that will emit meta-data for that type of file.Only these simple scripts need change as the system evolves

Page 10: Three Flavors of Data

Example: gif animation

CorieDate = “2003-184” Region = “Estuary”

Lat = xxxxLong = xxxx

/forecasts/2003-184/.../isosal_estuary7/anim-sal_estuary_7.gif

Variable = “Salinity”

Type = “Animation”

Depth = “7”product line = “isoline”

Here, a script can just parse the path and file name

Page 11: Three Flavors of Data

Example: Binary output

Need a different mechanism than for gif animations; might be convenient to implement it in a different script.

/forecasts/2003-184/run/1_salt.gif

Variable= “Salinity”

What about number of nodes? Mean Sea Level?

We need to access the file’s content

1_salt.63

nodes: 55817msl: 4285::

Page 12: Three Flavors of Data

Architecture

Reflector creates XML file containing meta-data for each file and also stores the meta-data into the databaseReflector determines file type (based on regular expressions) and calls appropriate collection scriptCollection script uses an “AddItem” Perl function to return the meta-data back to the reflector

ReflectorCollection

Script

invokes

Meta-data

DBXML

Page 13: Three Flavors of Data

Metadata in XML and DB?

These XML files give you filesystem-based access to the metadata for an artifactUse “info” to present the XML in a readable form:

/../run> info 1_salt.63variable: saltversion: 1.04msl: 4285nodes: 55817

Also useful if DB is inaccessible.

Page 14: Three Flavors of Data

Minor Technical Change

Previously we had suggested that the collection scripts should emit metadata on standard output

We have provided a perl function AddItem(Name,Value,Notes,Type)

Page 15: Three Flavors of Data

How does this help ?

Find artifacts via descriptors (query) ‘find animations showing the estuary where

we used a constant bottom friction coefficient’

where region = “estuary” and type = “animation” and ntau = “0”

Write robust metadata-driven programs

Chris’ low bandwidth zoom web app Stay-Fresh Powerpoint Slides