Upload
honorato-joyner
View
41
Download
0
Embed Size (px)
DESCRIPTION
Three Flavors of Data. Science Data Simulations and Sensor Readings Catalog Data Metadata; descriptors of datasets, data products and other processing artifacts. Active Data Data associated with logging, monitoring and scheduling compute tasks. Three Flavors of Data (1). Science Data - PowerPoint PPT Presentation
Citation preview
Three Flavors of Data
Science Data Simulations and Sensor Readings
Catalog Data Metadata; descriptors of datasets, data
products and other processing artifacts.
Active Data Data associated with logging,
monitoring and scheduling compute tasks.
Three Flavors of Data (1)
Science Data Simulation Data: Solutions to partial differential
equations governing the physics of the Columbia River Estuary
Sensor Data: measurements of the physical characteristics used to guide and validate simulations
Wanted: Simple means for specifying new data products
from these raw data and computing them efficiently
Approach: Data manipulation language based on a GridField
data model.
Three Flavors of Data (2)
Catalog Data Explicit metadata to describe system artifacts
Wanted: Tools to locate artifacts given descriptors (query) A metadata collection facility that tolerates
change The metadata we wish to collect may change (eg, new
product ‘lines’ are developed) The source of the metadata may change (eg, file
naming conventions or directory structures evolve.)
Approach: Generic database; custom collection scripts
Three Flavors of Data (3)
Active Data Data describing past, current, and future
compute tasks.
Wanted: Tools for scheduling, monitoring, and managing...
individual tasks (eg, a single data product derivation) groups of interdependent tasks (eg, a daily forecast run) campaigns (eg, a series of calibration runs followed by
a re-computation of the runs of 2002 with a different implicitness)
Approach: undecided
Simulation Data: GridFields
The data product suite exhibits recurring processing idioms
larger grids reduced to smaller grids Ex: ‘estuary’ data products vs. ‘far’ data products
grids mapped to other gridsEx: 3D grid mapped to a 2D slice
grids combinedEx: 1D depth grid ‘crossed’ with a 2D horizontal
grid.
Simulation Data: GridFields (2)
We’re expressing these idioms as operators over a grid-based data model. Advantages: Simpler recipes
5 ops for all the data products (plus helper functions) Flexible model; fewer maintenance troubles
N dimensions uniform handling of space and time (maybe more...)
Any cell type segments, triangles, quadrangles, arbitrary polytopes
Optimization opportunities operators prescribe semantics, but not implementation topological equivalences exposed and exploited
Simulation Data: GridFields (3)
Status: Core operators functional Simple examples hooked to XMVIS for
viewing Todo:
Examples hooked to VTK Write/Test examples from the current product
suite Support GridFields too large for memory Expose a nice syntax for writing recipes
Catalog Data: CollectionWhere is the Metadata?
File Name File Path
Version: 1.04
Variable: salt
:
File Content
1_salt.63
/forecasts/2003-184/run/images/isosal_estuary7/anim-sal_estuary_7.gif
Other Files?
Collection scripts
For each file type the meta-data collection mechanism is different. gifs binary output Param.in
Use a script for each file type that will emit meta-data for that type of file.Only these simple scripts need change as the system evolves
Example: gif animation
CorieDate = “2003-184” Region = “Estuary”
Lat = xxxxLong = xxxx
/forecasts/2003-184/.../isosal_estuary7/anim-sal_estuary_7.gif
Variable = “Salinity”
Type = “Animation”
Depth = “7”product line = “isoline”
Here, a script can just parse the path and file name
Example: Binary output
Need a different mechanism than for gif animations; might be convenient to implement it in a different script.
/forecasts/2003-184/run/1_salt.gif
Variable= “Salinity”
What about number of nodes? Mean Sea Level?
We need to access the file’s content
1_salt.63
nodes: 55817msl: 4285::
Architecture
Reflector creates XML file containing meta-data for each file and also stores the meta-data into the databaseReflector determines file type (based on regular expressions) and calls appropriate collection scriptCollection script uses an “AddItem” Perl function to return the meta-data back to the reflector
ReflectorCollection
Script
invokes
Meta-data
DBXML
Metadata in XML and DB?
These XML files give you filesystem-based access to the metadata for an artifactUse “info” to present the XML in a readable form:
/../run> info 1_salt.63variable: saltversion: 1.04msl: 4285nodes: 55817
Also useful if DB is inaccessible.
Minor Technical Change
Previously we had suggested that the collection scripts should emit metadata on standard output
We have provided a perl function AddItem(Name,Value,Notes,Type)
How does this help ?
Find artifacts via descriptors (query) ‘find animations showing the estuary where
we used a constant bottom friction coefficient’
where region = “estuary” and type = “animation” and ntau = “0”
Write robust metadata-driven programs
Chris’ low bandwidth zoom web app Stay-Fresh Powerpoint Slides