2
Cyber-Folk
OHSUBill Howe, Charles Seaton, Paul Turner,
Antonio Baptista
UtahJuliana Freire, Claudio Silva
Portland StateDavid Maier, Nirupama Bulusu, Wu-Chi
Feng+ grad & undergrad students
3
Activities
• Data Mart• VisTrails• Quarry RoboCMOP• Network Optimization• Cruise Dashboard• Ocean Appliance• Data Policies
5
Data Mart Design Principles
• 100% visibility of data assets• On-demand generation of products• Can always download data behind a
product• Highly configurable: navigation, data
selection, products, product parameters
Have a look, leave comments
6
The VisTrails Project (Utah)
• Vision: Provenance-enable the world• Comprehensive provenance infrastructure for
computational tasks– Captures provenance transparently– Provides intuitive query interfaces for exploring
provenance data– Supports collaboration
• Designed to support exploratory tasks such as visualization and data mining– Task specification iteratively refined as users
generate and test hypotheses
• VisTrails system is open source: www.vistrails.org
8
Integrating Tools and Libraries
SCIRun
Workflow that combines 5 different librariesValue added: provenance, query, parameter-space exploration, easier sharing & collaboration
9
Quarry
Structured browse capability for model products– Harvest fine-grained meta-data– Automatically design efficient database
schema based on data patterns– Can explore space of products via
alternating property, value selections
http://www.stccmop.org/quarry
10
Our Trajectory: RoboCMOP
Vision: Lift scientific C-I to an active participant in the scientific process, acting autonomously to provide the data, products, and context you need, right when needed.
Stages– Locate existing products (based on “cues” in conversation)– Instantiate existing product types on demand– Propose new product variants (Cf. VisTrails “Creating workflows
by analogy”)– Task observatory systems to collect relevant data (serendipitous
gap-filling, active direction of assets)
11
Network Optimization: Nirupama Bulusu
• Sensor stations are deployed based on– Physical Intuition: Sensing coverage, Flow
dynamics– Physical Limitation: Power and
Communication wiring
• Little understanding which sensors are important– Is the current deployment optimal? – If not, which sensors we should remove, which
sensors we should keep? – If we want to deploy more sensors, where
should we deploy them?
12
Sensor Selection Problem
• Find a configuration of the network that reduces the most error in the data assimilation process
|)|(||))(max( AnnSandAStosubjectSDS
≤=⊆
},...,,{21 sss n
A =
)(SD
},,,,{ δzyxtypesi=
• Set of all sensor configurations
• Sensor configuration – type: sanity, elevation,
temperature– x,y,z : sensor location– δ : sensor standard deviation
• Error reduction in data assimilation
13
Results
• Reduce 26% of number of sensors, reduce accuracy by 1.55%
Exploring a genetic-algorithms approach
14
Cruise Dashboard
Project of Nick Hagerty, summer REU– Fast visibility of collected data– With appropriate information context
One of the drivers of pluggable products
16
Interface
• Cast-specific interface fully functional• First deployed (successfully) on July 2007 cruise• Useful simply as convenient grouping of relevant
data, graphs, information• Hope to link with workflow
18
Ocean Appliance
• We must “IOOS-enable” local data providers• Someone has to write the code• Responsibility usually falls to RAs• The cost of hardware is falling• The cost of software support is rising
• Provision complete platforms to control cost
19
IOOS: System of Systems (of Systems …)
National Service Nodes
RA
RA
Univ.
Discovery
Brokerage
Aggregation
Fusion
Applications
Local Prov.
Local Prov.
Local Prov.
Value-add services:
DMAC standards
DMAC standards Ad hoc protocols
http://www.ocean.us/
20
System of Systems (of Systems …)
RA
RA
Univ.
Local Prov.
Local Prov.
Local Prov.
Ad Hoc Protocols
-- FTP
-- screen scraping
-- ASCII
-- netCDF
How can we “DMAC-enable” the Local Data Providers, quickly and inexpensively?
21
The Ocean Appliance
Software– Linux Fedora Core 6– web server (Apache)– database (PostgreSQL)– ingest/QC system (Python)– telemetry system (Python)– web-based visualization
(Drupal, Python)
Hardware – 2.6GHz Dual– 2GB RAM– 250 GB SATA– 4 serial ports– ~$500– ~1’x1’x1.5’
22
SWAP Network; collaboration of:- OSU- OHSU- UNOLS
Deployed on Multi-ship Coordinated Cruise
Wecoma
Forerunner
Barnes
23
Data Standards
• What counts as data?• What are the standard procedures for collecting
data during cruises?• How are new data sources added?• What external data archives will we use?• What are our QA/QC procedures for each data
source?• How is instrument calibration information
handled?• How will data processing levels and data release
versioning be handled?
Charles Seaton: [email protected]