EN 600.619: Adv. Storage and TP Systems
MauveDB: Model-based User Views
EN 600.619: Adv. Storage and TP Systems
Problem
• Databases are unusable for scientific data– Data are incomplete, imprecise, and erroneous– Need to be filtered/synthesized using models
• Scientists use the in the most rudimentary ways– As a backing store for raw data– Run few or no queries
• User-define functions are inadequate– Static models, insufficient for many applications– Let’s discuss this later?
EN 600.619: Adv. Storage and TP Systems
Approach
• Define user-views based on a model syntax– Extend traditional SQL-view model
• User views provide access to synthesized data– Data independence
• Present stable view of system– When sites don’t report data (missing values)– When network changes– Report data at different locations than sampled
• View maintenance– Issues of whether to materialize or not
EN 600.619: Adv. Storage and TP Systems
Processing Scientific Data
• Without Model-based views– Export to Matlab then apply models– Use custom, programmatic querying tools– Can’t use SQL
• Getting data back into database is awkward and inefficient
• With Model-based views– Self-updating models as data changes– Standard SQL data against synthesized data
EN 600.619: Adv. Storage and TP Systems
Example
• Benefits– Network changes are transparent– Spatial or temporal biases removed (e.g., for aggregates)
• What about model errors?
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
EN 600.619: Adv. Storage and TP Systems
Architecture
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
EN 600.619: Adv. Storage and TP Systems
View Creation: Regression
• Select a virtual grid on which data are reported– Using MatLab style syntax
• Create a unique model at each time T
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
EN 600.619: Adv. Storage and TP Systems
View Creation: Interpolation
• Interpolate missing values from nearby sites
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
EN 600.619: Adv. Storage and TP Systems
Case Study 1: Temp Regression
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
EN 600.619: Adv. Storage and TP Systems
Case Study 2: Temp Interpolation
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
EN 600.619: Adv. Storage and TP Systems
The AS Clause
• AS clause specifies each model– AS FIT– AS INTERPOLATE
• Probably needs extended syntax for models methods– INTERPOLATE with splines, nearest neighbor, regression
• User-views are only as flexible as models pre-programmed into the syntax– How does this compare with UDFs, table valued functions?– Is this the appropriate level for this kind of customization?
EN 600.619: Adv. Storage and TP Systems
View Maintenance
• Options– Logical: build results for each query– Materialized: pre-compute all results for each model– Partial/Cached: store results generated by queries– Model-based: often models have fixed costs
• Building basis functions, matrix inversions, linear solutions
• Tradeoff between query latency and overhead• Is implementing model logic at such a low level
reasonable?
EN 600.619: Adv. Storage and TP Systems
Outcomes/Opinions
• Is MauveDB the technology that will make scientists use databases?