Upload
joleen-fletcher
View
216
Download
0
Embed Size (px)
DESCRIPTION
EN : Adv. Storage and TP Systems Approach Define user-views based on a model syntax –Extend traditional SQL-view model User views provide access to synthesized data –Data independence Present stable view of system –When sites don’t report data (missing values) –When network changes –Report data at different locations than sampled View maintenance –Issues of whether to materialize or not
Citation preview
EN 600.619: Adv. Storage and TP Systems
MauveDB: Model-based User Views
EN 600.619: Adv. Storage and TP Systems
Problem
• Databases are unusable for scientific data– Data are incomplete, imprecise, and erroneous– Need to be filtered/synthesized using models
• Scientists use the in the most rudimentary ways– As a backing store for raw data– Run few or no queries
• User-define functions are inadequate– Static models, insufficient for many applications– Let’s discuss this later?
EN 600.619: Adv. Storage and TP Systems
Approach
• Define user-views based on a model syntax– Extend traditional SQL-view model
• User views provide access to synthesized data– Data independence
• Present stable view of system– When sites don’t report data (missing values)– When network changes– Report data at different locations than sampled
• View maintenance– Issues of whether to materialize or not
EN 600.619: Adv. Storage and TP Systems
Processing Scientific Data
• Without Model-based views– Export to Matlab then apply models– Use custom, programmatic querying tools– Can’t use SQL
• Getting data back into database is awkward and inefficient
• With Model-based views– Self-updating models as data changes– Standard SQL data against synthesized data
EN 600.619: Adv. Storage and TP Systems
Example
• Benefits– Network changes are transparent– Spatial or temporal biases removed (e.g., for aggregates)
• What about model errors?
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
EN 600.619: Adv. Storage and TP Systems
Architecture
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
EN 600.619: Adv. Storage and TP Systems
View Creation: Regression
• Select a virtual grid on which data are reported– Using MatLab style syntax
• Create a unique model at each time T
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
EN 600.619: Adv. Storage and TP Systems
View Creation: Interpolation
• Interpolate missing values from nearby sites
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
EN 600.619: Adv. Storage and TP Systems
Case Study 1: Temp Regression
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
EN 600.619: Adv. Storage and TP Systems
Case Study 2: Temp Interpolation
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
EN 600.619: Adv. Storage and TP Systems
The AS Clause
• AS clause specifies each model– AS FIT– AS INTERPOLATE
• Probably needs extended syntax for models methods– INTERPOLATE with splines, nearest neighbor, regression
• User-views are only as flexible as models pre-programmed into the syntax– How does this compare with UDFs, table valued functions?– Is this the appropriate level for this kind of customization?
EN 600.619: Adv. Storage and TP Systems
View Maintenance
• Options– Logical: build results for each query– Materialized: pre-compute all results for each model– Partial/Cached: store results generated by queries– Model-based: often models have fixed costs
• Building basis functions, matrix inversions, linear solutions
• Tradeoff between query latency and overhead• Is implementing model logic at such a low level
reasonable?
EN 600.619: Adv. Storage and TP Systems
Outcomes/Opinions
• Is MauveDB the technology that will make scientists use databases?