13
EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

Embed Size (px)

DESCRIPTION

EN : Adv. Storage and TP Systems Approach Define user-views based on a model syntax –Extend traditional SQL-view model User views provide access to synthesized data –Data independence Present stable view of system –When sites don’t report data (missing values) –When network changes –Report data at different locations than sampled View maintenance –Issues of whether to materialize or not

Citation preview

Page 1: EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems

MauveDB: Model-based User Views

Page 2: EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems

Problem

• Databases are unusable for scientific data– Data are incomplete, imprecise, and erroneous– Need to be filtered/synthesized using models

• Scientists use the in the most rudimentary ways– As a backing store for raw data– Run few or no queries

• User-define functions are inadequate– Static models, insufficient for many applications– Let’s discuss this later?

Page 3: EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems

Approach

• Define user-views based on a model syntax– Extend traditional SQL-view model

• User views provide access to synthesized data– Data independence

• Present stable view of system– When sites don’t report data (missing values)– When network changes– Report data at different locations than sampled

• View maintenance– Issues of whether to materialize or not

Page 4: EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems

Processing Scientific Data

• Without Model-based views– Export to Matlab then apply models– Use custom, programmatic querying tools– Can’t use SQL

• Getting data back into database is awkward and inefficient

• With Model-based views– Self-updating models as data changes– Standard SQL data against synthesized data

Page 5: EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems

Example

• Benefits– Network changes are transparent– Spatial or temporal biases removed (e.g., for aggregates)

• What about model errors?

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 6: EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems

Architecture

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 7: EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems

View Creation: Regression

• Select a virtual grid on which data are reported– Using MatLab style syntax

• Create a unique model at each time T

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 8: EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems

View Creation: Interpolation

• Interpolate missing values from nearby sites

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 9: EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems

Case Study 1: Temp Regression

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 10: EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems

Case Study 2: Temp Interpolation

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 11: EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems

The AS Clause

• AS clause specifies each model– AS FIT– AS INTERPOLATE

• Probably needs extended syntax for models methods– INTERPOLATE with splines, nearest neighbor, regression

• User-views are only as flexible as models pre-programmed into the syntax– How does this compare with UDFs, table valued functions?– Is this the appropriate level for this kind of customization?

Page 12: EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems

View Maintenance

• Options– Logical: build results for each query– Materialized: pre-compute all results for each model– Partial/Cached: store results generated by queries– Model-based: often models have fixed costs

• Building basis functions, matrix inversions, linear solutions

• Tradeoff between query latency and overhead• Is implementing model logic at such a low level

reasonable?

Page 13: EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems

Outcomes/Opinions

• Is MauveDB the technology that will make scientists use databases?