Transcript
Page 1: EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems

MauveDB: Model-based User Views

Page 2: EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems

Problem

• Databases are unusable for scientific data– Data are incomplete, imprecise, and erroneous– Need to be filtered/synthesized using models

• Scientists use the in the most rudimentary ways– As a backing store for raw data– Run few or no queries

• User-define functions are inadequate– Static models, insufficient for many applications– Let’s discuss this later?

Page 3: EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems

Approach

• Define user-views based on a model syntax– Extend traditional SQL-view model

• User views provide access to synthesized data– Data independence

• Present stable view of system– When sites don’t report data (missing values)– When network changes– Report data at different locations than sampled

• View maintenance– Issues of whether to materialize or not

Page 4: EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems

Processing Scientific Data

• Without Model-based views– Export to Matlab then apply models– Use custom, programmatic querying tools– Can’t use SQL

• Getting data back into database is awkward and inefficient

• With Model-based views– Self-updating models as data changes– Standard SQL data against synthesized data

Page 5: EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems

Example

• Benefits– Network changes are transparent– Spatial or temporal biases removed (e.g., for aggregates)

• What about model errors?

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 6: EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems

Architecture

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 7: EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems

View Creation: Regression

• Select a virtual grid on which data are reported– Using MatLab style syntax

• Create a unique model at each time T

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 8: EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems

View Creation: Interpolation

• Interpolate missing values from nearby sites

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 9: EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems

Case Study 1: Temp Regression

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 10: EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems

Case Study 2: Temp Interpolation

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 11: EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems

The AS Clause

• AS clause specifies each model– AS FIT– AS INTERPOLATE

• Probably needs extended syntax for models methods– INTERPOLATE with splines, nearest neighbor, regression

• User-views are only as flexible as models pre-programmed into the syntax– How does this compare with UDFs, table valued functions?– Is this the appropriate level for this kind of customization?

Page 12: EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems

View Maintenance

• Options– Logical: build results for each query– Materialized: pre-compute all results for each model– Partial/Cached: store results generated by queries– Model-based: often models have fixed costs

• Building basis functions, matrix inversions, linear solutions

• Tradeoff between query latency and overhead• Is implementing model logic at such a low level

reasonable?

Page 13: EN 600.619: Adv. Storage and TP Systems MauveDB: Model-based User Views

EN 600.619: Adv. Storage and TP Systems

Outcomes/Opinions

• Is MauveDB the technology that will make scientists use databases?


Recommended