Upload
phila
View
27
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Black-box Determination of Cost Models’ Parameters for Federated Stream-Processing Systems. Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener. 2011-09-23. IDEAS 2011. Agenda. Problem Statement Calibration of Cost Models Function Approximation - PowerPoint PPT Presentation
Citation preview
Chair for Computer Science 6 (Data Management)Friedrich-Alexander-University of Erlangen-Nuremberg
Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener
2011-09-23
IDEAS 2011
Black-box Determination of Cost Models’ Parameters for
Federated Stream-Processing Systems
2
Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener
Agenda
Problem Statement
Calibration of Cost ModelsFunction Approximation
Estimating the Costs of Single Operators
Evaluation
Summary
Perspective: Cost Estimation for Federated DSMS
2011-09-23
3
Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener
Problem Statement
DSAM: heterogeneous distributed data stream processing
Automatic cost-based query distribution
Problem: hardware and DSMS specific cost models needed
2011-09-23
4
Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener
Things we know a priori
2011-09-23
Operator graph
Topology
Data rates
Selectivity
Distribution of certain values
For some operators: Cost model Calibration of Cost Models
Stream characteristics
5
Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener
Things we do not know a priori
2011-09-23
Hardware and DSMS-specific parameters of cost models
System costs
For some operators: cost model Function approximation
6
Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener
Calibration of Cost Models - Parameter Estimation
Cost model consists ofStream and operator-dependent parameters
Constant values
Hardware/System/Implementation dependent values
Test queries and input streamsDifferent values for the stream and operator dependent parameters
Cost Measurements
Least squares
Outlier detection (e.g. RANSAC)
2011-09-23
7
Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener
Function Approximation – Nonparametric Models
No appropriate cost model Operator without existing cost model
Existing cost models could not be fitted to a specific system
Solution: function approximation
Radial Basis Function Network (RBNF)Function approximation instead of interpolation
Less centers than input points
Moore-Penrose pseudoinverse least squares solution
Improving the function approximationIterative approach
1. Naive function approximation
2. Improving areas of interest (e.g. discontinuities, high gradient)
2011-09-23
8
Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener
Estimating the Costs of Single Operators
Assumptions Only the system costs can be measured
The costs of a single operator are independent of other operators additivity
System costs linear dependent on the number of operators
Parallel instances of the same operator
LatencyParallel operators latency not dependent on the number of operators
Operators have to be connected in series
2011-09-23
9
Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener
Evaluation
Coral 8
Test settingSynthetic input streams with constant properties
(rate, attribute value distribution)
Every test query running for two minutes
The test data collected in the first minute is discarded
Measured valuesLatency
Memory consumption (resident set size)
CPU usage
Coral8 status stream
Input and output rate
Query latency
Application Memory
2011-09-23
10
Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener
Coral8 Measurements
Filter operatorApplication memory
CPU usage
Unexpected behavior: steps and peaks
2011-09-23
11
Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener
Costs of Single Operators
CPU usage linear dependent on the number of operators
Slope equals the costs of a single operator
Operators Operators
2011-09-23
12
Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener
Model Calibration and RBFN
Application memory of the aggregate operator
Left side: Calibrated cost modelLinear cost model
Right side: Function ApproximationAdapts to the steps
2011-09-23
13
Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener
Cost Estimation for Operator Graphs
Operator graph consisting of 100 parallel filter operators
Cost estimation using function approximation
2011-09-23
14
Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener
Summary
Cost estimation for black-box systems without cost estimatorsCalibration of a cost model
Default cost model
System-specific cost model
Function approximation
Calibration of a cost model for unknown systems Behavior conforming to cost model is required
Nonconforming behavior can be detected (automatically) after some measurements
EvaluationCPU usage and memory consumption can be estimated
Latency: Queuing theory
2011-09-23
15
Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener
Application: Cost Estimation for Federated DSMS
Cost formulas as metadataCost formulas containing constants, variables and parameters
Cost estimationHardware-dependent and system-dependent parameters loaded from metadata catalog
Operator-dependent variables by a metadata provider
Stream-dependent variables by a monitoring component or an estimator
Interpreter to calculate costs
AdvantagesBoth default and system specific cost formulas possible
Cost models interchangeable at runtime
2011-09-23
16
Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener
Any questions…?
2011-09-23
17
Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener
Generating Test Data and Test Queries
Identifying parameters
Cost model based Identifying query or stream-dependent parameters
Generating a set of test data for the parameters
Mapping the parameters to the query language and stream properties
Operator or query language basedNo existing cost model
Function approximation
Identifying important parameters based on the query language and possible stream properties
Generating a set of test data
2011-09-23
18
Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Klaus Meyer-Wegener
Problem statement
Global Query Graph
Op1 Op2
Op5
Op3 Op4
Op6
Stream1
Stream2
Node 1
Node 2
Node 3
Distributed Query Processing
Data Rate, Density, Statistics
Out
Data Rate, Density, Statistics
???
??? Relevant metadata about inner streams unknown
???
??????
???
SSDBM 2010
19
Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Klaus Meyer-Wegener
Propagation of Densities
Propagation of input streams‘ statistics
Propagation of statistics for inner streams between operators
Propagation of statistics for output streams
Statistical objective: Attribute Value Distribution (Density)
Analytic Operator ModelAccurate Formulas
Numerical Operator ModelDiscrete Mappings
Training of mapping relation Data Rate, Density, Statistics
OperatorInput-Stream Output-Stream
Operator Model
Data Rate, Density, Statistics
AnalyticOperator
Model
NumericalOperator
Model
SSDBM 2010