Upload
ontico
View
1.360
Download
5
Embed Size (px)
DESCRIPTION
HighLoad++ 2013
Citation preview
Contents
● Real Data and Ideal Models● Load Testing (Tuning)● Production Monitoring● Correlation● Tools
Real Data vs. Ideal Models
● noise (human actions)● outliers● missing data● different resolutions● counter update frequencies● quantization● not Gaussian and not random walk● what is normal for the system?
Load Testing (Tuning)
● goal● beware of transient response● find failure● filter data● find bottleneck and fix● rinse and repeat
Filtration
● constants● index of dispersion (sd/mean)● apply system knowledge
– tasks migrated by scheduler– dependent (disk used/free)– interface traffic < 10 packets/s– load average < 0.5– …
Production Classics
● Control charts– fixed window moving average (MA)– exponentially weighted moving average (EWMA)
● Holt-Winters
Holt-Winters
triple exponential smoothing● needs a lot of data● sensitive to outliers● can't handle 3 seasons + holidays● overfitting
Autocorrelation
Ljung-Box Test● non-stationary● mean shift● trends● seasonal● periodic (cron jobs, sampling)● aggregated (MA, EWMA)
2-Sample Tests: Good
Kolmogorov–Smirnov, Cramér–von Mises● good for request size and latency (unaggregated)● work on periodic data● outlier resistant● good for data exploration
2-Sample Tests: Bad
Kolmogorov–Smirnov, Cramér–von Mises● false positives on trends and seasonal changes● need many unique values● computational complexity● bad for alerting
Finding Similar Graphs
● correlation (Pearson, Spearman)● Euclidean distance● dynamic time warping (DTW)● discrete Fourier transform (DFT)● discrete wavelet transform (DWT)
Clustering
● non-euclidean (ultrametric) space● many small clusters● local clustering around events● false positives
– cron jobs (log rotation)– human actions (restarts, reconfigurations)– cache expirations– …
Radd.smooth <- function(m) { r <- nrow(m) ms <- sapply(m, function(y) { ave(coredata(y), seq.int(r) %/% max(3, r %/% 150), FUN=function(x) {mean(x, na.rm=T)}) }) df <- data.frame(index(m)[rep.int(1:r, ncol(m))], factor(rep(1:ncol(m), each = r), levels = 1:ncol(m)), as.vector(coredata(m)), as.vector(coredata(ms))) names(df) <- c("Index", "Series", "Value", "Smooth") df}
Skyline Algorithms
● median absolute deviation● grubbs● first hour average● stddev from average● stddev from moving average
● mean subtraction cumulation● least squares● histogram bins● ks test● second order anomalies
Oculus Internals
● Skyline Import Script and Cronjob● Resque workers● ElasticSearch● Sinatra (Ruby) Web App