Upload
puneet-khanduri
View
340
Download
5
Embed Size (px)
Citation preview
DiffyAutomatic Testing of Microservices @Twitter
Puneet Khanduri, Arun Kejariwal(@pzdk, @arun_kejariwal)
1
Oct 8, 2014
Twitter, Inc. Down 2% Due To Broken Signup
2
Oct 8, 2014
Twitter, Inc. NOT Down 2% Due To NOT Broken Signup
3
“I just refactored a critical part of my service. How do I know I didn’t break anything?”
- Every Service Developer @ Twitter
4
“They just refactored a critical part of their service. How do I know they didn’t break anything?”
- Every Site Reliability Engineer @ Twitter
5
Tier #0Unit Tests
CostWriting good tests takes 1.5x development time
Limited ScopeTesting classes/methods in isolation
High coverage per testExample: A method has 5 independent code paths
1 unit test => 20% coverage
Tier#0 - Unit Tests Cost
Writing good tests takes ~1.5x of development time
Limited Scope Testing classes/methods in isolation
High Coverage % per Test
e.g. A method has 5 independent code paths => 1 test yields 20% coverage
6
Tier #1Component Tests
CostSame as Unit Tests
Limited ScopeTesting classes/methods in isolation
Low coverage per testCyclomatic complexity is O(kn) - impractical to target 100%
Handpicked test cases
Tier#1 - Component Tests Testing a service in isolation with a fully mocked environment.
Cost of a single test Same as unit tests
Low Coverage% per test
Cyclomatic complexity is O(k^n) - impractical to target 100%
Handpicked test cases e.g. A request path has 6 methods with 5 paths per method => 1 test = 0.03% coverage
7
Tier #1Component Tests
Tier#1 - Component Tests Testing a service in isolation with a fully mocked environment.
Cost of a single test Same as unit tests
Low Coverage% per test
Cyclomatic complexity is O(k^n) - impractical to target 100%
Handpicked test cases e.g. A request path has 6 methods with 5 paths per method => 1 test = 0.03% coverage
Request path with 6 methods and 5 paths per method
1 test => 0.03% coverage
8
Tier #2Integration Tests
CostSame as Unit Tests
+ Amortized cost of a staging environment
Negligible coverage per test Much less than component tests
A request path has 4 services, 6 methods/service, 5 paths/methods
Testing a service and its downstream dependencies in a real (staging) environment
9
Emerging pattern
Super exponential cost of coverage
… emerging pattern ...
super exponential cost of coverage 10
Diffy ApproachHigher coverage for free
11
Diffy Approach
Free test inputs
Sample production traffic or whatever traffic source you prefer
Free assertions
Use “known good” versions of your code to generate assertions
12
What about the noise?
Server generated timestamps
Random number generators
Downstream non-determinism
Race conditions
13
Diffy TopologyDiffy Topology
diffy
secondary
candidate
primary
raw differences
non-deterministic noise
filtered differences
sampled production traffic
14
15
Automation
Compare latest in master against last deploy to production
Automatically deploy master as candidate
Automatically deploy prod tag as primary and secondary
16
Automation (contd.)
Reporting
Diffy e-mails a report with highlighted critical endpoints and fields
Sample requests and response available for further analysis
17
18
Performance Regression
Why is it challenging?
Software New release
Hardware performance Uncontrolled parameter
Makes robust analysis challenging
Large variability across nodes
19
Performance Regression: Diffy Approach
Observation All target service instances see identical load
Key Idea
Discover all performance metrics (thousands of time series)
Compare reference instances to test instances
Report metrics with significant deviations20
Performance Regression (contd.)
Visual analysis: Error proneFalse&nega)ve&
21
Common Statistical Methods
Welch’s t-Test Two sample test
H0: Means of two populations are equal
22
Common Statistical Methods (contd.)
F-Test H0: Means of a set of populations are equal
Two groups F = t2, where t is Student’s statistic
Assumptions Normally distributed populations [1] Equal variance (Homoscedastic) Independent samples
[1] “Power Func/on of the F-‐Test Under Non-‐Normal Situa/ons”, by M. L. Tiku. In Journal of the American Sta2s2cal Associa2on, Vol. 66, No. 336 (Dec., 1971), pp. 913-‐916. 23
Similarity based Match count Longest subsequence based
Clustering k-Means, phased k-Means EM Dynamic clustering k-Mediods Single linkage clustering PCA, SVM
24
Other Previous Work
Common Statistical Methods (contd.)
Diffy Performance TopologyDiffy-Performance Topology
diffy
reference cluster
test cluster
sampled production traffic
classifier
PASSED
IGNORED
FAILED
25
Classifiers
Sample count Minimum number of samples
Relative Threshold Variance within reference vs. distance between reference and test
Absolute Threshold Distance between reference and test vs. median of reference
26
Classifiers (contd.)
MAD Median Absolute Deviation
Robust Statistic
27
Classifiers (contd.)
Ensemble of Composable Classifiers
val classifier = { SampleCountClassifier(40) and (
RelativeThresholdClassifier(50, 0.1) or AbsoluteThresholdClassifier(50, 0.1) or MadClassifier
) }
28
DEMO
29
Open Source (@diffyproject)
Github
https://github.com/twitter/diffy
Blog
https://blog.twitter.com/2015/diffy-testing-services-without-writing-tests
30
31