Upload
phungngoc
View
216
Download
1
Embed Size (px)
Citation preview
2
Overview of PhD Work
PerformanceProblems
Diagnosis
PLDI-14
SRDS-13, IWPD-16
Detection
PACT-15SRDS-16
Mitigation
Eurosys-16CGO-17
Middleware-14, ICAC-15
3
Performance Problem Diagnosis
PerformanceProblems
Diagnosis
PLDI-14
SRDS-13, IWPD-16
• Diagnosis technique for large-scale parallel-applications
• Compares application progress to find the root-cause• Can handle complex loop nesting structures• Fully dynamic analysis – no source code needed• Highly accurate
4
Performance Problem Detection
PerformanceProblems
Detection
PACT-15SRDS-16
• A tool-chain for detecting performance problems• Gracefully handles unseen inputs• Automatically calibrates detection threshold
according to input-properties• Good accuracy and very low false-alarms
5
Performance Problem Mitigation
Mitigation
Eurosys-16CGO-17
Middleware-14,ICAC-15
• EuroSys: In the context of erasure-coded distributed storage
• CGO: In the context of error-resilient applications
6
Partial-Parallel-Repair (PPR):A Distributed Technique for Repairing
Erasure Coded Storage
Eurosys-2016
Subrata Mitra, Rajesh Panta (At&t), Moo Ryong Ra (At&t), Saurabh Bagchi
7
Need for storage redundancy
Data center storages are frequently affected by unavailability events
• Unplanned unavailability: - Component failures, network congestions, software glitches, power failures
• Planned unavailability: - Software/hardware updates, infrastructure maintenance
How storage redundancy helps ?
• Prevents permanent data loss (Reliability)• Keeps the data accessible to the user (Availability)
8
Replication for storage redundancy
• Keep multiple copies of the data in different machines
Data
• Data is divided in chunks
• Each chunk is replicated multiple times
Replication is not suitable for large amount of data
Erasure coded (EC) storage
• Reed-Solomon (RS) is the most popular coding method
Data
k data chunks m parity chunks
Stripe
Can survive up to m chunk failures
Redundancy method Total storage required ReliabilityTriple Replication 30 TB 2 Failures
RS (k=6, m=3) 15 TB 3 FailuresRS (k=12, m=4) 13.34 TB 4 Failures 9
Example for 10 TB of data
• Erasure coding has much lower storage overhead while providing same or better reliability.
10
The repair problem in EC storage
Crashed New destination
S7S2 S3 S4 S5 S6S1
Data chunks Parity chunks
(4, 2) RS codeChunk size = 256MB
Bottleneck
Network bottleneck slows down the repair process
11
The repair problem in EC storage(2)Repair time in EC is much longer than replication
Redundancy method
Total storage required
Reliability
Triple Replication 30 TB 2 Failures
RS (k=6, m=3) 15 TB 3 Failures
RS (k=12, m=4) 13.33 TB 4 Failures
Example for 10 TB of data
# chunks transferred during
a repair
1
6
12
For a chunk size of 256MBThis would be a 12 x 256 Mega-Bytes (24 Gbits ) of data transfer over a particular link !
12
What triggers a repair ?
• Monitoring process finds unavailable chunks - Regular repairs- Chunk is re-created in a new server
• Client finds missing or corrupted chunks- Degraded reads- Chunk is re-created in the client- On the critical path of the user application
13
Existing solutions• Keep additional parities: Need additional storage Huang et al. (ATC-2012), Sathiamoorthy et al. (VLDB-2013) • Mix of both replication and erasure code: Higher storage overhead
than EC Xia et al. (FAST-2015), Ma et al. (INFOCOM-2013)
• Repair friendly codes: Restricted parameters Khan et al. (FAST-2012), Xiang et al. (SIGMETRICS-2010), Hu et al. (FAST-2012), Rashmi et al. (SIGCOMM-2014)
• Delay repairs: Depends on policy. Immediate repair needed for degraded reads
Silberstein et al. (SYSTOR-2014)
14
8MB 16MB 32MB 64MB 8MB 16MB 32MB 64MB6+3 12+4
0
20
40
60
80
100
120Computation Disk read Network transfer
RS codes
% o
f tot
al re
cons
truc
tion
time
Network transfer time takes up to 94% of the total repair time
Network transfer time dominates
15
Our solution approachWe introduce: Partial Parallel Repair (PPR)
• A distributed repair technique – targeted to reduce network transfer time
• No additional storage
• No restrictions on the code
• Significantly lower repair time
16
Key insight: partial calculationsEncoding Repair
• Equations are associative• Individual terms can be
calculated in parallel
17
Partial Parallel Repair Technique
S7S2 S3 S4 S5 S6S1
Traditional Repair
a2C2 + a3C3 + a4C4+ a5C5
Bottleneck
Partial Parallel Repair
S7S2 S3 S4 S5 S6S1
a2C2 a4C4a2C2 + a3C3 a4C4 + a5C5a2C2 + a3C3+ a4C4 + a5C5
|a2C2 | = |a2C2 + a3C3 |
18
PPR communication patterns
Traditional Repair PPR
Network transfer time O(k) O(log2(k+1))
Repair traffic flow Many to one More evenly distributedAmount of transferred data Same Same
19
PPR
tim
e/ T
radi
tiona
l tim
e
k
1.2
1.0
0.8
0.6
0.4
0.2
0.02 4 6 8 10 12 14 16 18 20
When is PPR most useful ?Network transfer times during repair:
• Traditional RS (k, m) (chunk size / bandwidth) * k• PPR RS (k, m) (chunk size / bandwidth) * ceil( log2(k+1) )
PPR is useful when
• k is large
• Network is the bottleneck
• Chunk size is large
20
Additional benefits of PPR
• Maximum data transferred to/from any node is logarithmically lower
- Implications: Less repair bandwidth reservation per node
• Computation is parallelized across multiple nodes
- Implications: Lower memory footprint per node and computation speedup
• PPR works if encoding/decoding operations are associative
- Implications: Compatible to a wide range of codes including RS, LRC, RS-Hitchhiker, Rotated-RS etc.
21
Can we try to reduce the repair time a bit more ?
• Disk I/O is the second dominant factor in the total repair time• Use caching technique to bypass disk I/O time
chunkID Last access timeC1 t1
chunkID Last access time ServerC1 t1 A
Client
B
A
Repair Manager
Read C1
Read C2C2 in cache
C1 in cache
chunkID Last access timeC2 t2
C2 t2 B
Multiple simultaneous failures
23
Chunk Failures
C1C2C3
Schedule repair Chosen servers
Repair Manager
C2 < , , , >
C1 < , , , >C3 < , , , >
m-PPR: a scheduling mechanism for running multiple PPR based jobs
A greedy approach. Attempts to minimize resource contention
24
Multiple simultaneous failures(2)
Wsrc = a1*hasCache – a2*(#reconstructions) - a3*userLoad
Wdst = -b1*(#repair destinations) – b2*userLoad
• A weight is calculated for each server
• The weights represent the “goodness” of the server for scheduling the next repair
• Best “k” servers are chosen as the source servers. Similarly best destination server is chosen
• All selections are subjected to reliability constraints. E.g. chunks of the same stripe should be in separate failure domains/update domains.
25
Implementation and evaluation
• Implemented on top of Quantcast File System (QFS) - QFS has similar architecture as HDFS.
• Repair Manager implemented inside the Meta Server of QFS
• Evaluated with various coding parameters and chunk sizes
• Evaluated PPR with Reed-Solomon code and two other repair friendly codes (LRC and Rotated-RS)
26
Repair time improvements
• PPR becomes more effective for higher values of “k”• Larger chunk size also gives higher benefits
(6, 3) (8, 3) (10, 4) (12, 4)0
10203040506070
8MB 16MB 32MB 64MB
RS codes
% re
ducti
on in
repa
ir tim
e w
.r.t t
radi
tiona
l rep
air
30% - 60% reduction in repair time
27
Improvements for degraded reads
PPR becomes more effective under constrained network bandwidth
1024 800 600 400 20002468
101214161820 PPR for 6+3 Traditional for 6+3
PPR for 12+4 Traditional for 12+4
Available network b/w (Mbits/sec)
Degr
aded
thro
ughp
ut
(MBy
tes/
sec)
30
Compatibility with existing codes
• PPR on top of LRC (Huang et al. in ATC-2012) provides 19% additional savings
• PPR on top of Rotated Reed-Solomon (Khan et al. in FAST-2012)provides 35% additional savings
RS RS + PPR LRC LRC + PPR Rotated RS Rotated RS + PPR
02
4
68
1012
Repa
ir tim
e (s
ec)
31
Improvements from m-PPR
30 50 100 1500
200
400
600
800
1000
1200Traditional RS repair time PPR repair time
Number of simultaneous failures
Tota
l rep
air ti
me
(sec
)
• m-PPR can reduce repair time by 31%-47%• It’s effectiveness reduces with higher number of simultaneous
failures because overall network transfers are more evenly distributed
32
Summary• Partial Parallel Repair (PPR) a technique for distributing the
repair task over multiple nodes and exploits concurrency
• PPR can reduce the total repair time by up to 60%
• Theoretically, the network transfer time is reduced by a factor of log(k)/k
• PPR is more attractive for higher “k” and higher chunk sizes
• PPR is compatible with any associative erasure codes
33
Phase-Aware Optimization in Approximate Computing
CGO-2017
Subrata Mitra, Manish K. Gupta, Sasa Misailovic (UIUC), Saurabh Bagchi
36
We can do much better
Can tolerate some
imprecisionComputer Vision
Data Analytics
Media Applications
Image Processing
Machine Learning Scientific Simulations
37
0% Quality Loss 5% Quality Loss 10% Quality Loss
Output quality degradation in Sobel
10% Quality loss is nearly indiscernible to the eye andyet provides 57% energy savings
Rahimi et al. DATE-2015
38
Approximate computing:Trade accuracy for energy saving or
computation speedup
Speedup and
Energy Reduction
90%
10%
10x
Accuracy LULESH – A Hydrodynamic Simulation
Adjust knobs to control the approximation-levels of computation
39
Approximate computing:Various prior approachesSoftware
Sage-MICRO-2013, Capri-ASPLOS-2016, Dynamic-Knobs-
ASPLOS-2011
HardwareEsmaeilzadeh-ASPLOS-
2012, Chippa-DAC-2010, Raha-CASES-2014
Compilers / PLAnsel-CGO-2011, PetaBricks-
PLDI-2009, Misailovic-OOPSLA-2014, EnerJ-PLDI-
2011
Input sensitivityAnsel-PLDI-2015, Ding-PLDI-
2015, Laurenzano-PLDI-2016
40
• The general approach has been to have single approximation configuration throughout the entire execution
Assumption of a monolithic execution
Output QualitySpeedup
Kernel Execution
Application Application
43
Application with tunable approximation levels
Loop Perforation: for ( i = 0 ; i < n ; i = i + approx _level ) { result = computeresult( ) ; }Loop Truncation: for ( i = 0 ; i < ( n - approx_ level ) ; i ++) { result = computeresult( ) ; }
Loop Memoization: for ( i = 0 ; i < n ; i ++) { if (0 == i % approx_level ) cached_result = result = computeresult( ) ; else result = cached_result ; }
Parameter Tuning: Tune algorithmic controls exposed by the app.
49
Modeling to capture phase behavior
• Collect training data for different phase-specific approximation setting.
• Build phase-specific speedup and QoS-degradation models using polynomial regression.
• For polynomial regression, the approximation knobs corresponding to different approximation blocks are the inputs and final speedup or QoS degradation are the outputs.
Example: Two approximation blocks with two knobs a1 and a2, Model for speedup with a degree-2 polynomial:
S = c0 + c1a1 + c2a2 + c3(a1)2 + c4(a2)2 + c5a1a2
50
Control-flow path specific modelsApplication speedup / QoS degradation characteristic might change with change in control-flow paths.Example: Changing filter ordering in FFmpeg
Use decision-trees to predict input-parameter dependent control-flow paths
Build speedup or QoS models per unique control-flow paths
51
Finding phase-specific optimization• For a user provided QoS-degradation budget find the best phase-
specific optimization settings.
• First, divide the application into phases and obtain the speedup and QoS characteristics.
• Divide the error budget among the phases in proportion to their “return on investment” (mean speedup over mean error) value.
• Solve a polynomial optimization problem for each phase with the sub error budget as the constraint and find the best approximation settings for that phase.
• Redistributed any unused budget to the remaining phases.
56
Speedup obtained by Opprox
Phase-specific approximation is more attractive when operating under small error budget
High Medium Small
57
Summary• We show when using approximation will boost application
performance, instead of “where” and “how much” we can also control “when” to fine-tune the expected outcome.
• Main computation inside a giant outer-loop which can be divided into “phases” to achieve fine-grained control over when to approximate.
• We present Opprox, a technique to characterize, model and optimize the gains from such phase-specific approximation.
• Opprox is particularly useful compared to traditional methods when operating under low error budget.
Full Papers:1. “Phase-Aware Optimization in Approximate Computing” by S. Mitra, Manish K. Gupta, S. Misailovic, S.
Bagchi, in CGO, 20172. “Partial-parallel-repair (PPR): a distributed technique for repairing erasure coded storage” by S. Mitra, R.
Panta, M-R Ra, S. Bagchi in EuroSys, 2016 3. 3“Sirius: Neural network based probabilistic assertions for detecting silent data corruption in parallel
programs” by T. Thomas, A. J. Bhattad, S. Mitra, S. Bagchi, in SRDS, 20164. “A Study of Failures in Community Clusters: The Case of Conte” by S. Mitra*, S. Javagal*, A. K. Maji, T.
Gamblin, A. Moody, S. Harrell , S. Bagchi, in IWPD@ISSRE, 20165. “Dealing with the Unknown: Resilience to Prediction Errors” by S. Mitra, G. Bronevetsky, S. Javagal, S.
Bagchi, in PACT, 20156. “VIDalizer: An Energy Efficient Video Streamer” by A. Raha*, S. Mitra*, V. Raghunathan, S. Rao in IEEE
WCNC, 20157. “ICE: An Integrated Configuration Engine for Interference Mitigation in Cloud Services” by A. Maji, S. Mitra,
S. Bagchi in ICAC, 20158. "Accurate application progress analysis for large-scale parallel debugging“ by S. Mitra, I. Laguna, D. H. Ahn,
S. Bagchi, M. Schulz, T. Gamblin in PLDI, 20149. “Mitigating Interference in Cloud Services by Middleware Reconfiguration” by A. Maji, S. Mitra, B. Zhou,
S. Bagchi and A. Verma in Middleware, 201410. "Automatic Problem Localization via Multi-dimensional Metric Profiling" by I. Laguna, S. Mitra, F.
Arshad, N. Theera-Ampornpunt, Z. Zhu, S. Bagchi, S. P. Midkiff, M. Kistler, A. Gheith in SRDS, 2013
Posters / Fast Abstracts:11. “Scalable Parallel Debugging via Loop- Aware Progress Dependence Analysis” in ‐ SC, 201312. “Cluster Workload Analytics Revisited” in DSN, 2016
List of all papers during PhD
A big thanks to all the collaborators!
Sasa (UIUC)
Greg (Google)Todd Martin
Ignacio Dong
( LLNL )
Rajesh
Moo-Ryong
( AT&T )
Suhas
Amiya
( Purdue )
And many more …
63
Diagnosis of performance problems at massive scale
"Accurate application progress analysis for large-scale parallel debugging"By: S. Mitra, I. Laguna, D. H. Ahn, S. Bagchi, M. Schulz, T. GamblinIn: Programming Language Design and Implementation (PLDI), 2014
64
Debugging large-scale parallel programs is challenging
Applications run with hundreds of thousands of processes.Inspecting the state of a massive number of threads/processes overwhelms developers.
Serial debugging techniques don’t work.They do not capture communication dependencies between multiple processes.
Most debugging techniques are manual.
Need to design more automatic and scalable debugging tools
http://www.wired.com/2013/01/million-core-supercomputer/
65
An error in a process propagates quickly to all processes
// computation codefor (...) MPI_Send() // computation codefor (...) MPI_Recv() // computation codeMPI_Reduce()// computation codeMPI_Barrier()
Error propagation example
MPI is widely used in large-scale HPC applications.Processes communicate among them to compute the solution of a problem.
MPI processes are tightly coupled.A process needs to receive data from another process to make progress.
Error here
Some processes wait
here
All other processes wait
here
Hangs and slow execution are common bug manifestations
66
Finding the least-progressed (LP) task often helps to identify the root cause of bugs
Static analysis technique Temporal ordering of tasks Identifies loop order variables (LOV) But, not all LOV can be identified
STAT [SC, 2009]
Probabilistic technique Captures control-flow via Markov model But, cannot infer progress dependencies within loops
Tools to Analyze the Progress of Tasks
Task A
Task B Task C
Task D
DependenceDependence
Dependence
Least-progressed task
[PACT, 2012]
67
PRODOMETER: A Loop Aware Progress Dependency Analysis tool
• Purely based on dynamic analysis of application’s execution control flow
• Control flow is summarized as a Markov Model (same as AutomaDeD)
• Overcomes the limitations of AutomaDeD by resolving progress dependency within loops
68
Each MPI Task is Modeled as a Markov Model
foo() { MPI_gather( ) // Computation code for (…) { // Computation code MPI_Send( ) // Computation code MPI_Recv( ) // Computation code }
Sample code
MPI_Gather
MPI_Send
MPI_Recv
1.0
1.0 0.7
0.3
0.75
Markov Model
MPI calls wrappers:- Gather call stack- Create states in the model
Nodes represent execution state before and after MPI calls
69
• The progress of the application is monitored by a helper thread
• When a hang (or slow code region) is detected, it freezes the markov model and starts the analysis phase
• Different tasks/processes wait at different nodes of the Markov model
Workflow of Prodometer
70
Probabilistic inference of Progress-Dependence Graph
1
2
3
4
5 7
6
8
9
10
Sample Markov Model
1.00.3 0.7
1.0
1.01.0
1.0
1.0
0.91.0
0.1
1.0…
…
Probability(3 -> 5) = 1.0Probability(5 -> 3) = 0
Task C is likely waiting for task B(A task in 3 always reaches 5)
C has progressed further than B
Progress dependence between tasks B and C?
Task C
Task D
Task A
Task B
Task E
71
Progress dependencies cannot be identified within loops
Dependence between tasks C and E?
Probability(7 → 5) = 1.0Probability(5 → 7) = 0.9
What task has made more progress??
1
2
3
4
5 7
6
8
9
10
Sample Markov Model
1.00.3 0.7
1.0
1.01.0
1.0
1.0
0.91.0
0.1
1.0…
…
Task C
Task D
Task A
Task B
Task E
72
Infer loop iteration counts from edge transition counts
A
B
C
Markov model
P=0.5
Transition probability
C = 321Edge
transitions
Creating Markov model per process
A
B
C
Task 1
A
B D
Task 2
A
B
C
Tasks 1-2
D
Markov models from all tasks are merged.New edges may be crated.
Task 1
Task 2Task 3
Task N…
We use a binomial-tree reduction for scalable model merging.
Complexity: O(log #models)
PDG analysis is done in a single task.
12
3
ADD TRANSITION COUNTS ON THE EDGES
73
Loop characteristic edge to identify loop iterations
A
B
D
C
E
F
Task x
1
2
5
43
1
1
This model contains two loops:A – B – C – D – E – F – AB – C – D – B
Transition counts may belong to multiple nested loops.Counts are the sum of several loops.
How do we identify the number of iterations per loop?
Characteristic edge:The edge that is not part of any other loop.
We use the backedge as the characteristic edge for a loop
Transition counts
. Loops are reducible (e.g., code does not use “goto” statements)
Backedge
Assumptions
74
Lexicographic comparison of nested loops
Task Y
Task X
Iterations for X = 200Iterations for Y = 100
Iterations for X = 50Iterations for Y = 60
Lexicographical order: in the order from outer to inner loop
Task X has made more progress on the inner loop (L2) than task Y.But Task Y has made more progress on the outer loop (L1).Task Y has made more progress.
L1
A
B
D
C
E
F
L2
In L2:
In L1:
75
Fault Injection in six HPC benchmarks
HPC benchmarks: AMG, LAMMPS, IRS, LULESH, BT, SP
Fault injections in: Random MPI process Random function call
We only inject inside loops.HPC applications spend most of its time (>90%) inside loops.
Experimental runs use 128, 256, and 512 MPI processes.
76
Metrics to compare the performance of the tools
Accuracy:the fraction of cases that a tool correctly identifies the Least-Progressed (LP) tasks
Precision:the fraction of the identified LP tasks that are actually where the fault was injected
Evaluated tools:(a) Prodometer(b) AutomaDeD
EXAMPLES
Task 1
Task 2
Task 3
Task 4
HangLP tasks given by the tool
Is the tool accurate?
Precision
Case 1 Case 2
(2) (2, 3)
100% 50%
Yes Yes
77
Accuracy and precision results
Accuracy
PRODOMETER (PR), AUTOMADED (AU)
Precision
Accuracy in PRODOMETER is on average 93%, versus 64% in AUTOMADED
Precision is always higher in PRODOMETER than in AUTOMADED
79
It takes only a few seconds for PRODOMETER to perform the analysis with thousands of tasks
Application slowdown is between 1.3 and 2.4
512 1024 2048 4096 8192 163840
2
4
6
8
10
12
14
16
18Analysis timeAggregation time
Processes
Tim
e (s
econ
ds)
512 1024 2048 4096 8192 163840
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6Analysis timeAggregation time
ProcessesTi
me
(sec
onds
)
Application 1: AMG Application 2: LULESH
Scalability and slowdown results
82
Input aware performance anomaly detection
“Dealing with the Unknown: Resilience to Prediction Errors” By: S. Mitra, G. Bronevetsky, S. Javagal, S. BagchiSubmitted to: PACT 2015
83
Complex software have too many factors influencing its performance
• Factors = configuration parameters, execution environment, input data
• Not possible to test all the combinations
• When performance degrades, hard to understand whether it was expected or something went wrong
84
A performance bug hiding behind an untested value of a command-line parameter
doMoreCalculations is an expensive routine. Invoking it 10,000 times will have serious performance impact
85
Statistical models are often created to predicted the performance of an application. But ...
• Many configuration parameters, each taking a range of values – almost impossible to cover all combinations
• Performance changes with size of the input• Performance changes with characteristics of the input
- density of a graph- sparsity of a matrix
• Often model is created with limited training runs. Parameters/inputs used in production are drastically different
Errors in performance prediction models must be characterized
86
Systematic characterization of prediction errors are useful in many scenarios
• Scheduler might consider it while predicting execution time and resource usage
• In approximate computing, such error characteristics might guide the decision of replacing actual code regions by prediction models for speed up and reduced energy consumption
• An anomaly detection tool may use it when distinguishing normal behavior from anomalous behavior to reduce false alarms
87
Tool chain: Sight – Sculptor – E-Analyzer - Guardian• We have proposed a technique for characterizing prediction errors in performance
models
• Built a tool chain for input-aware anomaly detection at production
88
SIGHT: The instrumentation and data collection framework
• Simple APIs for annotating code regions to model
• Track input features and observation features of each module
• Input features = config parameters, properties of the data, size, number of threads, processes etc.
Examples: Sparsity of matrix in SpMV, bit-rate in FFMpeg, # threads in LINPACK
• Observation features = execution time, total number of instructions, load instructions, cache miss rate, residue value etc.
90
SCULPTOR: The modeling tool
• Create models for code regions identified by the developers
• Choose the most useful input and observation features
- Use a maximal information coefficient (MIC) based analysis. “Detecting Novel Associations in Large Data Sets” [Science, Reshef et al. Dec, 2011]
- Identifies if any relationship exists between two variables
- Works even for non-linear relationships
• Model using polynomial regression up to degree 3
93
E-ANALYZER: The error characterization tool
• Prediction models are not perfect
• Interpolation error
• Extrapolation error
94
How to calculate the distance ?
Example: Trained with two input features: f1, ranging from 0-1000 and f2 ranging from 0-10. • Now how far is the production point (f1=2000, f2=11) ?• Which one is closer to training region (f1=2000, f2=11) or (f1=1100, f2=20) ?
Practical limitations:• Individual input features can not be normalized as the
valid ranges are not known (no a priori bound: e.g. size)
95
Our proposed solution:
• Consider one input feature dimension at a time and create a error profile
• Combine these individual profiles (at production) to estimate overall error
• Distance along one input feature dimension = (how many std. deviations away is the production point from the mean of the training set)
96
Input features: f1, f2 Observation features: T1
% error
Δ f1=
% error
Sort data w.r.t f1, train with first few, predict for the rest -> get error profile w.r.t. f1
Sort data w.r.t f2, train with first few, predict for the rest -> error profile w.r.t. f2
To estimate overall error at a production point (f1 = X1, f2= X2):
Combine the projected errors from feature specific error profiles
Δ f2=
Curve-fit percentage of error with respect to distance
97
Combine these individual error components ?
• Errors might be correlated, simple RMS formula would not work
• We calculate overall error as:
√ [𝐸 𝑓 1 ] 2+ [ (1−𝑚𝑓 1 𝑓 2 ) 𝐸 𝑓 2 ]2+ .. .
Ef1, Ef2 are the extrapolation errors due to f1, f2 at the production pointmf1f2 is the MIC (correlation) of extrapolation errors due to f1 and f2
• When errors are uncorrelated (i.e., mf1f2 = 0.0):
• When errors are fully correlated (i.e., mf1f2 = 1.0):
= RMS error
= Error coming from any one component
99
GUARDIAN: The anomaly detection tool
• During production runs, calculates a probability of anomaly for each observation feature in a code-region
How extrapolation and interpolation error fit together ?
100
Experiment: False positive rates
Lower is better
% o
f tim
e fa
lse a
larm
raise
d
GUARDIAN A : Only uses extrapolation error characteristicsGUARDIAN B: Full functionality, interpolation error distribution on top of extrapolation error
101
Experiment: Detection accuracyHigher is better
injected extra computation loop
Another experiment: injected extra code that allocates and reads randomly and multiple times from an arrayResults are similar
GUARDIAN A : Only uses extrapolation error characteristicsGUARDIAN B: Full functionality, interpolation error distribution on top of extrapolation error
% o
f tim
e de
tect
ed