Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Cross-Domain Fault Localization: A Case for a Graph Digest Approach
William Fischer Geoffrey Xie Joel Young
Department of Computer Science
Naval Postgraduate School
2
Network Fault Localization
Fault localization: integral part of network management/troubleshooting
Not always easy to locate a fault• Large number of devices• Stale topology databases• Human-introduced errors tough to find
Detect LocateRemedy/
TestFault
Routerlost aroute
Can’t reachweb server/ alarms in
NOC
Find the problem
Add the route back
into the router
3
Cross-Domain Fault Localization
Networks are highly connected• Some faults can affect many domains
• E.g. DNS failure, link congestion• Correlating observations across domains intuitively
increases accuracy of locating these types of faults
Domains can represent fault propagation with a graph• Common Ground
4
Challenges and Assumption
Challenges• Domain managers are reluctant to share internal information
• Topology, state, sensitive properties, etc• May even view other domains as adversaries
• Accuracy versus Privacy: competing goals• Scalability: measured by the size of the aggregated inference graph
Assumption• Domain managers will want to participate if
– Privacy preserved– Fault localization accuracy improved
5
Related Work
Intra-domain: Significant recent advances in fault localization • SHRINK [Kandula et al., 2005] and SCORE [Kompella et al., 2005]
– bipartite causal graph model• Sherlock [Bahl et al., 2007]
– Multi-level causal graph model
Cross-domain: under-researched• End-to-end approach for hierarchical organizations [Steinder et al.,
2008]– Constrained environment
6
Our Graph Digest Approach
What is a graph digest?
A reduction of a fault propagation model (eg causal graph) to a digest
representation of nodes and edges
7
Our Graph Digest Approach
Gj
G1
G21. Fault detected in jth domain, Gj constructed2. Domain j asks for digest from domains 1 and 23. Domains 1 and 2 send digests G1, G2 to domain j4. Domain j imports the digests and runs inference
( )n
i ji j
f G G¹
æ ö= ç ÷è øU U+ +Gj
A general framework for creating and using a digest that explicitly
models the inference accuracy and privacy requirements
8
Performance Criteria
Heart of the approach is quantifiable metrics for accuracy and privacy
Accuracy
Privacy: let S model adversary’s knowledge of sensitive property
| || |d u
d
B Bh
BÇ
=| |
| |d u
u
B Bc
BÇ
=
2 for 0, otherwise 0
h ch c
h ca a* *= + > =
+
2
Pr( | )(( | ), ) Pr( | ) log
Pr( )x X
S x digestKL S digest S S x digest
S xÎ
== =
=å
Bu: Best explanation using undigested graphs
Bd: Best explanation using 1 undigested graph, 1+ digests
9
Practical Privacy Metrics
KL Distance an ideal metric for a privacy criterion, but hard to quantify
In this work we look at protecting a domain against causal graph attacks
From causal graphs, can infer
• In-degree, out-degree, path lengths, reachability, etc
Sample privacy metrics:
• Reachability, number of routers, maximum node degree, diameter
10
IllustrationSHRINK
Assumptions• Bipartite model• Independent SRG failures• No more than 3 simultaneous failures
arg max Pr(< S1,…,Sn > | < L1,…,Lm >)
Adds “noisy” edges to form complete bipartite graph• d = .0001 edge strength for a noisy edge• Subtract d from any edge strength of 1.0
Returns most probable explanation for the observations
<S1,…,Sn>
Pr( | ) .0001j iL S =
Pr( | ) .9999j iL S =
11
IllustrationSHRINK
Topology
R3
R1
R2
O1 O2
R6
R4
R5
F2
F4P2
P3
P1
P4
P5
Domain 2 (Blue Inc.)Customer
Domain 1 (Red Inc.)Provider
F1
F3
Identifiers:R* IP RouterP* Point to Point linkC* Leased Optical CircuitF* Optical FiberO* Optical Switch
C2
C1
C3
12
IllustrationSHRINK
IP View
R3
R1
R2
R6
R4
R5
L2
L3
L1
L7
L8
L5
L6L4
13
IllustrationSHRINK
Provider Causal Graph(with respect to customer)
F1 O1 F2 O2 F3 F4
C1 C2 C3
14
IllustrationSHRINK
Customer Causal Graphwith observation state
R2 P2 R3 P3 C1 C2 C3 R4 P4 R5 P5 R6R1 P1
L3 L4 L5 L6 L8L7L1 L2
ü üüü ü
ü Best explanation
15
IllustrationSHRINK
Nodes R1, P1, P2, P4 pruned
R2 R3 P3 C1 C2 C3 R4 R5 P5 R6
L3 L4 L5 L6 L8L7L1 L2
16
IllustrationSHRINK
Combing all “up” observations
R2 R3 P3 C1 C2 C3 R4 R5 P5 R6
Lu L4 L5 L6
17
IllustrationSHRINK
Aggregating nodes that areindistinguishable in the causal graph
U1 R3 U2 C1 C2 C3 R4 R5
Lu L4 L5 L6
U1 = R2,R6U2 = P3,P5
18
IllustrationSHRINK
Renaming all but the “shared attribute” nodes and Lu
S1 S2 S3 C1 C2 C3 S4 S5
Lu L1 L2 L3
19
IllustrationSHRINK
Multilevel union
F1 O1 F2 O2 F3 F4
S1 S2 S3 C1 C2 C3 S4 S5
Lu L1 L2 L3
20
IllustrationSHRINK
Model Specific Bipartite Union
F1 O1 F2 O2 F3 F4S1 S2 S3 S4 S5
Lu L1 L2 L3
ü
ü Best explanation
21
R1
R3
R2
R4
R6R5
R7
R9
R8
R10
R13
R11
R12
R14
R17
R15
R18
R16
P1
P2P3P4
P5 P6P7
C1
C2
C3
P8
P9 P10
P11P12
P13P14
P15
P20
P18
P22
P19
P17P16
P21Identifiers:R* IP RouterP* Point to Point linkC* Leased Optical Circuit
Continuing WorkCustomer Network Physical Topology
22
R1
R3
R2
R4
R6R5
R7
R9
R8
R10
R13
R11
R12
R14
R17
R15
R18
R16
L1
L3L2L4
L5 L6
L7
L11
L12 L13
L14L15
L16L17
L18
L23
L21
L25
L22
L20L19
L24Identifiers:R* IP RouterL* IP Link
Continuing WorkCustomer Network IP View
L8
L10
L9
L26
L27
L28
VPN
23
Accuracy Results for All Single Failures
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Alpha (harmonic mean)
CD
F Accuracy
Localized the fault 100% of the time
24
Privacy Results for All Single Failures
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Ratio to the true value
CD
F
Total Routers Total Diameter Highest Degree Reachability
25
Adding Scalability Results
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Metrics
CD
F
Total Routers Total Diameter Highest DegreeReachability Accuracy Scalability
26
Summary and Future Work
A general framework for reasoning and discussing cross-domain fault localization
Initial results demonstrating utility of the framework
Future Work:• Validation of the generality for the approach
• Impacts of observation and model errors must be determined
• Privacy protection given a series of digests from the same domain
• Digest-sharing format and strategies must be explored
Thank you!
Questions?