Upload
kaushalya-madhawa
View
48
Download
4
Embed Size (px)
Citation preview
ABRA: APPROXIMATING BETWEENNESS CENTRALITY IN STATIC AND DYNAMIC GRAPHS WITH RADEMACHER AVERAGES
Matteo Riondata and Eli Upfal 22nd ACM SIGKDD Conference, August 2016
1
Murata Lab - Paper reading seminar
Presented by: Kaushalya Madhawa (25th November 2016)
OUTLINE1. INTRODUCTION
2. RANDOM SAMPLING FOR APPROXIMATIONS
3. STATISTICAL LEARNING THEORY
‣ representativeness of a sample
‣ Rademacher averages
4. EXPERIMENTS AND RESULTS
2
BETWEENNESS CENTRALITY (BC)▸ unweighted graph G = (V, E)
▸ n = |V|, m = |E|
3
b(w) = 1|V | (|V | −1)
∑(u ,v)∈VXVσ uv(w)σ uv
W
V
σ uv(w) - number of shortest paths from u to v passing through w U
BETWEENNESS CENTRALITY (BC)▸ unweighted graph G = (V, E)
▸ n = |V|, m = |E|
▸ fastest exact betweenness calculation algorithm runs in O(nm) [Brandes 2001]
▸ requires O(n+m) space
4
b(w) = 1|V | (|V | −1)
∑(u ,v)∈VXVσ uv(w)σ uv
W
V
σ uv(w) - number of shortest paths from u to v passing through w U
▸ these methods are based on random sampling to estimate betweenness centrality with an acceptable accuracy
▸ problem definition
▸ given ε, δ ∈ (0, 1), an (ε, δ) approximation to B is a collection such that
APPROXIMATE BC FOR LARGE NETWORKS 5
CONTRIBUTIONS OF THIS PAPER
▸ progressive sampling based BC approximation within ε additive factor
▸ first BC approximation algorithm to estimate BC without depending on any global property of the graph
▸ ie: RK algorithm [Riandato and Karnopoulis 2016] depends on Vertex diameter of the graph
6
RANDOM SAMPLING TO APPROXIMATE BETWEENNESS 7
PROGRESSIVE SAMPLING 8
PROGRESSIVE SAMPLING▸ What is a good stopping condition?
▸ guarantees that the computed approximation fulfills the desired quality properties
▸ can be evaluated efficiently
▸ is tight (satisfied at small sample sizes)
▸ Determining sampling schedule
▸ minimize the number of iterations that are needed before the stopping condition is satisfied
9
RECAP OF STATISTICAL LEARNING THEORY
▸ A training set S is called (w.r.t. domain Z , hypothesis class H , loss function l , and distribution D ) if
▸ representativeness of sample S with respect to F is defined as the largest gap between the true error of a function f and its empirical error
10
ε − representative
suph∈H
| LD (h)− LS (h) | ≤ ε
LD ( f ) = EZ~D[ f (z)] LS ( f ) =1m
fi=1
m
∑ (zi )
RepD (F,S) = supf∈F(LD ( f )− LS ( f ))
given f ∈F,
REPRESENTATIVENESS OF A SAMPLE▸ how to estimate representative of S using a single sample?
11
S =
S = supf∈F(LS1 ( f )− LS2 ( f ))
S = 2msupf∈F
σ ii=1
m
∑ f (zi )
σ = (σ 1,..,σ m )∈{±1}m
RADEMACHER AVERAGE 12
‣ Rademacher complexity measure captures this idea by considering the expectation of the above with respect to a random choice of σ
F°S = {( f (z1),...., f (zm )) : f ∈F}
R(F°S) = 1mEσ ~{±1}[sup
f∈Fσ i
i=1
m
∑ f (zi )] σ be distributed i.i.d. according to P[i = 1] = P[i = 1] = 0.5
LD ( f )− LS ( f ) ≤ 2E ′S ~DmR(F° ′S )+ c 2ln(2 /δ )m
BACK TO BC‣ for each node w, is the fraction of shortest paths from u
to v going through w
13
fw (u,v)
LD ( fw ) =1|D |
σ uv(w)σuv(u ,v)∈VXV ,u≠v
∑ = b(w)
RADEMACHER AVERAGE: HOW TO CALCULATE?
▸ calculation is not straightforward and can be time consuming
▸ an upper bound to the Rademacher average is used in place of
14
R(F°S) = 1mEσ ~{±1}[sup
f∈Fσ i
i=1
m
∑ f (zi )]
R(F°S) ≤mins∈!+ω (s)
ω (s) = 1sln v∈υs
e∑ xp(s2 || v ||2 /(2m2 ))
vw = ( fw (u1,v1),..., fw (um ,vm ))
ν s = {vw ,w∈V} (|ν s |≤|V |)
R(F°S)
STOPPING CONDITION OF BC CALCULATION
▸ a tighter upper bound to maximum deviation average calculated [Oneto 2013]
15
Δ s =ω *
1−α+ ln(2 /δ )2lα (1−α )
+ ln(2 /δ )2m
Δ s ≤ ε
α = ln(2 /δ )ln(2 /δ )+ (2lR(F°S)+ ln(2 /δ ))ln(2 /δ )
‣ when this holds collection is returned
SAMPLING SCHEDULE▸ initial sample size determined by
▸ next sample size ( ) is calculated assuming that , which is and upper bound to is also an upper bound to
16
R(F°Si )
R(F°Si+1)
Si+1
DYNAMIC GRAPH BC APPROXIMATION (ABRA-D)▸ vertex and edge insertions and deletions allowed
▸ two data structures introduced by Hayashi et al (2015) used
▸ Hypergraph sketch: weighted hyper edge representation of shortest paths
▸ Two-ball index: to efficiently detect the parts of the Hypergraph sketch that need to be modified
17
EXPERIMENTAL EVALUATION
▸ performance measured using
▸ runtime
▸ sample size
▸ accuracy
▸ algorithms compared
▸ BA [Brandes 2001] - exact algorithm
▸ RK [Riondato and Kornaropoulos 2016]
18
EXPERIMENTAL RESULTS▸ δ is is fixed to 0.1 ▸ given the logarithmic dependence of the sample size on
δ, impact on the results is limited
19
REFERENCES[1] U. Brandes. A faster algorithm for betweenness centrality. J. Math. Sociol., 25(2):163–177, 2001. doi: 10.1080/0022250X.2001.9990249
[2] M. Riondato and E. M. Kornaropoulos. Fast approximation of betweenness centrality through sampling. Data Mining and Knowledge Discovery, 30(2):438–475, 2015. ISSN 1573-756X. doi: 10.1007/s10618-015-0423-0.
[3] T. Hayashi, T. Akiba, and Y. Yoshida. Fully dynamic betweenness centrality maintenance on massive networks. Proceedings of the VLDB Endowment, 9(2), 2015
[4] L. Oneto, A. Ghio, D. Anguita, and S. Ridella. An improved analysis of the Rademacher data-dependent bound using its self bounding property. Neural Networks, 44:107–111, 2013.
20