48
Construction of Near-optimal Vertex Clique Covering for Real-world Networks David Chalupa Institute of Applied Informatics Faculty of Informatics and Information Technologies Slovak University of Technology chalupa@fiit.stuba.sk October 14, 2013 David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 1 / 48

Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Construction of Near-optimal Vertex Clique Coveringfor Real-world Networks

David Chalupa

Institute of Applied InformaticsFaculty of Informatics and Information Technologies

Slovak University of Technology

[email protected]

October 14, 2013

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 1 / 48

Page 2: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Overview

the vertex clique covering problem (CCP)

some properties of real-world networks

motivation and relations to different fields

the approach: iterated greedy (IG) clique covering;randomized local search (RLS) for maximum independent set

experimental results

(a sketch of a few) theoretical results

conclusions and discussion

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 2 / 48

Page 3: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Clique Covering and Community Detection

Figure : Clique covering and clustering: Illustration on a small social network.

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 3 / 48

Page 4: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

The (Vertex) Clique Covering Problem (CCP) - Illustration

Figure : Two solutions to CCP in a small sparse uniform random graph (on theleft) and in a small sample of a social network (on the right).

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 4 / 48

Page 5: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

The (Vertex) Clique Covering Problem (CCP) - Definition

objective: minimize k ≤ |V |, such that there are pairwise disjointclasses V1,V2, ...,Vk ⊂ V , which:- cover the whole vertex set, i.e. V1 ∪ V2 ∪ ... ∪ Vk = V and- induce cliques, i.e. ∀i = 1..k d(G (Vi )) = 1,

where d(G ) = 2|E ||V |(|V |−1) is the density of G

CCP is NP-hard : the k-fixed decision problem - NP-complete (Karp,1972)

equivalency : clique covering of G with k cliques - graph coloring of Gwith k colors

similarity to max clique / independent set: adaptations of heuristicsbetween these problems were proposed in the past (Gendreau et al,1993)

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 5 / 48

Page 6: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Clique Covering Problem (CCP)

Minimizing the number of partitions under the assumption that thepartitions induce cliques.

Exact solution is possible, although computationally intensealgorithms are needed (Karp, 1972).

However, does it really hold that CCP in social networks is as hard asfor general graphs?Maybe not...

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 6 / 48

Page 7: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Overview

the vertex clique covering problem (CCP)

some properties of real-world networks

motivation and relations to different fields

the approach: iterated greedy (IG) clique covering;randomized local search (RLS) for maximum independent set

experimental results

(a sketch of a few) theoretical results

conclusions and discussion

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 7 / 48

Page 8: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Sparseness / Asymptotical Sparsenessa class of graphs is called asymptotically sparse if and only if for nvertices and m(n) edges (as a function of the number of vertices), itholds that:m(n) ≺

(n2

)≡ δ(n) ≺ n,

where δ(n) = 2m(n)/n is the average degree of a vertex

2.5

3

3.5

4

4.5

5

5.5

0 5000 10000 15000 20000

δ(n)

-0.01

0

0.01

0.02

0.03

0.04

0.05

0 5000 10000 15000 20000

δ'(n)

Figure : The average degree δ(n) in a growing sample from a Slovak socialnetwork and its difference function δ′(n).

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 8 / 48

Page 9: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Degree Distribution / Scale-free Structuredegree distribution P(k): the fraction of vertices in network withdegree kscale-free network : a network, where it holds that P(k) ∼ k−γ ,where γ is a coefficient of steepness of the distributionhigher γ means that the network is sparser; for many real-worldnetworks: γ ∈ [2, 3]

10-4

10-3

10-2

10-1

100

100 101 102 103

BA2_10000 (degree distribution)

10-5

10-4

10-3

10-2

10-1

100

100 101 102 103 104

as-22july06 (degree distribution)

Figure : Degree distributions for a artificial scale-free network (on the left)and a snapshot of the Internet (on the right).

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 9 / 48

Page 10: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Overview

the vertex clique covering problem (CCP)

some properties of real-world networks

motivation and relations to different fields

the approach: iterated greedy (IG) clique covering;randomized local search (RLS) for maximum independent set

experimental results

(a sketch of a few) theoretical results

conclusions and discussion

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 10 / 48

Page 11: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Motivation and Applications

data mining (Sun et al., 2008) and web mining (Tang et al., 2011)

research citation network analysis (Sun et al., 2008)

protein interaction and gene regulatory networks in bioinformatics(Gao et al., 2009); (Boyer et al., 2005)

analysis of terrorist organization networks (Patillo et al., 2012)

infectious diseases epidemiology (Rothenberg et al., 1996)

scheduling and timetabling (Burke et al., 2007)

frequency assignment in mobile radio networks (Smith et al., 1998)

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 11 / 48

Page 12: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Relation to Different Fields

Artificial Intelligence:designing / choosing an efficient heuristic

Graph Mining:knowledge discovery from raw network data

Theoretical Computer Science:understanding how well (badly) the algorithm performs (and why)

Statistical Mechanics:understanding the relation to how the network was created

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 12 / 48

Page 13: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Currently Available Algorithms: Graph Coloring Algorithms

Brelaz’s graph coloring heuristic (Brelaz, 1979) - generally a goodtradeoff quality ↔ speedO(n2) time; O(n2) space complexity

Leighton’s graph coloring heuristic (Leighton, 1979) - more suitablefor some graph classesO(n3) time; O(n2) space complexity

Culberson and Luo’s iterated greedy heuristic (Culberson and Luo,1996) - repeated construction of solutions by a greedy algorithmcombined with a stochastic improvement mechanismO((n2 −m)) time per iteration; O(n2) space complexity1

1The time and space complexities of all these algorithms already take into accountthat CCP is equivalent to graph coloring of complementary graph (i.e. CCP for a sparsegraph leads to coloring of a dense graph).

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 13 / 48

Page 14: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Overview

the vertex clique covering problem (CCP)

some properties of real-world networks

motivation and relations to different fields

the approach: iterated greedy (IG) clique covering;randomized local search (RLS) for maximum independent set

experimental results

(a sketch of a few) theoretical results

conclusions and discussion

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 14 / 48

Page 15: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Greedy Clique Covering (GCC)

not just a greedy algorithm - also a “genotype-phenotype mapping”

permutation of vertices → clique covering

let Γ(v , c) be the number of neighbors of v with label c

if Γ(v , c) = |Vc |, i.e. all vertices in clique G (Vc ) are neighbors of v ,then we can put v into this clique

if there are more suitable cliques, we choose the one with minimum c- this is called the First Fit strategy (Welsh and Powell, 1967)

O(m) time; O(n) space complexity

GFigure : Illustration graph for greedy clique covering (GCC).

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 15 / 48

Page 16: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Block-based Mutation Operators and Iterated Greedy

block-based property: If we put k cliques as blocks in permutationand run GCC once again, then we obtain at most k cliques(Culberson and Luo, 1996); (Chalupa, 2012)

block-based mutation: We shuffle the blocks and re-run GCC.Such an algorithm behaves like typical local search

every shuffling operation is equivalent to a sequence of consecutiveblock jump operations

motivation: Great experimental results on real-world networks(especially social networks)

Figure : Illustration of the block jump(j , 1,P) operator.

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 16 / 48

Page 17: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Iterated Greedy

Algorithm 1: The IG Algorithm for CCPThe IG Algorithm for CCP

Input: graph G = [V ,E ]Output: clique covering S of G

1 P = random permutation(1, 2, ..., |V |)2 while stopping criterion is not met3 [V1,V2, ...,Vk ] = greedy clique covering(G ,P)4 if ϑ∗(G ) is known and k = ϑ∗(G )5 return S = V1,V2, ...,Vk6 P = [V1,V2, ...,Vk ]7 P = random permutation(V1,V2, ...,Vk )8 return S = V1,V2, ...,Vk

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 17 / 48

Page 18: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Iterated Greedy: An Interlude with Experimental Results

Table : The comparison of the approximations of ϑ(G ) = χ(G ) for each graphobtained by the Brelaz’s heuristic (BRE), saturation-based GCC (SAT-GCC) andthe IG heuristic with GCC (IG-GCC).

G BRE SAT-GCC IG-GCC

Erdos-Renyi uniform random graphsunif 1000 0.1 299 310 243unif 5000 0.1 1241 1288 1066unif 10000 0.1 2326 2389 2025unif 20000 0.01 7640 7817 6387

Leighton graphs from DIMACS instances.le450 15a 85 89 80le450 15b 92 90 82le450 15c 68 74 57le450 15d 73 73 57le450 25a 91 92 91le450 25b 81 82 80le450 25c 61 59 54le450 25d 60 59 51

Social graphssoc2000 1471 1473 1471soc10000 6619 6633 6618soc20000 12770 12804 12764

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 18 / 48

Page 19: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Lower Bounds for Clique Covering Number

Lemma. Let G be an undirected graph with minimum degree δmin(G ),clique covering number ϑ(G ), maximum independent set size α(G ) andmaximum clique size ω(G ). Then, ϑ(G ) is bounded in the following way:

max

α(G ),

|V |ω(G )

≤ ϑ(G ) ≤ |V | − δmin(G ). (1)

More generally:

αL(G ) ≤ ϑ(G ) ≤ ϑU(G ). (2)

In practice αL(G ) will be a better upper bound. In the context of socialnetworks, it is the size of some large groups of people, where nobodyknows nobody.

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 19 / 48

Page 20: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Randomized Local Search (RLS) for MaximumIndependent Set

permutation of vertices → independent set (IS)

we begin with independent set S = ∅in each iteration, we take the next vertex from the permutation...

... and add it to S if it can be added without violating the IS property

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 20 / 48

Page 21: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Randomized Local Search (RLS) for MaximumIndependent Set

Algorithm 2: RLS1p Algorithm for the Maximum Independent Set

SizeRLS1

p Algorithm for the Maximum Independent Set Size

Input: graph G = [V ,E ]Output: the size α(G ) of the maximum independent set

1 P = random permutation(1, 2, ..., |V |), P∗ = P, k∗ = 12 while stopping criterion is not met3 k = |greedy independent set(G ,P)|4 if k ≥ k∗

5 k∗ = k , P∗ = P6 j = uniformly random(2, |V |)7 P = jump(j , 1,P∗)8 return α(G ) = k∗

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 21 / 48

Page 22: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Overview

the vertex clique covering problem (CCP)

some properties of real-world networks

motivation and relations to different fields

the approach: iterated greedy (IG) clique covering;randomized local search (RLS) for maximum independent set

experimental results

(a sketch of a few) theoretical results

conclusions and discussion

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 22 / 48

Page 23: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Iterated Greedy Clique Covering: Instances for Experiments

web-based social network extracts

network science instances: adjective-noun network (Newman, 2002),collaboration network (Newman, 2002), social network (Zachary,1977), college football network (Girvan and Newman, 2002),computer network

coappearance networks: network of coapperances of literarycharacters (Knuth, 1993)

Leighton graphs: quasirandom graphs modeling large schedulingproblems (Leighton, 1979)

Erdos-Renyi uniform random graphs

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 23 / 48

Page 24: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Iterated Greedy Clique Covering: Results on ComplexNetworks

Table : Detailed computational results of our approach on complex networkinstances.

source of G file name ϑ∗ succ. iter. CPUWeb-based social network extracts

Social network I. soc500 ϑ ≤ 377 30/30 1888 < 1 s|V | = 500, |E | = 924 ϑ ≥ 377 30/30 3764 < 1 sSocial network I. soc1000 ϑ ≤ 759 30/30 3801 1 s|V | = 1000, |E | = 1876 ϑ ≥ 759 30/30 7960 < 1 sSocial network I. soc2000 ϑ ≤ 1471 30/30 7372 4 s|V | = 2000, |E | = 4124 ϑ ≥ 1470 30/30 17430 < 1 sSocial network I. soc10000 ϑ ≤ 6618 30/30 33276 89 s|V | = 10000, |E | = 28675 ϑ ≥ 6618 17/30 124120 31 sSocial network I. soc20000 ϑ ≤ 12764 30/30 64651 366 s|V | = 20000, |E | = 63245 ϑ ≥ 12764 25/30 274529 147 sSocial network II. soc52 ϑ ≤ 15 30/30 78 < 1 s|V | = 52, |E | = 822 ϑ ≥ 15 30/30 508 < 1 s

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 24 / 48

Page 25: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Iterated Greedy Clique Covering: Results on ComplexNetworks

Table : Detailed computational results of our approach on complex networkinstances.

source of G file name ϑ∗ succ. iter. CPUNetwork science instances

Adjective-noun adjacencies adjnoun ϑ ≤ 55 30/30 364 < 1 s|V | = 112, |E | = 425 ϑ ≥ 53 30/30 1145 < 1 sNetwork science collaborations netscience ϑ ≤ 630 30/30 3453 1 s|V | = 1589, |E | = 2742 ϑ ≥ 630 30/30 11874 < 1 sLes Miserables network lesmis ϑ ≤ 35 30/30 176 < 1 s|V | = 77, |E | = 254 ϑ ≥ 35 30/30 546 < 1 sZachary Karate Club zachary ϑ ≤ 20 30/30 101 < 1 s|V | = 34, |E | = 78 ϑ ≥ 20 30/30 232 < 1 sAmerican College Football football ϑ ≤ 22 22/30 118 < 1 s|V | = 115, |E | = 616 ϑ ≥ 21 30/30 1215 < 1 sSnapshot of the Internet as − 22july06 ϑ ≤ 19661 30/30 98312 556 s|V | = 22963, |E | = 48436 ϑ ≥ 19660 26/30 192136 128 s

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 25 / 48

Page 26: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Iterated Greedy Clique Covering: Results on ComplexNetworks

Table : Detailed computational results of our approach on complex networkinstances.

source of G file name ϑ∗ succ. iter. CPUCharacters’ coappearance networks (Johnson and Trick, 1996)

Anna Karenina anna ϑ ≤ 80 30/30 402 < 1 s|V | = 138, |E | = 986 ϑ ≥ 80 30/30 1022 < 1 sDavid Copperfield david ϑ ≥ 36 30/30 182 < 1 s|V | = 87, |E | = 812 ϑ ≤ 36 30/30 715 < 1 sHuckleberry Finn huck ϑ ≤ 27 30/30 136 < 1 s|V | = 74, |E | = 602 ϑ ≥ 27 30/30 516 < 1 sIliad and Odyssey homer ϑ ≤ 341 30/30 1711 < 1 s|V | = 561, |E | = 3258 ϑ ≥ 341 30/30 4219 < 1 sJean Valjean jean ϑ ≤ 38 30/30 192 < 1 s|V | = 80, |E | = 508 ϑ ≥ 38 30/30 574 < 1 s

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 26 / 48

Page 27: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Iterated Greedy Clique Covering: Overview of the Results

Table : Summary of the upper and lower bounds for ϑ obtained by IG for cliquecovering and RLS for maximum independent sets on complex network instances.

source of G file name ϑL(G) ϑU (G)|V |

ϑU (G)

Web-based social network extractsSocial network I. soc500 377 377 1.33Social network I. soc1000 759 759 1.32Social network I. soc2000 1470 1471 1.36Social network I. soc10000 6618 6618 1.51Social network I. soc20000 12764 12764 1.57Social network II. soc52 15 15 3.47

Network science instancesAdjective-noun adjacencies adjnoun 53 55 2.04Network science collaborations netscience 690 690 2.30Les Miserables network lesmis 35 35 2.20Zachary Karate Club zachary 20 20 1.70American College Football football 21 22 5.23Snapshot of the Internet as − 22july06 19660 19661 1.17

Characters’ coappearance networks (Johnson and Trick, 1996)Anna Karenina anna 80 80 1.73David Copperfield david 36 36 2.42Huckleberry Finn huck 27 27 2.74Iliad and Odyssey homer 341 341 1.65Jean Valjean jean 38 38 2.11

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 27 / 48

Page 28: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Iterated Greedy Clique Covering: Results on ArtificialGraphs

Table : Summary of the upper and lower bounds for ϑ obtained by our approachon synthetic graphs following the Leighton’s model.

source of G file name ϑL(G) ϑU (G) |V |ϑU (G)

Leighton graphs from DIMACS coloring instances (Johnson and Trick, 1996)Leighton graph (15-colorable) le450 15a 75 80 5.63Leighton graph (15-colorable) le450 15b 78 82 5.49Leighton graph (15-colorable) le450 15c 41 57 7.76Leighton graph (15-colorable) le450 15d 41 57 7.76Leighton graph (25-colorable) le450 25a 91 91 4.95Leighton graph (25-colorable) le450 25b 78 80 5.63Leighton graph (25-colorable) le450 25c 47 54 8.33Leighton graph (25-colorable) le450 25d 43 51 8.82

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 28 / 48

Page 29: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Iterated Greedy Clique Covering: Results on ArtificialGraphs

Table : Summary of the upper and lower bounds for ϑ obtained by our approachon synthetic graphs following the Erdos-Renyi model.

source of G file name ϑL(G) ϑU (G) |V |ϑU (G)

Erdos-Renyi uniform random graphsUniform random graph unif 1000 0.1 147 243 4.12Uniform random graph unif 5000 0.1 617 1066 4.69Uniform random graph unif 10000 0.1 1154 2025 4.94Uniform random graph unif 20000 0.01 3796 6387 3.13

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 29 / 48

Page 30: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Instances: Degree and Clique Size Distributions

10-4

10-3

10-2

10-1

100

100 101 102

soc2000 (degree distribution)

10-5

10-4

10-3

10-2

10-1

100

100 101 102

soc20000 (degree distribution)

10-4

10-3

10-2

10-1

100

100 101 102

netscience (degree distribution)

10-3

10-2

10-1

100

100 101

soc2000 (clique size distribution)

10-4

10-3

10-2

10-1

100

100 101

soc20000 (clique size distribution)

10-3

10-2

10-1

100

100 101 102

netscience (clique size distribution)

Figure : The visualization of degree and clique size distributions for chosenreal-world network test instances and the obtained solutions in log log scale.

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 30 / 48

Page 31: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Instances: Degree and Clique Size Distributions

10-3

10-2

10-1

100

100 101 102

football (degree distribution)

10-5

10-4

10-3

10-2

10-1

100

100 101 102 103 104

as-22july06 (degree distribution)

10-3

10-2

10-1

100

100 101 102

homer (degree distribution)

10-2

10-1

100

100 101

football (clique size distribution)

10-5

10-4

10-3

10-2

10-1

100

100 101

as-22july06 (clique size distribution)

10-3

10-2

10-1

100

100 101

homer (clique size distribution)

Figure : The visualization of degree and clique size distributions for chosenreal-world network test instances and the obtained solutions in log log scale (partII).

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 31 / 48

Page 32: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Instances: Degree and Clique Size Distributions

10-5

10-4

10-3

10-2

10-1

100 101 102 103

unif20000_0.01 (degree distribution)

10-3

10-2

10-1

100 101 102 103

le450_15c (degree distribution)

10-3

10-2

10-1

100 101 102 103

le450_25b (degree distribution)

10-2

10-1

100

100 101

unif20000_0.01 (clique size distribution)

10-2

10-1

100

100 101 102

le450_15c (clique size distribution)

10-2

10-1

100

100 101 102

le450_25b (clique size distribution)

Figure : The visualization of degree and clique size distributions for chosensynthetic test instances in log log scale.

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 32 / 48

Page 33: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Interesting Points

It seems to be easy to find a good clique covering of real-world socialnetwork, despite the fact that CCP is NP-hard.

This seems to be due to structural / statistical properties of thenetworks.

When the approach gives only an interval [ϑL, ϑU ]? How to cope withthis?

Scaling to larger networks (105 - 106 vertices)?

Generalization to communities in general?Is it easy or hard to find ,,high-level” communities in real-worldgraphs?

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 33 / 48

Page 34: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Overview

the vertex clique covering problem (CCP)

some properties of real-world networks

motivation and relations to different fields

the approach: iterated greedy (IG) clique covering;randomized local search (RLS) for maximum independent set

experimental results

(a sketch of a few) theoretical results

conclusions and discussion

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 34 / 48

Page 35: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

How Iterated Greedy Really Works

IG is a local search algorithm: it works like walking down some stairs- fitness levels

plateaus: on each stair, the algorithm moves randomly until it findsthe edge - random walk

local optima: no guarantee that there is a way to a lower step fromthe current one

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 35 / 48

Page 36: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Analytical Results on GCC and Iterated Greedy

Table : An overview of analytical results on GCC and iterated greedy.

graph class GCC IGpaths approximation optimal

ratio 4/3 O(n5)trees approximation optimal

ratio ∈ [4/3, 2] O(n5)growing constant constantnetworks approx. ratio approx. ratiocomplements of differs based can get stuckbipartite graphs on density with small prob.worst-case unknown can get stuckresult (probably very bad) with prob. Ω(1)

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 36 / 48

Page 37: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Behavior of GCC and Block-based Mutation on Paths

Theorem. For IG with block jump operator on paths, the expected time toobtain the optimal clique covering is upper bounded by O(n5).Sketch of proof.- We have at most O(n) extra cliques - these represent the fitness levels.- On each of them, there is a random walk of 1-cliques, until they meetand form a 2-clique.- This random walk is almost fair - it takes O(n3) time to upgrade to abetter fitness level.- The complexity of GCC is O(n).

Figure : Illustration of the case, when we have only two vertices to be joined, i.e.we have a solution with ϑ+ 1 cliques.

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 37 / 48

Page 38: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Sparse Complements of Bipartite Graphs (Biclique Graphs)

Theorem.Let G = [V ,E ] be a graph with with 2 planted cliques of size r . Let thenumber of edges between the planted cliques Eout satisfy |E |out < r .Then, IG with GCC and random reorderings will converge to the optimalsolution in O(n3) time.Sketch of proof. By induction from the simple case of two triangles.

Figure : Two triangles with 1 inter-clique edge and their coverings with 2 and 3cliques (on the left and in the middle) and two triangles with 2 inter-clique edgesand their covering with 4 cliques (on the right).

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 38 / 48

Page 39: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Complements of Bipartite Graphs

Lemma. On graph H, a uniformly random permutation will lead to aclique covering that is not improvable by block jump with probability atleast 1/15.Proof. The labeling should induce the three inter-clique edges, instead ofthe two triangles. In permutations with “embedded” inter-clique edges,there are 3 blocks (the wrong cliques) and 23 possible internal orderings ofthe vertices in these blocks. Thus, the probability of generating such asituation is at least 3! 23

6! = 115 .

Figure : The illustration of graph H, on which the IG can fail to converge withprobability at least 1/15.

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 39 / 48

Page 40: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Worst-case Result

Theorem. On graphs from class Hϑ/2, a uniformly random permutationwill lead to a clique covering that is not improvable by block jump with

probability at least 1−(

1415

)|V |/6.

Proof. Since the H subgraphs are disjoint, we can treat their vertices inthe permutation as independent. The independence of the componentsimplies that the probability that all subpermutations are the right ones, is

at most(

1415

)|V |/6. Thus, the inverse probability is 1−

(1415

)|V |/6.

...

Figure : The illustration of graph Hϑ/2, which consists of ϑ/2 disjoint Hsubgraphs.

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 40 / 48

Page 41: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Overview

the vertex clique covering problem (CCP)

some properties of real-world networks

motivation and relations to different fields

the approach: iterated greedy (IG) clique covering;randomized local search (RLS) for maximum independent set

experimental results

(a sketch of a few) theoretical results

conclusions and discussion

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 41 / 48

Page 42: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Conclusions

iterated greedy (IG) clique covering :a constructive heuristic for CCP with stochastic improvementmechanism

randomized local search for maximum independent set:serves as a lower bound (since we do not know the optimum forreal-world graphs)

results:on 13 out of 17 real-world graphs our approach solved the problemoptimally; on the rest of the graphs, a near-optimal solution wasfound

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 42 / 48

Page 43: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Open Problems

theoretical analysis of IG on models of complex networks

analytical results on RLS for maximum independent set in general

study of the impact of further scaling - what if we try to solve theproblem for 105, 106 vertex networks

generalization to more “loose” community detection problems

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 43 / 48

Page 44: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

Thank you for your [email protected]

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 44 / 48

Page 45: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

References I

(Chalupa, 2013a) Chalupa, D.: Construction of Near-optimal Vertex Clique Covering forReal-world Networks. In: Computing and Informatics, to appear.

(Chalupa, 2013b) Chalupa, D.: An Analytical Investigation of Block-based MutationOperators for Order-based Stochastic Clique Covering Algorithms. In: Blum, C., Alba , E.(eds.) Proceedings of the 15th annual conference on Genetic and evolutionarycomputation. pp. 495–502. GECCO ’13, ACM, New York, NY, USA (2013).

(Chalupa, 2012) Chalupa, D.: On the efficiency of an order-based representation in theclique covering problem. In: Soule, T., Moore, J. (eds.) Proceedings of the 14th annualconference on Genetic and evolutionary computation. pp. 353–360. GECCO ’12, ACM,New York, NY, USA (2012).

(Chalupa, 2011) Chalupa, D.: On the Ability of Graph Coloring Heuristics to FindSubstructures in Social Networks. In: Information Sciences and Technologies Bulletin ofACM Slovakia, 3(2):51-54 (2011).

(Leskovec, 2010) Leskovec, J.—Lang, K. J.—Mahoney, M. W.: Empirical comparison ofalgorithms for network community detection. In M. Rappa, P. Jones, J. Freire and S.Chakrabarti (Eds.): Proceedings of the 19th International Conference on World Wide Web,WWW 2010, pp. 631–640. ACM, New York, NY, USA (2010).

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 45 / 48

Page 46: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

References II

(Girvan and Newman, 2002) Girvan, M.—Newman, M. E. J.: Community structure insocial and biological networks. Proceedings of the National Academy of Sciences,99(12):7821-7826 (2002).

(Newman, 2006) Newman, M. E. J.: Finding community structure in networks using theeigenvectors of matrices. arXiv:physics/0605087 (2006).

(Zachary, 1977) Zachary, W. W.: An information flow model for conflict and fission insmall groups. Journal of Anthropological Research, 33:452-473 (1977).

(Knuth, 1993) Knuth, D. E.: The Stanford GraphBase: A Platform for CombinatorialComputing. Addison-Wesley, Reading, MA, 1993.

(Karp, 1972) Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.,Thatcher, J. (eds.) Complexity of Computer Computations, pp. 85–103. Plenum Press,New York, NY, USA (1972)

(Johnson and Trick, 1996) Johnson, D. S.—Trick, M.: Cliques, Coloring, and Satisfiability:Second DIMACS Implementation Challenge, DIMACS Series in Discrete Mathematics andTheoretical Computer Science, Vol. 26. American Mathematical Society (1996).

(Schaeffer, 2007) Schaeffer, S. E.: Graph clustering. Computer Science Review, 1(1):27-64(2007).

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 46 / 48

Page 47: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

References III

(Brelaz, 1979) Brelaz, D.: New methods to color vertices of a graph. Communications ofthe ACM 22(4):251–256 (1979).

(Culberson and Luo, 1996) Culberson, J.C., Luo, F.: Exploring the k-colorable landscapewith iterated greedy. In: Johnson, D.S., Trick, M. (eds.) Cliques, Coloring, andSatisfiability: Second DIMACS Implementation Challenge. pp. 245–284. AmericanMathematical Society (1996).

(Gendreau et al., 1993) Gendreau, M., Soriano, P., Salvail, L.: Solving the maximumclique problem using a tabu search approach. Ann Oper Res 41, 385–403 (1993)

(Welsh and Powell, 1967) Welsh, D.J.A., Powell, M.B.: An upper bound for the chromaticnumber of a graph and its application to timetabling problems. The Computer Journal10(1), 85–86 (1967)

(Boyer et al., 2005) F. Boyer, A. Morgat, L. Labarre, J. Pothier, and A. Viari. Syntons,metabolons and interactons: an exact graph-theoretical approach for exploringneighbourhood between genomic and functional data. Bioinformatics, 21(23):4209–4215(2005).

(Burke et al., 2007) E. K. Burke, B. McCollum, A. Meisels, S. Petrovic, and R. Qu. Agraph-based hyper-heuristic for educational timetabling problems. European Journal ofOperational Research, 176(1):177–192 (2007).

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 47 / 48

Page 48: Construction of Near-optimal Vertex Clique Covering for ...kvasnicka/Seminar_of_AI/... · 10/14/2013  · 1996) - repeated construction of solutions by a greedy algorithm combined

References IV

(Gao et al, 2009) L. Gao, P. Sun, and J. Song. Clustering algorithms for detectingfunctional modules in protein interaction networks. Journal of Bioinformatics andComputational Biology, 7(1):217–242 (2009).

(Leighton, 1979) F. T. Leighton. A graph coloring algorithm for large scheduling problems.Journal of Research of the National Bureau of Standards, 84(6):489–503 (1979).

(Smith et al, 1998) D. H. Smith, S. Hurley, and S. U. Thiel. Improving heuristics for thefrequency assignment problem. European Journal of Operational Research, 107(1):76–86(1998).

(Sun et al., 2008) J. Sun, Y. Xie, H. Zhang, and C. Faloutsos. Less is more: Sparse graphmining with compact matrix decomposition. Statistical Analysis and Data Mining,1(1):6–22, 2008.

(Tang et al., 2011) J. Tang, T. Wang, J. Wang, Q. Lu, and W. Li. Using complex networkfeatures for fast clustering in the web. In S. Sadagopan, K. Ramamritham, A. Kumar,M. P. Ravindra, E. Bertino, and R. Kumar, editors, Proceedings of the 20th internationalconference companion on World wide web, WWW ’11, pages 133–134, New York, NY,USA, 2011. ACM.

(Rothenberg et al., 1996) R. B. Rothenberg, J. J. Potterat, and D. E. Woodhouse.Personal Risk Taking and the Spread of Disease: Beyond Core Groups. The Journal ofInfectious Diseases, Supplement 2, 174:S144–S149, 1996.

David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 48 / 48