UBI529
3. Distributed Graph Algorithms
2
Distributed Algorithms Models
Interprocess Communication method: accessing shared memory, point-to-point or broadcast messages, or remote procedure calls.
• Timing model: synchronous or asynchronous models.
• Failure models : reliable or faulty behavior; Byzantine failures (failed processor can behave arbitrarily).
3
We assume
A distributed network—Modeled as a graph. Nodes are processors and edges are communication links.
• Nodes can communicate directly (only) with their neighbors through the edges.
• Nodes have unique processor identities.
• Synchronous model: Time is measured in rounds (time steps).
• One message (typically of size O(log n)) can be sent through an edge in a time step. A node can send messages simultaneously through all its edges at once in a round.
• No failure of nodes or edges. No malicious nodes.
2.1 Vertex and Tree Coloring
• Vertex Coloring
• Sequential Vertex Coloring Algorithms
• Distributed Synchronous Vertex Coloring Algorithm
• Distributed Tree Coloring Algorithms
5
Preliminaries
Vertex Coloring Problem: Given undirected Graph G = (V,E). Assign a color cu to each vertex u Є V such that if e = (v,w) Є E, then cu ≠ cw Aim is to use the minimum number of colors.
Definition 2.1.1 : Given an undirected Graph, chromatic number Χ(G) is the minimum number of colors to color it. A vertex k-coloring uses exactly k colors. If X(G) = k, G is k-colorable but not (k-1) colorable.Calculating X(G) is NP-hard. 3-coloring decision is NP-complete.
Applications :Assignment of radio frequencies : Colors represent frequencies, transmitters are the vertices. If two stations are neighbors when they interfere.University course scheduling : Vertices are courses, students edgesFast register allocation for computer programming : Vertices are variables, they are neigbors if they can be active at the same time.
6
Sequential Algorithm for Vertex Coloring
Algorithm 2.1.1 : Sequential Vertex ColoringInput : G with v1,v2, ..., vn
Output : Vertex Coloring f : VG -> {1,2,3,..}
1. For i =1 to n do 2. f(vi) := smallest color number that does not conflict by
any of the other colored neighbors of vi
3. Return Vertex Coloring f
7
Vertex Coloring Algorithms
Definition 2.1.2 : The number of neighbors of a vertex v is called the degree of v δ(v). The maximum degree vertex in a Graph G is called the the Graph degree Δ(G) = Δ.
Theorem 2.1.1 : The algorithm is correct and terminates in O(n) steps. The algorithm uses Δ +1 colors.Proof: Correctness and termination are straight-forward. Since each node has at most Δ neighbors, there is always at least one color free in the range {1, …, Δ+1}.
Remarks:• For many graphs coloring can be done with much less than Δ +1 colors.• This algorithm is not distributed; only one processor is active at a time. But: Use idea of Algorithm 1.4 to define “local” coloring subroutine 1.7
8
Heuristic Vertex Coloring Algorithm : Largest Degree First
Idea : (Two observations) A vertex of a large degree is more difficult to color than a smaller degree vertex. Also, a vertex with more colored neighbors will be more difficult to color later
Algorithm 2.1.1 : Largest Degree First Algorithm
Input : G with v1,v2, ..., vn
Output : Vertex Coloring f : VG -> {1,2,3,..}
1. While there are uncolored vertices of G2. Among the uncolored max. degree vertices
Choose vertex v with the max. Colored degree 3. Assign smallest possible k to v : f(v) := k4. Return Vertex Coloring fThe coloring in the diagram is v3,v1,v2,v4,v8,v6,v7,v5
Colored degree : # of different colors used to color neighbors of v
9
Coloring Trees : A Distributed Algorithm
Lemma 2.1.1: X(Tree) <= 2.
Proof: If the distance of a node to the root is odd (even), color it 1 (0). An odd node has only even neighbors and vice versa.If we assume that each node knows its parent (root has no parent) and children in a tree, this constructive proof gives a very simple algorithm.
Algorithm 2.1.3 [Slow tree coloring]:
1. Root sends color 0 to children. (Root is colored 0)2. When receiving a message x from parent, a node u picks color cu = 1-x, and sends cu to its children
10
Distributed Tree Coloring
Remarks:
• With the proof of Lemma 2.1.1, the algorithm 2.13 is correct.• The time complexity of the algorithm is the height of the tree.• When the root is chosen randomly, this can be up to the diameter of the tree.
2.2 Distributed Tree based Communication Algorithms
• Broadcast
• Convergecast
• BFS Tree Construction
12
Broadcast
Broadcasting means sending a message from a source node to all othernodes of the network.
Two basic broadcasting approaches are flooding andspanning tree-based broadcast.
Flooding:
A source node s wants to send a message to allnodes in the network. s simply forwards the message over all its edges.
Any vertex v != s, upon receiving the message forthe first time (over an edge e) forwards it on everyother edge.
Upon receiving the message again it does nothing.
13
Broadcast
Definition 2.2.1 [Broadcast]: A broadcast operation is initiated by a single processor, the source. The source wants to send a message to all other nodes in the system.
Definition 2.2.2 [Distance, Radius, Diameter]:
• The distance between two nodes u, v in an undirected graph is the number of hops of a minimum path between u and v.
• The radius of a node u in a graph is the maximum distance between u and any other node. The radius of a graph is the minimum radius of any node in the graph.
• The diameter of a graph is the maximum distance between two arbitrary nodes.
14
Broadcast
Theorem 2.2.1 [Lower Bound]: The message complexity of a broadcast is at least n-1. The radius of the graph is a lower bound for the time complexity.Proof: Every node must receive the message.
Remarks:• You can use a pre-computed spanning tree to do the broadcast with tight message complexity.• If the spanning tree is a breadth-first spanning tree (for a given source), then also the time complexity is tight.Definition 2.2.3 : A graph (system/network) is clean if the nodes do not know the topology of the graph.Theorem 2.2.2 [Clean Lower Bound]: For a clean network, the number of edges is a lower bound for the broadcast message complexity.Proof: If you do not try every edge, you might miss a whole part of the graph behind it.
15
Flooding
Algorithm 2.2.1 [Flooding]: The source sends the message to all neighbors. Each node receiving the message the first time forwards to all (other) neighbors.
Remarks:
• If node v receives the message first from node u, then node v calls node u “parent”. This parent relation defines a spanning tree T. If the flooding algorithm is executed in a synchronous system, then T is a breadth-first spanning tree (with respect to the root).
• More interestingly, also in asynchronous systems the flooding algorithm terminates after r time units, where r is the radius of the source. (But note that the constructed spanning tree needs not be breadth-first.)
16
Flooding Analysis
Theorem : The message complexity of flooding is (|E|) and the time complexity is (D), where D is the diameter of G.
Proof. The message complexity follows from the fact that each edge delivers the message at least once and at most twice (one in each direction). To show the time complexity, we use induction on t to show that after t time units, the message has already reached everyvertex at a distance of t or less from the source
17
Broadcast Over a Rooted Spanning Tree
Suppose processors already have information about a rooted spanning tree of the communication topology
tree: connected graph with no cycles spanning tree: contains all processors rooted: there is a unique root node
Implemented via parent and children local variables at each processor
indicate which incident channels lead to parent and children in the rooted spanning tree
18
Broadcast Over a Rooted Spanning Tree: A Simple Algorithm
1. root initially sends msg to its children2. when a node receives msg from its parent
sends msg to its children terminates (sets a local boolean to true)
Synchronous model:
time is depth of the spanning tree, which is at most n - 1 number of messages is n - 1, since one message is sent over
each spanning tree edge
Asynchronous model:
same time and messages
19
Tree Broadcast
Assume that a spanning tree has been constructed.
Theorem . For every n-vertex graph G with a spanning tree T rooted at r0, the message complexity of broadcast is n−1 and time complexity is depth(T).
A broadcast algorithm can be used to construct a spanning tree in G.
The message complexity of broadcast is asymptotically equivalent to the message complexity of spanning tree construction.
Using a breadth-first spanning tree, we get theoptimal message and time complexities for broadcast.
20
Convergecast
Again, suppose a rooted spanning tree has already been computed by the processors
parent and children variables at each processor
Do the opposite of broadcast:
leaves send messages to their parents
non-leaves wait to get message from each child, then send combined info to parent
21
Convergecast
g h
a
b c
d e f
g h
d e,g f,h
c,f,hb,d
solid arrows: parent-child relationships
dotted lines:non-tree edges
22
Finding a Spanning Tree Given a Root
a distinguished processor is known, to serve as the rootroot sends M to all its neighborswhen non-root first gets M
set the sender as its parent send "parent" msg to sender send M to all other neighbors
when get M otherwise
send "reject" msg to sender
use "parent" and "reject" msgs to set children variables and know when to terminate
23
Execution of Spanning Tree Alg.
g h
a
b c
d e f
Synchronous: always givesbreadth-first search (BFS) tree
g h
a
b c
d e f
Asynchronous: not necessarily BFS tree
Both models:O(m) messagesO(diam) time
2.3 Distributed Minimum Spanning Tree Algorithms
25
Minimum Spanning Tree
Minimum spanning tree. Given a connected graph G = (V, E) with real-valued edge weights ce, an MST is a subset of the edges T E
such that T is a spanning tree whose sum of edge weights is minimized.
Cayley's Theorem. There are nn-2 spanning trees of Kn.
5
23
10
21
14
24
16
6
4
189
7
11 8
5
6
4
9
7
11 8
G = (V, E) T, eT ce = 50
can't solve by brute force
26
Applications
MST is fundamental problem with diverse applications.
Network design– telephone, electrical, hydraulic, TV cable, computer, road
Approximation algorithms for NP-hard problems– traveling salesperson problem, Steiner tree
Indirect applications
– max bottleneck paths– LDPC codes for error correction– image registration with Renyi entropy– learning salient features for real-time face verification– reducing data storage in sequencing amino acids in a protein– model locality of particle interactions in turbulent fluid flows– autoconfig protocol for Ethernet bridging to avoid cycles in a
network Cluster analysis.
27
Greedy Algorithms
Kruskal's algorithm. Start with T = . Consider edges in ascending order of cost. Insert edge e in T unless doing so would create a cycle.
Reverse-Delete algorithm. Start with T = E. Consider edges in descending order of cost. Delete edge e from T unless doing so would disconnect T.
Prim's algorithm. Start with some root node s and greedily grow a tree T from s outward. At each step, add the cheapest edge e to T that has exactly one endpoint in T.
Remark. All three algorithms produce an MST.
28
Greedy Algorithms
Simplifying assumption. All edge costs ce are distinct.
Cut property. Let S be any subset of nodes, and let e be the min cost edge with exactly one endpoint in S. Then the MST contains e.
Cycle property. Let C be any cycle, and let f be the max cost edge belonging to C. Then the MST does not contain f.
f C
S
e is in the MST
e
f is not in the MST
29
Cycles and Cuts
Cycle. Set of edges the form a-b, b-c, c-d, …, y-z, z-a.
Cutset. A cut is a subset of nodes S. The corresponding cutset D is the subset of edges with exactly one endpoint in S.
Cycle C = 1-2, 2-3, 3-4, 4-5, 5-6, 6-1
13
8
2
6
7
4
5
Cut S = { 4, 5, 8 }Cutset D = 5-6, 5-7, 3-4, 3-5, 7-8
13
8
2
6
7
4
5
30
Cycle-Cut Intersection
Claim. A cycle and a cutset intersect in an even number of edges.
Pf. (by picture)
13
8
2
6
7
4
5
S
V - S
C
Cycle C = 1-2, 2-3, 3-4, 4-5, 5-6, 6-1Cutset D = 3-4, 3-5, 5-6, 5-7, 7-8 Intersection = 3-4, 5-6
31
Greedy Algorithms
Simplifying assumption. All edge costs ce are distinct.
Cut property. Let S be any subset of nodes, and let e be the min cost edge with exactly one endpoint in S. Then the MST T* contains e.
Pf. (exchange argument) Suppose e does not belong to T*, and let's see what happens. Adding e to T* creates a cycle C in T*. Edge e is both in the cycle C and in the cutset D corresponding
to S there exists another edge, say f, that is in both C and D.
T' = T* { e } - { f } is also a spanning tree. Since ce < cf, cost(T') < cost(T*). This is a contradiction. ▪
f
T*
e
S
32
Greedy Algorithms
Simplifying assumption. All edge costs ce are distinct.
Cycle property. Let C be any cycle in G, and let f be the max cost edge belonging to C. Then the MST T* does not contain f.
Pf. (exchange argument) Suppose f belongs to T*, and let's see what happens. Deleting f from T* creates a cut S in T*. Edge f is both in the cycle C and in the cutset D corresponding
to S there exists another edge, say e, that is in both C and D.
T' = T* { e } - { f } is also a spanning tree. Since ce < cf, cost(T') < cost(T*). This is a contradiction. ▪
f
T*
e
S
33
Prim's Algorithm: Proof of Correctness
Prim's algorithm. [Jarník 1930, Dijkstra 1957, Prim 1959] Initialize S = any node. Apply cut property to S. Add min cost edge in cutset corresponding to S to T, and add
one new explored node u to S.
S
34
Implementation: Prim's Algorithm
Prim(G, c) { foreach (v V) a[v] Initialize an empty priority queue Q foreach (v V) insert v onto Q Initialize set of explored nodes S
while (Q is not empty) { u delete min element from Q S S { u }
foreach (edge e = (u, v) incident to u) if ((v S) and (ce < a[v]))
decrease priority a[v] to ce
}
Implementation. Use a priority queue ala Dijkstra. Maintain set of explored nodes S. For each unexplored node v, maintain attachment cost a[v] =
cost of cheapest edge v to a node in S. O(n2) with an array; O(m log n) with a binary heap.
35
Kruskal's Algorithm: Proof of Correctness
Kruskal's algorithm. [Kruskal, 1956] Consider edges in ascending order of weight. Case 1: If adding e to T creates a cycle, discard e according
to cycle property. Case 2: Otherwise, insert e = (u, v) into T according to cut
property where S = set of nodes in u's connected component.
Case 1
v
u
Case 2
e
eS
36
Implementation: Kruskal's Algorithm
Kruskal(G, c) { Sort edges weights so that c1 c2 ... cm. T
foreach (u V) make a set containing singleton u
for i = 1 to m (u,v) = ei
if (u and v are in different sets) { T T {ei} merge the sets containing u and v } return T}
Implementation. Use the union-find data structure. Build set T of edges in the MST. Maintain set for each connected component. O(m log n) for sorting and O(m (m, n)) for union-find.
are u and v in different connected components?
merge two components
m n2 log m is O(log n) essentially a constant
37
38
Distributed Spanning tree construction
Chang-Robert’s algorithm
{The root is known}
Uses signals and acks, similar
to the termination detection
algorithm. Uses the same rule
for sending acknowledgment.
0
1 2
3 4
5
root
For a graph G=(V,E), a spanning tree is a maximally connected subgraph T=(V,E’), E’ E,such that if one more edge is added, then the subgraph is no more a tree. Used for broadcasting in a network.
Question: What if the root is not
designated?
39
Chang Roberts Spanning Tree Algprogram probe-echodefine N : integer (no. of neighbors)
C, D : integer;initially parent :=i; C=0; D=0;
{for the initiator}
send probes to each neighbor;D:=no. of neighbors;do D!=0 echo -> D:=D-1 od {D=0 signals end}
{ for a non-initator process i>0}
do parent parent=i C=0 -> C:=1; parent := sender; if i is not a leaf -> send probes to non –
parent neighbors; D:= no. of non-parent neighbors
fi; echo -> D:=D-1; probe sender != parent -> send echo to sender; C=1 D=0 -> send echo to parent; C:=0;od
40
Graph traversal
Many applications of exploring an unknown graph by a visitor
(a token or mobile agent or a robot). The goal of traversal
is to visit every node at least once, and return to the starting point.
- How efficiently can this be done?
- What is the guarantee that all nodes will be visited?
- What is the guarantee that the algorithm will terminate?
Consider web-crawlers, exploration of social networks,graph layouts for visualization or drawing etc.
41
Graph traversal and Spanning Tree Formation
Rule 1. Send the token towards each neighbor exactly once.
Rule 2. If rule 1 is not applicable, then send the token to the parent.
Tarry’s algorithm is one of the oldest (1895)
0 2
3
4 5
root
6
1
5
A possible route is: 0 1 2 5 3 1 4 6 2 6 4 1 3 5 2 1 0
Nodes and their parent pointers generate a spanning tree that may not be DFS
42
Distributed MST
Def MST Fragment : In a weighted graph G = (V,E,w), a tree T in G is called an MST fragment of G, i there exists an MST of G such that T is a subgraph of that MST.
Def MWOE : An edge e is an outgoing edge of a MST fragment T, iff exactly one of its endpoints belongs to T. The minimum weight outgoingedge is denoted MWOE(T).Lemma : Consider a MST fragment T of a graph G = (V, E, w). Lete = MWOE(T). Then T U e is a MST fragment as well.
Proof : Let TM be an MST containing T. If TM contains T we are done.Otherwise, let e’ be an edge that connects T to the rest of TM.Clearly, e’ is an outgoing edge of T and w(e’)>=w(e). Adding e to TM, creates a graph C with a cycle through e and e’. Discarding e’ from C yields a new T’ M with w(T’ M) >= w(TM).
43
Minimum Spanning Tree
Given a weighted graph G = (V, E), generate a spanning tree T = (V, E’)
such that the sum of the weights of all the edges is minimum.
Applications
On Euclidean plane, approximate solutions to the traveling salesman
problem,
Lease phone lines to connect the different offices with a minimum cost,
Visualizing multidimensional data (how entities are related to each other)
We are interested in distributed algorithms only
The traveling salesman problemasks for the shortest route to visit a collection of cities and return to
the starting point.
44
Example
45
Sequential algorithms for MST
Review (1) Prim’s algorithm and (2) Kruskal’s algorithm.
Theorem. If the weight of every edge is distinct, then the MST is unique.
1
2
3
4
5
6
7
8
9
e0 2
3
51
4
6
T1T2
46
Gallagher-Humblet-Spira (GHS) Algorithm
GHS is a distributed version of Prim’s
algorithm.
Bottom-up approach. MST is recursively
constructed by fragments joined by an edge
of least cost.
3
7
5
Fragment Fragment
47
Challenges
1
2
3
4
5
6
7
8
9
e0 2
3
51
4
6
T1T2
Challenge 1. How will the nodes in a given fragment identify the edge to be used to connect with a different fragment?
A root node in each fragment is the coordinator
48
Challenges
1
2
3
4
5
6
7
8
9
e0 2
3
51
4
6
T1T2
Challenge 2. How will a node in T1 determine if a given edge connects to a node of a different tree T2 or the same tree T1? Why will node 0 choose the edge e with weight 8, and not the edge with weight 4?
Nodes in a fragment acquire the same name before augmentation.
49
Two main steps
Each fragment has a level. Initially each node is a fragment at level 0.
(MERGE) Two fragments at the same level L combine to form a fragment
of level L+1
(ABSORB) A fragment at level L is absorbed by another fragment at level
L’ (L < L’)
50
Least weight outgoing edge
To test if an edge is outgoing, each node sends a test message through a candidate edge. The receiving node may send accept or reject.
Root broadcasts initiate in its own fragment, collects the report from other nodes about eligible edges using a convergecast, and determines the least weight outgoing edge.
1
2
3
4
5
6
7
8
9
e0 2
3
51
4
6
T1T2
test
reject
accept
51
Accept of reject?
Case 1. If name (i) = name (j) then send rejectCase 2. If name (i)≠name (j)level (i) level (j) then send acceptCase 3. If name (i) ≠ name (j) level (i) > level (j) then wait until level (j) = level (i).
Levels can only increase.
Question: Can fragments wait for ever and lead to a deadlock?
test
Let i send test to j
reject
test
52
Delayed response
join
initiate
test
Level 5 Level 3
A B
B is about to change its level to 5. So B does notsend an accept reponse to A in response to test
53
The major steps
Repeat
Test edges as outgoing or notDetermine lwoe - it becomes a tree edgeSend join (or respond to join)Update level & name & identify new coordinator
until done
54
Classification of edges
Basic (initially all branches are basic)Branch (all tree edges)Rejected (not a tree edge)
Branch and rejected are stable attributes
55
Wrapping it up
Merge
The edge through which the join
message is sent, changes its status to
branch, and becomes a tree edge.
Each root broadcasts an
(initiate, L+1, name) message
to the nodes in its own fragment.
T T’
(join, L, T)
(join, L’, T’)
(a)
level=L level = L’
L= L’
T T’
level=Llevel = L’
(b) L > L’
(join, L’, T;)
Example of merge
56
Wrapping it up
Absorb
T’ receives an initiate
message.
This indicates that the fragment
at level L has been
absorbed by the
other fragment at level L’.
They collectively search for the
lwoe.
The edge through which the
join message was sent,
changes
its status to branch.
T T’
(join, L, T)
(join, L’, T’)
(a)
level=L level = L’
L= L’
T T’
level=Llevel = L’
(b) L > L’
(join, L’, T;)
initiate
Example of absorb
57
Example
4
1
0 2
5
6 3
1
3
2
4 7
8
9
5
6
58
Example
4
1
0 2
5
6 3
1
3
2
4 7
8
9
5
6
merge merge
merge
59
Example
4
1
0 2
5
6 3
1
2
4 7
8
9
5
6
merge
absorb
3
60
Example
4
1
0 2
5
6 3
1
2
4 7
8
9
5
6
3
absorb
61
Message complexity
At least two messages (test + reject) must pass through eachrejected edge. The upper bound is 2|E| messages.
At each of the log N levels, a node can receive at most (1) oneinitiate message and (2) one accept message (3) one joinmessage (4) one test message not leading to a rejection, and(5) one changeroot message.
So, the total number of messages has an upper bound of2|E| + 5N logN
62
MST Algorithms: Theory
Deterministic comparison based algorithms. O(m log n) [Jarník, Prim, Dijkstra, Kruskal, Boruvka] O(m log log n). [Cheriton-Tarjan 1976, Yao 1975] O(m (m, n)). [Fredman-Tarjan 1987] O(m log (m, n)). [Gabow-Galil-Spencer-Tarjan 1986] O(m (m, n)). [Chazelle 2000]
Holy grail. O(m).
Notable. O(m) randomized. [Karger-Klein-Tarjan 1995] O(m) verification. [Dixon-Rauch-Tarjan 1992]
Euclidean. 2-d: O(n log n).compute MST of edges in Delaunay k-d: O(k n2). dense Prim
63
Distributed MST Algorithms
Gallager, Humblet, & Spira ’83: O(n log n) running time message: O(|E| + n log n) (optimal)
Chin & Ting ’85: O(n log log n) timeGafni ’85: O(n log*n) Awerbuch ’87: O(n), existentially optimalGaray, Kutten, & Peleg ’98: O(D + n0.61), Diameter DKutten & Peleg ’98:
Elkin ’04: , μ is called MST radius
– Cannot detect termination unless μ is given as input.
Peleg & Rabinovich (’99) showed a lower bound of for running time.
nnDO log*
nO ~
n~
64
Distributed Graph Algs : Other areas of interest
Distributed Cycle/Knot Detection
Distributed Center Finding
Distributed Connected Dominating Set Construction in MANETs, WSNs
Distributed Clustering based on Graph Partitioning
65
References
Introduction to Graph Theory, Douglas West, Prentice Hall, 2000 (basics)
Graph Theory and Its Applications, Gross and Yellen, CRC Press, 1998(basics)Distributed Algorithm Course notes, J.Welch, TAMU (flooding and tree algorithms)CS590A Fall 2007 G. Pandurangan 1, Purdue University
Distributed Computing Principles Course Notes, Roger Wattenhofer, ETH (Coloring algorithms)Introduction to Algorithm Design, Kleinman, Tardos, Prentice-Hall, 2005 (MST dependent)
22C:166 Distributed Systems and Algorithms Course, Sukumar Ghosh, University of Iowa (routing part heavily dependent)