28
A NAME-INDEPENDENT COMPACT MULTICAST ROUTING (CMR) ALGORITHM TECHNICAL REPORT Version 1 (March2011) Pedro Pedroso 1 , Dimitri Papadimitriou 2 and Davide Careglio 1 1 Barcelona-Tech UPC, Barcelona, Spain {ppedroso,careglio}@upc.edu 2 Alcatel-Lucent Bell, Antwerp, Belgium, [email protected] 1 Problem Statement Motivations: Todays Internet routing system of large backbone operators is facing scalability problems. With the booming of multimedia streaming/content, multicast distribution from a source to a set of destination nodes is (re-)gaining interest as a bandwidth saving technique competing with or complementing cached content distribution. Nevertheless, the scaling problems faced in the 90s when multicast received main attention from the research community remain unaddressed. Indeed, routing protocol dependent multicast routing schemes (such as Distance Vector Multicast Routing Protocol and Multicast Open Shortest Path First) have been replaced by routing protocol independent routing schemes such as Protocol Independent Multicast (PIM) and Core Base Trees (CBT) but overlaying multicast routing on top of unicast su/ers from the same scaling limitations as current unicast routing with the addition of the level of indirection added by the multicast routing application. Multicast routing protocol enables routers to build a delivery tree between the sender(s) and receivers of a multicast group. Multicast routing table includes the Multicast Routing Information Base (MRIB) and the multicast Tree Information Base (TIB). The MRIB is the topology table, typically derived from the unicast routing table, which carries multicast-specic topology information. The TIB is the collection of router state created from the exchange of Join/Prune messages. This table stores the state of all multicast distribution trees at that router. Therefore, the current size and growth rate of the core BGP route table are reaching unbearable values. Neither full global information about all possible multicast sets, nor store the complete topological information can be maintained anymore in order to cope with the growth of Internet multimedia applications. The memory- space consumption is huge. The identication of the major factors that are driving routing table growth, constraints in router technology, and the limitations of todays Internet addressing architecture are crucial to help to dene the next steps towards e/ective solutions [RFC4984]. Compact routing schemes address the fundamental tradeo/ between the memory space required to store the routing table entries and the length of the routing paths that these schemes produce. This work introduces a compact routing scheme that allows the distribution of tra¢ c from any source to any set of leaf nodes along a multicast routing path that denes a distribution tree. By means of the proposed scheme, a multicast distribution tree dynamically evolves according to the arrival of leaf-initiated join/leave requests. Two reference multicast routing schemes (the Shortest Path Tree and the Steiner Tree algorithm) are used to evaluate and compare the performance of the proposed scheme. The performance metrics considered include the stretch of the produced routing paths, the size and the number of routing table entries, and the communication cost. The results obtained by simulation, both on synthetic power law graphs (modeling the Internet topology) and real topologies such as the CAIDA Internet topology maps comprising 16k and 32k nodes, show that our scheme can successfully handle leaf-initiated dynamic setup of multicast distribution trees. While increasing the communication cost compared to the Shortest Path Tree, the proposed scheme achieves considerable reduction of the routing table size compared to both reference schemes. Moreover, the stretch of the resulting multicast routing paths show limited deterioration compared to the minimum value obtained with Steiner Trees. Graph theory and networking disciplines are here crossed to the achievement of high purposes. Our Contributions: The present work introduces the rst known name-independent compact multicast routing (CMR) algorithm enabling the leaf-initiated, distributed and dynamic construction of point-to-multipoint (p2mp) routing paths from any source to any set of destinations (or leaves). These paths dene multicast tra¢ c distribution tree (MDT) since the algorithm instantiates the local state so that each MDT node can 1

1 Problem Statement2 Alcatel-Lucent Bell, Antwerp, Belgium, [email protected] 1 Problem Statement Motivations: Today™s Internet routing system of large backbone

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

  • A NAME-INDEPENDENT COMPACT MULTICAST ROUTING (CMR) ALGORITHM

    TECHNICAL REPORT Version 1 (March2011)

    Pedro Pedroso1, Dimitri Papadimitriou2 and Davide Careglio1

    1 Barcelona-Tech UPC, Barcelona, Spain {ppedroso,careglio}@upc.edu2 Alcatel-Lucent Bell, Antwerp, Belgium, [email protected]

    1 Problem Statement

    Motivations:

    Today’s Internet routing system of large backbone operators is facing scalability problems. With thebooming of multimedia streaming/content, multicast distribution from a source to a set of destinationnodes is (re-)gaining interest as a bandwidth saving technique competing with or complementing cachedcontent distribution. Nevertheless, the scaling problems faced in the 90’s when multicast received mainattention from the research community remain unaddressed. Indeed, routing protocol dependent multicastrouting schemes (such as Distance Vector Multicast Routing Protocol and Multicast Open Shortest PathFirst) have been replaced by routing protocol independent routing schemes such as Protocol IndependentMulticast (PIM) and Core Base Trees (CBT) but overlaying multicast routing on top of unicast suffersfrom the same scaling limitations as current unicast routing with the addition of the level of indirectionadded by the multicast routing application. Multicast routing protocol enables routers to build a deliverytree between the sender(s) and receivers of a multicast group. Multicast routing table includes theMulticast Routing Information Base (MRIB) and the multicast Tree Information Base (TIB). The MRIBis the topology table, typically derived from the unicast routing table, which carries multicast-specifictopology information. The TIB is the collection of router state created from the exchange of Join/Prunemessages. This table stores the state of all multicast distribution trees at that router. Therefore, thecurrent size and growth rate of the core BGP route table are reaching unbearable values. Neither fullglobal information about all possible multicast sets, nor store the complete topological information can bemaintained anymore in order to cope with the growth of Internet multimedia applications. The memory-space consumption is huge. The identification of the major factors that are driving routing table growth,constraints in router technology, and the limitations of today’s Internet addressing architecture are crucialto help to define the next steps towards effective solutions [RFC4984].Compact routing schemes address the fundamental tradeoff between the memory space required to

    store the routing table entries and the length of the routing paths that these schemes produce. Thiswork introduces a compact routing scheme that allows the distribution of traffi c from any source toany set of leaf nodes along a multicast routing path that defines a distribution tree. By means of theproposed scheme, a multicast distribution tree dynamically evolves according to the arrival of leaf-initiatedjoin/leave requests. Two reference multicast routing schemes (the Shortest Path Tree and the Steiner Treealgorithm) are used to evaluate and compare the performance of the proposed scheme. The performancemetrics considered include the stretch of the produced routing paths, the size and the number of routingtable entries, and the communication cost. The results obtained by simulation, both on synthetic powerlaw graphs (modeling the Internet topology) and real topologies such as the CAIDA Internet topologymaps comprising 16k and 32k nodes, show that our scheme can successfully handle leaf-initiated dynamicsetup of multicast distribution trees. While increasing the communication cost compared to the ShortestPath Tree, the proposed scheme achieves considerable reduction of the routing table size compared toboth reference schemes. Moreover, the stretch of the resulting multicast routing paths show limiteddeterioration compared to the minimum value obtained with Steiner Trees. Graph theory and networkingdisciplines are here crossed to the achievement of high purposes.

    Our Contributions:

    The present work introduces the first known name-independent compact multicast routing (CMR)algorithm enabling the leaf-initiated, distributed and dynamic construction of point-to-multipoint (p2mp)routing paths from any source to any set of destinations (or leaves). These paths define multicast traffi cdistribution tree (MDT) since the algorithm instantiates the local state so that each MDT node can

    1

  • derive the entries to forward multicast traffi c received from the source to its leaves. We simulate the CMRalgorithm on synthetic power law graphs modeling (comprising both 10k and 16k nodes) the Internettopology and the CAIDA map of the Internet topology (comprising 16k and 32k nodes). To evaluate theCMR performance, we measure the stretch of the routing paths it produces, the memory space to storethe RT entries as well as the communication cost, i.e., the number of message exchanged to build theMDT. Two reference schemes, the Shortest Path Tree (SPT) and the Steiner Tree (ST) algorithm areused to compare its performance over the same topologies.Simulation results confirm that the CMR can provide a suitable algorithmic basis for balanced stretch

    and memory space consumption. Substantial gains in terms of the RT entries and memory space requiredto store them are obtained. Compared to the SPT, the gain in memory space consumption resultsfrom the elimination of the underlying unicast RT entries whereas, compared to the ST, this gain ismainly due to the elimination of the RT entries required at each step of the routing path construction.The proposed two-phase search process keeps the CMR communication cost within reasonable boundscompared to the reference SPT scheme and sub-linearly proportional to the multicast group size. Furtherwork will be nevertheless conducted to further decrease the communication cost of the CMR so as toreach this saturation level for smaller multicast group sizes. Another main area of investigation involvesthe investigation of the CMR performance on real topologies such as the CAIDA Internet topology mapswhich comprise 32k nodes.

    2 State-of-the-Art

    The problem of designing routing schemes with small routing tables (RT) size has been introduced atthe very beginning of the Internet, in the 1970’s by Kleinroch and Kamoun in their seminal work “Hi-erarchical Routing for Large Networks; Performance Evaluation and Optimization, Computer Networks”[Kleinroch77], and the theoretical aspects of compact routing have been mainly developed in late 1980’sby the work of Peleg and Upfall [Peleg89]. What we learn from these works, and from several subsequentones, is that every weighted network with n nodes has a routing scheme with RT of approximately n1/k

    memory space per router such that the length of any route of the scheme is no more than O(k) timesthe optimal length (i.e., the distance), where k is any integral parameter > 0. Recent results lead tosome significant improvement on the stretch as produced by routing scheme, i.e., the factor O(k), andon the capability of the scheme. [Thorup01] demonstrated for general weighted undirected networks thata name-dependent handshaking-based routing scheme that uses O(n1/k) bits memory at each router hasstretch 2k − 1 (for every integer k > 2). The authors also demonstrated that without handshaking, thestretch of the routing scheme increases to 4k − 5. In name-dependent (or labeled) routing schemes, ad-dresses (or labels) encode some topological information. As labels cannot be arbitrary, any topologicalchange implies node address change (renaming).Lots of attention has been captured by “name-independent”compact routing schemes. [Abraham08]

    presents the first compact name-independent routing scheme for arbitrary undirected weighted graphs.The routing scheme has stretch 3, and requires poly-logarithmic-bit headers and O(n1/2) bits of routinginformation per node. When routing along its stretch 3 paths, each routing decision is performed in con-stant time. The fundamental result is that with O(n1/2) bits of routing information per node, topology-dependent node labels does not improve the stretch factor compared to topology-independent namingof nodes. Indeed, in name-independent routing schemes, the addressing space is arbitrary and topology-unaware, i.e., independent of the topology, thus highly desirable for Internet routing system supporting dy-namic terminal multi-homing, mobility, etc.Recent schemes developed in [Abraham06c] [Abraham06d] can support "name-independence" with a

    stretch in O(k), for O(n1/k) memory space, where the hidden constant is about hundred. Recent studieshave applied compact routing schemes on Internet-like topologies by taking advantage of their topologicalproperties (network diameter growing logarithmically in the number of nodes and node degree distribu-tion following power law). [Krioukov04] showed that the average performance of the stretch-3 compactrouting scheme of [Thorup01] on Internet-like topologies is much better than its worst case, it achievesan average stretch = 1.1 (up to 70% of all pair-wise paths being stretch-1 shortest paths). Nevertheless,drawbacks resulting from the name-dependence of the routing scheme remain unaddressed and limit theirapplicability to static topologies (thus inapplicable for dynamic and evolutive topologies such as the Inter-net). Application of the name-independent general scheme of [Abraham08] to Internet-like topologies hasthus been trialed but leads to an average stretch of 1.5 (thus worse on average than its name-dependent

    2

  • counterpart). Despite the amount of works that have been achieved to deal with properties of large-scalenetworks, most of the previous results achieved with compact routing schemes consider only static net-works. Unfortunately, existing compact routing schemes does neither handle node/link insertion/removal(characteristic of the network evolution) nor failures (intermittent or permanent). Hence, compact routingschemes can not cope with network topology dynamics.

    Another bolding topic, pursuing similar objective, is related with the use of hyperbolic space in In-ternet. In [Boguñá10], the authors present a method to map the Internet to an hyperbolic space, thusresolving serious scaling limitations that the Internet faces today. Besides the immediate practical via-bility, such network mapping method can provide a different perspective on the community structure incomplex networks. The existing internet routing, which relies on only geographical or geometrical topolog-ical information, is unsustainable. As the authors say “We compare routing in the internet today to usinga hypothetical road atlas, which is really just a long encoded list of road intersections and connectionsthat would require drivers to pore through each line to plot a course to their destination without usingany geographical, or geometrical, information which helps us navigate through the space in real life.”.

    More interesting works:- Euler Project introduction documents.- Hierarchical Routing (see [Kleinroch77])- Name-dependent: topology-aware in node addressing- check on "Internet Observatory Annual 2009", [Arbor09]:analyze of traffi c trends in Internet.

    3 Benchmark Routing Algorithms

    We consider the Shortest-Path Tree (SPT) and the Steiner Tree (ST) algorithms to benchmark theproposed algorithm. SPT is the best way to construct optimal source-based distribution trees: optimalpath cost but higher resource consumption. Thus, it provides the communication cost reference. On theother hand, ST is the best way to create optimal shared distribution trees, where we consider an optimizealgorithm based on shortest path distances (i.e. ST algorithm). It is the reference in terms of stretch.In order to obtain the near optimal solution for the ST, we consider a ST-Integer Linear Programmingformulation. For this purpose, we have adapted the formulation provided in [SAGE] to be computed onbi-directional graphs.

    3.1 Shortest-Path Tree Algorithm

    The SPT is a connected subgraph without cycles (i.e. a tree) of a given weighted graph so that thedistance/cost between a selected source node and any other node of a multicast group, g, is minimal. Itis rooted at the multicast source node, s. The Dijkstra’s algorithm is used to compute the SPT, from agiven vertex.Whenever a node wants to join the SPT, it sends a < s, g > join message out on the proper interface

    towards the source node for that group using the proper multicast protocol to inform the upstream nodethat it wishes to join the distribution tree for that group. The upstream node receiving such message addsthe interface on which the message was received and then sends a (S,G) join message out the interfacetowards the source. The process is repeated in every subsequent node, building the SPT as it goes. Theprocess stops when the join message reaches i) the multicast source node or ii) a node that already hasmulticast forwarding state for this source-group pair. In either case, the branch is created and each of thenodes has multicast forwarding state for the source-group pair, and packets can flow down the distributiontree from source to receiver.

    3.2 Steiner Tree Algorithm

    The Steiner problem asks for a shortest network which spans a given set of points. Minimum spanningnetworks have been well-studied when all connections are required to be between the given points. Moreinformation about ST heuristics can be found in [Hwang92].

    3

  • 3.2.1 Minimum-Path Heuristic

    The computation of a minimum-cost Steiner tree is a NP-complete problem [Garey77]. As a first stepto approach our problem, we have implemented the minimum cost path heuristic algorithm (MPH) tocompute a minimum-cost Steiner tree for a multicast connection. In MPH, starting from a source node,the tree is gradually grown until it spans all destination nodes belonging to a multicast group. Thegrowth is usually based on the addition of shortest paths between destination nodes already in the treeand destination nodes not yet in the tree. A full description of the heuristic algorithm can be found in[Takahashi80].

    3.2.2 Integer Linear Programming

    In order to obtain the near optimal solution for ST algorithm, we consider a ST-Integer Linear Program-ming formulation. We here adapt the formulation given in [SAGE] to be computed on a bi-directionalgraph. It is defined as follows:

    Given a graph G, a cost function c : E(G) → R and a set M of vertices, we want to find an acyclicsub-graph of minimum cost, T, linking all them together. This sub-graph T of G has V = |V (T )| verticesand E = |E(T )| = |V (T )|−1 edges and contains each vertex from M. Note that E is a set of bi-directionaledges. For such reason, we set ei and eout as any incoming edge and outgoing edge, respectively.

    Notation:E = edgesV = verticesxe = binary variable indicating if e ∈ sub-graph Txv = binary variable indicating if vertex v ∈ Tce = cost of an edge e ∈ E

    minimize U =∑

    e∈Ece.xe (ILP)

    subject to∑eout∈Ee∼v

    xe ≥ 1, ∀v ∈M, (1a)∑ein∈E xe ≤ 0, v = s, (1b)∑ein∈Eein∼v

    xe ≤ 1, ∀v ∈ V,∀ein ∈ E , (1c)

    xe ≤ xv, e ∼ v,∀v ∈ V,∀e ∈ E , (1d)∑eout∈Ee∼v

    ≤ C.(xv +

    ∑ein∈Ee∼v

    ), ∀v ∈ V (1e)

    ∑eout∈Ee∼v

    ≥ −C.(

    (1−) +(

    1−∑

    ein∈Ee∼v

    ))+ 1, ∀v ∈ V (1f)∑

    v∈V xv −∑

    e∈E xe = 1, (1g)

    xe ∈ {0, 1}E , ∀e ∈ E , (1h)xv ∈ {0, 1}V , ∀v ∈ V, (1i)

    The objective of the optimization problem ILP is to minimize the total number of links used to connectall the vertices in M. (1a) gives that each node of the multicast group has to have at least one of its links,either incoming or outgoing, as part of the final sub-graph; (1b) the source node can not have any ofits incoming links as part of the final sub-graph; (1c) only one incoming link of the node can be part ofthe final sub-graph; (1d) means that if a link is part of the sub-graph so it is the node; (1e) and (1f)guarantee that if a node is not part of the multicast group and it has one incoming link, it must have atleast one outgoing link as part of the sub-graph too; (1g) says that the total number of nodes is equal tothe number of links plus 1.

    ILP Time Complexity:

    O(|M |+N.V + 1 + V +N) = O(|M |+ (1 + V )(N + 1))

    4

  • 4 Compact Routing

    A routing scheme is COMPACT if it is memory effi cient. Its goodness is measured by its STRETCH.The main goal is to minimize the size of the routing table at each node. In other words, compact unicastrouting aims to find the best tradeoffbetween the memory-space required to store the routing table entriesat each node and the stretch factor increase on the routing paths it produces. Such routing schemes havebeen extensively studied following the model developed in the late 1980’s by Peleg and Upfall [Peleg89].Since then, following the distinction operated by Awerbuch [Awerbuch89], various labeled compact routingschemes (nodes are named by polylogarithmic size labels encoding topological information) and name-independent compact routing schemes (node name space is topologically independent) have been designed[Thorup01], [Abraham08].As recently formalized in [Abraham09], dynamic compact multicast routing algorithms enable the

    construction of point-to-multipoint (p2mp) routing paths from any source to any set of destinationsreferred to as leaves. As mentioned above, such routing paths define a distributed tree referred as MDT.The routing algorithm creates and maintains the set of routing states used by each router part of theMDT to derive the necessary information to forward multicast traffi c from the source to the leaves.

    4.1 Algorithm’s Performance Metrics

    In addition to the conventional stretch and memory-bit space tradeoff, the performance analysis of theproposed compact multicast routing algorithm considers its communication cost.

    4.1.1 Stretch

    The stretch (of a routing scheme) is defined as the ratio over all source-destination pairs between therouting scheme path cost/length and the minimum path cost/length for the same source-destination pair.Intuitively, the stretch of a routing scheme provides a quality measure of the path cost/length increaseit produces compared to the shortest paths. Shortest path routing schemes either AS-path length based(path vector routing) or cost-metric based (link-state routing) are stretch-1. This metric is interesting tomeasure because compact routing schemes that produce reduced routing tables, are not always able tochoose the minimum cost/length path for a given destination. On the other hand, the routing schemeshould favor computation and/or selection of routes whose stretch remains closer to 1.

    stretch =costSPTcostST

    ; stretch =costCMRcostST

    (2)

    4.1.2 Routing Table Size

    Computed using the size of a single entry of the routing table (RT) and the number of entries it comprises.The size of the RT is directly related to routing system scalability because the less memory a router needsto store its entries, the more scalable the routing system would be. Shortest path routing schemes areincompressible, i.e., for all nodes in for all graphs, their lower bound equal their upper bound, i.e., O(nlog n) bits are required to store their RT entries [[Gavoille96], [Krioukov07]. Note that when designing arouting scheme, one must take into account the fundamental trade-off that exists between the stretch ofa routing scheme and the size of the RT it produces. Some upper bounds for the max RT size are definedin previous papers: TZ scheme: RT = Õ(n

    12 ) per node and Coweni scheme: RT = Õ(n

    23 )per node.

    The terminology used to model the multicast routing information base is borrowed from ProtocolIndependent Multicast (PIM). PIM defines the following Information Bases or Tables, as defined in [PIM].The TIB (Tree Information Base) is the multicast routing table. It essentially stores the state of allmulticast distribution trees necessary to forward multicast packets at a router. The MRIB (MulticastRouting Information Base) is the multicast control message routing. The MRIB is used to determine thenext-hop neighbor to which any Join/Prune message is sent on SP tree. This is the multicast topologytable (usually derived from underlying “unicast” routing table, e.g.,.PIM runs as an overlay routing ontop of Shortest Path (SP) producing shortest-path tree (SPT). MFIB (multicast forwarding informationbase) is derived from the TIB. From the TIB entries (multicast routing table entries), forwarding entriesare derived that are used to forward the multicast traffi c. MRIB entries are constructed for multicastdistribution tree build up and maintenance. For each one of the algorithms considered in this work, wehave the following number and format of RT entries:

    5

  • SPT Algorithm Three types of routing entries are involved in this algorithm, namely URIB, MRIB andTIB. The URIB entries are maintained by every node in the network, consisting in one entry indicating thepath towards the multicast source as a sequence of Autonomous System (AS) interfaces, plus M routingentries enabling communication with direct neighbors. This stems because when a route is received fromone interface it does not necessarily propagate to all other interfaces; the MRIB entries are maintainedby each node of the tree and are derived from the URIB entries; and iii) the TIB entries also maintainedby each node of the tree to forward data packets towards the destination nodes of the multicast group.

    Type |Entries| Description FormatURIB 1 AS-Path, defined as a sequence of

    AS’. This gives the path towardsthe multicast source.

    , as ASs+ASk1+...+ASd

    URIB M This enables the communicationwith the direct neighbors.

    MRIB 1 entry per nodeof SPT\{s}

    It indicates the upstream neigh-bor which to send the join re-quests (toward the source s alongthe SP tree)

    < (S,G), address of the next-hop neighbor toward the sourceS along the SP tree>

    TIB 1 entry per mul-ticast group G

    One entry (state) per multicastdistribution tree i.e. each statecorresponding to the local entryof the SP tree for source S andgroup G.

    (S,G) state

    ST Algorithm With the absence of unicast traffi c, the URIB entries are not created. More precisely,the MRIB is constructed from the algorithm itself and not from the URIB as in the SPT case. Thedissemination of current MDT information to remote nodes of the network is done by flooding and not bydirected propagation of routing information like in the SPT case. Thus, the M URIB entries are discardedhere. We could indeed assume directed forwarding in the ST case too but this would make the processingat intermediate nodes more complex; hence the flooding procedure is kept for the ST case.

    Type |Entries| Description FormatMRIB 1 per node of

    MDT\{s}MDT topology description, i.e., itindicates the upstream neighborto which to send the join requestsalong the MDT.

    < (S,G), address of the next-hopneighbor towards the MDT>

    MRIB 1 per node of thenetwork

    It indicates the best next hopneighbor to which to send the joinrequests towards the MDT.

    < (S,G), address of the best nexthop node towards the MDT>>

    TIB 1 per node ofMDT

    One entry (state) per multicastdistribution tree i.e. each statecorresponding to the local entryof the shared tree for source S andgroup G

    (S,G) state

    CMR Algorithm This algorithm needs to keep only MRIB and TIB type of entries. No URIB entriesare created or maintained by this algorithm (there is NO global view of the topology), what is one itsmain advantages. As regards the MRIB, it is constructed from the algorithm itself. The dissemination ofinformation is done by flooding through node’s interfaces. There is a clear distinction between "port" androuting table entry to a non-adjacent neighbor otherwise we have a comparison problem with rrespect tothe overall classification of entries. The gain from CMR comes from its independence from unicast routingwith respect to SPT and the absence of dissemination information (locally stored) for the ST in the STcase. The structures are maintained per interface during the construction (in the MRIB) but releasedafterwards. A non selected node is not stateful for subsequent searches which is re-initiated —in fact thetradeoff is to keep full structures per multicast source and then the entries become part of the MRIB orrelease/rediscover for each request but then the communication cost is rather high.

    6

  • Type |Entries| Description FormatMRIB 1 per node of

    MDT\{s}MDT topology description, i.e., itindicates the upstream neighborto which to send the join requestsalong the MDT.

    < (S,G), address of the next-hopneighbor towards the MDT>

    TIB 1 per node ofMDT

    One entry (state) per multicastdistribution tree i.e. each statecorresponding to the local entryof the MDT for source S andgroup G.

    (S,G) state necessary to forwardthe multicast packets.

    Data Structure of each Routing Entry Each single routing entry must be encoded using a properdata structure scheme, helping to derive its size in number of bits. For instance, let us consider an interfaceencoded over 32 bits, an address over 32 bits, an AS over 16 bits (as an AS’s path being defined as asequence of AS’s) and cost/distance metric over 16 bits. This values are extracted from [RFC4601].In this work, we consider a compact TIB entry where it may contain several outgoing interface addresses

    (to the multicast case only). The multicast forwarding mode is performed according to the Reverse PathForwarding (RPF) mechanism, in which a data packet is accepted for forwarding only if it is received onan interface used to reach the source in unicast.

    Type Format Size (in bits)

    URIB <S∑AS_address > |S| ∗ 16

    URIB , cost > 32 + 16 = 46MRIB , interface_address > 32 ∗ 3 = 96TIB ,

    ∑interface_addr > 32 ∗ 3 +

    n∑address

    4.1.3 Communication Cost

    The dynamic nature of the routing protocol such as those currently deployed over the Internet allowseach router to be kept up to date with respect to non-local topological changes (resulting from topologicalfailures, addition/withdraw of routes and ASes). The latter information is exchanged between routers bymeans of routing information updates (each router timely distributes to its own peers following specificselection criteria the routing information received from other peers). Communication cost is defined asthe number of routing updated messages that needs to be exchanged between routers to converge after atopology change. Recently, [Korman06] showed that the communication cost lower bound for scale-freegraphs is at best linear up to logarithmic factors. The number of routing updates may change accordingto the advertisement technique (time or event-driven).

    4.2 Compact Multicast Routing Algorithm

    The concept of compact multicast routing is introduced in [Abraham09]. Compact multicast routingschemes are distributed algorithms i.e. an algorithm to route multicast group so as to store and maintainthe state of all multicast distribution trees at a router. These states are used to derive the necessaryinformation to forward multicast packets at a router.The present work proposes the CMR algorithm, a name-independent compact multicast routing algo-

    rithm for leaf-initiated, distributed and dynamic construction of MDT. In this context, “leaf-initiated”means that the join/leave requests are initiated by the leaves; “distributed” implies that transit nodesprocess the join/leave requests and compute the routing table entries (no centralized processing by theroot); and “dynamic” refers to the on-line capability to timely process the join/leave requests as theyarrive without re-computing and re-building the MDT from scratch. The proposed scheme is also char-acterized by its independence from any underlying unicast routing topology required by leaf-initiatedmulticast routing schemes such as PIM [PIM]. In other terms, the local knowledge of the cost to directneighbor nodes is suffi cient for the proposed routing scheme to properly operate. As such, it is actually a

    7

  • true “protocol independent”multicast routing scheme. The following performance metrics are considered.The memory complexity (expressed in terms of memory-bit space) of a multicast routing scheme is definedas for its unicast counterpart: the maximum number of memory-bits required to locally store the routingtable entries (the {next-hop, destination} information associated to any routing path) produced by therouting algorithm. However, the stretch is now defined as the total weight of edges used by the algorithmto deliver the multicast packet from source s to all leaf nodes D ⊆ V , where V is the total number ofnodes or vertices, divided by the weight of the minimum ST sourced at s ∈ V . In the present context,an additional metric shall be minimized: the communication cost, defined as the number of messagestriggered by the sequence of joining/leaving nodes and exchanged for the algorithm to build the MDT.Aiming to mitigate the communication cost, the proposed algorithm segments the searching space intoa local and a global space. This segmentation enables to devise a two-stage search process. The joiningleaf locally searches first in its neighborhood, called vicinity, for a node belonging to the MDT. If itsvicinity does not include any node belonging to the MDT, the leaf node then initiates a global searchon the remaining part of the topology. As later shown in Section IV, this two-stage process considerablydecreases the communication cost induced by the algorithm.

    Compared to [Abraham09], the present paper proposes a dynamic leaf-initiated name-independentcompact multicast routing algorithm with distributed computation of the routing table entries and analy-ses its performance over large-scale power-law graph of 10k nodes. A distinctive property of the MDTconstruction proposed in [Abraham09] is that it is oblivious: the routing path from the source to a giventarget is irrespective of the current set of other leaves. The proposed algorithm is oblivious if and only ifrouting state minimization (shared tree) is not part of the metric set. Indeed, the main (known) limitationof the proposed routing scheme is that is not totally oblivious, in the sense that leave events may resultinto MDT re-organizations. Means by which such -triggered-adaptation can be performed are still underinvestigation. One possible way to address this limitation consists in notifying downstream nodes fromthe leave event(s) at the expense of increasing communication cost. Also leaf nodes shall be configuredwith a maximum path cost to limit the search phase for reaching the MDT (in particular, at the veryinitial steps of the tree construction) independently of the waiting time. Being considered as part of thebootstrapping process this initial configuration is not perceived as an actual limitation (even if an adaptivecost with respect to the size of the tree would decrease communication cost).

    4.2.1 Preliminares

    Consider a network topology modeled by an undirected weighted graph G = (V,E, c), with |V | = ν whereV represents the finite set of nodes or vertices (all with multicast capabilities), |E| = µ where E representsthe finite set of links or edges, and c a non-negative link cost function c : E → Z+ that associates a costc(e(u, v)) to each link e(u, v) ∈ E. Let S be the finite set of source nodes, S ⊂ V and D the finiteset of destination nodes of a multicast group, where D ⊆ V \{S}, |S|

  • As stated before, the reduction in memory space consumed by the routing table results however inhigher communication cost compared to the reference algorithms, namely the SPT and the ST. Highercost may hinder CMR applicability to large-scale topologies such as the Internet. Hence, to keep thecommunication cost as low as possible, the algorithm’s search process is segmented in two different stages.The rationale is to put tighter limits and search locally before search globally. Indeed, the likelihood offinding a node of the MDT within a few hops distance from the joining leaf is high in large topologies(whose diameter is logarithmically proportional to its number of nodes) and it increases with the size ofthe MDT. Hence, searching in the entire topology every time a leaf node decides to join a MDT maybe too costly from a communication perspective. Therefore, we segment the algorithm’s search process,executing first a local search covering the leaf’s neighborhood, and if unsuccessful, executing a globalsearch over the remaining topology.

    4.2.3 Algorithm Basic Operations

    The MDT construction, Ts,D, is leaf-initiated and processed iteractively. Thus, at each step ω, ω =1, 2, .., |D|, a randomly selected node u joins Ts,M ,M ⊆ D. If node u is already part of Ts,M (u ∈ VT ) thenit is either a transit or branching node of the MDT. Otherwise, node u is not part of Ts,M (u ∈ D\{VT })and it must search for the least cost branching path towards a node v ∈ Ts,M .Let node i be any node ∈ N and K to be the set of upstream neighbor nodes of node i such that

    |K| = deg(i). Then, let ti,k denote the link cost between node i and any of its neighbors k ∈ K, calledtangent cost, ti,k = c(e(i, k)), and rk,v denote the sum of the link costs between i’s neighbor node k andany node v part of Ts,M (v ∈ VT ), called radial cost. Note that none of the network nodes n ∈ N storeany routing information besides those |K| entries, at this step of the execution. Thus, every node i knowsthe tangent cost ti,k to each of its neighbor nodes k but it is not aware of the radial cost rk,v to theTs,M . Finally, let ci,v denote the cost of a branching path pi,v, where ci,v = ti,k + rk,v. Among the setPi,v of possible paths from node i /∈ Ts,M to node v ∈ Ts,M , the least cost branching path is denoted byp∗i,v = min{ci,v|pi,v ∈ Pi,v}.Two types of messages are involved in this process, namely the request (type-R) messages flowing in

    the upstream direction, i.e. towards the multicast source s, and response (type-A) messages sent in thedownstream direction towards the joining leaf node u. A full message content description is presentedfurther in this document (4.2.4). Type-R messages comprise three main fields: i) a sequential numberSN = {uid, < s, g >}, where uid identifies the leaf node and < s, g > encodes the multicast source/grouppair, thus prevents duplication of request messages; ii) the leaf node u’s timer value τ(u) to set the waitingtime at intermediates nodes before answering back to the downstream neighbor node (pred(u)); and iii)a variable path budget π, a maximum (dissemination) path budget to each message that discards thosewith too long and unneeded range (to keep the communication cost as low as possible). This path budgetis bound by a threshold set to the graph diameter (the length of the longest shortest path) for whichapproximation algorithms exist, as well as method for computing a lower and upper bound [6]. Type-Amessages comprise the locally selected (i.e. radial) cost r∗k,v where k is the local node and v ∈ Ts,M .

    Forward Direction: Type-R message

    The process starts with the leaf node u (i = u) sending type-R messages to all of its direct neighborsk (succ(i)) to find the least cost branching path to a branching node v ∈ Ts,M by asking for their radialcost rk,v (cost of the sub-path from node k to v). If k = v, v ∈ Ts,M , node v sends back to its downstreamnode(s) a type-A message indicating the radial cost metric value (in this case rk=v,v = 0). Otherwisei = k, and node k decrements the path budget π by 1 and set its local waiting time τ(i) = wmax − w,where w is incremented at each node to account for propagation and processing delay. Node k thenforwards a type-R message to its neighbor nodes, |K| = deg(k), except to the node from which theincoming type-R message has been received (split horizon). We say that node k sends a type-R messageto its upstream neighbor nodes. To decrease communication cost, before sending type-R message, nodek checks the resulting path budget, discards message whose path budget reaches 0. Moreover, node kkeeps the received sequence number {uid, < s, g >} to prevent duplication of the type-R messages (forsame request) toward its upstream neighbor nodes and consequently avoid loops. When node k receivesanother type-R message requesting to join the same multicast source s as part of the multicast groupg, this message is not further propagated to its upstream neighbor nodes. Node k simply records theincoming edge (to subsequently send a type-A message when all responses are collected). Note that the

    9

  • number of incoming type-R message can be at most equal to deg(k). This process continues until thetype-R message reaches a node v such that v ∈ Ts,M (for the first leaf, v corresponds to the multicastsource s).

    Backward Direction: Type-A message

    Before answering back to its downstream node(s), a node i (i 6= u,v ∈ Ts,M ) must verify one of thefollowing conditions: i) having received the entire set of answers from its upstream neighbor nodes beforeits waiting timer τ(i) expires, ii) having waited until expiration of its waiting timer τ(i) being initiatedafter the reception of the first type-R message received directed to a given source s. Once one of thesetwo conditions is met and |type-R message| ≥ 1, node i computes the branching path cost ci,j from itselfto any node v ∈ Ts,M using the radial cost rk,v received from its upstream neighbor nodes and the propertangent cost value, ti,k. It then selects the least cost branching path p∗i,v, and sends the corresponding costvalue, c∗i,v to its downstream node(s). If |type-A message| = 0 at waiting timer expiration, the cost valueci,v is set to infinite indicating that the multicast source s is unreachable. Node further downstream in theleaf node u direction may ignore these messages if they receive type-A message(s) from other upstreamnodes with finite cost value c∗i,v.

    The algorithm terminates when the leaf node u receives all type-A message (in response to the type-Rmessages it initiated) and determines the upstream neighbor node along the least-cost branching path p∗u,vtowards the MDT (p∗u,v = min{cu,v|pu,v ∈ Pu,v}). If |type-A message| = 0 at waiting timer expiration orthe cost value cu,v in all received type-A message is set to infinite, node u declares the multicast sources unreachable. Node u further proceed by sending to this upstream neighbor a Join request message φurequesting connectivity to TS,M . At the end of this process, the routing table of each node v belonging toTs,M (v ∈ VT ) includes per MDT: i) one routing entry indicating the upstream neighbor node to whichany Join/Prune message is to be sent for that MDT (stored in the multicast routing information base orMRIB), and ii) one multicast traffi c routing entry (stored in the tree information base or TIB) as definedby pair state < s, g >, where is the source of the multicast traffi c and g the multicast group. Equivalently,the pair identifies the MDT Ts,M.In Fig.1 is illustrated a simple example of the search process. Normal arrows represents type-R

    messages and dot arrows type-A messages. Let node A be any node ∈ V sending Type-R messages toits upstream neighbor nodes B and C (t = 1). Because none of them belongs to Ts,M , each one of themrepeats the same process than node A (forward type-R towards a node of Ts,M ). Let us now focus in nodeC neighboring. It sends type-R messages both to node D and node E (t = 2). Node D is part of MDTand it answers back to node C with proper type-A message (t = 3). Node E is not and forwards type-Rmessages to its neighbors (except node C - split horizon, t = 3). Note that node C does not reply backto node A while it does not receive a type-A message from E (or t(C) expires). Assuming node E has

    10

  • Figure 1: Join Process: normal arrows represent the Type-R messages and dot arrows the Type-A mes-sages. Node D is part of MDT.

    only 2 interfaces has shown, it receives a type-A message from node D at t = 4 and consequently nodeC receives it at t = 5. Having |type-A message received| = |type-R message sent|, node C computes theleast cost branching path (either pc,d or pc,e,d) and then replies to node A at t = 6.

    4.2.4 MDT building related messages

    The number of exchanged messages play an important role in the complex trade-off process envolvingseveral factors such as the multicast-cost (i.e. stretch), the communication-cost and the routing tablesize in the MDT build up process. These messages are responsible for the MDT build up process which"dig" the network aiming to find the least-cost branching path. Its "modus operandis" should be carefullydefined as it will determine how complex and accurate we want the algorithm to be, having strong impacton parameters like complexity, time convergence and computational cost. There are two type of messagesgenerated in the process, namely request (type-R )and answer (type-A) messages:

    Type-R message The request message is basically responsible to find the MDT. The message comprisesthree main fields: a) a sequential number that uniquely identifies the current request, defined as SN ={u_id, request_identifier}, to detect convergence and consequently prevent duplication of messages,where u_id identifies the leaf node and encodes the multicast source/group pair; b) leaf timervalue, to set the maximum time that the node has to answer back to the sender (upper node) and c)a maximum path budget, to keep the communication cost as low as possible, helping to discard thosemessages with too long and unneeded range, set in accordance to the current criterion.

    Type-A message the answer message is the response to a request message and it is originally generatedwhen a type-A message reaches a node that does belong to the MDT (positive answer) or whenever thenode doesn’t belong to the MDT but its outgoing interfaces are equal to zero, or even if the request’ssearch budget has expired (negative answer). In the case of a positive answer, it carries out the sub-pathp∗k,j and the (radial) cost metric, r

    ∗k,j , associated to such sub-path, towards the leaf, where k is an local

    node and j the node of the tree, k 6= j, and the identifier of the vicinity edge node when flag e=0 (seebelow).

    4.2.5 Search Segmentation: Two-stage procedure

    The communication cost of the proposed algorithm can be a seriously constraint in large topologies,O(>1k). As said before, the algorithm’s search process is segmented in two different stages in order tomitigate it. The rationale is to put tighter limits and search locally before search globally. Indeed, thelikelihood of finding a node of the MDT within a few hops distance from the joining leaf is high in largetopologies (whose diameter is logarithmically proportional to its number of nodes) and it increases withthe size of the MDT. Hence, searching in the entire topology every time a leaf node decides to join a MDTis too costly from a communication perspective. Therefore, we segment the algorithm’s search process,executing first a local search covering the leaf’s neighborhood, and if unsuccessful, executing a globalsearch over the remaining topology. The flag e distinguishes the messages exchanged during the search

    11

  • Algorithm 1 Dynamic Compact Multicast Routing (CMR) AlgorithmRequire: G, Ts,M , τ(u) = τmax, π = πmax, u /∈ Ts,M , v ∈ Ts,MEnsure: p∗u,v 6= ∅, x∗u,v > 01: set E = ∅2: K = neighbors(u)3: Send type-R message to each k upstream neighbor node4: for k ∈ K do5: sender = u; receiver = k;6: m = {"request",sender,receiver,π,τ}7: E ← E ∪ {m}8: end for9: Process each generated message m10: while E 6= ∅ do11: m← E12: if m.type equals to "request" then13: processingRequestMessage(m)14: else15: processingAnswerMensage(m)16: end if17: Update node waiting time values18: for n ∈ N do19: if τ(n).enable then20: if τ(n).expired then21: Ω{pi,j , xi,j} = computeSubBranchingPath(n)22: if n 6= u then23: sender = n; receiver = downstream(n)24: m = {"answer",sender,receiver,Ω }25: E ← E ∪ {m}26: end if27: else28: Update waiting time29: end if30: end if31: end for32: end while33: Return p∗u,v and x

    ∗u,v

    12

  • Procedure 2 Processing Request MessageRequire: m, m.type == "request"Ensure: mnew, mnew.sender == m.receiver, mnew.receiver 6= m.sender1: i = m.sender2: j = m.receiver3: Check if the receiver is a node of Ts,M4: if j ∈ T then5: path = ∅6: path← path ∪ {j}7: radial = 08: Ω = {path, radial}9: sender = j; receiver = i10: m = {"answer",sender,receiver,Ω }11: E ← E ∪ {m}12: else13: if m.pbudget > 0 then14: K = neighbors(j)\{i}15: if K = ∅ then16: m = {"negative answer",j,i,∅} // send negative answer17: E ← E ∪ {m}18: else19: if j.firstime then20: downstream(j) = i // save the downstream node i at node j21: p

    budget = updated m.pbudget22: T

    ′= updated Timer T

    23: enable Timer T = T′

    24: for k ∈ K do25: sender = j; receiver = k;26: m = {"request",sender,receiver,p

    budget,T′}

    27: E ← E ∪ {m}28: end for29: else30: enable secundary timer31: save the Answer to this holding Type-R message32: end if33: end if34: else35: m = {"negative answer",j,i,∅} // send negative answer36: E ← E ∪ {m}37: end if38: end if

    13

  • Procedure 3 Processing Answer MessageRequire: Message m, m.type == "answer"Ensure: new message m, m.type == "answer"1: i = m.receiver2: k = m.sender3: if Timer T has not expired then4: A ← A ∪ {m} //save Type-A message at node i, set A5: K = neighbors(i)6: if |A| = |K − 1| then7: Ω{pi,j , xi,j} = computeSubBranchingPath(i)8: sender = i; receiver = downstream(i)9: m = {"answer",sender ,receiver ,Ω}10: E ← E ∪ {m}11: else12: Keep on holding stage13: end if14: end if

    Procedure 4 Compute Sub Branching Path x∗i,jRequire: A, iEnsure: Ω{p∗i,j , x∗i,j}1: x∗i,j = 100002: for m ∈ A do3: k = m.sender4: jj ∈ Ts,M5: radialk,j = m.Ω.radial6: pk,j = m.Ω.path7: xi,j = tangenti,k + radialk,j8: if xi,j < x∗i,j then9: x∗i,j = xi,j10: p∗i,j ← pk,j ∪ {i}11: end if12: end for13: Return Ω{p∗i,j , x∗i,j}

    14

  • stages, both type-R and type-A messages are flagged as internals, e=0, if belonging to the local searchprocedure, and as externals, e=1, otherwise.

    Local Search This first stage consists in a limited search within a certain perimeter of the topologyaround the joining leaf u. As illustrated in Fig.2, the contiguous set of nodes covered during this firststage is called vicinity, B ⊆ V , where nodes b ∈ B are referred to as vicinity nodes. The vicinity B isdelimited by vicinity edge nodes, bv, i.e., nodes at a given hop-count distance, determined either by one ofthe two following criteria: i) cost-threshold or ii) number of vicinity nodes proportional to n0.5/log(n). InSection 6, we show that for power law graphs this proportionality leads to the minimum communicationcost. During this stage, the pbudget of each type-R message carries the criterion value (set at leaf node u)that delimits the vicinity of leaf node u, B(u). If the criterion is set to the cost-threshold, starting fromnode u, pbudget value is decremented at each hop according to the travelled link cost; nodes with pbudget> 0 determine nodes b ∈ B(u). On the other hand, if the criterion is set to the maximum number ofnodes part of its vicinity B(u), pbudget is decremented at each hop with the vicinity node’s out-degree.In both cases, nodes setting pbudget < 0 are identified as vicinity edge nodes of B(u). For instance, Fig.1 assumes a maximum pbudget of 8 at node u. At its neighboring node b1, pbudget = 8(deg(u) = 5) = 3.Hence, when the vicinity node b1 forwards a type-R message to its neighbor nodes (except, by applicationof split horizon, to the node from which the type-R message has been received), the value pbudget = 3−out-deg(b1) = 0. Applying this procedure to node b2 leads to the same result since the out-degree of this nodeis also equal to 3. This procedure settles the maximum reachability of type-R messages with flag e=0 bydetermining the size of the vicinity |B|, whenever pbudget = 0.

    The local search starts with the leaf node u sending internal type-R messages (i.e., flag e=0) to allits direct neighbor nodes b (upstream nodes) to find the least cost branching path to a branching nodev ∈ Ts,M (v ∈ V T ). Referring to Fig.1, leaf node u sends type-R message to nodes b1, . . . , b5. This processcontinues until the type-R message reaches i) a node v ∈ Ts,M and pbudget > 0 or ii) a node v /∈ Ts,Mand pbudget = 0. In the former case, a node belongs to the tree is found; in the latter, a vicinity edgenode is reached (node v = bv) but no nodes belong to the tree are found. At this point, node v repliesto its neighbor node(s) from which it has received the type-R message(s) with a type-A message. If nodev = bv, then the radial cost is set to infinite. If not, the radial cost is computed as follows.Each downstream node w (w 6= bv, w /∈ Ts,M ) computes all the branching path costs cw,v from itself

    to node v (where v ∈ Ts,M or v = bv 6= Ts,M ). The cost cw,v is defined as the sum of the cost of edgejoining node w to one of its upstream node i and the cost of the path from node i (i /∈ Ts,M ) and v(v ∈ Ts,M ). The latter, referred to as the radial cost, is included in the type-A message sent from nodei to w. Node w then selects the least cost branching path p∗w,v and sends the corresponding cost valuecw,v* to its own downstream node(s). Receiving nodes process this value as the new radial cost and thecomputation starts again. This stage terminates when node w = u and the leaf node u has received alltype-A message (in response to the type-R messages it initiated). If |type-A message| = 0 at waitingtimer expiration or the cost value cw,v in all received type-A message is set to infinite, node u declaresthe multicast source s unreachable and launches the global search method (see Section III.C). Otherwise,the process is completed and the leaf node u determines the upstream neighbor node along the least costbranching path p∗u,v(= min{cu,v|pu,v ∈ Pu,v}) to Ts,M. Leaf node u then further proceeds by sending tothis upstream neighbor node a join request message requesting connectivity to Ts,M .

    Global Search: This stage represents the search of the MDT’s branching node outside the vicinity ofthe leaf node. This process is triggered by the leaf node when the local search phase ends by declaring themulticast source s as unreachable in its vicinity. The global search phase comprises a set of distributedsearch processes triggered by the leaf node u and started at each vicinity edge node bv (see Fig.3). Type-R messages marked as external (i.e., flag e=1) are used in this search phase. Two issues can arise here.The first one is that the external type-R messages have to reach the vicinity edge nodes without beingflooded inside the B(u) again. For this purpose, the leaf node u sends the external type-R messages (flage=1) directly to each of its vicinity edge nodes bv using a single possible path. Indeed, during the localsearch phase, the internal type-A messages received by the leaf node u include the identifier of the nodebv that initiates them. As well, vicinity nodes b ∈ B(u) keep per vicinity edge node bv, a single activeinterface from which type-A messages with infinite radial cost have been received (indicating that theneighbor node sits along the path from leaf node u to a given edge node bv). Moreover, to avoid that anode b ∈ B(u) within the vicinity receives back external type-R messages during the global search stage

    15

  • vicinity edge node bv filter incoming type R messages (flag e=1). The type-A messages sent during thelocal search are tagged with the flag e=0 sent in response to the reception of type-R message (flag e=0).Interfaces sending such type-A message are removed from the list of interfaces for forwarding of type-Rmessage (flag e=1). The exception is for interfaces having received a type-R message (flag e=1) with leafnode u as sender to enable edge vicinity nodes to sent back the answer to node u once the global searchcompletes for that node bv.During the global search phase, the pbudget value is bound at node u by a threshold set to the graph

    diameter (length of the longest shortest path). Approximation algorithms exist to compute this value aswell as method for computing a lower and upper bound [8]. Each node bv sets the maximum waitingtimer wt, TMAIN in Fig. 3, and the subsequent search process proceeds as follows. For instance, assumethat node bv sends external type-R messages to each of its neighbor nodes except to its downstream nodeas explained here above. It then waits for receiving the same number of external type-A messages (e=1).Upon reception, node vb determines the least-cost branching path p∗u,v to Ts,M (p

    ∗u,v = min{cu,v|pu,v ∈

    Pu,v}), where u = bv. Node bv is ready to answer back to leaf node u once either of the following is met:i) it receives the entire set of type-A messages from its upstream neighbor nodes (before its waiting timerwt expires) or ii) the waiting timer wt initiated after reception of the first type-R message (e=1) fromleaf node u expires. Once one of these two conditions is met and |type-R message| > 1, node bv computesthe branching path cost cu,v from itself to any node v ∈ Ts,M using the radial cost cw,v received from itsupstream neighbor nodes w and the cost of the link from itself to node w. Node bv then selects the leastcost branching path p∗u,v, and sends the corresponding cost value, c

    ∗u,v directly to the joining leaf node u.

    If |type-A message| = 0 at waiting timer expiration, the cost value cu,v is set to infinite indicating that themulticast source s is unreachable. Hence, as soon as this search phase terminates, each node bv returnsa unique type-A message directly to the leaf node u from which it initially received an external type-Rmessage. Thus, no routing decisions are taken at the vicinity nodes along the route taken by the type-Amessages (e=1) towards the leaf node u. This route is given by the downstream interface maintainedby each node b ∈ B(u) when the type-R message is received. Fig.3 shows the node b1 receiving twotype-A messages from bv1 and bv2. In opposition to the previous stage, here b1 does not perform anycomputation or routing decision. It just forwards the incoming type-A messages received from nodes bv1and bv2 towards the leaf u. At leaf node u, as many type-A messages as the number of vicinity edge nodescan be received. A global timer is set at the joining leaf node u to prevent a too long waiting time. Notethat i) the records locally created during the local search phase are subsequently deleted by the nodesending a type-A message (flag e=0) that does not include an infinite cost to a vicinity edge node bv, andii) the records remaining and/or locally created during the global search phase are deleted by the nodesending a type-A message (flag e=1).

    4.2.6 Algorithm Computational Cost

    The computational cost is defined with respect to time and resource complexity. While the time complexityis limited by the maximum waiting time of the joinning leaf node, the resource complexity consists inthe CPU and memory consumption. The RT "gains" comes at the expense of higher communicationand processing complexity. It becomes a problem of memory (to keep the RT) vs. CPU (to process themessages).

    4.2.7 Algorithm Convergence Time

    One of the main advantages of the proposed algorithm is its independence from any underlying unicastrouting information and thus from any topological changes. The algorithm terminates when the leafnode u has the upstream entry towards the MDT, Ts,M . Such entry is computed upon the receptionof all the answers from its direct neighbors, which happens within a maximum (and worst case) timeT (u). This T (u) value is set to cope with the maximum round-trip time of a type-R/type-A message(worst case) and ensure that the leaf node u joins the MDT through the least cost branching path,p∗i,j = min {ci,j : pi,j ∈ Pi,j}.

    This maximum RTT is determined by the diameter of the graph G, diameter(G) = max{e(v) : v ∈ V (G)} ,i.e., the greatest distance between any pair of nodes. T (u) is defined as follows:

    T (u) = RTT = 2 ∗ diameter(G) ∗ (tpropagation + ttransmission) , (3)

    16

  • Procedure 5 Two-Stage Search ProcedureRequire: G, N, Ts,M , diameter, uEnsure: Ωglobal 6= ∅ if Ωlocal = ∅, or Ωlocal 6= ∅1: pbudget = f(n)2: flag e = 03: Ωlocal ← CMRA_Algorithm(G,N,Ts,M ,pbudget,u,flag e) //LOCAL Search4: if Ωlocal = ∅ then5: pbudget = diameter6: flag e = 17: set E = ∅8: L = active upstream interfaces to forward type-R flag e=1 messages9: for l ∈ L(u) do10: Send request m11: m = {"request",u,l,p

    budget}12: E ← E ∪ {m}13: end for14: while E = ∅ do15: if pbudget > 0 then16: if L(m.receiver) 6= ∅ then17: Vicinity edge node is "found"18: Ωglobal ← CMRA_Algorithm(G,N,Ts,M ,pbudget,u,flag e) //GLOBAL search19: else20: Forward request message, flag e=121: E ← E ∪ {m}22: end if23: end if24: end while25: end if26: Returns the least cost branching path towards MDT, Ts,M

    Figure 2: Local Search stage: search the node of the MDT within a limited perimeter called vicinity.

    17

  • Figure 3: Global Search stage: If local search fails to find a node of the MDT, a search outside the vicinitymust be performed.

    where

    tpropagation =link distance

    light speed, ttransmission =

    message size(bits)

    link rate, (4)

    The processing time can be easily neglected. The routing decisions time are constant and the infor-mation volume (message size) to process is really small.

    4.2.8 Timer Setting Mechanism

    The timer mechanism is implemented to prevent infinite waiting time and reduce the algorithm conver-gence time. Each upstream node, v /∈ D /∈ TS,M , may receive as many type-R messages as its incominginterfaces (one per interface though), and sometimes lagged in time. As a way to guarantee that thesenders of those messages obtain always a response whenever information is available (even if it is notthe best at the moment), one timer per each interface receiving type-R messages is enabled (instead of aglobal timer per node set by the first arrival request). Hence, the timer of the first type-R message is the"main" timer and all the forthcoming timers as "secundary", Tmain > ∀Tsec. Those "secundary" type-Rmessages stay on holding. This means that, in a worst case situation, node v waits Tmain before it repliesback to its downstream node (i.e. to have all the answers from its neighbors) and Tsec to answer (whetherwith empty or incomplete information) to those holding type-R messages.For instance, let us take the example depicted in Fig.4. Imagine that node B has received an type-A

    message both from nodes C and E in a time tarrival � T1 � T2. Yet it cannot answer back to node Abecause is still waiting for a message from node D. In a worst case, node D would answer in a time closeto T2. In a case where T2 = T ′2, by the time the type-A message from B gets to C, C’s timer to A wouldhave already elapsed and the path solution (E-B-C) would be lost. Note that all this becomes particularyproblematic on a non-equal link cost topology scenario only and assuming equal timer decrement, x (samepropagation and transmission times).

    Identified problems There are particular cases, due to topology characteristics, leading to some con-straints in the whole process that might reduce the performance of the algorithm, mainly its convergencetime. Here, we highlight those shortcomings which have been identified so far and some proposals toovercome them.In Fig.4, the following case may occur: nodes D and E set the timer for its request messages with the

    same value of the Tsec of B to the request from C, T2 = T ′2. The node B should be able to detect this

    18

  • Figure 4: Main and Secundary Timer Labelling.

    situation and reset T ′2. The new T′2 value must be T1 < T

    ′2 > T2 and T3 {Problem 1}. A second problem

    pops up here too. Even if B possesses both type-A messages from D and E, it may not reply to node Abecause the type-A message from C could be missing, wherefore B has to wait Tsec(C) = T ′2 to elapsebefore it answers to node A {Problem 2}.The Fig.4.2.8 (left), shows a case where the time convergence may be the maximum, i.e., the leaf node

    waiting time is T (A) = Tmax. This happens iff the type-R message expires its pbudget to find a node ofthe MDT, and the distance of pi,j is equal to diameter(G). Then, the backward process is leveraged bya chain event of elapsing timers (i.e. nodes would answer due to timer elapsing). As a consequence, theleaf may sometimes not receive the optimal branching path solution. According to the case depicted, bythe time node E receives the type-A message from node F, the timer with respect to node D, which wason holding, would have already elapsed. Though the answer from E to D could be either empty (if otherinterfaces haven’t answer yet) or it may contain a non-optimal branching path, if E would have anotherinterface besides F and have received a A message from there {Problem 3}.There is also the case as shown in Fig.?? (right), where a deadlock occurs either between E and F or

    D and E (depends on which type-R messages would E process first). Besides receiving an empty type-Amessage from its F interface, node X has to wait Tmain to answer back to Y (i.e. that the timer to answerto Y elapses). In such a way, the branching path (A-B-C-D-E-F-X) is excluded from the evaluation at A{Problem 4}.

    19

  • Speeding up the answering process The algorithm convergence time can be considerably improvedby adding some optimized procedures to handle those particular cases resulting from the timer settingproblems (described above). Among the following proposals to the aforementioned problems, there is atransversal improvement which consists in trading the optimal branching path solution per the algorithmconvergence time. Is to say, every time a node receives a type-A message, it can boost the waiting timecount down in order to speed up the process up to the leaf node. Consequently, sometimes the leaf nodemay not receive the best branching path to join the tree but a lower convergence time would be achieved.

    {Solution to prob.1}: a new T ′2 value must be found within T1 and T2. if:

    T ′2 − T1 = (Y − 2x)− (Y − x) = x, (5)

    {Solution to prob.2}: whenever the following two conditions are verified,∑sent_requests =

    ∑received_answers+

    ∑holding_requests, (6)

    and

    |outgoing_interfaces| = |sent_requests| 6= 1, (7)

    B will send an incomplete message to node C, expecting to receive an type-A message from C beforeT1, and unblock the situation. With the second condition we guarantee that the type-A message is notempty (see solution to problem 4).{Solution to prob.3}: If F answers at T ′4, there is no solution. Otherwise, we will have a non-empty

    answer to forward.{Solution to prob.4}: The problem 4 can be solved by making use of an auxiliar type of messages {"on

    hold" or negative hold} which basically are annoucements sent on the backward direction of the holdingrequest, to inform the next branch node (with degree >2) that it will have to forward a type-A messageas soon as it has one, throughout the same interface from where this auxiliar message has arrived. InFig.5. is shown the two variations of the problem and in Fig.6 is described the proposed solution (the caseA from the Fig. is selected to present the proposal). On the left hand side of such figure, node E sendsa N message to F, being consequently forwarded to X. It stops at node X because it is a branch node(degree > 2) which means it is waiting for replies on another interfaces too: X is the node to be informed.This way, X knows that by the time it receives an type-A message coming from the source or a node of

    20

  • the tree, it has to forwarded it also for such interface (i.e. X-F) even if it was not suppose to (a requesthas been sent before). Owing to this mechanism X can speed up its answer to Y and at the same timeE has a non-empty answer to send to D (the sub-path: E-F-X-{sub-path}). Note that F does the samewhen receives request to put on hold. However, we know that by this way we will not receive any usefulinformation (in such example it would stop at node A). In order to avoid unnecessary communicationcost, a path budget can be introduced as in the request messages.

    Figure 5: The problem 4: the two possible situations that may occur.

    Figure 6: solution to the problem 4. An auxiliar message {"on hold" or negative hold} is used to informthe branch node x to send its type-A message also throughout such interface.

    4.3 Compact AnyTraffi c Routing Algorithm

    4.3.1 Anytraffi c Labeled Concept

    The concept of AnyTraffi c data group [GC09],[ICC10] refers to a group of destination nodes receivingunicast and multicast traffi c over the same source-initiated network entity (Fig. 7). The novel heuristicalgorithm, which is mathematical described in [GC09], attempts to construct a single network entity pereach AnyTraffi c data group. Hence, the aim of the heuristic algorithm is to find, at the minimum-cost, a

    21

  • Figure 7: AnyTraffi c Concept: single routing entity (i.e. tree) to forward both unicast and multicasttraffi c.

    set of branch nodes that takes into account unicast and multicast traffi c constraints. At these designatedbranch nodes, several P2P data path segments are appended to reach the destination nodes that belongto a given set of one or more AnyTraffi c data groups. The branch node selection is done according toa given pruning condition in order to guarantee a low increase of bandwidth consumption as well as ofthe length of unicast path. In fact, the created network entity is a root-initiated P2MP tree which issignaled using the technique described in [RFC4875]. Note that this single scheme for AnyTraffi c datasimplifies the management of the P2MP tree when considering dynamic multicast sessions. In particular,our heuristic places itself between Shortest Path and Steiner tree algorithms, achieving better overallperformance to forward an offered load consisting of combined unicast and multicast traffi c in meshedand switched optical networks, as it is demonstrated in [GC09],[ICC10].Here, we aim to combine both concepts in a Compact AnyTraffi c Routing (CAR) algorithm. Each

    joining leaf u has a parameter setting the maximum stretch of the path towards the multicast source saccording to some unicast traffi c requirements (e.g. delay). This Maximum Deficit factor ∆s,dmax for eachinitial (P2P) path ps,d, d ∈M , given by [GC09]:

    ∆s,dmax = xs,de−f(xs,d)dmax . (8)

    This implies that each node of the MDT must maintain one additional entry recording the cost fromit towards the source. This means |MDT nodes| additional entries per MDT. A good way to assess theunicast path stretch would be using the following ratio:

    Stretchunicast =costCARcostSPT

    ,∀(s, d), (9)

    with different criteria like cost/state/bx. This gives an idea how much stretch are the paths consideringeach single leaf/client.

    (ON GOING WORK)

    5 Simulation Environment

    We compare the performance of the CMR algorithm to the Steiner Tree (ST) and the Shortest-Path Tree(SPT) executed over the same topologies. The following performance metrics are considered. The stretchdefined as the total cost of edges used by the algorithm to deliver the packet from source s to all nodesof D ∈ V divided by the cost of the minimum ST sourced at s ∈ S with D as leaf node set. The memorycomplexity expressed in terms of memory-bit space consumption is defined as the maximum over all nodesof the number of memory-bits required to locally store the RT entries associated to the routing paths

    22

  • produced by the routing algorithm. The communication cost measures the number of messages triggeredby the sequence of joining/leaving nodes and exchanged by the algorithm to build the MDT.The SPT algorithm provides the reference for the communication cost. It is constructed from a loop-

    avoidance path-vector routing algorithm carrying the identifier of the multicast source s and the routingpath to reach that source. Each node keeps thus a routing table entry per neighbor node (to exchangemessages) and a routing table entry per path to the multicast source s. The ST algorithm provides thereference in terms of stretch. In order to obtain the near optimal solution for the ST, we consider aST-Integer Linear Programming formulation.

    5.1 AS-Internet Representative Graphs

    We want our algorithm to run in Autonomous System (AS)-Internet representative topology graphs. Onthe one hand, we use a synthetic power-law generated topologies, ranging from 1k up to 16k generatedaccording to [Bu02]. This evolutive topology generation model that relies on generalized linear preferentialattachment produces power-law graphs representative of the Internet AS topology (in particular in termsof clustering coeffi cient). On the other hand, we also run it real topologies such as the CAIDA Internettopology maps which comprise 16k and 32k nodes [CAIDA].ASes are an important abstraction because they are the "unit of routing policy" in the routing system

    of the global Internet (under a single administrative control). ASes peer with each other to exchangetraffi c, and these peering relationships define the high-level global Internet topology. For the purposes ofanalysis, these peering relationships are represented with an AS graph, where nodes represent ASes andlinks represent peering relationships.

    5.2 Communication Cost Accounting

    As said before, the communication cost accounts for the total number of messages exchanged in responseto any non-local topological changes. Those messages can either carry out topological and routing infor-mation as in the SPT and ST routing cases or to help on the tree building process itself as in the case ofthe proposed algorithm CMR. Note that the assessment of the communication cost is done per multicastdistribution tree (MDT). As such, we can define the comm. cost metric as follows:

    communication_cost = mupdate +mjoin +mrelease +mrouting−dependent (10)

    where,mupdate = total of routing updated messages (e.g. OSPF LS update type) ,mrouting−dependent =routing algorithm dependent messages, mjoin and mrelease = request messages either to join or release aMDT, respectively. The join and release messages are shared by all the algorithms. They carry out theleaf request to either join or release the MDT.

    How to account the communication cost of the offl ine computed ST? Although the ST iscomputed in a static way (i.e. source-routing), the communication cost accounts for the total number ofmessages exchange during the tree building process, as a dynamic scenario would have been considered.The communication cost for the ST measures at each step of its construction the number of messagesinitiated by nodes part of the MDT. These messages contain the minimal information for remote nodesnot (yet) belonging to the MDT to join it. Using this information, each node knows how to reach theclosest node of the MDT. Thus, although the ST is computed centrally, the communication cost accountsfor the total number of messages exchanged during the MDT building process as a dynamic scenariowould perform. Thus, the communication scheme is computed as follows:1. Definition of an hypothetical join leaf sequence for the computed ST from D.2. For each u ∈ D, get the branching path along the MDT. All the nodes belonging to such path must

    inform the rest of the network that they are potential branching nodes to them, Q.3. If( n/∈ Q), n generates such message and flood it.At every leaf join request, every node of the branch path must flood tree information throughout the

    network. This way, each network node knows how to reach the least costly node of the MDT.Fig. 8 and 9 show the communication process in 3 different occasions. First, when D1 joins the MDT

    through the multicast source, M={D1} → thus, such information must be spread out to all the othernodes of the network (excluding node I). The same must be done when D2 decides to leave the MDT. Inthis case, two procedures are possible: i) D2 announces that is leaving and every remaining node of the

    23

  • tree floods new information across the network or ii) every remaining node of the tree announces directlythat D2 is leaving at the same it announces its position.

    Figure 8: ST communication cost emulation process: the first leaf D1 has just joined the MDT.

    Figure 9: ST communication cost: dissemination process when a second leaf D2 joins the MDT. On theleft hand side, if D2 does not belong to MDT initially and on the right hand side, if D2 is already part ofMDT as a transit/branching node.

    6 Results and Performance Analysis

    The performance of the proposed compact algorithm for multicast routing is assessed through exhaustivesimulations executions under a non-blocking dynamic scenario considering join-events only in a first phaseand then fully scenario (i.e. interleaved sequence of join/leave events). The simulations are executed onthe ad-hoc simulator developed in [GC09, ICC10] and here extended to implement such novel compactmulticast routing algorithm. In order to run up on top of O(>1k), sparse matrix compression techniqueshad to be applied to the shortest path (SP) compuation methods.

    6.1 Simulation Performance Metrics and CFD ratios

    Different performance metrics are used to analyse the proposed algorithm. Suc metrics are listed below:- Cost: Max cost path | Min cost path | average cost path ( end-to-end path is considered)- State: #states (per tree) #branching nodes (per tree)- Communication Cost:

    - number of discovery messages: total, request and response msgs- number of join/leave messages: total (one-way)

    - RT Size:URIB, MRIB and TIB entries per execution (S,G), in terms of:- Number of entries

    24

  • - Number of bits

    6.2 Results Per Topologies

    The obtained results must be within certain bounds. When comparing SPT to CMR the following premisesshould be accomplished: i) cost: SP-CMR >0 and ii)state: SP -CMR >0 or = 0 and ii) state: CMR-ST >= 0 or =0 and ii) state SPT - ST >=0 or

  • As depicted in Fig.11b, the gain in terms of memory space consumption obtained for the ST comparedto the CMR varies from 10,1 to 3,8 when the multicast group size increases from 500 to 2000. This gaindecreases from about 8,8 to 3,4 when comparing the SPT to the CMR. In both figures, even if the gaindecreases with increasing multicast group size, the results show significant benefit when the multicastgroup size remains relatively small.

    Routing Table (number of entries)

    0,001,002,003,004,005,006,007,008,009,00

    500 1000 1500 2000Number of nodes of the mcast group

    SPT/CMRA

    ST/CMRA

    Routing Table (memory)

    0,00

    2,00

    4,00

    6,00

    8,00

    10,00

    12,00

    500 1000 1500 2000Number of nodes of the mcast group

    SPT/CMRA

    ST/CMRA

    Fig.11: (a) RT size ratio (number of entries) - (b) RT size ratio (memory-bit space)

    6.2.2 16k-nodes synthetic power-law graph and 16k-nodes CAIDA Internet map withSearch Segmentation

    The CMR algorithm is simulated on i) synthetic power law graphs (16k nodes and 36k links) generatedby means of GLP that produces topologies representative of the Internet at the Autonomous System(AS)-level and ii) the CAIDA Internet AS-topology map (16k nodes and 48k links). The simulationscenarios builds p2mp routing paths for multicast groups of increasing size from 500 to 4000 nodes (selectedrandomly).For the GLP topology (Fig.12a), the stretch for the CMR is slightly higher than 1 (1.03). The same

    trend is observed for the CAIDA map (moving from 1.05 to 1.02). In both cases, it remains almostconstant when the multicast group size increases. The relative gain (max: 0.1-min: 0.01) compared to theSPT decreases with increasing group size, this trend is deeper for the CAIDA topology (see Fig.13a). Forboth topologies, the number of RT entries and memory space required to store them, show substantial butdecreasing gain for the CMR compared to the ST as the multicast group size increases. For a multicastgroup size of 500|4000 nodes, the number 1453|9271 of RT entries produced by the CMR is 12.0|2.7times smaller than the number 17407|24997 of RT entries produced by the ST (see Fig.12b and 13b).Similar results are observed for the memory space consumption. Compared to the SPT, the increasein communication cost for the CMR ranges from 14 to 18 times (for the CAIDA map) and from 17 to50 (for the GLP topology) and even if both ranges are much smaller compared to the ST (see Fig.12cand 13c). The difference between topologies can be explained by the higher number of edges in theGLP topology. Despite this noticeable difference, the SPT communication cost grows linearly with themulticast group size whereas the CMR curve is concave implying sub-linear dependence. Moreover, theCMR curve decelerates as the multicast group size increases leading to saturation ratio around 18 (forthe CAIDA map) and 50 (for the GLP topology).

    1,00

    1,03

    1,05

    1,08

    1,10

    1,13

    1,15

    500 1000 1500 2000 2500 3000 3500 4000Multicast group size

    Mul

    tiplic

    ativ

    e St

    retc

    h

    SPT/ST

    CMRA/ST 29,72

    21,08

    13,6011,76

    10,40 9,36

    6,93

    5,134,18

    3,583,19 2,91 2,70

    54,03

    16,51

    11,98

    1,0

    10,0

    100,0

    500 1000 1500 2000 2500 3000 3500 4000Multicast group size

    RT S

    ize R

    atio

    (mem

    ory

    spac

    e)

    SPT/CMRA

    ST/CMRA2166

    30144036 4679

    5611 66337295

    27 3440 44 46 48 50

    1160

    17

    1

    10

    100

    1000

    10000

    500 1000 1500 2000 2500 3000 3500 4000Multicast Group Size (in number of nodes)

    Com

    mun

    icat

    ion

    Cost

    Rat

    io

    ST/SPT

    CMRA/SPT

    Fig.12 (GLP Topology): (a) Stretch - (b) RT size ratio (memory-bit space) - (c) Communication costratio

    26

  • 1,00

    1,01

    1,02

    1,03

    1,04

    1,05

    1,06

    500 1000 1500 2000 2500 3000 3500 4000

    Multicast Group Size

    Mul

    tiplic

    ativ

    e St

    retc

    h

    SPT/ST

    CMRA/ST22,80

    16,08

    10,569,10

    8,06 7,257,20

    5,284,29

    3,71 3,30 3,00 2,77

    41,76

    12,6112,60

    1,0

    10,0

    100,0

    500 1000 1500 2000 2500 3000 3500 4000Multicast Group Size

    RT S

    ize

    Ratio

    (mem

    ory

    spac

    e)

    SPT/CMRA

    ST/CMRA

    1760,582494,07

    3188,46 3851,274485,37 5059,65 5599,56

    16 17 17 18 18 18 18

    939,19

    14

    1

    10

    100

    1000

    10000

    500 1000 1500 2000 2500 3000 3500 4000Multicast Group Size (in number of nodes)

    Com

    mun

    icat

    ion

    Cost

    Rat

    io

    ST/SPT

    CMRA/SPT

    Fig.13 (CAIDA map): (a) Stretch - (b) RT size ratio (memory-bit space) - (c) Communication costratio

    References

    [RFC4984] RFC 4984 —“Report from the IAB Workshop on Routing and Addressing”

    [Kleinroch77] L. Kleinrock and F. Kamoun. Hierarchical routing for large networks: Performance evalu-ation and optimization. Computer Networks, 1:155—174, 1977.

    [Peleg89] D. Peleg and E. Upfall, “A trade-off between space and effi ciency for routing tables,” J.ACM, vol. 36, no. 3, pp. 510—530, Jul. 1989

    [Thorup01] M. Thorup, and U. Zwick, “Compact routing schemes,”Proc. 13th Annual ACM SPAA’01,Heraklion, Crete, Greece, pp. 1—10, Jul. 2001.

    [Abraham08] I. Abraham, C. Gavoille, D. Malkhi, N. Nisan, M. Thorup, “Compact name-independentrouting with minimum stretch,”ACM Trans. Alg., vol. 4, no. 3, art. 37, Jun. 2008

    [Abraham06] I. Abraham, C. Gavoille, and D. Malkhi. On space-stretch trade-offs: Lower bounds. InSPAA, 2006

    [Krioukov04] D. Krioukov, K. Fall, and X. Yang. Compact routing on Internet-like graphs. In INFOCOM,2004

    [Boguna10] Marián Boguñá, Fragkiskos Papadopoulos, Dmitri Krioukov, "Sustaining the Internet withhyperbolic mapping", Nature Communications, Sept. 2010

    [Arbor09] "Internet Observatory Annual 2009", Arbor

    [SAGE] Sage’s Graph Library. Available at http://www.sagemath.org/.

    [Garey77] M.R. Garey, R.L. Graham, D.S. Johnson, “The complexity of computing Steiner minimaltrees”, SIAM J. Appl. Math., vol. 32, no. 4, pp. 835—859, 1977

    [Takahashi80] H. Takahashi, A. Matsuyama, “An approximate solution for the Steiner problem ingraphs”,Math. Japonica, pp. 573—577, 1980

    [Hwang92] Frank K. Hwang, Dana S. Richards, Pawel Winter, "The Steiner Tree Problem (Annals ofDiscrete Mathematics)", North-Holland, 1992

    [Awerbuch89] B. Awerbuch, A. Bar-Noy, N. Linial, D. Peleg, “Compact distributed data structures foradaptive routing,”Proc. 21st annual ACM STOC’89, Seattle, WA, United States, pp. 479—489, May 1989

    [Abraham09] I. Abraham, D. Malkhi, D. Ratajczak, “Compact multicast routing,”Proc. 23rd Int. Symp.DISC’09, Elche, Spain, pp.364—378, Sep. 2009

    [Gavoille96] C. Gavoille and S. P´erenn‘es. Memory requirement for routing in distributed networks. InPODC, 1996

    [Krioukov07] D. Krioukov, et. al., "On Compact Routing for the Internet", ACM Sigcomm ComputerCommunication Review, Vol.7, N.3, July 2007

    27

  • [PIM] B. Fenner, M. Handley, H. Holbrook, I. Kouvelas, “Protocol Independent Multicast - SparseMode (PIM-SM),”Internet Engineering Task Force (IETF), RFC 4601, Aug. 2006

    [GC09] P.Pedroso, O. Pedrola, D. Papadimitriou, M.Klinkowski, D. Careglio, "Anytraffi c routingalgorithm for label-based forwarding", IEEE Globecom 2009, Hawaii, US

    [ICC10] D. Papadimitriou, P. Pedroso, D. Careglio, "AnyTraffi c Labeled Routing", IEEE ICC 2010,South Africa

    [RFC4875] R. Aggarwal, Ed., Extensions to Resource Reservation Protocol - Traffi c Engineering(RSVP-TE) for Point-to-Multipoint TE Label Switched Paths (LSPs), RFC 4875, May2007

    [Bu02] T. Bu, D. Towsley, “On distinguishing between Internet power law topology generators,”Proc. IEEE Infocom’02, pp. 638—647, New York, NY, USA, Jun. 2002

    [CAIDA] CAIDA, http://www.caida.org

    AppendixesDefinition 1 (Network Diameter) In the mathematical field of graph theory, the distance between twovertices in a graph is the number of edges in a shortest path connecting them. Let G be a graph andv be a vertex of G. The eccentricity of the vertex v is the maximum distance from v to any vertex.That is, e(v)=max{d(v,w):w in V(G)}. The radius of G is the minimum eccentricity among the verticesof G. Therefore, radius (G) = min{e(v):v in V(G)}. The diameter of G is the maximum eccentricityamong the vertices of G. Thus, diameter (G) = max{e(v):v in V(G)}. That is, it is the greatest dis-tance between any pair of vertices. To find the diameter of a graph, first find the shortest path betweeneach pair of vertices. The greatest length of any of these paths is the diameter of the graph. Check on[http://math.fau.edu/locke/Center.htm]

    A.1. Path-DV Daemons

    Distance/path-vector scheme:We can run OSPF like flooding but this becomes rather irrealistic on a flat network of 10k nodes, for

    instance. We can run OSPF below 1k nodes but above we would better use a distance/path-vector schemeas comparison criteria. Therefore, we are interested on a simple DV implementation, augmented withdistance to the mcast source,i.e., while a pure Dist.Vector approach will give the distance, an augmentedapproach (Path vector) would also give the path to the source. The behavior is a simplified BGP routingupdate propagation with the AS_Path length as metric and selection criteria.We have implemented a split horizon dissemination of distance/path where the node which starts the

    entire process is the mcast source. The multicast source sends messages (announcing the itself) throughoutall its outgoing interfaces. Then, each neighbor of the source forwards the msg (with the updated path)also throughout all its outgoing interfaces except the one from where it has received (the source). and soon.In practice, each node maintains in addition reachability/path to each other (thus N-1 entry routing

    entry) (just imagine unicast traffi c) but for multicast purposes only this is not the case. So, in a first stepwe won’t count the N-1 entries to do the comparison, since we don’t not account for "unicast" entries inour counting of RT entries. The total number of msgs is equal to the

    Comm.cost = degree(source) +

    N∑i=0

    (degree(nodei)− 1) (11)

    , i.e., more simply assuming that avg node degree is equal to 2 ∗ LN . Then your number of messagesis proportional to L (the number of links). Here there is accounting of differential delays (shorter pathhaving higher delays longer path during the propagation and processing of messages e.g. path a, b, c hasshorter delay than path a, c) which results into c re-issuing a another message telling a shorter path isavailable. We will extend the model to take this effect into account.

    28