14
770 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 5, NO. 6, DECEMBER 1997 A Quantitative Comparison of Graph-Based Models for Internet Topology Ellen W. Zegura, Member, IEEE, Kenneth L. Calvert, Member, IEEE, and Michael J. Donahoo Abstract—Graphs are commonly used to model the topological structure of internetworks in order to study problems ranging from routing to resource reservation. A variety of graphs are found in the literature, including fixed topologies such as rings or stars, “well-known” topologies such as the ARPAnet, and randomly generated topologies. While many researchers rely upon graphs for analytic and simulation studies, there has been little analysis of the implications of using a particular model or how the graph generation method may affect the results of such studies. Further, the selection of one generation method over another is often arbitrary, since the differences and similarities between methods are not well understood. This paper considers the problem of generating and selecting graphs that reflect the properties of real internetworks. We review generation methods in common use and also propose several new methods. We consider a set of metrics that characterize the graphs produced by a method, and we quantify similarities and differences among several generation methods with respect to these metrics. We also consider the effect of the graph model in the context of a specific problem, namely multicast routing. Index Terms— Internetworking, multicast, network modeling, scalability. I. INTRODUCTION A. Background T HE explosive growth of the Internet has been accompa- nied by a wide range of internetworking problems related to routing, resource reservation, and administration. The study of algorithms and policies to address such problems often involves simulation or analysis using an abstraction or model of the actual network structure and applications. The reason is clear; networks that are large enough to be interesting are also expensive and difficult to control, therefore they are rarely available for experimental purposes. Moreover, it is generally more efficient to assess solutions using analysis or simulation, provided the model is a “good” abstraction of the real network and application. It is therefore remarkable that studies based on randomly generated or trivial network topologies are so common, while rigorous analyses of how the results scale or how they might change with a different topology are so rare. The state of the art in network modeling includes: 1) regular topologies, such as rings, trees, and stars (e.g., [6], [13], [23]); Manuscript received August 26, 1996; revised July 1, 1997; approved by IEEE/ACM TRANSACTIONS ON NETWORKING Editor D. Estrin. The authors are with the College of Computing, Georgia Institute of Technology, Atlanta, GA 30332 USA (e-mail: [email protected]). Publisher Item Identifier S 1063-6692(97)08486-0. 2) “well-known” topologies, such as the ARPAnet or NSFnet backbone (e.g., [1], [18], [24]); 3) randomly generated topologies (e.g., [19]–[21]). The limitations of each of these are obvious. Well-known and regular topologies reflect only parts of current or past real net- works; random topologies may not reflect any (past, present, or future) real network. Also clear from the cited references is the diverse set of problems that relies on network models to evaluate performance. Furthermore, most researchers seem to be aware of the perils of reaching conclusions about real networks based on these models; it is typical for papers to include a disclaimer to this effect. To illustrate the important role that the network model can play in assessing algorithms, consider the following results. 1) Doar and Leslie found that the efficiency of their dy- namic multicasting algorithms was reduced by as much as half when using random graphs versus using hierar- chically structured graphs designed to reflect some of the properties of real internetworks. (See Fig. 5 in [11].) 2) Wei and Estrin found that the traffic concentration in core-based multicast routing trees is comparable to traf- fic concentration in shortest path trees for a network model with average node degree of about 3.0, but the traffic concentration is almost 30% higher in core-based trees when average node degree increased to 8.0. (See Fig. 9(a) in [22].) 3) Mitzel and Shenker found that multicast resource reser- vation styles compared quite differently in the quantity of resources reserved for linear, tree and star topologies. (See Tables IV and V in [13].) It should be clear from these examples that the network model does matter. The conclusions reached about the suitability and performance of algorithms may vary depending on the methods used to model the network. A variety of criteria may be applied to assessing a network model, depending in large part on the intended use. For ex- ample, if the purpose is to stress test the algorithm, the model should generate instances which are, in some sense, “difficult.” For the problem of routing, this may mean topologies with moderate node degree and many routing choices. If the purpose is to model a particular (static) network (e.g., a campus or corporate network), then the model should accurately reflect the current topology. Historically, large networks such as the Public Switched Telephone Network have grown according to a topological design developed by some central authority or administra- 1063–6692/97$10.00 1997 IEEE

A quantitative comparison of graph-based models for Internet topology

  • Upload
    mj

  • View
    215

  • Download
    3

Embed Size (px)

Citation preview

770 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 5, NO. 6, DECEMBER 1997

A Quantitative Comparison ofGraph-Based Models for Internet TopologyEllen W. Zegura,Member, IEEE, Kenneth L. Calvert,Member, IEEE, and Michael J. Donahoo

Abstract—Graphs are commonly used to model the topologicalstructure of internetworks in order to study problems rangingfrom routing to resource reservation. A variety of graphs arefound in the literature, including fixed topologies such as ringsor stars, “well-known” topologies such as the ARPAnet, andrandomly generated topologies. While many researchers relyupon graphs for analytic and simulation studies, there has beenlittle analysis of the implications of using a particular model orhow the graph generation method may affect the results of suchstudies. Further, the selection of one generation method overanother is often arbitrary, since the differences and similaritiesbetween methods are not well understood.

This paper considers the problem of generating and selectinggraphs that reflect the properties of real internetworks. We reviewgeneration methods in common use and also propose severalnew methods. We consider a set of metrics that characterizethe graphs produced by a method, and we quantify similaritiesand differences among several generation methods with respectto these metrics. We also consider the effect of the graph modelin the context of a specific problem, namely multicast routing.

Index Terms—Internetworking, multicast, network modeling,scalability.

I. INTRODUCTION

A. Background

T HE explosive growth of the Internet has been accompa-nied by a wide range of internetworking problems related

to routing, resource reservation, and administration. The studyof algorithms and policies to address such problems ofteninvolves simulation or analysis using anabstractionor modelof the actual network structure and applications. The reasonis clear; networks that are large enough to be interesting arealso expensive and difficult to control, therefore they are rarelyavailable for experimental purposes. Moreover, it is generallymore efficient to assess solutions using analysis or simulation,providedthe model is a “good” abstraction of the real networkand application. It is therefore remarkable that studies basedon randomly generated or trivial network topologies are socommon, while rigorous analyses of how the results scale orhow they might change with a different topology are so rare.

The state of the art in network modeling includes:

1) regular topologies, such as rings, trees, and stars (e.g.,[6], [13], [23]);

Manuscript received August 26, 1996; revised July 1, 1997; approved byIEEE/ACM TRANSACTIONS ON NETWORKING Editor D. Estrin.

The authors are with the College of Computing, Georgia Institute ofTechnology, Atlanta, GA 30332 USA (e-mail: [email protected]).

Publisher Item Identifier S 1063-6692(97)08486-0.

2) “well-known” topologies, such as the ARPAnet orNSFnet backbone (e.g., [1], [18], [24]);

3) randomly generated topologies (e.g., [19]–[21]).

The limitations of each of these are obvious. Well-known andregular topologies reflect only parts of current or past real net-works; random topologies may not reflect any (past, present,or future) real network. Also clear from the cited referencesis the diverse set of problems that relies on network modelsto evaluate performance. Furthermore, most researchers seemto be aware of the perils of reaching conclusions about realnetworks based on these models; it is typical for papers toinclude a disclaimer to this effect.

To illustrate the important role that the network model canplay in assessing algorithms, consider the following results.

1) Doar and Leslie found that the efficiency of their dy-namic multicasting algorithms was reduced by as muchas half when using random graphs versus using hierar-chically structured graphs designed to reflect some ofthe properties of real internetworks. (See Fig. 5 in [11].)

2) Wei and Estrin found that the traffic concentration incore-based multicast routing trees is comparable to traf-fic concentration in shortest path trees for a networkmodel with average node degree of about 3.0, but thetraffic concentration is almost 30% higher in core-basedtrees when average node degree increased to 8.0. (SeeFig. 9(a) in [22].)

3) Mitzel and Shenker found that multicast resource reser-vation styles compared quite differently in the quantityof resources reserved for linear, tree and star topologies.(See Tables IV and V in [13].)

It should be clear from these examples that the network modeldoes matter. The conclusions reached about the suitabilityand performance of algorithms may vary depending on themethods used to model the network.

A variety of criteria may be applied to assessing a networkmodel, depending in large part on the intended use. For ex-ample, if the purpose is to stress test the algorithm, the modelshould generate instances which are, in some sense, “difficult.”For the problem of routing, this may mean topologies withmoderate node degree and many routing choices. If the purposeis to model a particular (static) network (e.g., a campus orcorporate network), then the model should accurately reflectthe current topology.

Historically, large networks such as the Public SwitchedTelephone Network have grown according to a topologicaldesign developed by some central authority or administra-

1063–6692/97$10.00 1997 IEEE

ZEGURA et al.: COMPARISON OF GRAPH-BASED MODELS FOR INTERNET TOPOLOGY 771

Fig. 1. The dangers of visual representation.

tion [2]. In contrast, there is no central administration thatcontrols—or even keeps track of—the detailed topology ofthe Internet. Although general characteristics of its topologyare known, it is impractical, if not impossible, to construct adetailed topological map of the Internet due to its decentralizedadministration and sheer scale.1 Further, the structure of theInternet is changing at a rapid rate, limiting the accuracy of anysnapshot of topology. These considerations make it difficult torigorously validate the “realism” of any graph model. In suchcases, where we have some, albeit incomplete, knowledge ofthe current and expected future structure of the network, themodel should reflect the known properties, and instantiate theremainder of the topology in some reasonable and randomfashion.

B. Overview and Roadmap

In this paper we consider the question of how to generategraph models that have path characteristics like those of theInternet. We also address the issue of the significance ofdifferences between graphs generated with different methods.Our goal is to provide information that can be used in selectinga graph to use for a particular purpose. We begin by describingthe general topological characteristics of the Internet (SectionII). It is this structure that the reader should keep in mindas we describe the common methods to model internetworktopology.

In Section III, we describe three classes of graph generationmethods: flat random, regular, and hierarchical. Commonmethods in the literature fall primarily into the flat randomand regular classes. Within the hierarchical class, we propose anew transit-stub model having properties of hierarchy, locality,and routing policy, as found in real internetworks.

We then analyze the outputs of the various generationmethods (Section IV), with the goal of providing a basisfor selecting a generation method, comparing one methodto another, or comparing the graphs generated by a methodto (a set of) known real topologies. We propose a set oftopological properties as a basis for such comparisons. Notethat a rigorous system for quantifying the characteristics ofmodels is essential; relying on visual representation (e.g.,

1Through interdomain routing protocols, it is possible to obtain informationabout high-level connectivity covering a significant fraction of the Internet, butassembling complete end-to-end information—i.e., down to the level of last-hop routers—would require the cooperation of hundreds of different domainadministrations.

Fig. 2. Example of Internet domain structure.

noting that a layout bears a visual resemblance to geographicmaps of the Internet) can be terribly misleading. For example,the two topologies in Fig. 1 each have 100 vertices andapproximately the same number of edges (230 versus 231),a fact that is counter to the visual impression.

Given a set of generation methods (Section III) and a set ofmetrics on the generated graphs (Section IV), an obvious taskis the comparison of one method to another. Before tacklingthat question, however, we must determine the effect of themethod parameters on the metrics of the generated graphswithin a particular method; Section V describes our investi-gation of this relationship. A statistical comparison of metricdistributionsacrossthe different generation methods providesa basis for some observations about method similarities anddifferences; these are presented in Section VI. We augmentthesegeneral observations by investigating the effect of thegeneration method in the context of aspecificproblem typicalof those for which graph models are used, namely multicastrouting. These results are presented in Section VII.

II. I NTERNET DOMAIN STRUCTURE

Throughout this paper, it is assumed that the goal is to modelthe paths (i.e., sequences of nodes) along which informationflows between nodes in an internetwork. Typically, undirectedgraphs are used to represent the set of these paths, withnodes representingswitchesor routers, and edges representingforwarding paths between switches. A forwarding path may bea direct (physical) link or it may be a shared medium; we donot distinguish these two cases. For example, an FDDI ringto which four IP routers are connected would be representedas a clique of four nodes. In an internetwork there may bemany paths between a pair of nodes; at any time one of theseis considered the “best,” or primary, path, typically because itis the shortest (by some metric) of the possible paths. Whenwe mention “the path between two nodes,” we refer to thisprimary path.

We do not model individual hosts; it is therefore sensibleto have a switching node with an edge to only one othernode, a situation that is not uncommon in actual internetworks.We also do not model link characteristics (e.g., bandwidth);such information can be easily included as an overlay to thebasic topological structure, as needed for studying particularproblems.

772 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 5, NO. 6, DECEMBER 1997

Today’s Internet can be viewed as a collection of intercon-nectedrouting domains[9], which are groups of nodes that areunder a common administration and share routing information.A primary characteristic of these domains isrouting locality:the path between any two nodes in a domain stays entirelywithin the domain. Each routing domain in the Internet canbe classified as either astub domain or atransit domain. Adomain is a stub domain if the path connecting any two nodes

and goes through that domain only if eitheror is inthat domain; transit domains do not have this restriction. Thepurpose of transit domains is to interconnect stub domainsefficiently; without them, every pair of stub domains wouldneed to be directly connected to each other. (See Fig. 2.)Routing domains range in size from one or two nodes tohundreds or thousands of nodes.

A transit domain comprises a set ofbackbonenodes, whichare typically fairly well connected to each other. In a transitdomain each backbone node also connects to a number ofstub domains, viagatewaynodes in the stubs. Some backbonenodes also connect to other transit domains. Stub domainscan be further classified as single- or multi-homed. Multi-homed stub domains have connections to more than onetransit domain. Single-homed stubs connect to only one transitdomain. Some stubs also have links to other stubs.

The routing characteristics of the current Internet can besummarized by the following general principles regarding thepath between two arbitrary nodesand

1) if and are in the same domain, the path betweenthem remains entirely within that domain;

2) if is in domain and is in domain the pathfrom to goes from through zero or more transitdomains, to

To summarize, the primary structural characteristic affectingthe paths between nodes in the Internet is the distinctionbetween stub domains and transit domains (a path connectingtwo nodes in different stub domains will never pass throughany stub domain other than those two). In other words, thereis a hierarchy imposed on nodes.2 In the next section, weconsider methods of generating graphs that model the structureof internetworks; as we shall see, some commonly used modelsfail to capture these restrictions.

III. GRAPH GENERATION METHODS

When modeling an entity with some details that are un-known or that may vary, the standard approach is to sub-stitute nondeterminism for the unknowns, generate multipleinstances, and analyze the set of results. In modeling Internettopology, this corresponds to generating multiple graphs tobe used in simulation or analytic studies. In this section,we consider three classes of generation methods. The firstare flat randomgraph methods, which construct a graph by

2Two refinements of this structure in the present Internet are worthmentioning. First, there may be multiple strata of transit domains, i.e., sometransit domains connect only to other transit domains and not to stub domains.This gives rise to a three-level (or even deeper) hierarchy. Second, top-leveltransit domains are not interconnected pairwise randomly, but rather cometogether in cliques calledexchanges. Such an exchange is implemented by asingle network, to which nodes from many transit domains are connected.

probabilistically adding edges to a given set of nodes, withno structure among the nodes. The second class of generationmethods areregular, in the sense that they generate graphswith specific structure (for example, rings and meshes), andthus have no randomness at all. The third class of models arehierarchical, building larger network structure from smallernetwork components. These methods represent a compromisebetween the extremes of the flat random and regular methods;they impose some high-level structure and fill in details atrandom.3

A. Four Flavors of Flat Random Methods

The networking literature contains a variety of flat (i.e.,nonhierarchical) random methods used to model internetworks.All are variations on the same basic method. A set of verticesis distributed in a plane, and an edge is added between eachpair of vertices with some probability. In thepure random(orsimply random) method, this probability is a fixed number

While the pure random method does not explicitly attemptto reflect the structure of real internetworks, it is attractivefor its simplicity and is commonly used to study networkingproblems.

Other methods add edges with a probability that is somefunction of the distance between the nodes. After the purerandom method, perhaps the most common generation methodis the Waxmanmethod [21], with the probability of an edgefrom to given by

where is the Euclidean distance from to, and is the maximum distance between any two nodes.

An increase in increases the number of edges in the graph,while an increase in increases the ratio of long edges toshort edges [21]. Several variations on the Waxman methodhave been proposed [11], [22]; because they are essentiallyequivalent to the Waxman method, our study includes onlythe original method.

We propose two new methods that are also intended to cap-ture locality by relating edge probability to distance betweenvertices. Ourexponentialmethod uses

so that the probability of an edge approaches zero as thedistance between the two vertices approachesOur localitymethod partitions the (potential) edges based on length, andassigns a different (fixed) probability for each equivalenceclass of edge lengths. For the case of two equivalence classes,the parameter defines the boundary

ifif

One nice feature of the locality method is that we have beenable to extend some analytic results from the large body of

3Georgia Tech Internet Topology Models (GT-ITM) is apackage for generating the flat random and hierarchical mod-els described in this section. It is publically available athttp://www.cc.gatech.edu/projects/gtitm/ .

ZEGURA et al.: COMPARISON OF GRAPH-BASED MODELS FOR INTERNET TOPOLOGY 773

TABLE IFLAT RANDOM GRAPH METHODS

Fig. 3. Examples of regular graphs.

work on pure random graphs [4] to this model [3]; these resultsdeal with properties of the graphs (e.g., connectedness) as thenumber of nodes becomes large.

To summarize, Table I indicates the probability of an edgebetween two vertices at Euclidean distancefor each flatrandom method in our study.

Note that for most of these methods, the effect of the variousparameters on properties of interest is indirect. For example,suppose one is modeling a real network with known averagenode degree of 4.0. What values ofand should be chosenin the Waxman method? The exponential method? We returnto such questions in Section V.

Note also that in each of these methods, every pair ofvertices is treated equally with respect to addition of edges.Thus, although it is possible to control the approximate numberof edges, it isnot possible to control theconfiguration ofthe edges. This has implications for the generation of largesparsely connectedgraphs using these methods. In particular,for any of the edge probability functions in Table I, theprobability that a graph is connected decreases as the numberof nodes in the graph increases. As a result, it is difficult tocreate connected graphs with large numbers of nodesand lowaverage node degree. One alternative is to generate partiallyconnected graphs and then apply various “repair” techniques toconnect the graph. Obviously this will produce graphs whosestructure is not entirely random, because it has been influencedby the repair process. Another option is to use the randommethods to generate smaller graphs that are then connectedinto a hierarchical structure. (See Section C.)

B. Regular Graphs

Regular graphs are often used in analytic studies ofalgorithm performance because their structure makes themtractable. We consider linear chains, rings, stars and meshes;Fig. 3 gives an example of each topology with nine nodes.The general structure should be clear from these examples.

C. Hierarchical Methods

The flat random and regular graph methods represent ex-tremes in the sense that the former offer little control over the

structure of the resulting graphs, while the latter are extremelyrigid in their structure. Neither captures the hierarchy thatis present in real internetworks, though both may reflectsome notion oflocality if certain nodes are more likely tobe connected than others. We now describe two methods ofcreating hierarchical graphs by connecting smaller componentstogether according to a larger scale structure.

1) N-Level Method:The N-level hierarchical method con-structs a graph by iteratively expanding individual nodes intographs. Beginning with a connected graph, each node in thegraph is replaced by a connected graph. The edges of theoriginal graph are then reattached to nodes in the replacementgraphs.

More precisely, in constructing the flat random graphs,we divide a unit square in the Euclidean plane into equal-sized square sectors, the number of which is determinedby a scale parameter each node in the graph is thenassigned to one of the squares. In constructing anN-levelhierarchical graph, the top-level graph is constructed in thisfashion using scale parameter Then each square containinga node is subdivided again, according to the second-level scaleparameter and a graph is constructed using that top-levelsquare as the unit plane. Fig. 4 illustrates the initial iterationsof the process. The result is that the scale of the final graphis the product of the scales of the individual levels, and edgelengths are roughly determined by edge level. For example, ina three-level graph, “top-level” edges (i.e., part of the originalgraph) are typically longer than second-level edges, while thesecond-level edges are typically longer than third-level edges.

It is possible to define a simple “routing policy” based onthe principles outlined in Section II and to add informationto the graph to reflect it. This generally takes the form of anedge metric designed so that the “shortest” path between twonodes, in terms of that metric, obeys the hierarchy constraints.For example, in theN-level method, the path between nodeswithin the same level-domain should stay entirely within thatdomain. We implement this by associating with each edge arouting policy weight, in addition to the Euclidean edge length,to use in constructing policy-based shortest paths. One of theadvantages of theN-level method is its simplicity. Becauseof the way nodes are laid out in the plane with this method,Euclidean edge lengths are a good approximation for policyweights that enforce domain locality. A disadvantage of thismethod is that thenodesare of only one type, and thus pathsin these graphs may not have the form described in Section II.

2) Transit-Stub Method:Our transit-stub method is a newway to produce hierarchical graphs by interconnecting transitand stub domains. We first generate a connected random graphusing any one of the methods discussed earlier; each node inthat graph represents an entire transit domain. Each node inthat graph is then replaced by another connected random graph,representing the backbone topology of one transit domain.Next, for each node in each transit domain, we generate anumber of connected random graphs representing the stubdomains attached to that node. Each of these stub domainshas an edge to its transit node. Finally, we add some “extra”connectivity, in the form of edges between pairs of nodes,one from a transit domain and one from a stub or one from

774 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 5, NO. 6, DECEMBER 1997

Fig. 4. N-level hierarchical layout.

each of two different stub domains. The number of extraedges of each type is controlled by parameters to the method.Clearly, if the random graphs generated are all connected,this construction results in a connected graph. Doar hasindependently proposed a graph generation method that hasa somewhat similar hierarchical structure to this method [7].

The size of the graph (number of nodes) and the distributionof nodes between transit and stub domains in this method iscontrolled by the following parameters:

Param. Meaning Ex.

As the table shows, it is not difficult to generate rather largegraphs with this method. Moreover, the average node degree ofthe whole graph depends mainly on the number of edges in thecomponent graphs; if the component graphs are small enough,it is possible to generate large graphs with small average nodedegrees. In the example above, if the average node degree is3.0 for transit nodes and 2.5 for stub nodes, the average nodedegree of the entire graph will be about 2.75.

Note that the parameters given areaverages. In our imple-mentation, we randomize the numbers of nodes in the transitdomains while preserving the average.4 The values of and

for each transit node and stub domain are also randomized.Our implementation allows different random graph methodsand parameters to be used to generate transit and stub domainsubgraphs with different characteristics.

Once the complete graph is constructed, integer edgeweights are assigned to the edges so that the hierarchicalrouting policies of Section II are enforced when shortest pathsare calculated using the weights. The calculation of theseweights is described in the Appendix.

4This is done by iteratively choosing a transit domain, and if its currentnumber of nodes exceeds one, decrementing it and then choosing anotherrandom domain and incrementing its number of nodes. This process “spreadsout” the values while keeping the mean constant.

The transit-stub method, like the other methods, assignseach node of the graph to a point in the Euclidean plane so thata “length” metric may be assigned to each edge. This is donein such a way that nodes in the same domain are “near” eachother, but domains may overlap in the plane. In particular, stubdomains may overlap with the transit domains to which theyare connected. This means that the policy weights assigned tothe edges may have little to do with the Euclidean “length” ofthe edges—unlike graphs generated using the-level method.

IV. M ETRICS

In this section, we consider several metrics that can be usedto compare and evaluate graphs. The metrics fall into thefollowing two categories: 1)topological metrics, which areindependent of any particular application and 2)application-specificmetrics, which depend on topology and application.In the next section, the effect of method parameters on thetopological metrics is discussed; in Section VI, the differentgeneration methods are compared with respect to these metrics.In Section VII, the methods are compared using some metricsspecific to multicast routing.

A. Topological Metrics

We first consider attributes inherent in the graphs them-selves; these are useful for characterizing and classifyingmethods independent of any particular application. Some ofthese metrics are derived from shortest paths; we considerboth “hop” metrics, in which each edge has unit weight, and“length” metrics, in which each edge has weight equal to itsEuclidean length. We further consider composite metrics inwhich the shortest paths aredeterminedusing one metric,but evaluatedusing a different metric. For example, in thehierarchical methods, we might determine the shortest pathsusing the routing policy weights, then evaluate the maximumhop count for these (policy-determined) routes between nodes.

For a graph with nodes and edges, we consider thefollowing topological properties.

• Average Node Degree—• Diameter—The diameterof a graph is the length of the

longest shortest path between any two vertices [5]. In-formally, a low diameter generally corresponds to shorterpaths. We consider several variations on the diameter,depending on the metric used to construct and evaluatethe shortest paths.

— The hop diameteris the length of the longest short-est path between any two vertices, where the short-est paths are computed and evaluated using hopcount as the metric.

— The length diameteris the length of the longestshortest path between any two vertices, where theshortest paths are computed and evaluated usingEuclidean length as the metric.

— The hop-length diameteris a composite metric.The shortest paths are determined using hop count,then measured using Euclidean length. The diameteris the longest Euclidean length amongst the pathsfound using hop count. This metric is interesting

ZEGURA et al.: COMPARISON OF GRAPH-BASED MODELS FOR INTERNET TOPOLOGY 775

TABLE IIPROPERTIES OFREGULAR GRAPHS

because routing algorithms often use hop count tofind paths, but propagation delay is proportionalto physical path length (represented by Euclideandistances).

For the hierarchical methods, thepolicy-hop di-ameteruses the routing policy weights to constructthe shortest paths, then uses the hop count to eval-uate the length of the paths. The diameter is thelength of the longest path.

Similarly, the policy-length diameteruses therouting policy weights to construct the shortestpaths, then uses the Euclidean length to evaluatethe length of the paths. The diameter is the lengthof the longest path.

• Number of Biconnected Components—A biconnectedcomponent (or bicomponent) is a maximal set of edgessuch that any two edges in the set are on a common simplecycle [5]. The number of bicomponents is a measure ofthe degree of “connectedness” or “edge redundancy” ina graph. Generally, a smaller number of bicomponentscorresponds to a larger number of paths between nodesin the graph.

We considered a number of other possible metrics, includingsome that were not scalar (e.g., node degree distributions).We chose to focus on these characteristics because they arequantitative abstractions of some aspects of the structure ofthe graph, they are relatively easy to measure, and theyare informative. They are not necessarily independent. Inparticular, increasing average node degree correlates withdecreasing diameter and fewer bicomponents.

As an example, Table II gives selected properties of linearchains, rings, stars and meshes, withnodes. While a ring anda chain differ by only one edge, they have substantially differ-ent hop diameter and number of bicomponents. (Removing oneedge in a ring breaks the symmetry.) A star has the smallesthop diameter of these four regular graphs. In all cases, theproperties are completely determined by the number of nodes

B. Example Application Metrics

To more closely relate the evaluation of network models toa specific intended use, we also consider two metrics that arerelevant to the performance of multicast routing algorithms.Briefly, a multicast routinginstanceis a subset of the graphnodes designated as multicast groupsourcesand another subset

designated as multicast groupreceivers; these two sets mayintersect.

For a given graph and instance, a multicast routing algorithmdefines a set ofdistribution trees, one per source, which

determine the path followed by each packet sent by a sourceon its way to all the receivers. A distribution tree is generallyderived from the shortest path trees provided by the underlyinginternetwork (unicast) routing protocol.

For a group with senders and receiverswe measure the following quantities for multicast routingalgorithm .

• Packet Hops Ratio.This metric considers the packet-hops,that is, the number of edges traversed by multicast packetsin the routing trees defined by from all sources to alldestinations. The actual value of the metric is the ratio ofthis quantity to the packet hops used by another routingalgorithm that constructs a distribution tree by using ashortest path tree rooted at each source and including allreceivers as leaves.5

• Delay Ratio.This metric considers the maximum delay,measured in hops, from any source to any receiver in thedistribution trees defined by As above, the “raw” valueis normalized to get the metric value by dividing by themaximum delay obtained using a shortest path algorithm.Note that this ratio is always at least 1.0.

V. PARAMETER SELECTION

We have described methods for generating graphs andmetrics that can be used to evaluate the properties of a graph.We now proceed to use these metrics to evaluate the graphsgenerated by each method. To compare the metricsacrossmethods, we must have some reasonable way of “normal-izing” so that the differences that we observe are (in somesense) inherent to the method. In this section, we describe amethodology for comparing methods that consists of fixingthe number of nodes and the average number of edges. Formost methods, a target number of edges can be achieved byvarious combinations of parameter values, so we investigatehow the metric values vary with the parameters. We concludethis section by selecting parameters for each method, avoidingthe extremes of the parameter space.

A. Evaluation Methodology

To compare methods, we normalize by fixing a value forthe number of vertices , the number of edges and themaximum distance between two nodes,. Specifically, weselect and this corresponds to an averagenode degree of 3.5. The scale of the Euclidean plane in whichthe nodes of each graph are placed is 100 by 100, thus

Later in the paper we investigate theeffect of changing and

In each method, we explore the combinations of parametersthat yield the desired fixed number of edges, while alsoproducing connected graphs with reasonable frequency. (Aftergenerating each graph, we check it for connectivity, keeping

5There are a couple of reasons for measuring the ratio to the shortestpath tree value, rather than the absolute number. For one, the ratio servesto normalize results, allowing comparisons across instances with differentnumbers of sources and receivers. In addition, shortest path trees are the basisfor the presently deployed Internet multicast routing algorithm DVMRP [10],[16], thus the ratio allows an evaluation of how algorithmA compares to anaccepted standard.

776 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 5, NO. 6, DECEMBER 1997

Fig. 5. Effect of� and� on number of edges in Waxman method.

only those graphs that are connected.) After characterizingthe combinations of parameters that can produce connectedgraphs with the target number of edges, we pick one particularset of parameters for each method and determine the addi-tional properties of interest for graphs generated using theseparameters.

B. Flat Random Methods

For all of the properties we will examine, the most usefulresult would be an analytic expression giving the value ofthe property as a function of the model parameters and thegraph size. Unfortunately, most of the methods do not admit asimple analytic expression, even for simple properties. Thereare two exceptions to this. First, a wide body of work onthe properties of (pure) random graphs has been developed,particularly graphs with large numbers of nodes[4]; second, we have extended a number of these results tothe locality method [3]. For the remaining methods, we relyon empirical results.

The expected node degree in the pure random method isgiven by thus will produce graphs with

edges. The expected degree of a node in the exponentialmethod is where isa random variable giving the distance between two vertices.The expected node degree in the exponential model scaleslinearly with Empirical results indicate that the averagenumber of edges is approximately 175 when aboutthree out of 100 of these graphs are connected.

The relationship between the number of edges and theparameters of the Waxman method is somewhat more com-plex. Through empirical studies, we have determined that theparameter has essentiallyno effecton the number of edges,for fixed and because a change inis accompanied by achange in the distancebetween an arbitrary pair of vertices,keeping the quantity essentially constant. The practicalimplication is that a user of this method can generate verticesin any convenient scale without affecting the graphs that aregenerated.

For a target number of edges, many combinations ofandcan achieve the target. Fig. 5 shows the contour lines of

equinumber of edges as a function ofand for

Fig. 6. Effect of parameters in Waxman method.

We explore the effect of the parametersand by selectingfour combinations that produce edges. We generate100 connected graphs for each combination and measure themetrics described earlier. Fig. 6 shows the comparison usinga box plot to indicate the median over the 100 values (witha white line), the 25 and 75% range boundaries (with a box),the 5 and 95% range boundaries (with “whiskers”), and anyoutliers with single lines. The average node degree plot servesas a check that the graphs achieve the target of 175 edges (i.e.,average node degree 3.5). The other properties are the numberof bicomponents and the diameter measured in both hops andlength. The length diameter has been normalized by dividingthe Euclidean length by the scale.

We will use box plots extensively to present scalar metrics,since they neatly summarize information about the distribution,and also support comparisons across methods. The readershould be careful, however, in interpreting the data for discretemetrics. For example, the hop diameter for thewax0 methodis almost always either eight or nine; it does not take oncontinuous values as one might infer from the box plot.

The value of ranges from 0.1 (wax0) to 0.4 (wax3). Thevalues are chosen to achieve the target of 175 edges. For

for As increases (andthus decreases), the graphs have more short edges, leadingto longer hop diameter, shorter length diameter, and slightlymore bicomponents. For comparisons to other methods, weselect and

For the locality model, we determined that a choice of, and will produce an

average node degree of 3.5. We have also experimented withvariations in and in the locality method. We observed thesame general trends as in the Waxman method: more shortedges (higher results in longer hop diameter, shorter lengthdiameter, and more bicomponents. The differences in lengthdiameter can be extreme in the locality method, sinceand

ZEGURA et al.: COMPARISON OF GRAPH-BASED MODELS FOR INTERNET TOPOLOGY 777

TABLE IIISELECTED PARAMETERS FOR EACH FLAT RANDOM METHOD

can be chosen to force the extremes ofno edges of lengthgreater (or less) than

We summarize the choice of parameters for each methodin Table III.

C. Hierarchical Methods

One of the advantages of the hierarchical methods is theirability to generatelarge graphs efficiently while maintaininglow average node degree. The flat methods require the averagenode degree to grow asgrows in order to generate connectedgraphs with reasonable efficiency. The increase in averagenode degree in turn affects other parameters such as diameterand number of bicomponents. In contrast, the formula for theaverage node degree of a transit-stub graph (ignoring “extra”edges) is

where and are the edge densities (number of edges pernode) of the transit and stub domains, respectively.6 Thus, it ispossible to parameterize the random graph generation methodsto achieve almost any overall average node degree.

Consider now the effect of varying the number and sizesof the domains for the transit-stub and-level methods, with

Note that there is no obvious way to “normalize” theedge probabilities when the number and size of domains isvaried.7 Thus, we make observations about the general trendsthat hold over large portions of the edge probability space;some extreme values of edge probability may generate graphsthat do not follow these general trends.

In the transit-stub method, we vary the number and size oftransit domains, while keeping the number and size of stubsfixed, then vary the number and size of stub domains whilekeeping the number and size of transits fixed. For example,Fig. 7 shows metrics for three configurations of 200-nodetransit-stub graphs with average node degree approximately3.5. Note that for this comparison, we are using the diametermetrics that construct the routes using the routing policy

6In our implementation, the randomization of the number of nodes in thedomains tends to increase the number of edges slightly over what the aboveformula predicts. This is because the number of possible edges in a graphgrows quadratically with the number of nodes. Thus, although the meannumber of nodes per domain is preserved by the randomization process,the number of additional edges contributed by domains with a greater-than-average number of nodes will be greater than the deficit left by domains withfewer-than-average edges. If parameters are always chosen so that domainsize is limited, this randomization effect will also be limited.

7One possibility is to keep constant the ratio of the number of intra- tointer-domain edges. However, one can easily come up with other plausiblealternatives. We are not convinced that there is a “best” way to do thisnormalization.

Fig. 7. Varying domains in transit-stub graphs.

weights. The configurations are as follows:

If we call ts0 the baseline, thents1 correspondsto fewer/larger transit domains andts2 corresponds tomore/smaller stub domains.

From this example, and others, we observe the following.

• More/smaller transit domains yield more bicomponentsand larger hop and length diameter. This is consistentwith intuition (more and smaller transit domains result inlonger paths to get from one stub domain to another).

• More/smaller stubs yield more bicomponents andsmallerhop and length diameter. Thus the effect on diameters isopposite for stubs than for transit domains. This is alsoconsistent with intuition (smaller stub domains result inshorter paths within the stub, while the length of the pathsacross the transit domains remain constant).

The two-level method has just one type of domain. Moreand smaller domains in this model have the same effect asmore/smaller stubs; that is, more/smaller domains lead to morebicomponents and smaller hop and length diameter.

We selected the following parameters to use in comparingthe hierarchical methods to the flat random methods. Eachtransit-stub graph has one transit domain of four nodes, twostub domains per transit node, and 12 nodes per stub domain.None of these graphs has any extra transit-stub or stub–stubedges, so each stub domain connects to exactly one transitdomain. Each two-level graph has ten domains with an average

778 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 5, NO. 6, DECEMBER 1997

Fig. 8. Comparison of flat random and hierarchical methods.

of ten nodes each. In both hierarchical methods, the purerandom method was used to generate all of the domain graphs.

VI. M ETHOD COMPARISONS

In this section, we compare one graph generation methodto another. We pay particular attention to the metrics thatserve asdiscriminators, demonstrating one method to beclearly different from another. Our comparison is based on 100connected 100-node graphs for each method, using parameterschosen so that the average of the graph average node degreesis 3.5 across the graphs of each type, as described in Section V.

A. Graph Properties

Fig. 8 shows box plots of topological metrics for the fourrandom methods (rand, wax, local, exp) and the two hierarchi-cal methods (tstub, 2lev). The salient features of these plotsare the following:

• The most significant difference in the flat methods is inthe length diameter. For the parameters chosen, the purerandom method is clearly different from the other models,with much longer length diameter. The random method isinsensitive to edge length when adding edges to a graph,while the other methods (with the given parameters) favorshorter edges over longer edges. The random method istherefore more likely than the other methods to have longedges and long paths.

• The exponential method is next most likely to have longpaths. The exponential method does include length in theprobability of an edge; however, for the parameters cho-sen, the probability falls off more slowly with increasingedge length than in the other methods.

• The hierarchical methods differ from the flat methodsmost significantly in the bicomponents metric, with nearlytwice as many bicomponents, on average, in hierarchi-

Fig. 9. Comparison of hierarchical methods.

cal graphs. The hierarchical graphs also tend to havelonger hop diameter and shorter length diameter. Thehierarchical structure explains the diameter results: edgesare mostly constrained to be (short) intra-domain edges,resulting in longer hop-based paths but more direct (thusshorter) length-based paths.

• The two types of hierarchy reflected in transit-stub andtwo-level methods differ from one another on several met-rics, with transit-stub graphs tending to have more bicom-ponents and longer length diameter. For these parameters,the transit domains serve to “separate” stub domains,leading to more bicomponents and longer length-basedpaths.

To further compare the two hierarchical methods, we lookat a larger topology.8 We consider graphs of 600 nodes,with average node degree 4.1. We normalize across the twohierarchical methods by fixing the number and size of thedomains to 75 domains of eight nodes each. In the transit-stubmethod, three of these domains are transit domains and 72are stub domains, with an average of three stubs per transitnode. We further select the edge probability parameters sothat the average number of “lowest” level intradomain edgesare approximately the same. Thus, the number of intrastubedges in transit-stub is approximately equal to the number ofintraneighborhood edges in two-level.

Fig. 9 shows properties for the hierarchical methods, in-cluding diameter using policy weights to construct the paths.Two transit-stub results are shown. Those labeled tstub haveno extra edges, while those labeled tstub-ex have five extratransit-stub edges and five extra stub–stub edges. These resultsshow that transit-stub graphs tend to have more bicomponents

8Note that the hierarchical methods are well suited to the generation oflarge topologies; indeedn = 100 nodes is somewhat too small to generatean “interesting” hierarchical graph.

ZEGURA et al.: COMPARISON OF GRAPH-BASED MODELS FOR INTERNET TOPOLOGY 779

than two-level graphs, though as expected the difference isreduced as extra edges are included. The policy-hop diameteris essentially the same for all three results, but the policy-length diameter is more than 50% larger in the two-levelmethod. The difference in policy-length diameter can beattributed to the lack of a “backbone” in the two-level graphs.Routes in a two-level graph may need to traverse multipleneighborhoods to get from source to destination, while thetransit-stub routes are more likely to make good use of thebackbone to construct shorter routes. Further, the policy-lengthdiameter is insensitive to the addition of extra edges; the extraedges will only reduce the diameter of a given graph if theyhappen to provide an alternate, shorter route between nodesthat are currently separated by the longest shortest path.

B. Statistical Analysis

We now turn to statistical analysis of the topological metricdata for the models. Our aim is to go beyond the qualitativevisual comparisons of the previous section to provide a rig-orous test for determining the extent to which two methodsdiffer with respect to the characteristics of the graphs theygenerate. We quantify the similarity of the graphs generatedwith different methods by performing pairwise comparisonsof the methods for each metric. Given two methods and ametric, these comparisons are designed to answer the followingquestions:

• Does the metric data come from the same distribution?• What is the probability that a random instance generated

by one method has a smaller value for the metric than arandom instance from the other method?

1) Statistical Methods:To answer the first question, weutilize the Kolmogorov–Smirnov two-sample test (KS test)[12] which tests the hypothesis that two independent samplesare drawn from the same population. For a particular metric,if the distribution of all methods appears identical, then theselection of the method for that metric is less important;conversely, for methods whose metric values are not the same,extra consideration must be given to the appropriate methodto use. To determine method interrelationships, we perform atwo-sided KS test with for each metric, pairwiseover the generation methods. Thus, the null hypothesis is thatthe samples are drawn from the same distribution, and the testwill indicate whether the null hypothesis can be rejected withconfidence level

Distribution testing allows only a yes/no answer about thesimilarity of the metric value distributions for two methods.We want to extend our analysis to answer the question ofhowsimilar or dissimilar the graphs generated by the methodsare. In addition we want to derive a measure of confidencein conclusions of metric value relationships between methods.That is, how confident are we in concluding that the metricvalue for one method is less than the metric value for another?To determine this, we assess the probability that a randomlygenerated instance from methodhas a lower value of metric

than a randomly generated instance from methodThecloser thisintermetric probability is to 0.5, the more similarthe methods are, relative to metric A high (low) probability

indicates that the value for metric of an instance of islikely to be lower (higher) than the value for 9

Let and be the probability distribution functionsof generation methods and respectively, for metricFor any random instance of graph modelsand with metricvalue and respectively, the probability that is at most

is given by

We used the measured distribution data from the experimentsin the previous section to numerically perform the integration.

2) Results: For conciseness, we combine the results of theKS test and intermetric probability into a single-table format.We opt for a pictographic representation of the intermetricprobability data using a “pie” that is proportionally more filledin to represent larger probabilities.10 For the KS-test, we addan asterisk to those tabular entries that do not reject the nullhypothesis of two sample sets having the same distribution.

Each table contains results comparing a variety of genera-tion methods for a particular metric. Each row correspondsto a particular graph size, and each column correspondsto a particular pair of methods.11 Within a column, twovalues for average node degree are considered, 3.5 and 5.0.Each entry in the table gives the pie chart representing theintermetric probability. Pie charts with an asterisk indicate thatthe two methods being compared have equivalent metric valuedistributions according to the KS test. N/A entries indicate thatwe did not perform experiments for this particular graph sizeand node degree.

As an example, consider the top entry in the first columnof Table IV. For this entry, 100 random method graphsare compared to 100 locality method graphs, all with 50nodes and average node degree 3.5. The pie indicates thata random graph has approximately a 62.5% chance of havinga smaller hop diameter than a locality graph, assuming bothgraphs have 50 nodes and an average node degree of 3.5.The lack of an asterisk indicates that the KS-test rejectedthe hypothesis that these random and locality graphs havethe same distribution of hop diameter.

Tables IV–VI give the flat-to-flat comparison for threediameter metrics: 1) the hop diameter; 2) the length diameter;and 3) the hop-length diameter. Recall that the hop-lengthdiameter is determined by first constructing shortest pathsusing hop count as the distance metric, then determining thelength of the longest resulting path using the Euclidean lengthof the edges as the length metric. Hop-length diameter is usefulsince routing algorithms often rely on hop count to constructpaths, but Euclidean length is proportional to propagationdelay on the path.

9Other methods exist for comparison of sample distribution (e.g.,�2 [17]).We chose to use a method that is somewhat more simple to evaluate and isnot sensitive to factors such as selection of bin size.

10Clearly some information is lost by this representation, however, webelieve it actually makes it easier to reach meaningful conclusions about theresults.

11The columns are labeled Pr(X<Y ) whereX andY are the first letter ofthe method names being compared. For example, Pr(R<L) means that thetable entry gives the intermetric probability that random is less than locality.

780 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 5, NO. 6, DECEMBER 1997

TABLE IVHOP DIAMETER FOR FLAT-TO-FLAT COMPARISON

Tables VII and VIII compare the flat methods to thehierarchical methods for the hop diameter and the lengthdiameter, with transit-stub denotedT and two-level denotedN.For the hierarchical-to-hierarchical comparisons, we only use600-node graphs with node degree 4.2. Table IX compares thetransit-stub and two-level hierarchical methods using the hopand length diameters, in addition to the bicomponent metric.

The interesting observations which can be made from thesetables are the following.

• We confirm, using statistical methods, some of the ob-servations we made in Section VI-A based on visualinspection of the boxplot data. For example, the differ-ences between the flat methods are more profound for thelength diameter measure than the hop diameter measure.The pies in Table V are mostly empty or nearly entirelyfilled, while the pies in Table IV are closer to half filled.

• The comparisons tend to be preserved with changes ingraph size and average node degree for the parametersselected.

• The comparison of the flat methods based on lengthdiameter is strikingly similar to the comparison based onhop-length diameter, with both metrics revealing signifi-cant differences across methods.

• The ordering of the methods differs significantly withrespect to length- and hop-based evaluation metrics. Infact, the most likely ordering is generally inverted ingoing from one to the other. For example in Table VII(hop diameter) all entries are close to one, while in TableVIII (length diameter) almost all are close to zero. Thereason for the inversion is the difference in probabilityof long paths for the various generation methods. That is,graphs with the most long edges will most likely have thesmallest hop diameter; however, graphs with mostly longedges will most likely not yield the most direct Euclideanpath.

• There are significant differences between the two-levelmethod and the transit-stub method as demonstrated inthe pair-wise comparisons of Table IX. Note again theinversion of probabilities for hop- and length-based met-rics.

VII. M ULTICAST ROUTING

Comparisons of methods using topological metrics hasprovided evidence of differences between graph generation

TABLE VLENGTH DIAMETER FOR FLAT-TO-FLAT COMPARISON

TABLE VIHOP-LENGTH DIAMETER FOR FLAT-TO-FLAT COMPARISON

TABLE VIIHOP DIAMETER FOR FLAT-TO-HIERACHICAL COMPARISON

TABLE VIIILENGTH DIAMETER FOR FLAT-TO-HIERACHICAL COMPARISON

TABLE IXPr(T <N): HIERARCHICAL COMPARISON

methods; however, ultimately it is important to know if thesedifferences affect performance for real networking problems.That is, how do methods compare for problems of interest?

ZEGURA et al.: COMPARISON OF GRAPH-BASED MODELS FOR INTERNET TOPOLOGY 781

To determine this effect, we consider the problem of multicastrouting. We assume the reader is familiar with the concepts ofmulticasting and center-based (or shared-tree) multicast rout-ing. (See [8] for details.) Many different generation methodshave been used to study multicast routing algorithms, includingthe Waxman method [21] and well-known topologies such asthe ARPAnet [22]. Thus, it is important to determine the effect,if any, the method has on results.

A. Multicast Routing

Many metrics exist for evaluating multicast routing algo-rithms and policies. For simplicity, we consider the packet-hops and maximum delay, as defined in Section IV. In ourexperiments, we consider ten graphs, each 200 nodes withaverage node degree 5.0, generated by three methods (ran-dom, exponential, and transit-stub). We chose these particularmethods to have representation that includes the pure randommethod, a flat method that is sensitive to edge length, and ahierarchical method.

To test the effect of graph type on performance, we performa series of “runs” on each graph. Each run consists ofgenerating a multicast instance with 1–50 sources and 1–50receivers. The total number and identity of the sources andreceivers are chosen randomly. After an instance has beengenerated, we evaluate the performance (packet hops andmaximum delay) with each node as the center of the sharedtree. We record the performance for the optimal center, aswell as the performance averaged over all centers. Finally,we normalize the optimal and average performance valuesby taking the ratio to the performance of the shortest pathmulticast routing trees for the instance. For each graph, weconstruct 100 such runs. Within a method, we combine theresults of the 100 runs over each of the ten graphs to get 1000data points per method.

B. Results

Fig. 10 gives the box plots for the ratio of the optimal andaverage center performance to the shortest path performancefor packet hops and maximum delay. Note that while theaverage center ratios are somewhat similar for all methods,the optimal center ratios differ significantly between the flatand hierarchical methods in the following manner.

• For optimal packet hops and delay, the means differbetween the random and transit-stub by 13% and betweenthe exponential and transit-stub by 12%; however, themeans for optimal packet hops and delay between the flatgraphs only differ by 1%.

• The optimal packet hops and delay results for the transit-stub graphs exhibit a much tighter distribution than eitherflat method.

• The hierarchical variance is an order of magnitude lessthan either flat variance for the optimal measures and halffor the average measures. This indicates that, at least forthese measures, fewer transit-stub graphs may be neededto achieve a significant sample size in analyzing resultsfrom the transit-stub graphs than from the flat graphs.

Fig. 10. Multicast performance.

• Using mean equivalence hypothesis testing (-test [15])with we determine the means of the flatmethods for the average packet hops and delay measuresto be equivalent, but the means of either flat methodcompared to the transit-stub method are not equivalent.Note that our large sample size (1000 graphs/runs) relaxesthe need for normality in this type of hypothesis testing[14].

Fig. 11 shows the distribution of multicast performance foreach metric. Note that, as expected, the frequency distributionsfor the two flat methods are very similar, and the transit-stubfrequency distribution is dissimilar to the flat methods. Alsonote that the frequency distribution for the transit-stub methodis tightly centered around 1.0 for both optimal metrics. Withoutmore extra transit-stub or stub–stub edges, the structure of thetransit-stub graph limits the routing tree possibilities; therefore,the optimal center routing tree is similar to the shortest pathrouting tree.

This demonstrates that generation method does have asignificant effect on performance in an actual networkingproblem. Specifically, problems involving optimality shouldpay close attention to the method selected because the methodaffects relative performance as well as performance variability.

VIII. C ONCLUSIONS AND FUTURE WORK

We set out to examine and improve upon methods forgenerating graphs to model Internet topology. We have ac-complished the following.

• Surveyed and analyzed the behavior of graph generationmethods commonly used in studies of networks.

782 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 5, NO. 6, DECEMBER 1997

The literature contains numerous random and regulartopologies used to study networking algorithms, withlittle guidance for the user of network models. We haveoutlined the role of the parameters in each method andincluded some plots that can be used for parameter lookupby users of these methods.

We conclude that the pure random method is signif-icantly different from the three other random methods.The other methods can all be parameterized to favor shortedges over long edges. The pure random method has nosuch control, therefore it generates edges that are (com-paratively) longer, leading to longer paths (in terms ofEuclidean distance). Given the tendency toward locality ininternetworking connectivity, methods that include edgelength in determining edge probability are more realistic.For the parameters we chose, the exponential methodis next most likely to have long edges; the other twomethods are similar in all properties we considered.

• Identified a fundamental limitation in all of the randomgraph methods.

We found that it is not practical to generate evenmoderate-sized random graphs (e.g., ) that areconnected and have realistic average node degrees (e.g.,less than about six), without resorting to methods ofrepairing unconnected graphs. Based on what we knowof real topologies, this limitation is significant, especiallyif a model is to be used to obtain quantitative results thatare valid for the Internet.

• Developed a hybrid generation method, the transit-stubmethod, capable of creating large graphs by composingsmaller random graphs.

By imposing a structure resembling the administrativestructure of the Internet, the transit-stub method allowscreation of large random graphs having realistic averagenode degree. Moreover, edge weights can be assigned tothese graphs in such a way that intra- and inter-domainpaths in the graph behave in a realistic manner. (See theAppendix for details on the assignment of edge weights.)Finally, the transit-stub method allows relatively directcontrol over attributes such as hop diameter and averagenode degree.

• Compared flat random and hierarchical methods based onquantitative measures, including topological metrics andmetrics specific to multicast routing.

As the Internet continues to grow in size and importance,realistic models of network topology will be critical for mean-ingful assessment of all kinds of algorithms and policies. Onegoal of our future work is to understand how measurementstaken on modest-sized graphs can bescaled to apply tomuch larger networks. Work is in progress to establish apublic repository of information about real network topologies,including both graph models and measurements. The graph-generating tools described in this paper are already in placeand being used by various research groups. Fig. 11. Multicast Performance Distribution.

ZEGURA et al.: COMPARISON OF GRAPH-BASED MODELS FOR INTERNET TOPOLOGY 783

APPENDIX

TRANSIT-STUB EDGE WEIGHT CALCULATION

The edge weights to be calculated and assigned are:

Weight of a transit–transit interdomain edge;Weight of a transit-stub interdomain edge;Weight of a stub–stub interdomain edge.

Note that all edges of the same type are given the same weight.Also, all intradomain edges are given unit edge weight. Bymaking the weights of theinterdomain edges sufficiently large,we guarantee that the path between any two nodes in the samedomain will remain within that domain.

In calculating the weights of the interdomain edges, thefollowing quantities, taken from the constructed graph, areused:

Parameter Meaning

Diameter of transit-domain connectivity

Maximum diameter of any transitdomain graph.Maximum diameter of any stub domain

The following constraints guarantee the desired localitycharacteristics:

Constraint Result

Intrastub preferred over anyinterstub path.Intratransit preferred over anyintertransit path.No shortcut via a stubnode connected to any pair oftransit nodes.Every all-transit path preferredover any path with three stubs.Direct connection betweentwo stubs may be preferredsometimes.

By assigning the following values, the above constraintsare satisfied:

ACKNOWLEDGMENT

The authors would like to thank the anonymous reviewersfor their helpful comments.

REFERENCES

[1] J. Behrens and J. J. Garcia-Luna-Aceves, “Distributed, scalable routingbased on link-state vectors,” inProc. ACM SIGCOMM’94, 1994, pp.136–147.

[2] J. Bellamy,Digital Telephony. New York: Wiley, 1991.[3] S. Bhattacharjee, K. Calvert, and E. Zegura, “Extending random graph

theory to topologies with hierarchy and locality (in preparation),”College of Computing, Georgia Tech., Tech. Rep., 1997.

[4] B. Bollobas, Random Graphs. Orlando, FL: Harcourt Brace Jo-vanovich, 1985.

[5] J. A. Bondy and U. S. R. Murty,Graph Theory with Applications.Amsterdam: North-Holland, 1976.

[6] L. Brakmo, S. O’Malley, and L. Peterson, “TCP Vegas: New techniquesfor congestion detection and avoidance,” inProc. ACM SIGCOMM’94,1994, pp. 24–35.

[7] K. Calvert, E. Zegura, and M. Doar, “Modeling Internet topology,”IEEE Commun. Mag., June 1997.

[8] K. L. Calvert, E. W. Zegura, and M. J. Donahoo, “Core selectionmethods for multicast routing,” K. Makki and N. Pissiou, Eds.,Proc.ICCCN’95, IEEE Computer Society Press, Sept. 1995, pp. 638–642.

[9] D. Clark, “Policy routing in Internet protocols,” Internet Request forComments 1102, May 1989.

[10] S. E. Deering and D. R. Cheriton, “Multicast routing in datagraminternetworks and extended LANs,”ACM Trans. Comput. Syst., vol.8, no. 2, pp. 85–110, May 1990.

[11] M. Doar and I. Leslie, “How bad is naive multicast routing?,” inProc.IEEE INFOCOM’93, 1993, pp. 82–89.

[12] J. E. Hill and A. Kerber,Models, Methods, and Analytical Proceduresin Education Research. Detroit, MI: Wayne State Univ., 1967.

[13] D. Mitzel and S. Shenker, “Asymptotic resource consumption in mul-ticast reservation styles,” inProc. ACM SIGCOMM’94, 1994, pp.226–233.

[14] D. C. Montgomery,Design and Analysis of Experiments, 3rd ed. NewYork: Wiley, 1991.

[15] B. Ostle and R. W. Mensing,Statistics in Research, 3rd ed. Ames, IA:Iowa State Univ., 1975.

[16] C. Partridge, D. Waitzman, and S. Deering, “Distance vector multicastrouting protocol,” Internet Request for Comments 1075, Nov. 1988.

[17] V. Paxson, “Empiricially derived analytic models of wide-area TCPconnections,”IEEE/ACM Trans. Networking, vol. 2, no. 4, pp. 316–336,Aug. 1994.

[18] S. Sibal and A. DeSimone, “Controlling alternate routing in general-mesh packet flow networks,” inProc. ACM SIGCOMM’94, 1994, pp.136–147.

[19] D. Sidhu, T. Fu, S. Abdallah, R. Nair, and R. Roltun, “Open shortest pathfirst (OSPF) routing protocol simulation,” inProc. ACM SIGCOMM’93,1993, pp. 53–62.

[20] D. C. Verma and P. M. Gopal, “Routing reserved bandwidth multi-pointconnections,” inProc. ACM SIGCOMM’93, 1993.

[21] B. M. Waxman, “Routing of multipoint connections,”IEEE J. Select.Areas Commun., vol. 6, no. 9, pp. 1617–1622, 1988.

[22] L. Wei and D. Estrin, “The trade-offs of mutlicast trees and algorithms,”in Int. Conf. Computer Communications and Networks, Aug. 1994.

[23] C. Williamson, “Optimizing file transfer response time using theloss-load curve congestion control mechanism,” inProc. ACMSIGCOMM’93, 1993, pp. 117–126.

[24] W. T. Zaumen and J. J. Garcia-Luna Aceves, “Dynamics of distributedshortest-path routing algorithms,” inProc. ACM SIGCOMM’91, 1991.

Ellen W. Zegura (M’93) received the B.S. degree in computer science andthe B.S. degree in electrical engineering, both in 1987, the M.S. degree incomputer science in 1990, and the D.Sc. degee in computer science in 1993,from Washington University, St. Louis, MO.

She has been a faculty member in the College of Computing at GeorgiaTech. since 1993. Her research interests are wide-area Internet workingand high-speed switching, with an emphasis on tailoring the networkingenvironment to the application.

Kenneth L. Calvert (M’91), for a photograph and biography, see p. 459 ofthe August 1997 issue of this TRANSACTIONS.

Michael J. Donahooreceived the B.S. and M.S. degrees in computer sciencefrom Baylor University, Waco, TX, in 1991 and 1993, respectively. Currently,he is working toward the Ph.D. degree in computer science at Georgia Tech.

His research interests include adaptation of applications and networkservices to facilitate the use of multicast.

Mr. Donahoo is a member of ACM.