13
1 On the Feasibility of Identifying Elephants in Internet Backbone Traffic K. Papagiannaki, N. Taft, S. Bhattacharya, P. Thiran, K. Salamatian, C. Diot Abstract—The elephants and mice phenomenon is known as one of the few invariants of Internet traffic. No systematic method has been proposed so far for defining a boundary to separate flows into two distinct classes. We consider four single-feature classification methods and examine network- level prefix traces collected from a Tier-1 backbone. We il- lustrate that one cannot simultaneously achieve consistency in the fraction of total load due to elephants and the num- ber of elephants. These methods also result in short aver- age holding times in the elephant state. To avoid reclassifi- cation due to short term fluctuations we propose a two fea- ture classification method that incorporates a metric, called latent heat, that is a function of the difference in elephant bandwidth and the class separation threshold over a num- ber of time intervals. Our first contribution is to illustrate the volatility of elephants and hence the difficulty in isolat- ing them for traffic engineering purposes, especially if one wants to achieve consistency with respect to load, number and duration of elephant flows. Our second contribution is to propose a classification method that achieves a reasonable tradeoff between fluctuations in elephant load and fluctua- tions in the number of elephants while simultaneously yield- ing longer average holding times. I. I NTRODUCTION Traffic engineering of IP networks has attracted a lot of at- tention in the recent past. Network operators are seeking ways in which they could make optimal use of their net- works’ capacity, without introducing complexity and main- taining the simplicity of the packet switching paradigm. With that in mind, researchers have proposed different al- gorithms for traffic engineering tasks that could lead to bet- ter utilized networks by exploiting certain properties of the underlying traffic [2], [6], [10], [12]. Studies of the Internet traffic at the level of network pre- fixes, fixed length prefixes, TCP flows, AS’s, and WWW traffic, have all shown that a very small percentage of the flows carries the largest part of the information [2], [6], [10], [12]. This behavior is known as “the elephants and mice phenomenon”, and is considered to be one of the few invariants of Internet traffic [7]. In statistics, this phe- nomenon is also called the “mass-count disparity”. The “mass-count disparity” states that for heavy-tailed distri- butions the majority of the mass in a set of observations is concentrated in a very small subset of the observations [3]. In order to be able to exploit such a property for traffic en- gineering purposes, one has to be able to identify which flows carry the majority of the bytes. Examples of traffic engineering applications include re-routing [10] and load balancing of elephant flows. The attraction of this ap- proach to traffic engineering is that by treating a relatively small number of flows differently one can affect a large portion of the overall traffic. For this to be practical, ele- phant flows would need to remain elephant flows for rea- sonably long periods of time, so that a traffic engineering application need not change its policy or state frequently. To the best of our knowledge no systematic way has been proposed so far, for choosing the criterion that isolates the high-volume flows. The criterion should allow one to con- sider any particular flow and to decide if it should be cate- gorized as an elephant or a mouse. Other studies have se- lected a particular criterion and then examined other prob- lems given this fixed criterion. For example, in [5] an ele- phant is any flow whose rate is larger than 1% of the link utilization; and in [9] an elephant is any flow whose peak rate exceeds the mean plus three standard deviations of the aggregate link flow. These studies do not address the ques- tion of how to choose the criterion for defining an elephant. Instead they focus on, given a fixed criterion, how should one sample high-volume flows [5], and on how to mathe- matically model the two types of flows [9]. Our aim is to separate out a small number of high-volume flows that account for most of the traffic, from a very large

Internet Backbone Traffic

Embed Size (px)

Citation preview

1

OntheFeasibilityof IdentifyingElephantsin

InternetBackboneTraffic

K. Papagiannaki,N. Taft, S.Bhattacharya,P. Thiran,K. Salamatian,C. Diot

Abstract—The elephantsand mice phenomenonis known asone of the few invariants of Inter net traffic. No systematicmethod has been proposedso far for defining a boundaryto separateflows into two distinct classes.We consider foursingle-feature classificationmethodsand examinenetwork-level prefix traces collectedfr om a Tier-1 backbone. We il-lustrate that onecannot simultaneouslyachieve consistencyin the fraction of total load due to elephantsand the num-ber of elephants. Thesemethods also result in short aver-ageholding times in the elephant state. To avoid reclassifi-cation due to short term fluctuations we proposea two fea-tur e classificationmethod that incorporatesa metric, calledlatent heat, that is a function of the differ encein elephantbandwidth and the classseparation thr esholdover a num-ber of time intervals. Our first contribution is to illustratethe volatility of elephantsand hencethe difficulty in isolat-ing them for traffic engineeringpurposes,especiallyif onewants to achieve consistencywith respectto load, numberand duration of elephant flows. Our secondcontribution isto proposea classificationmethodthat achievesa reasonabletradeoff betweenfluctuations in elephant load and fluctua-tions in the number of elephantswhile simultaneouslyyield-ing longer averageholding times.

I . INTRODUCTION

Traffic engineeringof IP networkshasattracteda lot of at-tentionin the recentpast. Network operatorsareseekingways in which they could make optimal useof their net-works’ capacity, withoutintroducingcomplexity andmain-taining the simplicity of the packet switching paradigm.With that in mind, researchershave proposeddifferental-gorithmsfor traffic engineeringtasksthatcouldleadto bet-terutilizednetworksby exploiting certainpropertiesof theunderlyingtraffic [2], [6], [10], [12].

Studiesof the Internettraffic at the level of network pre-fixes,fixed lengthprefixes,TCP flows, AS’s, andWWWtraffic, have all shown thata very small percentageof theflows carriesthe largestpart of the information [2], [6],

[10], [12]. This behavior is known as“the elephantsandmice phenomenon”,and is consideredto be one of thefew invariantsof Internettraffic [7]. In statistics,this phe-nomenonis also called the “mass-countdisparity”. The“mass-countdisparity” statesthat for heavy-tailed distri-butionsthemajority of themassin a setof observationsisconcentratedin averysmallsubsetof theobservations[3].

In orderto beableto exploit suchapropertyfor traffic en-gineeringpurposes,onehasto be able to identify whichflows carry the majority of the bytes. Examplesof trafficengineeringapplicationsincludere-routing[10] andloadbalancingof elephantflows. The attractionof this ap-proachto traffic engineeringis thatby treatinga relativelysmall numberof flows differently one can affect a largeportion of the overall traffic. For this to be practical,ele-phantflows would needto remainelephantflows for rea-sonablylong periodsof time, so thata traffic engineeringapplicationneednot changeits policy or statefrequently.

To thebestof our knowledgeno systematicway hasbeenproposedsofar, for choosingthecriterionthat isolatesthehigh-volumeflows. Thecriterionshouldallow oneto con-siderany particularflow andto decideif it shouldbecate-gorizedasanelephantor a mouse.Otherstudieshave se-lecteda particularcriterionandthenexaminedotherprob-lemsgiventhis fixedcriterion. For example,in [5] anele-phantis any flow whoserateis larger than1% of the linkutilization; andin [9] an elephantis any flow whosepeakrateexceedsthemeanplusthreestandarddeviationsof theaggregatelink flow. Thesestudiesdonotaddresstheques-tion of how to choosethecriterionfor defininganelephant.Insteadthey focuson, given a fixedcriterion,how shouldonesamplehigh-volumeflows [5], andon how to mathe-maticallymodelthetwo typesof flows [9].

Our aim is to separateout a smallnumberof high-volumeflows thataccountfor mostof thetraffic, from avery large

SPRINTATL TECHNICAL REPORT TR01-ATL-110918 2

numberof flows that contribute negligibly to the overallload. In therestof thepaperwe refer to theformerastheelephant� class,andthelatterasthemouseclass. Thetermelephantsand mice hasbeenuseddifferently throughouttheliterature.In thiswork, elephantsandmicearedefinedaccordingto their averagebandwidth(as opposedto filesizesor flow durations).In [9] they usetheterms � and

�flowswhenthisphenomenonis appliedto bandwidthratesof 5-tupleflows.

We investigatea numberof schemesfor defininga thresh-old valuethatallowsoneto separateout theelephantsfromthemice.Wecollectpacketmeasurementsfrom Sprint’s IPbackbone,andstudyflow bandwidthsat thenetwork prefixlevel. Initially weconsidertwosingle-featureclassificationmethods. The first methodexploits the propertythat thenetwork-level prefix flow bandwidthdistribution is heavytailed. It identifiesthethresholdseparatingelephantsfrommice,asa cut-off point in theComplementaryCumulativeDistribution Function(CCDF). This techniquemakes noassumptionsabouthow high or low the thresholdshouldbe. Instead,it definesthe thresholdas the first point intheflow bandwidthdistribution afterwhich power-law be-havior canbewitnessed.Thesecondmethodclassifiesaselephantsthe high-volumeflows whosecumulative band-width is equalto a certainfraction of the overall traffic.This is motivatedby practicalityasit is simplefor carriersto implement.

We use four performance metrics to evaluate theseschemes.Our findingsillustratethevolatility of elephantsin anumberof ways.Thenumberof elephantsat any timeis highly variable,elephantflows needfrequentreclassifi-cation,andelephantsdo not remainelephantssufficientlylong for traffic engineeringapplications.We quickly con-sidertwo othersinglefeatureclassificationschemes(with-out examiningall four metricsin detail), to illustratethattheseother methodsalso yield highly volatile elephants.A comparisonof all theseschemesrevealsan interestingtradeoff, namely that it is difficult to keepboth the ele-phant load and the numberof elephantsconsistent(i.e.,smallfluctuations)at thesametime.

Becausepersistencein time is an importantproperty, andsimplesinglefeatureclassificationschemescannotachievethis, we next proposea two featureclassificationmethod.We suggesta metric, called latent heat, that incorporatestemporalbehavior of aflow. By addingthis into theclassi-ficationwe avoid unnecessaryreclassificationof flowsdueto short-termfluctuationsthat do not changea flow’s es-sentialnature. We show that this schemeachievesa bet-ter tradeoff in termsof fluctuationsin the elephantloadandthe numberof elephants,while simultaneouslyyield-

ing elephantflowswith muchlargeraverageholdingtimesin theelephantstate.

In SectionII, wedescribethedataanalyzedthroughoutthepaper. In SectionIII we presentour methodology, andinSectionIV, we presentinitial results.We improve on theproposedapproachesin SectionV, and characterizetheidentifiedelephantnetwork prefixesin SectionVI. Wecon-cludein SectionVII.

I I . MEASUREMENT ENVIRONMENT

Thedatausedin this papercomesfrom packet tracescol-lectedin the coreof Sprint’s Tier-1 IP backbonenetwork[8]. Optical splitters are usedin conjunctionwith pas-sivemonitoringequipmentto collect44-byteheadersfromevery IP packet traversingthe monitoredlink. Monitor-ing equipmenthasbeeninstalledin threemajor POPsintheUSA. Theanalysisdescribedthroughoutthepaperhasbeenperformedon6 tracescollectedonOC-12andOC-48links, yielding similar results.In this paperwe presentre-sultsonly from two differentOC-12 links. The specificsof thosetwo tracesare presentedin Table I. The linksusedare two hops away from the peripheryof the net-work, and traffic is capturedon its way towardsthe coreof thenetwork. Therefore,traffic towardsspecificdestina-tionsshouldexhibit a sufficient level of aggregation. Theutilization levels of thesetwo links aregiven in Figure1(The times displayedin the figuresarealways in EDT.).Weselectedthosetwo particularlinks becausethey exhibitdifferentbehavior, which hasimplicationson theclassifi-cationprocessaswill beseenin latersections.Thefirst oftheselinks providesadatasetthatvariesin asmoothfash-ion throughoutthe day. The secondlink exhibits a morepronouncedburstyor busybehavior.

Tracelabel

Starttime Endtime LinkSpeed

westcoast

Jul 24 08:00:352001EDT

Jul 29 02:42:552001EDT

OC-12

eastcoast

Jul 24 08:00:342001EDT

Jul 25 13:21:552001EDT

OC-12

TABLE IDESCRIPTION OF COLLECTED TRACES.

The first issueaddressedis the granularity level of theflowsto beclassified.For thepurposeof intra-domaintraf-fic engineeringthenaturalgranularitylevel is that of net-work prefixesappearingin routingtables.Othercandidates

2

SPRINTATL TECHNICAL REPORT TR01-ATL-110918 3

05:00 09:00 13:00 17:00 21:00 01:00 05:00 09:000

10

20

30

40

50

60

70

Jul 24, 2001 Jul 25, 2001

Thr

ough

put o

f tot

al a

ggre

gate

(M

bits

/sec

)

west−coasteast−coast

Fig. 1. Link utilization for thewestandeastcoasttrace.

05:00 09:00 13:00 17:00 21:00 01:00 05:00 09:000

2000

4000

6000

8000

10000

12000

Jul 24, 2001 Jul 25, 2001

Num

ber

of a

ctiv

e ne

twor

k pr

efix

es

west−coasteast−coast

Fig. 2. Numberof active network prefixesfor thewestandeastcoasttrace.

could include flows definedby their destinationaddress,a fixed prefix length,the usualfive-tuple(sourceaddress,destinationaddress,sourceport, destinationport, protocolid), or theAS level.

Network prefixes is a natural candidatefor a few rea-sons.First,suchflowscaneasilybemanipulatedby simplychangingthenext hopaddressin theroutingtablefor thatparticularflow. Second,routingpoliciestendto beappliedto a network prefix asa whole,sincenetwork prefixesarethesmallestroutableentitiesin theInternet.Thereforeanychangein policy would affect theentireflow aswe defineit. This would not thecase,for example,for flows definedby destinationaddressor prefixesof a fixedlength.Third,it hasbeenobservedthatelephantsandmicedoexist at thislevel of granularity[6].

In parallel to the packet tracecollection, we collect theBGProutingtablesatthecorrespondingPOPs.ThoseBGPtablesaredefault-freeandcontainapproximately120Ken-tries. We calculatethe volume of traffic headedtowardseachBGPdestinationandcomputetheaveragebandwidthof eachflow over5 minutetimeintervals.Weselect5 min-

0 200 400 600 800 10000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of top flows

Fra

ctio

n of

tota

l ban

dwid

th

Fig. 3. Cumulative Distribution Functionof flow bandwidthsfor thefirst 125-minuteintervalsof theeastcoasttrace.

100

102

104

106

10−4

10−3

10−2

10−1

100

Bandwidth (Bytes/sec)

Com

plem

enta

ry C

umul

ativ

e D

istr

ibut

ion

Fun

ctio

n

west coasteast coast

Fig. 4. ComplementaryCumulative Distribution Functionofflow bandwidthsfor theeastandwestcoasttrace.

utesasthe measurementinterval, sincethis is the defaulttime interval at which SNMP information is usuallycol-lected. We find that in any given measurementinterval,approximately90%of thenetwork prefixeshave no traffictraveling towardsthem. We definea flow to beactiveif itreceivesat leastonepacket duringthemeasurementinter-val. Figure2 presentsthenumberof activeprefixesateachmeasurementinterval over a two day period. The obser-vation that thereareonly a few flows active at any giventime slot is furtherconfirmationthatthis aggregationlevelis appropriatefor the identificationof elephantsin that itproducesalimited numberof flowsonwhichcomputationsneedto becarriedout.

The cumulative distribution function (cdf) for the band-width of the derived flows is presentedin Figure3. Eachcurve correspondsto a different5 minute interval withinthefirst hourof the trace. We seethatat any time duringthishour, thetop100flowsaccountfor 50%to 70%of thetotal traffic, while thetop 400flows accountfor 75%-85%

3

SPRINTATL TECHNICAL REPORT TR01-ATL-110918 4

of the total traffic. This confirmsthat elephantsandmicedo exist in ourdatasetat thechosengranularitylevel.

Wefurtherexamineourdatasetfor heavy-tail properties.InFigure4 we presentthecomplementarycumulative distri-bution functionplot of thebandwidthvaluesof our flows.Thestraightline at theendof thedistribution indicatesthatit may be heavy-tailed. We usethe aesttestdesignedbyCrovella andTaqqu(describedin [4]) to estimatethescal-ing parameterof thedistribution of theflows’ bandwidths.We find a scalingparameterwhosevaluesvary between1.03 and1.2, verifying that the bandwidthdistribution isindeedheavy-tailed. In SectionIII, wedescribetheproper-tiesof theheavy-tail distributions,andwedeviseamethodin which we makesuseof thesepropertiesin theclassifi-cationprocess.

I I I . SINGLE FEATURE CLASSIFICATION

Using our packet-level measurements,we aggregatetraf-fic into “flows”, basedon its network prefix asannouncedthroughBGP[11]. This givesusa setof flows,andanas-sociatedbandwidthfor eachflow. We now describetech-niquesthatcouldbeusedto classifytheseflows into “ele-phants”,and“mice”.

Beforeproceedingwe introducesomenotation. Let � de-notetheindex of anetwork prefixflow, i.e.,aBGProutingtable entry. Let � denotethe length of the time intervalover which measurementsare taken. Time is discretizedinto theseintervals, and � is the index of time intervals.Wedefine����� � to betheaveragebandwidthof thetrafficdestinedto aparticularnetwork prefix � duringthe ����� timeslotof length � . Weuse ��� to representthesetof all flowsconsideredelephants,and ��� to representthesetof flowsconsideredmice.

Our methodologyconsistsof two phases:1) thresholdde-tectionphase,and2) thresholdupdatephase.In the firstphase,we seekfor a bandwidthvalue ���� � , suchthat if aflow’s averagebandwidthin the chosentime interval ex-ceedsthe threshold,it is classifiedasan elephant.Other-wise, it is considereda mouse. In the secondphase,weupdatethethreshold,i.e., at time ����� , basedon thecur-rentthresholddetectedpluspastthresholdvalues.

A. ThresholdDetectionPhase

Previous attemptsat classifyingflows into elephantsandmice usea singleclassificationfeature,which is either1)bandwidth[6], [12], or 2) duration[10]. Given that weare interestedin high-volume flows, for the first stepof

ourmethodologywe focuson techniquesthatidentify ele-phantsbasedon their bandwidthalone. Issuesregardingthedurationof theelephantflows areinvestigatedin laterstages.

In thedetectionphaseof ourclassificationprocess,we an-alyzethepropertiesof thedatasetcollectedandcalculateathresholdvalue ���� � accordingly. Thisvalue ���� � is likelyto changewith time alwaysisolatingthoseflows thatcon-tribute thehighestamountof informationin eachtime in-terval.

We proposetwo differenttechniquesto identify the initialthresholdvalue.

� The first method takes into accountthe heavy tailpropertyof thecollecteddataset.Thethresholdvalueis setin sucha way sothata flow is characterizedasanelephant,only if it is locatedin thetail of theflowbandwidthdistribution.� Thesecondtechniqueis moreintuitive, andcouldbepreferredin operationalenvironments.It requiresthesetting of an input parametercorrespondingto thefraction of total traffic we would like to placein theelephantclass.Thethresholdis setin suchaway thatall the flows exceedingit accountfor the requestedfractionof thetotal traffic.

1) AEST approach: A random variable X follows aheavy-taileddistribution (with tail index � ) if

�! �#"%$'&�(*)+$-,'.�/1032�$54768/19;:<�=:<> (1)

where ) is a positive constant,and where ( meansthatthe ratio of the two sidestendsto 1 as $?4 6 . AESTis a methodfor the estimationof the scalingparameter�proposedby Crovella andTaqqu[4].

The particularvalue of � is important in many practicalsituations,andanumberof methodsarecommonlyusedtoestimateits value. For instance,valuesof � lower than1indicatethat the randomvariableX hasan infinite mean.Valuesof � between1 and2 indicateinfinite variance.

TheAEST methodidentifiestheportionof a dataset’s tailthatexhibitspower-law behavior. It’sbasedonthefactthattheshapeof thetail determinesthescalingpropertiesof thedatasetwhenit is aggregated. By observingthe distribu-tionalpropertiesof theaggregatesonecanmakeinferencesaboutwherein thetail power-law behavior begins.Weusethismethodto identify thefirst point in thedistribution af-ter which power law propertiescanbe witnessed.We setthis particularbandwidthvalueasthethresholdvaluesep-aratingelephantsandmice in that given measurementin-

4

SPRINTATL TECHNICAL REPORT TR01-ATL-110918 5

terval.

In otherwords, if we define @A �$�� as the complementarycumulative distribution function of the flow bandwidths,i.e. @A �$�� = 1 - F(x) = P[X " x], then ���� � = B$ , sothat

B$DCFE;G�HI JLKNMPO @A �$��JLKNMPO $ (RQ��1�

TheAEST methodologyhastheadvantagesof beingnon-parametricandeasyto apply. Moreover, it hasbeenshownto berelatively accurateonsyntheticdatasets.Moreimpor-tantly, therehasbeenevidencethatthescalingestimator, ascalculatedby AEST, appearsto increasein accuracy asthesizeof the datasetgrows. Given that our datasetsfeatureat leasttwo thousandmeasurements,sucha techniquewillprovide us with reasonablyaccuratevaluesfor the cutoffpoints in the heavy tail flow bandwidthdistribution mea-suredin eachtime interval.

2) Constantload approach: This secondapproachis amorepragmaticapproachtowardsidentifying the thresh-old bandwidthvaluesseparatingelephantsfrom mice. Foreachtime interval � , we definelocal threshold���� � , suchthat theflows exceedingthis thresholdaccountfor � % ofthe total traffic on the link. We sort the flows andstart-ing from the largest, we proceeddown the list addingflows into the elephantclass. We stop when the volumeof the elephantclassreaches� % of the link load. If werank bandwidthsin ascendingorder, let S denotethe in-dex of a flow, and T denotethe highestbandwidthflowin timeslot n, then ���� �UCV��W , � �� � where X is cho-sensuchthat: Y WZ\[�] � Z �� �^:_�a`bY �Z\[�] � Z �� � , and

Y W , �Z+[�] � Z �� �c"dC*�e`cY �Z+[�] � Z �� �\f

B. ThresholdUpdatePhase

To make useof thethresholdderived for engineeringpur-poses,weclassifythedatain timeslot ��d�g�h� accordingto���� � . This maynot beadequatebecausewe maynot havea sufficiently large numberof measurementsin eachtimeinterval to yield representative estimates.In addition,weexpectthatcorrelationsexist acrosstimeslots(shown laterin SectionIV-B) andarethusmotivatedto allow pastdatato influenceour choiceof threshold. We thuscalculateamoregeneralclassificationthreshold B���� � usinga simplefirst orderauto-regressive filter on ���� � asfollows.

B����i�*�h�jCkl�mQ � �n`oB���� ��� � `j���� � (2)

Wecall this theupdaterule. We foundthatavalueof� C

93fp> leadsto a sufficiently smooth B���� � . We now usethethreshold B���� � to classifythe traffic during the ��q�R�h�lrs�

time interval. In otherwords,thefinal decisionrule is thefollowing.

if � � �� �t"kB���� � then �juq��� else��uq� � (3)

C. ThresholdEvaluation

In mostapplicationsof classificationeachdataobject in-herentlybelongsto someparticularclass.Thisclassis un-known to theobserver whosejob is to guesscorrectlythetrue stateof nature. Our situationis different in that theflows do not naturallybelongto a particularclass.Insteadweareimposingacategorizationontoourflowsbecauseitis usefulfor traffic engineeringpurposes.Becauseof this,thetraditionalperformancemeasuresbasedonmisclassifi-cationerrorscannotbeused.Theevaluationof thetypeofclassificationtechniquesproposedhereshouldbebasedonthe needsof the particulartraffic engineeringapplicationusingthe classification.The first two metricswe usefortheidentificationof theelephantsarethefollowing.

1) The fraction of total traffic apportionedto the ele-phantclassshouldbesufficiently large.

2) Thenumberof elephantflows shouldbereasonablysmall, say lessthan a number

A, where

Ais con-

strainedby theroutercapabilities,anddenotesarea-sonablenumberof flows for which the router cankeepstate.

D. Persistenceof Classificationin Time

We explainedearlier that for traffic engineeringpurposeswe would like flows to retaintheir classaslong aspossi-ble. Ideally the majority of the elephantsshouldremainelephantsfor long periodsof time, i.e. in theorderof fewhours. Intuitively this meansthat we would like to clas-sify aselephantsthoseflows whosevolume is above theidentifiedthresholdconsistently.

Thelengthof timethatanelephantcanremainanelephantis both a function of the flow itself andof the classifica-tion. It is a functionof theclassificationin thesensethata particularhigh-volumeflow will remainin the elephantclassas long as the continuallyadjustingthresholdstayslower that its averagebandwidth. Becausetraffic on theInternetis highly variable,we don’t expectflows to retaintheir classificationindefinitely. Weknow thattherewill bemany flows thatwill burst for a smallamountof time,andbecomequietafterwards.

To definea performancemetric to capturethepersistenceof theclassificationin time, we proceedasfollows. Notethat the classificationschemewe have proposed,induces

5

SPRINTATL TECHNICAL REPORT TR01-ATL-110918 6

the following underlyingtwo-stateprocesson eachflow.Let vw��� �xCy� if �tug� � , and vw�l�� ��Cz9 if �tu{��� . At eachclassification| time interval, theprocesseithertransitionstotheotherstateor staysin thesamestate.Wedefinethefol-lowing two performancemetricsin termsof thisprocess.

3) For eachflow � , we calculatetheautocovariancefora lag of 1, namely � M �}�l�h�tCR~ vw��� �n`tvw�l��D�F�h�s& .Thismetricreflectshow predictabletheclassof flow� is at time �e�R� basedon its classat time � . Theclassificationschemewith the largerautocovarianceis theonein whichtheflowsaremorelikely to retaintheir classification.

4) For eachflow � , we calculatethe averageholdingtime the flow spendsin the elephantstate,and itsaverageholding time in the mousestate. The clas-sificationschemethat yields larger holding timesismoreattractive for ourpurposes.

IV. RESULTS

In this sectionwe presentthe resultsderived applyingthetwo methodsdescribedin the previous section. We com-parethetwo methodswith respectto thefour performancemetricsidentified. We discusstemporalissuesrelatedtoclassificationsuch as the duration of the validity of theclassification,and the timescaleof the measurementandclassificationintervals.

A. ClassificationResults

Figures 5, 6, and 7 presentthe thresholdvalues com-putedusing the two proposedtechnques,the numberofelephants,and the fraction of total traffic apportionedtoelephantsfor bothtraces.

For the west coasttrace,the thresholdobtainedwith the“constantload” approachfluctuatesthroughoutthedaybe-tween1 KBytes/secand2.5KBytes/sec.As alreadyshownin SectionII, thewestcoasttraceexhibits high utilizationlevels during the peakperiod (Figure 1). A burst whichis notaccompaniedby similarburstybehavior in thenum-berof active network prefixes(Figure2). This impliesthattheburst in link utilization is not dueto new network pre-fixesbecomingactive, but ratherdueto existing networkprefixes receiving traffic at higher rates. Given this risein thebandwidthsachieved by thealreadyactive networkprefixes,it is expectedthatnow thethresholdvalueisolat-ing theflows contributing 80%of the total traffic is muchhigher.

The “aest” approachprovides a lower estimatefor thethreshold. The reasonwhy the thresholdis lower‘ in the

09:00 13:00 17:00 21:00 01:00 05:00 09:00 13:000

500

1000

1500

2000

2500

3000

3500

4000

Jul 24, 2001 Jul 25, 2001

Thr

esho

ld (

Byt

es/s

ec)

constant load (west coast)aest (west coast)constant load (east coast)aest (east coast)

Fig. 5. Derivedthresholds�������� for 0.8-constantload, andaest.

10:00 13:00 17:00 21:00 01:00 05:00 09:00 13:000

200

400

600

800

1000

1200

1400

Jul 24, 2001 Jul 25, 2001

Num

ber

of e

leph

ants

constant load (west coast)aest (west coast)constant load (east coast)aest (east coast)

Fig. 6. Numberof elephantflows for 0.8-constantload, andaest.

“aest” approachis becausemany of the flows in thepeakperiodhave now beenpositionedin the tail of the band-width distribution.

The effect of the burstinessin the utilization of the westcoasttraceis that thenumberof elephantsidentifiedwithboth techniquesis much larger. Given that the “aest”thresholdis muchlower thanthe “constantload” one,thenumberof elephantsidentifiedwith “aest” is muchhigher,thusaccountingfor asmuchas90%of the total traffic onthelink.

For the smoothereastcoasttrace,wherethe trend in thelink utilizationis accompaniedby similar trendin thenum-berof active network prefixes,the thresholdvaluescalcu-latedwith bothtechniquesbehave similarly. Thethresholdvaluesareslightly higherfor the“aest”approach,account-ing for lesselephants,andsmallerfractionof thetotal traf-fic. It is importantto notice,though,thatfor theeastcoasttrace,the total amountof traffic apportionedto elephants

6

SPRINTATL TECHNICAL REPORT TR01-ATL-110918 7

10:00 13:00 17:00 21:00 01:00 05:00 09:00 13:000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Fra

ctio

n of

tota

l tra

ffic

appo

rtio

ned

to e

leph

ants

constant load (west coast)aest (west coast)constant load (east coast)aest (east load)

Fig. 7. Fractionof total traffic accountedto elephantsfor 0.8-constantload, andaest.

is aproximatelyconstantwith time, andequalto 80% forthe “constantload” (imposedby thetechniqueitself), and70%for the“aest” approachrespectively.

From the previous discussionit can be easily seenthatin casesof bursty link utilization, maintaininga constantamountof traffic in the elephantclassleadsto a volatilenumberof elephantflows,andequalyburstythresholdval-ues. Moreover, in suchcases,the performanceachievedusingthe two suggestedtechniquesis muchdifferent. Inthecaseof smoothtraffic thetwo approachesachievecom-parableresults.

B. TimeBehaviorof Classification

As discussedin SectionIII-C, we theflows to retaintheirclassificationaslong aspossible.We say“as long aspos-sible” becauseon theonehandwe wantto avoid updatingstatetoofrequently, but on theotherhandwedon’t wanttokeepflows in a particularstateif their bandwidthchangessignificantly. To assessthis issueof thepersistency of theclassificationwith time, we examineour last two metrics,namely, theautocorrelationvalueat lag 1, andtheaverageholdingtimesin theelephantstate.Wecomputetheseval-uesfor the peakhours,which is the periodduring whichpersistency is perhapsmostlyneeded.Thepeakhoursarebetween12pmand5pmEDT.

TableII shows thatthe vw��� � processesachievehighvaluesof autocorrelationfor bothapproachesandfor bothtraces.Whathasto benotedis that thevaluespresentedin TableII arehighly biasedtowardsthe performanceof the miceflows. The total numberof destinationprefixes that be-comeactive at somepoint throughoutthe durationof thewestcoasttraceis 30,241out of which theelephantsvarybetween454 and994 (dependingon the time interval � ).In otherwords,only asmallnumberof theautocorrelation

EastCoast WestCoast

AEST constantload AEST constantload

min. 0.8611 0.8687 0.8497 0.85835% 0.9789 0.9786 0.9737 0.9769avg. 0.9827 0.9823 0.9814 0.981995% 0.9833 0.9833 0.9833 0.9833max. 0.9913 0.9913 0.9906 0.9912

TABLE IIAUTOCORRELATION VALUES AT LAG 1 FOR PEAK HOURS.

EastCoast WestCoast

AEST constantload AEST constantload

min. 1 1 1 120% 1 1 1 150% 1.5 1.5 2 2avg. 3.72 4.1 8.63 8.0980% 3.5 3.5 12 10.5max. 60 60 60 60

TABLE IIIAVERAGE HOLDING TIMES IN THE ELEPHANT STATE FOR PEAK

HOURS (IN 5-MINUTES SLOTS).

valuescorrespondto elephantflows. This is oneof therea-sonswe weremotivatedto look at theholdingtimesof theclassificationprocess.

Wecomputetheaverageholdingtimesfor eachflow in theelephantstatefor thefive hourbusy period. Thedistribu-tion of theaverageholdingtimesis shown in Figure8. Thex-axisvaluesrepresenttime slotsof 5 minuteseach(so avalueof 12 correspondsto 1 hour). TableIII presentsthebasicstatisticsfor theaverageholdingtimesin theelephantstate.We seefrom this tablethat theaverageholdingtimeis approximately20 minutesfor the eastcoasttrace,and40 minutesfor thewestcoasttrace. Moreover, morethan80%of theflowshaveaverageholdingtimeslessthanhalfthemaximum.

It is clear from Table III andFigure8 that few flows re-main elephants“forever” (i.e., the durationof the wholepeakperiod). It is naturalto askhow well pastbehaviorcanpredictfuturebehavior. In particularwe would like toknow what is thelikelihoodthatanelephantthathasbeenanelephantfor acertainamountof timewill remainanele-phant“forever”. To measurethiswecalculatetheprobabil-ity thatanelephantwill remainanelephantfor thewholedurationof thepeakperiod,giventhat it hasstayedin the

7

SPRINTATL TECHNICAL REPORT TR01-ATL-110918 8

0 10 20 30 40 50 6010

0

101

102

103

104

Average holding time in the elephant state

Num

ber

of o

ccur

ence

s (lo

g)constant load (west coast)aest (west coast)constant load (east coast)aest (east coast)

Fig. 8. Averageholding time in the elephantstatefor peakhours(in 5-minutesslots).

elephantstatefor a givennumberof intervals. In Figure9,we presentthoseprobabilitiesfor bothtracesandfor bothapproaches.

Figure9 shows thatfor theburstywestcoasttracethepre-dictability of theelephantsis very limited. Flows that re-main elephantsfor 36 intervals (i.e. 3 hours)during thepeakperiodhave a probabilityof lessthan0.2 of remain-ing in that statefor the whole 5 hours. For theeastcoasttraceelephantsshow ahigherlevel of predictability, andifthey remainelephantsfor 3 hours,they have a probabilitygreaterthan0.5of remainingelephantsfor thewholedura-tion. Therefore,Figure9 providesanotherindicationthatnetwork prefixflowsareveryvolatile,andoncepersistencyis themostdesirablepropertyof theclassification,simplyclassifyingflows basedon theirbandwidthat eachintervalis not sufficient. In practicalsettingssuchsmallprobabil-ities of successfullyidentifying elephantsin thefuturehaslimited applicability.

Theseresultshave implications for other methodspro-posedin the literature. For example, in [5] the authorscreatestatefor eachflow thatexceedstheir threshold,andthey keepthatstatefor thelifetime of theapplication.Thatwouldn’t makesensefor ourdatasincelessthan5%of theflows classifiedaselephantsin any instantremainedele-phantsfor theentire5-hourperiod.Weconcludethatflowsneedto bereclassifiedfairly frequently.

We saw in the previous subsectionthat in order to main-tain a consistentamountof load in theelephantclass,thenumberof elephantsis quitevolatile. Thefactthataverageholding timesareshortis anotherindicatorof thevolatil-ity of elephants.If elephantsareindeedvolatile, thentheycouldneedto bereclassifiedoftenif thegoalis to maintainaconsistentloadin theelephantclass.

0 10 20 30 40 50 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of sequential intervals in the elephant state

Pro

babi

lity

of "

alw

ays"

bei

ng a

n el

epha

nt

constant load (west coast)aest (west coast)constant load (east coast)aest (east coast)

Fig. 9. Probability of “always” remainan elephantfor 0.8-constantload, andaest.

108 110 112 114 116 118 120 1220

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Time interval (5 mins)

Fra

ctio

n of

tota

l tra

ffic

appo

rtio

ned

to e

leph

ants

constant load (update = 1hr)aest (update = 1hr)constant load (update = 30 mins)aest (update = 30 mins)

Fig. 10. Effect of infrequentclassificationon the load of theelephantclass.

Until now we have consideredreclassifyingflows every 5minutes.In orderto evaluateothertimescalesfor reclassi-fication,weconsidertwo othercases,namelyreclassifyingevery 30 minutesand1 hour. Even thoughwe reclassify-ing at theseintervals,we continueto calculatethesepara-tion thresholdvaluesevery5 minutes.Figure10shows theamountof traffic apportionedto elephantsusingbothtech-niqueswith theselongerreclassificationtimes. This helpsusto understandhow fastthelist of flows belongingto theelephantclassbecomesstale. Regardlessof the methodusedfor classification,we seethatwhenflowsareupdatedat longerintervals, the fractionof total traffic apportionedto the elephantclassmay drop dramatically, in particularit canfall aslow as40% within 25 minutesfrom the lastclassificationinterval. Flowsclassifiedaselephantsat timeslot 109 (Figure 10) experiencelower bandwidthsin thenext intervals,while flows thatmayexperiencehigh loadscannotbeplacedin theelephantclass,until thenew updateperiod. Moreover if we updatetoo slowly, we canexperi-encea significantjump in theelephantloadat the time ofreclassification(e.g.,at time interval 115). Note that not

8

SPRINTATL TECHNICAL REPORT TR01-ATL-110918 9

updatingthelist of elephantssufficiently oftenmayendan-ger the efficacy of the traffic engineeringapplicationbe-causeit may no longerbe giving differential treatmenttothelargestflows.

C. OtherSingle-Feature ClassificationTechniques

Thetwo metricsfor thenumberof elephantsidentified,andthefractionof total traffic apportionedto theelephantclassareclearly related.The targetof our secondclassificationapproachis to maintaina constantelephantload. In thissection,we report resultsaboutsimple classificationap-proachesthat couldbeselectedandwould targetat main-taininga constantnumberof elephantsor a thresholdcon-sistentlyequalto a specificfraction of the link utilizationor capacity. Thissectionshouldnotbeconceivedasacom-pleteevaluationof the techniquesdescribed,but a simpleillustrationof how alternative techniqueswouldperform.

A simpleway to identify the high-volumeflows in a net-work is to collect thebandwidthdistribution for thedesti-nationprefixesin eachinterval, andto classifythe top �flowsateachinterval aselephants.Forourwestcoasttrace,weset � equalto 100(approximatelythenumberof flowsconsistentlyin theelephantclassfor the5-peakhours),andmeasurethefractionof total traffic apportionedto theele-phantclass1. For comparisonreasons,we also calculatethe thresholdvalueas the bandwidthof the smallestele-phant. This is the bandwidthvaluesflows would have toexceedto beclassifiedaselephantsin thethreshold-basedapproaches.

Ourresultsindicatethatwith suchatechniquetheelephantloadmayfluctuatebetween35and94%of thetotal traffic.This pointsto an importanttradeoff. We saw in Figures6and7 thatmethodsthatsucceedin keepingthefractionoftotaltraffic in theelephantclassreasonablyconsistentyieldlargefluctuationsin thenumberof elephants.In this alter-nateapproachthat focuseson keepingthenumberof ele-phantsconstant,we observe large fluctuationsin the loaddueto elephantflows. It thusappearsdifficult to keepboththeelephantloadandthenumberof elephantfrom experi-encingfluctuationssimultaneously.

The correspondingthresholdrangefor this methodwas1to 12 KBytes/secdependingon the time of day, exhibit-ing similarbusyperiods,astheutilizationof therespectivelink. An importantresultof this classificationapproachisthatonly 11flowsremainedelephantsfor thewhole5-hourpeakperiod, while 97% of the other elephantflows fea-turedholdingtimesof lessthan24 intervals (i.e. 2 hours).

�Thenatureof our resultsdid notchangefor differentvaluesof � .

During this 5-hourperiod,approximately1800flows be-cameelephantsat somepoint (in both traces).Sinceonly11 of theseremainedelephantsfor the entireperiod, theprobability that a given flow in the setof top 100 at anygiveninterval remainsamongthetop 100flows for a verylongdurationis minimal.

Wepresentresultswhenthethresholdvalueis setequalto1% of theutilization of themonitoredlink, asproposedin[5] (1% of thecapacityof our OC-12links wasnever ex-ceededby a singleflow in our traces). Oncewe set thethresholdvalue in such a way, then the numberof ele-phantsidentified is limited between3 and 23 flows, andtheamountof traffic they correspondto fluctuatesbetween10 and60%of the total traffic. Themaximumfractionoftraffic is apportionedto the elephantclassin the off-peakhours,when the utilization is much lower, and thereforeeasierto exceed.Suchaclassificationprocessoffersmuchlower autocorrelationvaluesthanany otherclassificationtechnique,while themaximumholdingtimeattheelephantclassis 60 5-minuteintervals, achieved by only 2 flows.All theotherelephantflows achieve holding timesof lessthan6 intervals (i.e. 36 minutes).Thestatisticsfor theav-erageholding timesachieved with the four single-featureclassificationmethods,described,aresummarizedin TableIV.

Averageholdingtimestatistics

min. 20% 50% avg. 80% max.AEST 1 1 2 8.63 12 60constantload 1 1 2 8.09 10.5 60constantnum-ber

1 1 1.2 3.47 2.75 60

1%utilization 1 1 1 4.12 3 60

TABLE IVSTATISTICS FOR THE HOLDING TIMES AT THE ELEPHANT STATE

FOR ALL FOUR SINGLE-FEATURE CLASSIFICATION METHODS

(WEST COAST IN 5-MINUTE SLOTS).

D. OtherTimescalesfor MeasurementandClassification

Throughoutthe paperwe have consideredmeasurementand classificationintervals of ��C�� minuteseach. Asstatedin Section III, we chose5 minutes for practicalreasonsbecausethis correspondsto the interval at whichSNMP informationis normally collected. In order to as-sesstheutility of this time scalewe alsoconsidered�iC��and �!CF�}9 minutes.

9

SPRINTATL TECHNICAL REPORT TR01-ATL-110918 10

Table V presentsthe statisticsfor autocorrelationvaluesachieved at lag 1, calculatedover the five peak hours.Thesevaluesaremuchsmallerthan the onesin TableII.At �=C��}9 we seemoreof a distinctionbetweenthe twoschemesthanwe did when �=C�� . We alsocalculatethemainstatisticsfor theaverageholdingtime at theelephantstate,andweobserve thatthe80thpercentileof thederiveddistribution is equalto 3 intervals(i.e. 3 hours).

Therefore, it is evident that even at higher timescales,whenflows’ bandwidthwould beexpectedto behave in asmootherfashion,volatility is themainpropertyof theele-phantflows. We noticethat theautocorrelationat ��Cy�}9is approximately10% lessthanwhat it is at �FC�� , andthenumberof intervalsanelephantmayremainasanele-phantis muchsmallerthantheoneachievedwhen �eCk� .Thoseremarksgive us confidencein that the resultspre-sentedso far arenot an artifact of the timescaleusedbutratherapropertyof thetraffic itself.

We alsoconsiderthe interval ��C�� minute. We find thattheautocorrelationfor ��Cy� and �5Cz� arealmostidenti-cal, indicatingthatthereis notany needto go lower than5minutes.This is only aninitial examinationinto theappro-priatetime scalefor sucha classification;however it givesusconfidencethatwe areusingreasonablevalues.

East-Coast� = 60 mins

AEST 0.8-constantloadmin. 0.5833 0.83335% 0.8333 0.8333avg. 0.8318 0.837395% 0.8333 0.8571max. 0.8889 0.8571

TABLE VAUTOCOVARIANCE VALUES AT LAG 1 FOR THE PEAK 5 HOURS.

V. TWO FEATURE CLASSIFICATION

It is evident from the previous sectionthat regardlessofthe criterion we useto definea separationthreshold,westill seealot of volatility in theelephants.Whatwewanttoavoid is reclassificationof aflow if it transitionsto theothersideof thethresholdfor ashorttime. Largeflowscanhaveshorttransientperiodsin which their volumedrops. Sim-ilarly, small flows canhave short-lived bursts. Althoughmany flows may spendmost their time on the samesideof the separationthreshold,every so often they crossthethresholdfor justoneor two timeintervals.Wedonotwantto reclassifyflowsdueto shorttransientburstsor dips.

This motivatedus to includea secondfeaturein theclas-sificationthatwouldbebasedona temporalcomponentofan elephant’s behavior. For traffic engineeringpurposes,it is important to be able to differentiatebetweenflowsthatcontribute largeamountsof traffic yet mayexperienceshort-lived periodsof reducedbandwidth,and flows thatfall below the thresholdand remaintherefor the rest oftheir lifetime. As a consequence,our reactionto transientdropsbelow the identified thresholdshouldexhibit suffi-cientlatency.

In thermodynamics,“latent heat” is the heat energy re-quiredto changea substancefrom onestateto another. Inorder to be able to identify the points in time, whenele-phantflowsbelow thethresholdchangestateinto mice,wedefinea new metric,which we call “latent heat”. An ele-phantis assumedto accumulate“latent heat”aslong asitexceedsthe identified threshold. Onceit falls below thethreshold,it startslosing“heat”, andchangesstate(i.e. be-comesamouse)whenits “latentheat”becomeslower thanapresetvalue.

In order to achieve this, at eachtime interval, apartfromtheflow’sbandwidthwecalculatethedistancebetweenthebandwidthachievedandthecorrespondingthresholdvalue,derivedusingtheproposedapproachessuchasthe“aest”,andthe “constantload”. We definethe “latent heat” of aflow asthesumof thosedistancesin thepast12 timeslots,i.e. theprevioushour.

�x� ��� ��C � [���� [�� , � �

��������1Q�B������

In eachclassificationinterval �q�?� , we classifyflows asfollows.

1) if ������i�*�h�c"�B���� � and�j� �l�� ��"a9 , �xuq� �

2) if ������i�*�h�c"�B���� � and�j� �l�� ��:a9 , �xuq���

3) if ������i�*�h�c:�B���� � and�j� �l�� ��"a9 , �xuq� �

4) if � � ��i�*�h�c:�B���� � and�j� � �� ��:a9 , �xuq� �

With thefirst condition,we cantrackflows thatsystemati-cally transmitovertheidentifiedthreshold.If theflow hap-pensto transmitunderthethresholdfor a smallnumberofintervals,thenwith condition(3) wemakesurethatit doesnot changeits state.With thefourth condition,we classifyasmicethoseflows thatfall consistentlybelow thethresh-old. If thoseflowshappento transmitoverthethresholdforasmallamountof intervals,thenthey remainmice(condi-tion 2). The “latent heat” metric takes into accounthowmuchbeyond or below the thresholda flow may transmitandthusaccountsfor majorchangesin theflow’sbehavior.In otherwords,if a flow hasbeenidle andstartstransmit-

10

SPRINTATL TECHNICAL REPORT TR01-ATL-110918 11

ting at threetimesthe identifiedthreshold,thenit will beclassifiedas an elephantwithin at least4 time intervals,assuming� thethresholdremainsconstant.

Weclassifyflows basedon thethresholdvaluescalculatedusingthe “aest” approach,andtheir “latent heat” in time.Theresultingnumberof elephantsandfractionof totaltraf-fic apportionedto theelephantclassis presentedin Figures11and12for bothtraces.If wecomparethesefigureswiththoseof Figures6 and7, we seethat with this classifica-tion approachwe cankeepthe load reasonablyconsistent(Figure12) while simultaneouslyexperiencinglessfluctu-ationsin the elephantload (Figure 11). Henceour two-featureclassificationschemeusing latentheatachieves abettertradeoff betweenreducingfluctuationsin eitherele-phantloador thenumberof elephants,thanany of thesin-gle featureclassificationschemes.

Theempiricalprobabilitydensityfunctionfor theaverageholdingtimein theelephantstateis presentedin Figure13.Specificstatisticsaregivenin TableVI. Thereis no longera peakat smallholdingtime values.Recallthat for singlefeatureclassificationwe have averageholdingtimesof 20and40 minutes(dependinguponthetrace),while hereweseeaverageholding timesfor both tracesare1 1/2 hoursor longer. We believe that classificationschemessuchasthis one,thatavoid reclassificationfor short-termfluctua-tions, identifiesthe type of long-lived elephantflows thataregoodcandidatesfor traffic engineeringapplications.

Averageholdingtime statistics

min. 20% 50% avg. 80% max.westcoast

1 7 16.5 23.56 44 60

eastcoast 1 6 11.5 17.42 24 60

TABLE VISTATISTICS FOR THE HOLDING TIMES AT THE ELEPHANT STATE

FOR THE “ LATENT HEAT” APPROACH ON THE AEST THRESHOLDS

(IN 5-MINUTE SLOTS).

VI. PROFILE OF ELEPHANT FLOWS

In thissectionweexaminetheprofileof theelephants.Weconsideredthoseflows thatwereclassifiedaselephantsbyour two featureclassificationmethodthroughouttheentire5 hour peakperiod. As a preliminaryassessmenton theprofile of the elephants,we examinetheir network prefixlengthsandthetypeof organizationthey belongto.

09:00 13:00 17:00 21:00 01:00 05:00 09:00 13:000

200

400

600

800

1000

1200

1400

Jul 24, 2001 Jul 25, 2001

Num

ber

of e

leph

ants

west coasteast coast

Fig. 11. Numberof elephantsderived using the two-featureclassificationmethod.

09:00 13:00 17:00 21:00 01:00 05:00 09:00 13:000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Jul 24, 2001 Jul 25, 2001

Fra

ctio

n of

tota

l tra

ffic

appo

rtio

ned

to e

leph

ants west coast

east coast

Fig. 12. Fractionof total traffic apportionedto elephantsusingthetwo-featureclassificationmethod.

0 10 20 30 40 50 6010

0

101

102

103

104

Average holding time in the elephant state

Num

ber

of o

ccur

ence

s (lo

g)

west coasteast coast

Fig. 13. Averageholding times in the elephantstate,whenclassifyingbasedon theaestthreshold,andthelatentheatof theflow bandwidthin anhour.

11

SPRINTATL TECHNICAL REPORT TR01-ATL-110918 12

0 5 10 15 20 25 3010

0

101

102

103

104

105

106

Network Prefix Length

Em

piric

al D

ensi

ty D

istr

ibut

ion

Fun

ctio

n

west coast active prefixeswest coast elephant prefixeseast coast active prefixeseast coast elephant prefixes

Fig. 14. Network prefix lengthdistribution for theelephants.

One would expect that the network prefixes of the ele-phants would correspondto large domains advertisedthroughBGP. As alreadymentioned,the monitoredlinksare OC-12 links interconnectingbackbonerouters in aPOP, andtraffic is capturedas it travels towardsthe coreof thenetwork. In Figure14 we presentthedensityfunc-tion for the lengthof network prefixesthatbecomeactiveatsomepoint in our trace.Weobserve thatmostof ourac-tivenetwork prefixescomefrom /24networks,followedby/16 and/18. Active networks with prefixesof /8, and/10arelimited to lessthan100. In thesamefigurewe presentsimilarstatisticsfor thenetwork prefixesthatgetclassifiedaselephants.Thelengthsfor theelephantprefixesarecon-tainedin a smallerregion, namelybetween/12 and /26.For the west-coastwe even witnessa /32 network alwaysin the elephantclass. This is a customernetwork usingNAT, advertisinga single IP addresswithin the Sprint IPbackbone.

Finding only two or three(dependingupon the trace)/8flows amongourelephantsis anunexpectedresult.Sucharesultcannotbegeneralizedfor thewholeInternetbecauseit is influencedby theway our network is engineered,andthe way BGP policies impact traffic flow throughoutournetwork. For instance,we comparedthe elephantsiden-tified on the westandthe eastcoast,andwe found that afew network prefixeswereelephantsononly oneof thetwocoasts.This resultwasexpectedsinceBGPpoliciesguidethe entry andexit pointsof our network for traffic that isnotdestinedfor ourcustomers.

In order to get a betterunderstandingabout the type ofbusinessescontributing the most to the overall load, weclassify elephantsinto the following categories: Tier-1,Tier-2, Tier-3 ISPs,corporations,universities,andgovern-mentinstitutions.Thedefinitionof whatconstitutesaTier-1 network is rathercontroversial.Themostpopulardefini-tion is thatTier-1 ISPsarethosenetworksthathave access

to the global InternetRoutingTableanddo not purchasenetwork capacityfrom other providers [1]. The secondtier generallyhasa smallernationalpresencethana Tier1 ISPandmay leasepartor all of its network from a Tier1. Lastly, aTier 3 ISPis typically a regionalprovider withnonationalbackbone.

Accordingto theabove categorization,we profile theele-phantsin ournetwork asshown in TableVII. Thevastma-jority of the elephantnetwork prefixesbelongsto Tier-1,Tier-2, andTier-3 ISP providers. We believe that profil-ing elephantflows is essentialonceone intendsto applyspecifictraffic engineeringpolicieson the traffic destinedtowardsthem.

Tier-1 Tier-2 Tier-3 corp. .edu gov.

westcoast

40 42 28 32 19 6

eastcoast

22 72 27 40 7 7

TABLE VIIPROFILE OF THE ELEPHANT NETWORK PREFIX FLOWS.

Noticethatcertainbusinessesmayadvertisemorethanonenetwork prefixesfor their network. Therefore,the resultspresentedin TableVII do not correspondto uniquebusi-nesses,but ratherto the numberof elephantnetwork pre-fixesin eachcategory.

VII . CONCLUSIONS

In thispaperwe consideredclassificationschemesfor net-work prefix flows that identify elephantsby separatingtheflows into two classescalledelephantsandmice. We con-sideredtwo singlefeatureschemesin depth,andtwo oth-ers summarily. We comparedthesemethodsin termsofthe numberof elephantsthey yield, and the total load intheelephantclass.Weshowedthatsinglefeatureschemesthat lead to persistency for one of thesemetrics, lead tofluctuationsfor theothermetric. Hencethereis a tradeoffbetweenthesetwo metrics.

We evaluatedmetricsbasedon the temporalbehavior ofelephants.We illustratedthe volatility of elephantsthreeways: 1) we found thatflows remainin theelephantstatefor surprisinglyshort periodsof time; 2) the numberofelephantsvariesconsiderablyoverdifferenttime intervals;and3) if elephantsarenot reclassifiedfrequently, thetotalloadin theelephantclasscanchangesubstantially.

12

SPRINTATL TECHNICAL REPORT TR01-ATL-110918 13

Elephantflowsthatarenotpersistentfor sufficientamountsof time, suchasperiodsof time larger thanroutingproto-col con� vergencetimes,maynot beusefulfor traffic engi-neeringapplications.Therefore,classificationtechniquesshould make sure that flows are reclassifiedonly whennecessaryandnot whenthey exhibit short-termtransitionsabove or below thethreshold.

For that reason,we proposeda new classificationmech-anismbasedon two features,a flow’s bandwidthand its“latent heat”. We calculatethe “latent heat” of a flowas follows. We measurethe differencebetweena flow’sbandwidthandtheseparationthresholdat eachtime inter-val, andsuma sequenceof thesedifferencesover multipletime intervals.Thismeasuresmoothesoutshort-termtran-sitionsbecausean elephant’s latentheatwill remainpos-itive despitea brief transitionbelow the threshold.Simi-larly, a mouse’s latentheatwill remainnegative despiteabrief transitionabove the separationthreshold. We showthatwith this two featureclassificationschemeweachievea bettertradeoff betweenour first two metrics. The ele-phantload remainsreasonablyconsistentwhile the num-berof elephantsfluctuateslessthanin single-featureclas-sificationschemes.More importantly, the propsoedtwo-featureschemeyields elephantsthat remainelephantsformuchlongerperiods.With singlefeatureclassificationele-phantsremainedelephantsonaveragebetween20-40min-utes,while using the two featureclassificationapproach,elephantsremainedelephantsfor 1 1/2 to 2 hours.

In profiling the elephantsfrom both our traces,we foundtheelephants’prefixesgenerallylie within a limited rangeof network prefix lengthsbetween/16 and /26. Surpris-ingly, we found few large prefixessuchas/8 and/10. Al-thoughtherewerea few corporations,governmentagen-ciesanduniversitiesamongour elephants,we learnedthatthevastmajority of themareTier-2 andTier-3 ISPs.

Overall, we concludethat while single featureclassifica-tion schemesare attractive in their simplicity, they areprobablyinsufficient for mosttraffic engineeringapplica-tions. While we believe thatour two featureclassificationschemeis more attractive to traffic engineeringapplica-tions, we also concludethat identifying elephantsis notstraightforward despitetheir heavy-hitter nature. This isimportantbecausetheideaof isolatingelephantsfor trafficengineeringpurposeshasbeenwidely proposed,but therehasbeenno prior effort on assessingthefeasibility andis-suesinvolvedin doingso.

REFERENCES

[1] D. Allen. Theimpactof peeringon isp performance:What’s bestfor you? NetworkMagazine, 2001.

[2] S. Bhattacharyya,C. Diot, J. Jetcheva, and N Taft. POP-levelandAccess-Link-Level Traffic Dynamicsin a Tier-1 POP. ACMSIGCOMMInternetMeasurementWorkshop, November2001.

[3] M. Crovella. Performanceevaluationwith heavy taileddistribu-tions. In Lecture Notesin ComputerScience1786, March2000.

[4] M. CrovellaandM. Taqqu.EstimatingtheHeavy Tail Index fromScalingProperties.MethodologyandComputingin AppliedProb-ability, 1999.

[5] C. EstanandG. Varghese. New Directionsin Traffic Measure-ment and Accounting. ACM SIGCOMMInternet MeasurementWorkshop, August2001.

[6] W. FangandL. Peterson.Inter-AS Traffic PatternsandTheir Im-plications.Proc.IEEE Globecom, December1999.Brazil.

[7] S.Floyd. Simulationis Crucial. IEEESpectrum, January2001.[8] C. Fraleigh,C. Diot, B. Lyles, S. Moon, P. Owezarski,D. Pa-

pagiannaki,andF. Tobagi. Designanddeploymentof a passivemonitoring infrastructure. In Proceedingsof Passiveand ActiveMeasurementWorkshop, Amsterdam,April 2001.

[9] S.Sarvotham,R.Riedi,andR.Baraniuk.Connection-Level Anal-ysisandModelingof Network Traffic. ACM SIGCOMMInternetMeasurementWorkshop, August2001.

[10] A. Shaikh,J. Rexford, andK. Shin. Load-Sensitive RoutingofLong-LivedIP Flows. Proc.ACM SIGCOMM, September1999.

[11] J.W. Steward. BGP4: Inter-DomainRoutingin theInternet. Ad-disonWesley, 1999.

[12] S.Uhlig andO. Bonaventure.OntheCostof UsingMPLSfor In-terdomainTraffic. Qualityof Future InternetServices, September2000.Berlin.

13