Inter-Domain MPLS Restoration - Carleton University

Design of Reliable Communication Networks (DRCN) 2003, Banff, Alberta. Canada, October 19-22,2003

Inter-Domain MPLS Restoration

Changcheng Huang, Donald MessierCarleton University, 1125 Colonel By Drive, Ottawa, ON K1S 5B6 Canada

E-mail: [email protected]

Abstract - Several MPLS (Multi-Protocol Label Switcbing)

based protection mecbanisms bave been proposed, sucb asend-to-end patb protection and local repair mecbanism. Tbosemecbanisms are designed for intra-domain recoveries andlittle or no attention bas been given to tbe case of crossing non-bomogenous independent domains. Tbis paper presents anovel solution for tbe setup and maintenance of independentprotection mecbanisms witbin individual domains and mergedat tbe domain boundaries. Tbis innovative solution offerssignificant advantages including fast recovery across multiplenon-bomogeneous domains and bigb scalability. Simulationresults using OPNET are presented sbowing tbe advantageand feasibility of our proposed approach.

Index Terms - Protection, MPLS, Inter-Domain

I. INTRODUCTION

The latency in Internet path failure, failover, and repairhas been well documented over the years. This is especiallytrue in the inter-domain case due to excessively longconvergence properties of the Border Gateway Protocol(BGP) [1]. Research by C. Labovitz et al. [2] presentsresults, supported by a two year study, demonstrating thedelay in Internet inter-domain path failovers averaged threeminutes and some percentage of failover recoveriestriggered routing table fluctuations lasting up to fJ.fteenminutes. Furthermore the report states that "Internet pathfailover has significant deleterious impact on end-to-endperformance-measured packet loss grows by a factor of 30and latency by a factor of four during path restore".Although networks are becoming more and more resilientthere are still frequent network failures that are becoming acause for concern [3][4][5][6]. Future optical networks willcarry a tremendous amount of traffic. A failure in an opticalnetwork will have a disastrous effect. The FCC has reportedthat network failures in the United States with impact onmore than 30,000 customers happen with a frequency in theorder of one every two days and the mean time to repairthem is in th:: order of five to ten hours [7]. With opticalcommunications technologies the problem is made worseand now a single failure may affect millions of users.Strategic planning at Gartner Group suggests at [8] thatthrough 2004, large U.S. enterprises will have lost morethan $500 million in potential revenue due to networkfailures that affect critical business functions. This Internet

0-7803-8118-1103/$17.00 <0 2003 IEEE 341

path failover latency is one of the driving factors behindadvances in MPLS protection mechanism.

Protection and restoration issues have been widelystudied under various contexts such as SONET rings, A TMand optical networks [9][10][1l]. Several recoverymechanisms have been proposed over the last few years.End-to-end schemes provide protection on disjoint pathsfrom source to destination and may rely on fault signalingto effect recovery switching at the source[12]. Local repairmechanisms for their part effect protection switching at theupstream node from the point of failure, the point of localrepair (PLR) and do not require fault signaling[l3][14].Local repair has the advantage of fast recovery, but ingeneral is not efficient in capacity. Path protection, on theother hand, can optimize !pIiTe capacity allocation on anend-t~end basis. Therefore it is typically more efficient.

MPLS being a relatively new technology, the research inadvanced protection mechanism for MPLS is still in itsinfancy. This is especially true for inter-domain protectionmechanism. The research conducted and still ongoing hasidentified several possible solutions to the MPLS intra-domain recovery problem[15]. Each of those solutionspresents its own strengths and weaknesses. As a first cut,MPLS restoration schemes can be separated into on-demand and pre-established mechanism. On-demandmechanism relies on the establishments of new paths afterthe failure event while p~established mechanismcomputes and maintains restoration paths for the duration ofthe communication session. Due to the fast recovery timessought, this work focuses exclusively on pre~stablishedprotection switching. Of the pre~stablished recoverymechanisms, one of the first being implemented in acommercial product is Cisco Systems' Fast Re-route (FRR)algorithm in the Gigabit Switch Router family. FRRprovides very fast link failure protection and is based on theestablishment of pre-established bypass tunnels for allLabel Switch Routers. The FRR algorithm can switchtraffic on a failed link to a recovery path within 20 ms but islimited to the global label assignment case. Several othermethods have been proposed based on individual backupLSPs (Label Switched Path) established on disjoint pathsfrom source to destination. An immediate benefit of end-to-end mechanism is scalability. Reference [16] shows thatgiven a network of N nodes, local repair schemes require

Design of Reliable Conununication Networks (DRCN) 2003, Banff, Alberta, Canada, October 19-22,2003

N*L*(L-l) backup paths to protect a network if each nodehas L bi-directional links. For end-to-end schemes, anetwork with M edge nodes, the total number of backuppaths is proportional to M*{M-l). If M is kept small, asignificant scalability advantage is realized. The followingparagraphs provide an overview of the most promisingintra-domain protection schemes.

The proposal at [17] is an improvement over the one hopCISCO FRR and describes mechanisms to locally recoverftom link and node failures. Several extensions to RSVP-TE are introduced to enable appropriate signaling for theestablishment, maintenance and switchover operations ofbypass tunnels and detour paths. The Fast Reroute methodwill be referred to as Local Fast Reroute (LFR) in thispaper. In the Local Fast Reroute, one-to-one backup LSPscan be established to locally bypass a point of failure.

A key part of this proposal is to setup backup LSPs bymaking use of a label stack. Instead of creating a separateLSP for every backed-up LSP, a single LSP is createdwhich serves to backup a set ofLSPs. Such an LSP backingup a set of primary LSPs is called a bypass tunnel.

The key advantage of LFR is the very fast recovery timewhile its disadvantages are scalability issues due to thepotential large number of bi-directional links andcomplexity in maintaining all the necessary labelassociations for the various protected paths.

The first end-to-end path protection scheme is presentedat [16] and uses signaling ftom the point of failure toinform the upstream LSRs (Label Switching Router) that apath has failed. Here a Reverse Notification Tree (RNT) isestablished and maintained to distribute the fault andrecovery notifications to all ingress nodes which may behidden due to label merging o~ons along the path. TheRNT is based on the establishment of a Path Switch LSR(PSL) and a Path Merge LSR (PML). The PSL is the originof the recovery path while the PML is its destination. In thecase of multipoint-to-point tree the PSLs form leaves andthe PMLs roots of multicast trees. The main advantages ofRNT protection are scalability and efficient use of resourceswhile its disadvantages are long recovery time due to thepropagation of failure notification messages.

Another end-to-end path protection mechanism presentedat [18] is called End-to-end Fast Reroute (EFR). It canachieve nearly the same protection speed as LFR., but isextremely inefficient in terms of bandwidth resource. Itrequires about two times of the bandwidth of the protectedpath for protection path. For more about this approach,readers are referred to [18].

Current research and recent proposals deal with the intra-domain case or assume homogeneity and full cooperationbetween domains. Recognizing the growing need toprovide a solution for the more general case, this paperproposes a new and innovative solution to solve the inter-domain protection problem for LSPs spanning multipleinhomogeneous and independent domains. The proposedsolution is based on the use of concatenated primary and

backup LSPs, protection signaling and a domain boundaryprotection scheme using local repair bypass tunnels. Wewill call our scheme Inter-Domain Boundary Local BypassTunnel (IBLBT) in this paper to differentiate it with othersolutions.

II. PROBLEM STATEMENT AND PROPOSEDSOLunON

The MPLS protection mechanisms presented in Section Iinclude LFR. EFR and RNT. All were designed for intra-domain failure recovery and will generally not functionwhen the primary LSP is not bmmded to a singleadministrative domain. The scalability problem with LFRwill be stretched further if multiple domains are involvedbecause each domain may have hundreds of nodes and linksthat require bypass tunnels for protection. While both EFRand RNT suffer longer delays due to the long LSPs thatspan several domains, EFR becomes more inefficientcompared to RNT because its extra bandwidthrequirements.

A unique issue for inter-domain protection is that,separate domains may not cooperate with each otb:r. Eachdomain is administered through a different authority. Someauthorities, such as carriers, are not willing to shareinformation with each other. Certain critical informationmay have significant impact on the operation of publiccarriers if it is disclosed. For example, failure information istypically considered negative on the image of a publiccarrier and can be exploited by competitors for theiradvantages. Most carriers will consider this informationconfidential and will not likely share this information withtheir customers and other carriers. When an internal failurehappens, a carrier will try to contain this information withinits own domain and try to recover ftom the failure by itself.Both end-to-end RNT and end-to-end EFR require somekind of failure signaling to all the upstream domains.Containing this failure signaling to the originating domainwill make end-to-end RNT and EFR almost impossible.

A complete solution to the inter-domain protectionproblem can be found if we turn the apparent difficulties inend-to-end RNT into advantages. Such is the case for theindependence of domains. Accepting the fact that domainswill be independent and inhomogeneous leads to the idea ofestablishing an independent path protection mechanismwithin each domain while at the same time being able toguarantee protection throughout the path ftom end to end.What is required is a solution at the domain boundaries toensure protection continuity. For the solution to work, eachdomain must provide its O\\n RNT protection schemewhich it initiates, establishes, maintains and hands over tothe next protection domain at the domain boundary. Adomain protection scheme must therefore be executedcompletely within that domain with no involvement fromother domains. The first step towards this solution is to

342

Design of Reliable Communication Networks (DRCN) 2003, Banff, Alberta, Canada. October 19-22, 2003

allow the primary and backup LSPs to be concatenated atthe domain boWldaries. Concatenation in this context isused to mean that specific actions must be taken at thispoint in the LSP to ensure continuity of service andprotection across domain boundaries. Figure 1 illustratesthe fundamental principles behind this solution. Theprimary path PI is protected through three separate backuppaths namely Bl, B2 and B3. Bl is initiated in the sourcedomain, B2 at the domain boWldary and B3 in thedestination domain. Each of those backup paths isindependent of the others and does not require faultnotification beyond its own domain.

This innovative solution permits the isolation of theprotection mechanism to a single domain or domainboWldary. Furthermore, domains can now use independentprotection mechanisms and signaling schemes and do notneed to propagate their intemal failure notifications toadjacent domains. This solution combines the advantages offast local repair at the domain boWldaries and the scalabilityadvantages of end-to-end protection within domains.

In summary, the proposed solution to the inter-domainMPLS recovery problem is based on the establismnent ofindependent protection mechanisms wi thin domains usingconcatenated primary and backup LSPs, minimal protectionsignaling between domains and, local repair at the domainboWldaries. Viewed from end-to-end at figure 1, theprimary LSP is protected by three or more distinct andindependent protection regions merged at their respectiveboWldaries. Those protection regions are the SourceProtection Domain, the Domain Interface Protection and theDestination!rransit Protection Domain. In addition to thosethree protection regions, transit protection regions are alsopossible when a protected LSP transits one or moreindependent domains before reaching its destination. Insuch a case, there would be several domain interfaceprotections in place.

Our solution introduces and makes use of Gateway LSRsand Concatenation Path Switch LSRs (CPSLs) as well asProxy Concatenation PSLs (PCPSL) and Proxy GatewayLSRs (PGL). Those new protection elements are used topre-establish inter-domain local bypass tunnels andguarantee protection against node and link failures whensufficient protection elements are present.

In the following discussions, we assume that there are atleast two separate links connecting two pairs of borderrouters between any neighboring domains. This will allowus to provide protection for two neighboring domainswithout counting on the support of a third domain Wlder thecontext of single point failure. One example that satisfiesthis requirement is shown in figure 2. Our focus is thereforeon removing the interdependency among domains that arenot directly linked and further limiting the dependencybetween neighboring domains as discussed in the nextsection. We will use the scenario of Dual Exit LSRs FullyMeshed (figure 1,) as our example case. The principles ofour solution can be readily applied to all other scenarios

that satisfy the condition stated at the beginning of thisparagraph.

Figure 2 illustrates the topology where a primaryprotected LSP PIA is protected in Domain A via backuppath BIA, protected at the boundary via local backup pathBI and protected in Domain B through backup path BIB.LSR 0 is the selected Gateway LSR for path PI while LSRI is its corresponding POL. LSR 2 is the CPSL for the sameprimary path while LSR 3 is the PCPSL. The POL 81dPCPSL are responsible to maintain end-to~d pathintegrity in the event of a Gateway or CPSL failure. Theselection of the PCPSL and its significance in the recoveryprocess is critical for the operation of this scheme. Thispoint is evident when looking at figure 2. In the figure wenote that Bl and BIB are routed through the PCPSL LSR 3.Although the identification of the PCPSL is simple in aDual Exit LSR topology its role is nevertheless important. Itis the merging of the local inter-domain backup path BI andthe destination domain backup path BIB at the PCPSL LSR3 that permits full and continuous protection acrossdomains. Without this action, recovery traffic on BI wouldbe dropped at LSR 3 since it could not make the necessarylabel association. The merging action of the PCPSL ensureslabel binding between BI and BIB, enabling the recoverytraffic from the Gateway LSR to be switched to thedestination.

ill S IMULA nON RESULTS

To verify the potential for the proposed ffiLBT solution,two separate models were built. The first model implementsMPLS recovery using an end-to-end path protectionmechanism. The model was built using dynamic LSPs. Forthe end-to-end recovery to work it is necessary for all nodesin the model to share a common signaling and recoverymechanism. This is necessary in the extended intra-domainend-t~end scheme since domains have to fully cooperate inthe recovery process. As discussed in previous chapters,this naive extended intra-domain solution would likely notbe found in real networks. Nevertheless, the model is usefulto serve as a comparison point with IBLBT solutionproposed in this paper. In contrast to end-t~end recovery,ffiLBT isolates recovery to the domain boundary or to anindividual domain. The second model built is the modelimplementing the proposed solution with its inter-domainboundary local repair tunnels. All models are run withvarious failure scenarios to collect data on recovery time forfurther analysis and comparison.A. MPLS End-To-End Protection Model

The first MPLS model was built to measure recoverytime for an end-to~d protection case and is represented atfigure 3. As stated earlier, it is recognized that such aninter-domain end-to~d protection mechanism is naive forthe reasons discussed in Section U. However, to obtaincomparative data from such a scheme LSPs wereconfigured using dynamic LSPs and all nodes share the

343

Design of Reliable Communication Networks (DRCN) 2003, Banff, Alberta, Canada, October 19-22, 2003

same signaling and protection mechanism. Traffic wasgenerated using two separate Gigabit Ethernet LANs eachwith menty-five users running high-resolution videoconferencing applications over UDP. Additionalapplications were configured such as heavy database accessand email, file transfers, and print sessions using TCP.Traffic entered the MPLS network at the ingress node LSRO. The Egress and Ingress LSRs were modeled as CISCO7609 while the transit LSRs were CISCO 7000 routers. TheEgress and Ingress LSRs were selected based on thenumber of Gigabit Ethernet ports available for the sourceLANs. IP forwarding processor speeds were increased to50000 packets/see on all nodes to permit higher trafficvolumes for the simulation. High traffic volume wasnecessary to ensure high link utilization for measurementpurposes. Traffic was switched between LSRs based on theForward Equivalency Class (FEe) associated with theincoming traffic and the established Paths. The selectedPrimary Path is shown at figure 3 and follows path LSRs 0-1-4-5-7-8 while the pre-established end-to~d backup LSPfollows LSRs 1-3-6-7-8 (LSR 1 is 1he PSt). All modellinks are OC-12 with 0.8 ms delay for inter-domain linksand 4 ms delay for intra-domain links. This approximates1200 km intra-domain links and 240 km inter-domain links.The average load on the network was kept at approximately125 Mbps.

Several failure scenarios were studied as follows:a) Source domain failure (Link 1-4 failure);b) Domain interface Failure (Link ~5 and node

5 failure);c) Destination domain failure (Link 5-7 failure).

A failure and recovery process was configured in Opnetto effect the planned failures at 170 sec from the simulationstart time. All simulations were run for a total of 180seconds. The 170 seconds time before failure was selectedto ensure sufficient time for all routing processes tocomplete their initial convergence, for traffic generationprocesses to reach steady state prior to the network failure,and for the MPLS processes to establish LSPs after theinitial layer three routing protocol convergence. Thesimulation end time is selected to allow sufficient time forrecovery. and steady states to return while being kept at aminimum to reduce the simulation run time. The largeamount of application traffic generated during thesimulation caused the simulation run time to be in excess ofone hour.

This model makes use of CR-LDP keep-alive messagesto detect node failures while link failures are detectedthrough lower layer Loss of Signal (LOS). The keep-alivemessage interval was configured for 10 ms while the holdoff timer was set at 30 Ms. Those short intervals wereselected arbitrarily but taking into account the OC-12 linerate with a view to reduce packet loss during failover. Upondetecting a failure the node upstream from the point offailure sends an LDP notification message to the sourcenode informing it of the failure and the affected LSPs. Base

on this notification message, the source node switches tothe recovery path. This LDP notification message isencapsulated in an IP packet and forwarded to the ingressnode for action. Several network probes were configured tocollect data on recovery times, routing tables, link statedatabases, traffic in and out of all LSPs as well asforwarding buffer utilization.

For this work, recovery time was measured at the mergepoint of the primary and backup paths (ie: PML). Thisrecovery time is uom the receiver's perspective andrepresents the difference in time between the reception ofthe last packets on the primary path and reception of thef11'8t packets on the recovery path. The recovery timeincludes failure detection time, time for the transmission offailure notification messages, protection-switching time,and transmission delay trom the recovery point to themerge point. To obtain the necessary LSP traffic data forthe measurement of recovery time, LSP output traffic forprimary and backup LSPs at the merge point was sampleevery 100 J.ISec. This sampling time was selected to providesufficient granularity into the recovery process whilemaintaining simulation output files to a manageable size.B. Inter-Domain Boundary Bypass Tunnel Model

In the IBLBT model, the primary and backup paths wereestablished following the proposed inter-domain protectionalgorithms. As described in previous sections and depictedat figure 4, concatenated LSPs were setup within eachdomain with backup paths using bypass tunnels establishedmanually as described in section II. The simulations wererun for 130 seconds with failures programmed for 125seconds. Shorter simulation time is possible with thismodel because static LSPs are used and no setup time isrequired during the simulation. Other than the recoverymechanism, the model was setup identically to the previousend-t~end MPLS model.C. Results Summary

Comparing end-t~end recovery with the IBLBT case isshown in table 1 (results are different with statisticallysignificance). The recovery speed benefits of IBLBT overthe end-~end case would have been much more evidenthad the simulation model included several moreindependent domains. Of course the further away thefailure is trom the point of repair the longer the recoverytime. Given the simplicity of the models in this work, thesignificant advantages of IBLBT could not be exploitedfully against the end-to-end case.

N. CONCLUSIONS

The growing demand for QoS has led to significantinnovations and improvements on the traditional best effortIP networks. Technologies such as MPLS provideimportant advantages over the classical ho~by-hop routingdecision processes. The ability of MPLS to apply equallywell to various layer I technologies including Wavelength

344

Design of Reliable Communication Networks (DRCN) 2003, Banff, Alberta., Canada, October 19-22,2003

Division Multiplexing (WDM) makes this technology astrong contender for current leading edge and futurenetworks. Furthermore, due to its label switchingarchitecture, MPLS can provide very fast recoverymechanism complementing existing lower layer protectionschemes. The development of new techniques to providepath protection at the MPLS layer will certainly continue.The proposed IBLBT protection mechanism presented inthis paper is an innovative and unique scheme to provideprotection across multiple independent domains. It relies ononly a very basic amount of information provided byneighboring domains and makes no assumption onprotection mechanisms of other domains and level ofcooperation. Simulation results show recovery times of afew milliseconds and point to the potential for this proposedsolution for MPLS inter-domain protection.

In general, our solution permits the isolation of theprotection mechanism to a single domain or domainboundary. Furthermore, domains can now use independentprotection mechanisms and signaling schemes and do notneed to propagate their internal failure notifications toadjacent domains. This solution combines the advantages offast local repair at the domain boundaries and the scalabilityadvantages of path protection within domains.

REFERENCBS

[1] Y. Rekhter, TJ. Watson, T. Li, "A Border GatewayProtocol 4 (BGP4)," IETF RFC 1771, March 1995.

[2] C. Labovits, A. Ahuja, A. Bose, F. Jahanian "DelayedInternet Routing Convergence," IEEE/ACMTransactions on Networking, Vol. 9, No.3, June 2001.

[3] T. G. Griffin, G. Wilfong, " An Analysis of BGP

Convergence Properties," in ACM SIGCOMM 1999,Cambridge, September 1999.

[4] S. J. Jeong, C. H. Youn, T. S. Cboi, T.S. Jeong, D. Lee,KS. MiD, " Policy Management for BGP Routing

Convergence Using Inter-AS Relationship," Journal ofCommunications and Networks, Vol. 3, No.4,December 200 I.

[5] R. Mahajan, D. WetberaIl, and T. Anderson,"Understanding BOP Misconfiguration," in ACMSIGCOMM 2002, Pittsburgh, August 2000.

[6] K. Varadban, R. Govindan, and D. Estrin, "PersistentRoute Oscillations in Inter-Domain Routing,"Computer Networks. 32(1),1999.

[7] P. Demeester, T. Wu, N. Yoshikai, "SurvivableCommunications Networks," IEEE Communications,Vol. 37 No. 8,August 1999.

[8] B. Hafuer, J. Pultz, "Network Failures: Be afraid, bevery afraid," Gartner Group, Note SPA.09.1285, 15September 1999.

[9] O. J. Wasem, "An Algorithm for Designing Rings forSurvivable Fiber Networks," IEEE Transactions on

[ll]B. T. Doshi, S. Dravida, P. Harshavardhana, O. Hauser,and Y. Wang, "Optical Network Design andRestoration," Bell Labs Technical Journal, January-March 1999.

[12]P.-H. Ho and H. T.Spare Capacity for MPLS-based Recovery," To appearin IEEE/ACM Transaction on Networking.

Reliability, vol. 40,1991.

[lO]T. Frisanco, "Optimal SpareVarious Switching MethodsICC '97, Montreal, June 1997.

Capacityin ATM

Design forNetworks,"

Mouftah, "Reconfiguration of

[13]M. Kodialam and T. V. Laksman, "Dynamic Routingof Locally Restorable Bandwidth Guaranteed TunnelsUsing Aggregate Link Infonnation," IEEE Infocom'OI,Anchorage, April, 200 1.

[14]W. D. Grover and D. Stamatelakis, "Cycle-OrientedDistributed Precontiguration: Ring-like Speed withMesh-like Capacity for Self-planning NetworkRestoration," IEEE ICC'98, Dresden, June 1998

[lS]V. Shanna and F. Hellstrand, "Framework for Multi-Protocol Label Switching (MPLS) Based Recovery, "

IETF RFC 3469, February, 2003.[16]C. Huang, V. Sharma, S. Makam, K. Owens, "Building

Reliable MPLS Networks Using a Path ProtectionMechanism," IEEE Communications Magazine,March, 2002.

[17]P. Pan, D.H. Gan, G. Swallow, J.P. Vasseur, D.Cooper, A. Atlas, M. Jork, "Fast Reroute Extensions toRSVP-TE for LSP Tunnels", IETF Draft, Work inprogress, <draft- ietf-mpls-rsvp-lsp- fastreroute-OO. txt>,Jan 2002.

[l8]D. Haskin and R. Krishnan, "A method for setting analternative label switched paths to handle fast reroute",IETF Draft, Work in progress <draft-haskin-mpls-fast-reroute-OS.txt>, November 2000.

345


Source Domain

Primary Path -Backup Path ----

Domain A Domain B

P1

Source OSPFArea 1

-"""" B1A"..

----------.Proxy Gateway ~

Source Domain

Primary Path-Backup Path -----

DOMAIN BDOMAIN A

83DestinationIT ransit

Domain

Figure 1 - Concatenated Primary and Backup LSPs

OSPFArea 2

B1B

DestinationlTransitDomain

I nterface Domain

Figure 2 - Inter-Domain Protection

346

Design of Reliable Communication Networks (DRCN) 2003, Banff, Alberta, Canada, October 19-22,2003

source

Primary Pat~

Backup PattT---.

source

Primary Path--

Backup Path---'

P1B'"

ffB1B

Proxy GatewayLSR

Proxy Concatenation PSLdestination

Figure 3 - MPLS En~to-end Protection Model

DOMAI N A DOMAI N B

~

Proxy GatewayLSR

Proxy Concatenation PSLdestination

Figure 4 - MPLS Domain Boundary Local Bypus tunnel Model

347


TABLE 1- COMPARISONS OF ENI>-TO-END RECOVERY AND IBLBT

348

Documents

Inter-Domain MPLS Restoration - Carleton University