On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian...

Preview:

Citation preview

On Understanding of On Understanding of Transient Interdomain Routing Transient Interdomain Routing

FailuresFailuresFeng Wang, Lixin Gao, Jia Wang, and Jian Qiu

Department of Electrical and Computer Engineering

University of Massachusetts, Amherst

MA 01002

AT&T Labs-research

180 Park Ave, Florham ParkNJ 07869

OutlineOutline

• What is transient routing failures?

• When can transient routing failures occur?

• How long can transient routing failures last?

• Measurement results

Internet RoutingInternet Routing

• Autonomous systems (ASes)– Internet Service Providers (ISPs)

– Companies

– Universities

• Intradomain Routing Protocols– Static Routing, OSPF, IS-IS

• Interdomain Routing Protocol– Border Gateway Protocol (BGP)

Long Convergence DelayLong Convergence Delay

• Long convergence delay (Labovitz et al, TON2001)

– Bringing a route back

– (Tup): <shortest path length MRAI

– Disconnecting a route

– (Tdown): <longest path length MRAI

• Fail-over: rerouting from Path A to Path B– During the time for discovering Path B, routers

might experience transient routing failures, i.e., no route is available

An Example of Transient Routing An Example of Transient Routing FailureFailure

d

Traffic on data plane

BGP update

W:20W:20

A:10 A:10

AS1AS2

AS0

120

1020

W:2010

A:10210

BGP Routing table

losing reachability

AS3

Our ContributionsOur Contributions

• Identify transient routing failures– Sufficient conditions

• Bound transient routing failure duration

OutlineOutline

• What is transient routing failures?

• When can transient routing failures occur?

• How long can transient routing failures last?

• Measurement results

• Two sufficient conditions for a node must experience a transient routing failure (transient routing failure for sure).

• One sufficient condition for a node may experience a transient routing failure (potential transient routing failure).

When Transient Routing Failures When Transient Routing Failures can Occur?can Occur?

110

210

20

310

w

w3

2

0

20

When Transient Routing Failures When Transient Routing Failures can Occur? (contd.)can Occur? (contd.)

110

210

20

310

w

3

2

0

20

A

w310320320

OutlineOutline

• What is transient routing failures?

• When can transient routing failures occur?

• How long can transient routing failures last?

• Measurement results

How long Transient Routing Failures How long Transient Routing Failures last?last?

d

W: 2 0

A: 10

W: 2 0W: 2 0

A: 10 A: 10

MRAI timerMRAI timer

2

0

112010

1010 210

MRAI TimersMRAI Timers

• Minimum Advertisement Interval timer– Minimum amount of time that must elapse between

routing updates

– Applied to BGP announcement or withdrawal

• Default MRAI value– eBGP session: 30 seconds

– iBGP session: 5 seconds

Upper Bound for Transient Routing Upper Bound for Transient Routing Failure DurationFailure Duration

Transient routing failure min(du +d u ) MRAI

0

u

du

u

v

, du

0

Occurrence of Transient failures in a Occurrence of Transient failures in a typical BGP systemtypical BGP system

• In a typical BGP system, transient failures are prevalent.

– Tier-1 ASes can experience transient routing failures, where alternate routes come from their edge routers.

– Non tier-1 ASes can experience transient routing failures, where alternate routes are obtained from other ASes.

OutlineOutline

• What is transient routing failures?

• When can transient routing failures occur?

• How long can transient routing failures last?

• Measurement results

MeasuringMeasuring Transient Failures within Transient Failures within a tier-1 ASa tier-1 AS

Percentage of transient failures among all routing failures that last less than 30 seconds

Cumulative distribution of transient Failure Duration

BGP updates, BGP tables and router configuration files are collected during July 2004

Measuring Transient Failures Measuring Transient Failures contd.contd.

• Transient failures in tier-2 ASes using Oregon RouteView’s BGP updates (July 2004)

Popularity of Prefixes Experiencing Popularity of Prefixes Experiencing Transient FailuresTransient Failures

• We aggregate the Netflow data collected in the tier-1 AS during the week (1/2/2005~1/8/2005)

• Transient routing failures can impact on popular prefixes and unpopular prefixes

Fra

ctio

n of

tran

sien

t ro

utin

g fa

ilur

es

ConclusionsConclusions

• Transient routing failures are prevalent in the Internet, and can last for a significant period of time.

• Majority of transient failures occur under the commonly applied routing policy setting.

• Popular and unpopular prefixes can experience transient failures.

ThanksThanks

Recommended