20
On Understanding of Transient On Understanding of Transient Interdomain Routing Failures Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering University of Massachusetts, Amherst MA 01002 AT&T Labs-research 180 Park Ave, Florham Park NJ 07869

On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering

Embed Size (px)

Citation preview

Page 1: On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering

On Understanding of On Understanding of Transient Interdomain Routing Transient Interdomain Routing

FailuresFailuresFeng Wang, Lixin Gao, Jia Wang, and Jian Qiu

Department of Electrical and Computer Engineering

University of Massachusetts, Amherst

MA 01002

AT&T Labs-research

180 Park Ave, Florham ParkNJ 07869

Page 2: On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering

OutlineOutline

• What is transient routing failures?

• When can transient routing failures occur?

• How long can transient routing failures last?

• Measurement results

Page 3: On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering

Internet RoutingInternet Routing

• Autonomous systems (ASes)– Internet Service Providers (ISPs)

– Companies

– Universities

• Intradomain Routing Protocols– Static Routing, OSPF, IS-IS

• Interdomain Routing Protocol– Border Gateway Protocol (BGP)

Page 4: On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering

Long Convergence DelayLong Convergence Delay

• Long convergence delay (Labovitz et al, TON2001)

– Bringing a route back

– (Tup): <shortest path length MRAI

– Disconnecting a route

– (Tdown): <longest path length MRAI

• Fail-over: rerouting from Path A to Path B– During the time for discovering Path B, routers

might experience transient routing failures, i.e., no route is available

Page 5: On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering

An Example of Transient Routing An Example of Transient Routing FailureFailure

d

Traffic on data plane

BGP update

W:20W:20

A:10 A:10

AS1AS2

AS0

120

1020

W:2010

A:10210

BGP Routing table

losing reachability

AS3

Page 6: On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering

Our ContributionsOur Contributions

• Identify transient routing failures– Sufficient conditions

• Bound transient routing failure duration

Page 7: On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering

OutlineOutline

• What is transient routing failures?

• When can transient routing failures occur?

• How long can transient routing failures last?

• Measurement results

Page 8: On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering

• Two sufficient conditions for a node must experience a transient routing failure (transient routing failure for sure).

• One sufficient condition for a node may experience a transient routing failure (potential transient routing failure).

When Transient Routing Failures When Transient Routing Failures can Occur?can Occur?

110

210

20

310

w

w3

2

0

20

Page 9: On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering

When Transient Routing Failures When Transient Routing Failures can Occur? (contd.)can Occur? (contd.)

110

210

20

310

w

3

2

0

20

A

w310320320

Page 10: On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering

OutlineOutline

• What is transient routing failures?

• When can transient routing failures occur?

• How long can transient routing failures last?

• Measurement results

Page 11: On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering

How long Transient Routing Failures How long Transient Routing Failures last?last?

d

W: 2 0

A: 10

W: 2 0W: 2 0

A: 10 A: 10

MRAI timerMRAI timer

2

0

112010

1010 210

Page 12: On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering

MRAI TimersMRAI Timers

• Minimum Advertisement Interval timer– Minimum amount of time that must elapse between

routing updates

– Applied to BGP announcement or withdrawal

• Default MRAI value– eBGP session: 30 seconds

– iBGP session: 5 seconds

Page 13: On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering

Upper Bound for Transient Routing Upper Bound for Transient Routing Failure DurationFailure Duration

Transient routing failure min(du +d u ) MRAI

0

u

du

u

v

, du

0

Page 14: On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering

Occurrence of Transient failures in a Occurrence of Transient failures in a typical BGP systemtypical BGP system

• In a typical BGP system, transient failures are prevalent.

– Tier-1 ASes can experience transient routing failures, where alternate routes come from their edge routers.

– Non tier-1 ASes can experience transient routing failures, where alternate routes are obtained from other ASes.

Page 15: On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering

OutlineOutline

• What is transient routing failures?

• When can transient routing failures occur?

• How long can transient routing failures last?

• Measurement results

Page 16: On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering

MeasuringMeasuring Transient Failures within Transient Failures within a tier-1 ASa tier-1 AS

Percentage of transient failures among all routing failures that last less than 30 seconds

Cumulative distribution of transient Failure Duration

BGP updates, BGP tables and router configuration files are collected during July 2004

Page 17: On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering

Measuring Transient Failures Measuring Transient Failures contd.contd.

• Transient failures in tier-2 ASes using Oregon RouteView’s BGP updates (July 2004)

Page 18: On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering

Popularity of Prefixes Experiencing Popularity of Prefixes Experiencing Transient FailuresTransient Failures

• We aggregate the Netflow data collected in the tier-1 AS during the week (1/2/2005~1/8/2005)

• Transient routing failures can impact on popular prefixes and unpopular prefixes

Fra

ctio

n of

tran

sien

t ro

utin

g fa

ilur

es

Page 19: On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering

ConclusionsConclusions

• Transient routing failures are prevalent in the Internet, and can last for a significant period of time.

• Majority of transient failures occur under the commonly applied routing policy setting.

• Popular and unpopular prefixes can experience transient failures.

Page 20: On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering

ThanksThanks