21
National Institute of Advanced Industrial Science and Technology Availability of US R&E network - viewpoint from IGP - Katsushi Kobayashi [email protected]

Availability of US R&E network, viewpoint from IGP

Embed Size (px)

Citation preview

Page 1: Availability of US R&E network, viewpoint from IGP

National Institute of Advanced Industrial Science and Technology

Availability of US R&E network

- viewpoint from IGP -Katsushi Kobayashi

[email protected]

Page 2: Availability of US R&E network, viewpoint from IGP

National Institute of Advanced Industrial Science and Technology

Toward End-to-end Non-disruptive Operation and Service

• Emerging new activities to improve Internet reliability.

• IETF IPFRR

• Overlay (Detour, RON, ....)

• TENOS (It’s our goal :))

• Data set required for quantitative estimation.

• Nation-wide production quality network.

Page 3: Availability of US R&E network, viewpoint from IGP

National Institute of Advanced Industrial Science and Technology

IS-IS collector in Abilene

• Major US Research and Education (R&E) network.

• IS-IS collector is part of Abilene observatory activity. • Contributed by Shu Zhang [ZK06] • Similar to Cengiz Alaettinoglu’s work on Qwest [AC02]

• Deployed all Abilene nodes for multi observation points. • Synchronized with CDMA timer (GPS based) • From Aug. ’04 to Apr. ’07 data set is available. • This presentation based on preliminary analysis of this IS-

IS data, possibly including flaws.

[ZK06] S. Zhang and K. Kobayashi, “Rtanaly: A System to Detect and Measure IGP Routing Changes”

Page 4: Availability of US R&E network, viewpoint from IGP

National Institute of Advanced Industrial Science and Technology

Abilene Network Map

Seattle

DenverSunnyvale

Los Angels

Kansas City

Chicago

Indianapolis

Atlanta

Washington

New York City

Houston

Seattle

Denver

Sunnyvale

Los Angels

Kansas City

Chicago

Indianapolis

Atlanta

Washington

New York City

Houston

11 nodes with T640 routers, and 14 OC192 circuits.

Page 5: Availability of US R&E network, viewpoint from IGP

National Institute of Advanced Industrial Science and Technology

Abilene IS-IS operation

• 9 sec. Hello interval, lost ISIS adjacency after missing 3 hellos • 22.5 sec. failure detection delay is supposed. • More faster failure detection is possible, e.g., shorter hello

interval, carrier loss with circuit failure.

• IGP maintains infrastructure information only.

• Minimize IGP database

• Not import any BGP route into IS-IS.

Page 6: Availability of US R&E network, viewpoint from IGP

National Institute of Advanced Industrial Science and Technology

• Network availability in hereafter:

• All network works without any failure. • From Network operator’s viewpoint.

• Don’t care specific source destination path availability. • Not from customer’s viewpoint.

• Timeframe:

• May include more than one event at same time.

ATLA

IPLS

Network

Timeframe of failure Timeframe of double failure

..........

Page 7: Availability of US R&E network, viewpoint from IGP

National Institute of Advanced Industrial Science and Technology

Abilene IS-IS overview ’05-’06 play back

• Node failure: timeout node LSP, or seq. number reset.

• Only 1 times on ’05 (53 sec. downtime), 2 on ’06 (1,298 sec. )

• Circuit failure: adjacency away from list in LSP

• Usually found, 635 timeframe on ’05, 513 on ’06.

• Ext. route failure: Route away from LSP

• Represent edge troubles ?

• Difficult to identify whether serious or trivial.To focus this failure.

Page 8: Availability of US R&E network, viewpoint from IGP

National Institute of Advanced Industrial Science and Technology

Lost adjacency event

Note that above histograms are drawn with IS-IS captured data at Atlanta. Few details are different with other IS-IS observatory point.

2005/Jan.-Dec. 2006/Jan.-Dec.single−failure

Monitor duration: 365 (days)Total disrupt(count): 635, Availability: 0.95443

log_10(Disrupt time (sec.))

Freq

uenc

y

0 1 2 3 4 5 6

010

020

030

040

0

single−failureMonitor duration: 365 (days)

Total disrupt(count): 513, Availability: 0.98424

log_10(Disrupt time (sec.))

Freq

uenc

y

0 1 2 3 4 5 6

010

020

030

040

0

60 sec. 1 hour 1 day 60 sec. 1 hour 1 day

Page 9: Availability of US R&E network, viewpoint from IGP

National Institute of Advanced Industrial Science and Technology

Breakdown in ‘05ATLA−IPLS

Monitor duration: 365 (days)Disrupt(count) 288, Avail: 0.99137

log_10(Disrupt time (sec.))

Freq

uenc

y

0 1 2 3 4 5 6

050

100

150

200

250

CHIN−IPLSMonitor duration: 365 (days)

Disrupt(count) 34, Avail: 0.99981

log_10(Disrupt time (sec.))

Freq

uenc

y

0 1 2 3 4 5 60

24

68

CHIN−NYCMMonitor duration: 365 (days)

Disrupt(count) 64, Avail: 0.99947

log_10(Disrupt time (sec.))

Freq

uenc

y

0 1 2 3 4 5 6

05

1015

DNVR−KSCYMonitor duration: 365 (days)

Disrupt(count) 4, Avail: 0.99997

log_10(Disrupt time (sec.))

Freq

uenc

y

0 1 2 3 4 5 6

0.0

0.5

1.0

1.5

2.0

DNVR−SNVAMonitor duration: 365 (days)

Disrupt(count) 12, Avail: 0.99302

log_10(Disrupt time (sec.))

Freq

uenc

y

0 1 2 3 4 5 6

01

23

45

DNVR−STTLMonitor duration: 365 (days)

Disrupt(count) 8, Avail: 0.99997

log_10(Disrupt time (sec.))

Freq

uenc

y

0 1 2 3 4 5 60.

00.

51.

01.

52.

02.

53.

0

Page 10: Availability of US R&E network, viewpoint from IGP

National Institute of Advanced Industrial Science and Technology

Availability Map (05/01-12)

Seattle

Denver

Sunnyvale

Los Angels

Kansas City

Chicago

Indianapolis

Atlanta

Washington

New York City

Houston

0.9999/11/800

0.9991/122/12,4940.9730/24/819,803

0.9930/12/183,352

0.9913/288/170,303

0.9998/34/1,364

0.9994/64/5,090

0.9998/10/3,940

Availability / Disrupt count / Longest down time (sec.)

0.9997/54/1,194

0.9999/4/398

0.9992/16/7,071

0.9997/12/2,349

0.9999/8/501

0.9993/18/14,192

Hurricane KatrinaAug. ‘05

Page 11: Availability of US R&E network, viewpoint from IGP

National Institute of Advanced Industrial Science and Technology

Yearly summary ’05 - ’062005/Jan.- Dec. 2006/Jan.- Dec.

Avail. Disrupt cnt. Avail. Disrupt cnt.

ATLA-HSTN 0.9738 24 0.9990 39

ATLA-IPLS 0.9914 288 0.9975 48

ATLA-WASH 0.9998 12 0.9994 25

CHIN-IPLS 0.9998 34 0.9998 14

CHIN-NYCM 0.9995 64 0.9999 30

DNVR-KSCY 1.0000 4 0.9999 18

DNVR-SNVA 0.9930 12 0.9922 51

DNVR-STTL 1.0000 8 0.9999 5

HSTN-KSCY 0.9993 18 0.9990 19

HSTN-LOSA 0.9991 121 0.9996 40

IPLS-KSCY 0.9998 10 0.9998 17

LOSA-SNVA 0.9997 54 0.9993 128

NYCM-WASH 0.9993 17 0.9989 113

SNVA-STTL 1.0000 11 1.0000 129

Total(*) 0.9544 677 0.9842 676

Page 12: Availability of US R&E network, viewpoint from IGP

National Institute of Advanced Industrial Science and Technology

Critical event

• 2 or more lost adjacency at same timeframe • Some combination makes serious impact. But, not all event

lead split graph condition.

• 32 timeframes (47 disrupt) in ’05, 58 (61) in ’06

• 26/47 timeframes in ’05, 49/61 in ’06, are attributed as missing a specific node,

Page 13: Availability of US R&E network, viewpoint from IGP

National Institute of Advanced Industrial Science and Technology

2 or more links failure (1) - Not critical -

Seattle

Denver

Sunnyvale

Los Angels

Kansas City

Chicago

Indianapolis

Atlanta

Washington

New York City

Houston

Not serious 05/12/19 11:35 - 11:41

Page 14: Availability of US R&E network, viewpoint from IGP

National Institute of Advanced Industrial Science and Technology

2 or more links failure (2) - Missing Node -

Seattle

Denver

Sunnyvale

Los Angels

Kansas City

Chicago

Indianapolis

Atlanta

Washington

New York City

Houston

Missing IPLS router at ...........

06/02/19 05:31-05:56 06/02/19 06:30-06:35 06/02/19 15:47-15:51

............

Page 15: Availability of US R&E network, viewpoint from IGP

National Institute of Advanced Industrial Science and Technology

Two or more failure in ‘05

single−failureMonitor duration: 365 (days)

Total disrupt(count): 637, Availability: 0.95435

log_10(Disrupt time (sec.))

Freq

uenc

y

0 1 2 3 4 5 6

010

020

030

040

0

2005/Jan.-Dec.double−failure

Monitor duration: 365 (days)Total disrupt(count): 47, Availability: 0.99976

log_10(Disrupt time (sec.))

Freq

uenc

y

0 1 2 3 4 5 6

05

1015

2025

All lost adjacency events Two or more missing

3 nines!!60 sec. 1 hour 1 day 60 sec. 1 hour 1 day

Page 16: Availability of US R&E network, viewpoint from IGP

National Institute of Advanced Industrial Science and Technology

Two or more failure in ‘062006/Jan.-Dec.

double−failureMonitor duration: 365 (days)

Total disrupt(count): 61, Availability: 0.99959

log_10(Disrupt time (sec.))

Freq

uenc

y

0 1 2 3 4 5 6

010

2030

40

single−failureMonitor duration: 365 (days)

Total disrupt(count): 514, Availability: 0.98419

log_10(Disrupt time (sec.))

Freq

uenc

y

0 1 2 3 4 5 6

010

020

030

040

0

All lost adjacency events Two or more missing

3 nines!!60 sec. 1 hour 1 day 60 sec. 1 hour 1 day

Page 17: Availability of US R&E network, viewpoint from IGP

National Institute of Advanced Industrial Science and Technology

Single link failure is trivial ? (1)

• Lost two or more adjacency events are rare, more than 99.95% availability, < 5 hours/year downtime.

• More than 500 lost single adjacency are founded. • 637 times in ’05, and 514 in ’06

• 3-4 hours/year downtime are estimated : • Suppose 22 sec. downtime for each lost adjacency. • Other delays, e.g. LSP propagation, SPF computation,

degrade more.

Page 18: Availability of US R&E network, viewpoint from IGP

National Institute of Advanced Industrial Science and Technology

Single link failure is trivial ? (2)

• 22 sec. downtime for each lost adjacency is overestimated ? • Router can detect circuit failure more faster triggered with

lower layer information, e.g., loss of optical, framer error. • Sub-second IGP convergence already presented [AC02].

• Sub-second is derived from propagation delay limit, impossible to reduce it.

• However, even if downtime is reduced into 1/10, downtime is still in order, i.e., from 3 hours/year to 20 min./year.

Page 19: Availability of US R&E network, viewpoint from IGP

National Institute of Advanced Industrial Science and Technology

“O” approach can improve availability on IP network ?

• Abilene IP network availability achieves more than three nines (99.9%) in previous analysis. Q: Is it possible to achieve six nines (99.9999 %) with building redundant path approach, i.e., Overlay or variant ? A1: No, not ensured disjointness, 90% multihomed paths are overlapped [HWJ06] A2: No, Each routing sub-system must be independent. Overlay infrastructure shares same routing infrastructure.

[HWJ06] J.Han et al., “An Experimental Study of Internet Path Diversity”

Page 20: Availability of US R&E network, viewpoint from IGP

National Institute of Advanced Industrial Science and Technology

Conclusion

• Introduce Abilene IS-IS trace data, and our preliminary availability evaluation with ’05 and ’06.

• Network availability is more than 99.95 % with IGP restoration.

Page 21: Availability of US R&E network, viewpoint from IGP

National Institute of Advanced Industrial Science and Technology

Acknowledge

• Zhang Shu

• Randy Bush