Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Preview:

DESCRIPTION

Presentation given at RIPE 61 and LINX 71

Citation preview

Incident analysis with RIPE NCC toolsAnalysing the RIS/Duke BGP incident

Erik Romijn <eromijn@ripe.net>

Senior Software Engineer

Erik Romijn

RIPE NCC information collection

• Routing Information Service (RIS)_ Listens to and stores all BGP updates_ Receiving data form 600 peers

2

Erik Romijn

RIPE NCC information collection

• DNS monitoring service (DNSMON)_ Monitors critical DNS infrastructure_ About 100 vantage points worldwide

3

Erik Romijn

RIPE NCC information collection

• Test Traffic Measurements (TTM)_ One-way latency/jitter/loss & traceroutes_ About 100 nodes in full mesh

4

Erik Romijn

Case study:RIPE NCC / Duke University

BGP experiment

5

Erik Romijn

RIS experiments & announcements

• RIS has a long tradition of supporting research

• Second AS in the world to announce 4-byte AS numbers

• Beacon prefixes from RIS available since 2002_ Also a vital part of debogonizing

6

Erik Romijn

Case study: RIPE NCC BGP experiment

• RIPE NCC conducted an experiment on 27-08-2010_ An optional BGP attribute was announced_ This was a optional transitive attribute of 3000 bytes_ The announcement was valid according to RFC4271

• Some routers corrupted the route and sent it_ Peers who saw this dropped the session

• This caused disruption to some internet traffic

7

Erik Romijn

• Announcement active from 08:41 to 09:08 UTC, using 93.175.144.0/24

• We later observed some negative impact

• Immediately started an extensive investigation_ This pointed towards a Cisco IOS XR bug_ Sent out a very detailed private announcement_ Also provided Cisco with all details

• Cisco released cisco-sa-20100827-bgp

8

Case study: 27 August 2010

Erik Romijn

Propagation of the announcement

9

Other router Other router

Other routers

Other router

Other router

Other router

Erik Romijn

Propagation of the announcement

10

RISAS65550

RIS @ AMS-IXAS12654

Other router

Other routers

Other router

Other router

Erik Romijn

Propagation of the announcement

11

RISAS65550

RIS @ AMS-IXAS12654

Other router Faulty router

Other routers

Other router

Other router

Erik Romijn

Propagation of the announcement

12

RISAS65550

RIS @ AMS-IXAS12654

Other router Faulty router

Other routers

Other router

Other router

Erik Romijn

Propagation of the announcement

13

RISAS65550

RIS @ AMS-IXAS12654

Other router Faulty router

Other routers

Other router

Other router

Erik Romijn

Propagation of the announcement

14

Other router

Other routers

Other router

Other router

Faulty router

Erik Romijn

Goal of the experiment

• Research group from Duke University approached RIPE NCC to help

• Their goal was to measure support for long optional transitive attributes

_ Intended to be used for certificates for secure routing

• They did not have an AS number or addresses

• Provided RIPE NCC with a patched Quagga

15

Erik Romijn

Expected results

A.The route propagates with the attribute intact

B.The route propagates, with some AS in the path removing the attribute

C.The route propagates, but takes a different path because some ASes drop the route

A and B were seen in 4-byte AS number tests.

16

Erik Romijn

Impact of the experimenton the Internet

17

Erik Romijn

Unstable prefixes

18

0%

25%

50%

75%

100%

8:00 8:10 8:20 8:30 8:40 8:50 9:00 9:10 9:20 9:30 9:40 9:50

Per

cent

age

of t

otal

pre

fixes

(320

000)

Time (UTC)

Erik Romijn

E-mails per hour - 28-29 August

19

0

15

30

45

60

0:00 3:00 6:00 9:00 12:00 15:00 18:00 21:00 0:00 3:00 6:00 9:00 12:00 15:00 18:00 21:00

Mai

ls p

er h

our

Time (UTC)

First NANOGpost

First LINX post

Initial RIPE NCC announcement / first AMS-IX post

Traffic on AMS-IX, LINX & NANOG

0%

0.5%

1.0%

1.5%

8:00 8:10 8:20 8:30 8:40 8:50 9:00 9:10 9:20 9:30 9:40 9:50

Per

cent

age

of t

otal

pre

fixes

(320

000)

Time (UTC)

Erik Romijn

Unstable prefixes

20

0.00%

0.05%

0.10%

0.15%

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Per

cent

age

of t

otal

pre

fixes

(320

000)

Duration of Invisibility (minutes)

Erik Romijn

Length of invisibilities

21

8-10 UTC, July 30, 2010 (total: 0.24%)8-10 UTC, Aug 20, 2010 (total: 0.11%)8-10 UTC, Aug 26, 2010 (total: 0.26%)8-10 UTC, Aug 27, 2010 (total: 0.69%)

Erik Romijn

Length of invisibilities

22

0.00%

0.05%

0.10%

0.15%

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Per

cent

age

of t

otal

pre

fixes

(320

000)

Duration of Invisibility (minutes)

8-10 UTC, July 30, 2010 (total: 0.24%)8-10 UTC, Aug 20, 2010 (total: 0.11%)8-10 UTC, Aug 26, 2010 (total: 0.26%)8-10 UTC, Aug 27, 2010 (total: 0.69%)

Erik Romijn

Critical DNS infrastructure (from DNSMON)

• Root servers unaffected

• 57% of TLDs unaffected

• Minor effects for 38% of the TLDs_ Some dropped queries for one or two servers

• More significant effects on 5% of the TLDs

23

Erik Romijn

Critical DNS infrastructure

24

Erik Romijn

View from a TTM probe in Prague, CZ

32

0

10

20

30

8:00 8:10 8:20 8:30 8:40 8:50 9:00 9:10 9:20 9:30 9:40 9:50

Up

dat

es p

er m

inut

e p

er 1

000

pre

fixes

Time (UTC)

IPv4 IPv6Erik Romijn

Updates for IPv4 vs IPv6

33

0

10

20

30

8:00 8:10 8:20 8:30 8:40 8:50 9:00 9:10 9:20 9:30 9:40 9:50

Up

dat

es p

er m

inut

e p

er 1

000

pre

fixes

Time (UTC)

IPv4 IPv6Erik Romijn

Updates for IPv4 vs IPv6

34

Most affected BGP sessions

did not carry IPv6 routes

0

10

20

30

8:00 8:10 8:20 8:30 8:40 8:50 9:00 9:10 9:20 9:30 9:40 9:50

Up

dat

es p

er m

inut

e p

er 1

000

pre

fixes

Time (UTC)

Erik Romijn

Unstable prefixes vs number of updates

35

Erik Romijn

View from a TTM probe in Prague, CZ

36

Erik Romijn

Locality of effects - updates

37

0

200

400

600

800

1000

Average up

d/sec p

er full table p

eer

8:25 AM8:35 AM

8:45 AM8:55 AM

9:05 AM9:15 AM

9:25 AM

9:35 AM

BGP Updates on all RIS locations (IPv4)

LINX, London AMS-IX/NL-IX/GN-IX, AmsterdamCIXP, Geneva VIX, ViennaDIX-IE, Tokyo Netnod, StockholmMIX, Milan NYIIX, New YorkDE-CIX, Frankfurt MSK-IX, MoscowPAIX, Palo Alto PTT, Sao PauloNOTA, Miami

VIX

LINX

AMS-IX

CIXP

DIX-IENetnodMIXNYIIXDE-CIXMSK-IXPAIXPTTNOTA

Erik Romijn

Locality of effects - withdrawals

38

0

80

160

240

320

400

Average up

d/sec p

er full table p

eer

8:25 AM8:35 AM

8:45 AM8:55 AM

9:05 AM9:15 AM

9:25 AM9:35 AM

BGP Withdrawals on all RIS locations (IPv4)

LINX, London AMS-IX/NL-IX/GN-IX, AmsterdamCIXP, Geneva VIX, ViennaDIX-IE, Tokyo Netnod, StockholmMIX, Milan NYIIX, New YorkDE-CIX, Frankfurt MSK-IX, MoscowPAIX, Palo Alto PTT, Sao PauloNOTA, Miami

LINXAMS-IX

CIXPDIX-IENetnodMIXNYIIXDE-CIXMSK-IXPAIXPTTNOTA

VIX

Erik Romijn

Locality of effects - vendors per IX

39

9%

4%2%

18%

67%

VIX

9%

2%

14%

33%

42%

AMS-IX

9%

4%

9%

23%

54%

NYIIX

CiscoJuniperBrocadeIntelOther

4%5%5%

28%

58%

LINX

7%2%7%

34%

50%

DE-CIX

Erik Romijn

Lessons learned

• Future experiments should be pre-announced with sufficient lead time

• Detected vulnerabilities should be handledwith more care

• More comprehensive impactassessments are needed

• Your input is welcome: <ris@ripe.net>

40

Questions?Erik Romijn <eromijn@ripe.net>

0

750

1500

2250

3000

8:00 8:10 8:20 8:30 8:40 8:50 9:00 9:10 9:20 9:30 9:40 9:50

Up

dat

es p

er 1

000

pre

fixes

per

min

ute

Time (UTC)

Erik Romijn

Updates per prefix range

42Updates for prefixes 0-90 Updates for prefixes 100-255

Erik Romijn

AS path length

43

0

1

2

3

4

5

8:00 8:10 8:20 8:30 8:40 8:50 9:00 9:10 9:20 9:30 9:40 9:50 10:00 10:10 10:20 10:30 10:40 10:50

Ave

rage

AS

pat

h le

ngth

in u

pd

ates

Time (UTC)

Recommended