Download ppt - IP Routing. 2 A Router’s Job So I got a packet, where to next? – Which “next hop”? -Out-going router interface to use when forwarding traffic to the destination

IP Routing

2

A Router’s Job

• So I got a packet, where to next?– Which “next hop”?

- Out-going router interface to use when forwarding traffic to the destination. May also include the IP address of the next router (if any) in the path towards the destination.

83.125.35.131

0101001101……………….1101111100011010

?

??

Destination address

3

The Lookup Operation

• Two basic pieces1. Forwarding table

2. Longest prefix match rule • Forwarding table

– List of all the routes (destinations I know) and their associated next hops

• Longest prefix match– For any destination address

find the route that differs from the destination address at the furthest bit position- More specific is better!

RouteNext Hop

155.23.0.0/16 3

155.23.14.0/24 4

162.26.0.0/16 1

112.0.0.0/8 2

183.30.0.0/16 7

83.0.0.0/8 3

83.125.0.0/16 11

83.125.35.0/24 9

83.125.35.128/25 12

193.2.0.0/16 1

0.0.0.0/0 2

83.125.35.131

4

So Where Does Routing Fit in All Of This?

• Routing builds the forwarding table.– Manual entries (static routes) take you only so far.– You need a process that:

- Automates populating the routing table.

- Adapts to changes.

• Routing protocols are responsible for:– Distributing known routes and the latest routing changes to a

router’s neighbors.– Deciding the next hop(s) to be used for each route.

- What’s the best choice?

- Ensuring consistency between decisions at different routers (no loops!)

5

Key Problem

• How to make correct local decisions?– each router/switch must know something about global state

• Global state– inherently large– dynamic– hard to collect

• A routing protocol must intelligently acquire, summarize, and maintain relevant information

– How do I find out about other routers and links?– How do I use that information to generate routes?– How do I maintain routes in the presence of changes?

6

Design Direction

• Designing a single solution that spans the entire Internet is unlikely to work

– Computational cost, protocol and bandwidth overhead

• Heterogeneous environment– Domain sizes and requirements, routers capabilities– Different constraints on internal and external connectivity

Basic design approach– Hierarchy of domains (reflects address hierarchy)

- Ensures scalability– Independence of routing protocols in different domains

- Support for heterogeneity– Gateways between domain for end-to-end solution

7

AS 524AS 524

A Bird’s Eye View of the Internet

• Basic “two-level” hierarchy– Federation of inter-

connected islands – Autonomous Systems (AS)

- Each island has its own internal rules.

- Islands collaborate to offer end-to-end connectivity.

• Some islands are bigger and more powerful than others.

– Willingness to carry traffic for others

- Peering or transit agreements

AS 121AS 121

AS 1AS 1

AS 2AS 2

AS 3AS 3

AS 123AS 123

AS 3411

AS 3411

AS 321AS 321

AS 168AS 168

AS 376AS 376

AS 441AS 441

8

AS 524AS 524

Delivering Ubiquitous Connectivity

• Basic principles– No global knowledge– Hop-by-hop decisions

• Map analogy– Detailed map of your

neighborhood– Coarse knowledge of

how to exit your neighborhood to reach remote destinations

– On-going validation of map each time you reach an intersection

- It could change…

AS 121AS 121

AS 1AS 1

AS 2AS 2

AS 3AS 3

AS 123AS 123

AS 3411

AS 3411

AS 321AS 321

AS 168AS 168

AS 376AS 376

AS 441AS 441

9

AS 524AS 524

Routing Protocol Overview

• Routing protocols follow the two-level hierarchy of the Internet

– Interior Gateway Protocols (IGP) control routing within an AS/domain

– Exterior Gateway Protocols (EGP) control routing between AS’s

• Different goals and constraints for each family

of protocols– IGP: Ability to fine tune

internal operation and shielding from outside “noise”

– EGP: Scalability and ability to accommodate a broad range of administrative policies

AS 121AS 121

AS 1AS 1

AS 2AS 2

AS 3AS 3

AS 123AS 123

AS 3411

AS 3411

AS 321AS 321

AS 168AS 168

AS 376AS 376

AS 441AS 441

10

AS 524AS 524

Interior Gateway Protocols


– Interior Gateway Protocols (IGPs)

control routing within an AS/domain

– Exterior Gateway Protocol(s) (EGP) control routing between AS’s

• Different goals and constraints for each family of protocols

– IGP: Ability to fine tune internal operation and shielding from outside “noise”

– EGP: Scalability and ability accommodate a broad range of administrative policies

AS 121AS 121

AS 1

AS 2AS 2

AS 3AS 3

AS 123AS 123

AS 3411

AS 3411

AS 321AS 321

AS 168AS 168

AS 376AS 376

AS 441AS 441

11

Protocol Design Goals and Requirements

• Minimize routing table space– Faster look-up (although forwarding table can be more compact)– Less to exchange (lower processing and bandwidth overhead)– Lower storage cost

• Minimize number and frequency of control messages– Lower processing and bandwidth overhead– But need to take responsiveness to changes into account

• Robustness– Avoid black holes (unable to reach destination)– Prevent or recover from loops (inconsistent decisions among routers)– Limit instability (oscillation between possible routes)

• Optimize use of network resources and overall performance

– Best possible and/or most efficient path

12

General Design Choices

• Where are routes computed?– Centralized vs. distributed

- Centralized is simpler but may not scale and more prone to failure - Distributed requires collaboration between routers and has more

complex transient behavior

• How are routes computed?– Distributed computations vs. distribution of information from

which routes can be computed- Distance vector: routers exchange results of computations

– Routing table and costs

- Link state: routers exchange information on which computations can be performed independently– Topological and reachability information

• What criteria are used when computing routes?– What metrics to optimize (hop count, bandwidth, delay, etc.)?– Static vs. dynamic metrics?

13

IGPs Job Description

• Ensure consistent and efficient (optimized) forwarding for destinations within a routing domain.

• Enable selection of appropriate exit points for destinations outside the domain.

• Two major families of protocols– Distance vector, examples:

– Routing Information Protocol V2 (RIP2)– Enhanced Interior Gateway Router Protocol (EIGRP), Cisco

proprietary

- Distributed computation– Link state, examples:

– Open Shortest Path First (OSPF)– Intermediate System – Intermediate System (IS-IS)

- Distributed information

14

Link State (LS) Routing

• Common feature with DV protocols– Relies on communications with neighbors– Supports destination-based shortest path forwarding

• Everything else is different: Distributing information versus distributing computations

– Type of information exchanged between neighbors- Topology and link costs vs route costs- Use to build a common domain map in all routers

– Route computations- Independently performed in each router vs distributed

• Two major components of LS protocols– Topology dissemination and maintenance (flooding)– Route computation algorithm executed in each router

15

OSPF Overview• Two level hierarchy

– Backbone area (area 0) connects other areas (hub and spoke).– Costs are assigned to internal links and external routes for fine

tuning of traffic distribution.

• Link state operation– Routers broadcast inside their area knowledge of their local

neighborhood.- Routers glue local neighborhood pieces to create an area map.

– Information about other areas is summarized and broadcast into an area by Area Border Routers (ABRs).

– Autonomous System Boundary Routers (ASBRs) are responsible for injecting information on how to reach external destinations.

• Routing table construction– Routers rely on their domain map and summary information about

other areas and the outside world to compute consistent best paths to known destinations.

– Multiple paths can exist for any given route.

16

A Day in the Life of an OSPF Router

1. Establish adjacency to neighbor routers through HELLO protocol.

2. Build and maintain topology database.a. Synchronize databases with neighbors.

b. Advertise router’s local “neighborhood” to others.

c. Process and forward advertisements received from neighbors.

3. Compute routing table.a. Intra-area destinations (routers and transit networks and

then stub networks)

b. Inter-area destinations

c. External destinations

4. Begin forwarding packets.Cha

nge

notif

icat

ion

17

OSPF HELLO Protocol (1)

• HELLO protocol is a liveness protocol between adjacent routers.

– HELLO packets are periodically exchanged.

• HELLO protocol serves multiple purposes:– Dynamic discovery of neighbors – Advertising of router attributes– Identification of Designated Router (DR) and DR election

(more on this later)– Detection of link and router failures

- Frequency of advertising varies with network technology (from 10 sec to 30 sec).

18







then stub networks)



4. Begin forwarding packets.

Cha

nge

notif

icat

ion

19

Database Synchronization (2a)

• Each router maintains a database that is used to store a “complete” map of the network (more on how this map/database is actually constructed later on).

• When routers boot, or the link between routers comes up, routers perform database synchronization.

– The goal is to quickly ensure a common view of the world.

- Synchronization process determines what each router knows and does not know and which router has the most recent information.

– Database entries are characterized by sequence number and age.

- Only newer or unknown entries are exchanged. Neighbor routers end up with a common domain

map.

20

Database Synchronization

Router 2

Router 1

Router 3

Router 4

Router 6

Router 5

Router 7

Router 81

2

3

• During partition, routers’ databases can become unsynchronized.

• After failure 1, network is partitioned.– Routers 1 to 4 don’t know about failure 2.– Routers 5 to 8 don’t know about failure 3.

• When routers 4 and 5 reconnect, database synchronization is required to ensure consistent views.

• Databases are described by database descriptor records exchanged between master and slave.

– Routers on each side of a newly restored link talk to each other to update databases (determine missing and out-of-date pieces).

21

Building the Topology Database (2b/c)

• OSPF router advertises itself and its view of its neighborhood with Link State Advertisements (LSAs).

– Puzzle piece from which the full map can be built- Router ID identifies the originator of each puzzle piece

– Several types of advertisements (in OSPF)- Multiple LSA types keep LSA size small (40 bytes on

average for OSPF).- Small granularity ensures updates of minimal size.- Different types of LSAs can have different scopes of

distribution.– Link local, area local, domain-wide

• Router forwards LSAs it originates or receives from its neighbors (flooding process).

• Sum of all received LSAs makes up a router’s topology database.

– Common network map shared by all routers in an area

22

LSA Distribution: Flooding

• New LSAs are forwarded on all eligible (type-dependent) links except the link on which they were received (if applicable).

– Dissemination of information is independent of routing.– Each LSA is transmitted at most twice on each link.

• New LSAs replace previous ones in local database.

• LSAs are transmitted reliably (acknowledged).

• Flooding rate bounded through minimum gap between LSAs

• Periodic (30 min.) flooding to refresh database (aging of LSAs)

– Why is periodic refreshing necessary if transmission is reliable?

All routers end-up with a “complete” domain map.

23

Flooding Example

1

1

1

1

1

24

Flooding Example

2

3

2

4

2

4

4

25

Flooding Example

2

3

2

4

2

5

5

26

LSA Sequence Numbers

• Use to identify most recent update.– Greater sequence number is newer.– Newer LSA replaces old one.

• Two problems to deal with:– Wrap around of sequence number counter

- Smaller sequence number is now newer.

– Choice of number at boot-up time - Need to be able to overwrite previous LSAs

27

OSPF Sequence Numbering and LSA Refreshes

• 32-bit sequence number (signed integer)– InitialSequenceNumber of -N+1– Increment by 1 (up to N-1) for each new update

• Wrap-around handled by premature aging of entry before flooding new value (when reaching N-1)

– Flood LSA with age of MaxAge (1 hour) to purge LSA– Send new LSA with sequence number of -N+1– Rare event (over 100,000 years with a 30 min. refresh period)

• Receiving a self-originated LSA with a larger sequence number than last transmitted LSA

– Caused by presence of residual LSA originated prior to last router restart

– Jump to one more than received number and flood again

28

Aging of LSAs

• LSA has an age field. – Incremented at each transmission and while stored– MaxAge determines time-to-live

- Gets reset at each refresh

• Two uses of MaxAge: Flooding of MaxAge LSA– Removal of timed-out or invalid LSAs – Removal of current entry before counter wrap-around

• Impact of MaxAge on boot-up problem– If router waits for MaxAge, old LSAs will be purged.– Selection of MaxAge is a difficult choice.

- Small MaxAge minimizes boot-up latency.- Small MaxAge imposes higher flooding overhead and may prevent

full LSA distribution in large networks.

• Initial database synchronization handles boot-up.

29

Securing LSA Databases

• Consistency of LSA databases is critical to avoid routing loops.

• Need to protect against multiple error scenarios:– Link errors

- LSA checksum and acknowledgment to ensure reliable transmissions

– Injection of spurious LSAs (errors or malice)

- Support for authentication capability in LSAs (OSPFv2)

30

What Are Link State Advertisements?

• Five basic types of LSAs:– Router LSA: A router’s neighborhood– Network LSA: Connectivity through a broadcast network– T3 Summary LSA: Reachability to a route in another area– T4 Summary LSA: Reachability to an ASBR in another area– T5 LSA: AS-external LSA to a route in another AS

• Several additional “special” LSA types:– T7 LSA: Support of “not-so-stubby area” (NSSA)– Opaque LSAs: Support extensibility of functionality:

- Type 9: Link local- Type 10: Area local- Type 11: Throughout whole AS

Intra

Inter

External

31

What Is a Router LSA?• Router LSA advertises to other routers in my “area” what my

neighborhood looks like– What kind of links, to which neighbors, at what cost?

- Point-to-point, point-to-multipoint, broadcast, etc.– What set of local destinations can I reach?

- Stub networks and transit networks– What kind of router am I?

- Internal, Area Border Router, AS Boundary Router

Stub

Stub

Stub

Stub TransitTransit

1

412

7

10

0

00

19.2.4.0/24

19.2.5.0/24

19.2.0.0/16

Designated Router

6

32

What Is a Network LSA?

• Network LSA identifies all the routers attached to the same broadcast network.

– Advertised by Designated Router (DR) that is chosen through an election process.

– Backup DR is also elected and takes over if the DR fails.– All routers exchange HELLO packets with only the DR and

the backup DR.

TransitTransit

19.2.0.0/16

Designated Router

Backup DR

33

Why A Network LSA?

• Multiple routers attached to broadcast LAN

– They can all reach each other.

• Direct advertising of complete router connectivity is expensive.

– Every router specifies connectivity to N-1 routers.

– ~N2 state overhead (bandwidth and storage)

• Network LSA provides more compact advertising

– Single copy of full router connectivity is sent by designated router.

– Backup DR minimizes sensitivity to DR stability.

A B C D

E F G

A

B C

D

E

F

G

A

B C

D

E

F

G

T

34

Building an Area Topology Map

• Putting the puzzle pieces together

R1Stub

Stub

Stub

Stub

R2

1

121

1019.2.4.0/24

19.2.5.0/24

1

TransitTransit19.2.0.0/16

1. Router LSA from R1 2 stub interfaces 1 transit interface 1 pt-to-pt link

35



R4

R3

0

Designated Router

R1Stub

Stub

Stub

Stub

R21

121

1019.2.4.0/24

19.2.5.0/24

1


0

2. Network LSA from DR R3 R1, R3, R4: attached routers

36



R4

R3

0

Designated Router

R1Stub

Stub

Stub

Stub

4

121

1019.2.4.0/24

19.2.5.0/24

1


0

R25

Stub

Stub

19.2.6.0/241

3. Router LSA from R2 2 pt-to-pt links 1 stub interface

37



R3

0

Designated Router

R1Stub

Stub

Stub

Stub

14

121

1019.2.4.0/24

19.2.5.0/24

1


0

R25

Stub

Stub

19.2.6.0/24

1

4. Router LSA from R4 1 pt-to-pt link 1 transit interface 2 stub interfaces

• Note: Stub network 19.2.6.0/24 is dual-homed, but packets wont “transit” through it No HELLO protocol

packets sent by R2 and R4 on those interfaces (passive interface config.)

R46

2

Stub

Stub

19.2.7.0/24

1

6

38



0

Designated Router

R1Stub

Stub

Stub

Stub

14

121

1019.2.4.0/24

19.2.5.0/24

1


0

R25

Stub

Stub

19.2.6.0/241

R46

62

Stub

Stub

19.2.7.0/24

1

R3

0

53

5. Router LSA from R3 1 transit interface 1 stub interface

• Core Network Graph

R1R2

R4

R3

T

1

46

5

6

00

012

5

39



0

Designated Router

R1Stub

Stub

Stub

Stub

14

121

19.2.4.0/24

19.2.5.0/24

1


0

R25

Stub

Stub

19.2.6.0/241

R46

62

Stub

Stub

19.2.7.0/24

1

R3

0

53

6. Final Network Graph– Stub networks are

added

R1R2

R4

R3

T

1

46

5

6

00

012

5

1

1

1

1

2

3

40






3. Compute routing table.a. Intra-area destinations (routers and transit networks

and then stub networks)




Cha

nge

notif

icat

ion

41

Computing the Intra-Area Routing Table (3a)

• Basic setting– Topology database has stabilized.– Router computes shortest paths from itself to all routers and transit

networks and then stub networks.

• Path computation based on Dijkstra algorithm– Maintain two sets of nodes

- Nodes to which the shortest path is known (set S)- Nodes to which candidate shortest paths are known (set C)- Initially only the origin router is in set S

– Iterate and at each iteration:- Consider all neighbors of last node X added to S and add them to C if

they are not already in it.- Update candidate paths of all neighbors of X if path through X is shorter

than their current path.- If C is empty, the algorithm terminates.- Otherwise add to S the node in C that is “closest” to X and iterate.

42

• What intra-area routing table does router R1 come-up with, and how does it do it?

Dijkstra Shortest Path Computation

2

R1

R2

R4

R3

T

1

46

5

6

0

0

012

5

1

1

1

1

3

43

Dijkstra’s Operation At R1

R1

R2

R4

R3

T

1

46

5

6

0

0

012

5

1

1

1

1

2

3

S = {(R1,0,R1)};

C = {(R2,1,R2); (T,12,T); (R3,,*); (R4,,*)}

S = {(R1,0,R1); (R2,1,R2)};

C = {(T,12,T); (R3,,*); (R4,6,R2)}

S = {(R1,0,R1); (R2,1,R2); (R4,6,R2)};

C = {(T,12,T,R2); (R3,,*)}

S = {(R1,0,R1); (R2,1,R2); (R4,6,R2);

(T,12,T,R2)};

C = {(R3,12,T,R2)}

S = {(R1,0,R1); (R2,1,R2); (R4,6,R2);

(T,12,T,R2); (R3,12,T,R2)};

C =

44

Adding Transit Networks First

• The order in which nodes are added to the labeled set can affect the number of paths discovered to some nodes. This is because once a node is added to the labeled set it is never revisited

– If E is added first to set of labeled nodes, the path A-C-E of cost 2 is not discovered

– If C is added first to set of labeled nodes, the path A-B-E-C of cost 2 is not discovered

• In OSPF transit network nodes always have outgoing costs of 0, and therefore must be added first to the set of labeled nodes

A

B

C

E

1

02

1

A

B

E

C

1

11

1

0

2

0

1

45

Adding Stub Networks

• For each stub network:– Identify all routers that advertise the stub network.– Retrieve the shortest path to those routers.– Add the cost of the shortest path to the router to the cost of the

stub network link advertised by each router in its Router LSA.– Pick the router(s) that yield the smallest cost.– Add the stub network to the routing table with the same next

hop(s) as the selected router(s).

• Four stub networks in previous example:– 19.2.4.0/24 and 19.2.5.0/24 are directly connected to router R1.– 19.2.6.0/24 is reachable from both R2 and R4, and R2 is the lower

cost option (total cost of 1+1=2 vs 1+5+2=8).– 19.2.7.0/24 is reachable from both R3 and R4, and R4 is the lower

cost option (total cost of 1+5+1=7 vs 12+3=15).

46

What Intra-Area Routing Table at R1?

Routes Next Hop(s)

19.2.4.0/24 19.2.4.1 (IP address of local interface at R1)


19.2.6.0/24 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)



0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)

47

From One To Multiple Areas

• Why can’t we keep increasing the number of routers in an area?– Topology database size and flooding overhead increase.– Most importantly, route computation can become very onerous.

- Cost and frequency of Dijkstra increase.

• Basic solution is to partition a domain into multiple areas.– Two-level hierarchy:

- Backbone area as a hub to which other areas connect- Area Border Routers (ABRs) interconnect areas.

• Full topology is maintained only within an area.– Flooding of router and network LSAs is limited to within an area.– Dijkstra computation is limited to one area.

• Domain-wide shortest path computed using a DV-like approach– ABRs advertise their cost to remote destinations. – Shortest path is computed by concatenating costs to and from ABR.

48




- Type 9: Link local- Type 10: Area local- Type 11: Throughout whole AS

Intra

Inter

External

49

Router’s View of a Multi-Area Domain

0

Designated Router

R1

Stub

Stub

Stub

Stub

14

12119.2.4.0/24

19.2.5.0/24

1


0

R2

5

Stub

Stub

19.2.6.0/241

R46

6

2

Stub

Stub

19.2.7.0/24

14

R3

0

5

3

T3T3

19.1.6.0/24

T3T3

19.1.5.0/24

T3T3

19.1.2.0/23

T3T3

19.3.0.0/16

T3T3

19.1.0.0/16

17

21

26

19

ABR

50

Generating Summary LSAs

• Summary LSAs advertise cost to routes or routers (ASBRs) in other areas.

• Area Border Routers (ABRs) are responsible for generating summary LSAs.

– ABRs advertise to other routers the results of their own shortest path computations for remote (but within the same AS) destinations.

- Essentially a “distance vector” type of approach

12.3.4.0/24

12.3.4.0/24

12

T3: [

12.3

.4.0

/24;

12]

T3

: [1

2.3

.4.0

/24

; 1

2]

T3: [12.3.4.0/24;

12]

51






then stub networks)




A Day in the Life of an OSPF RouterC

hang

e no

tific

atio

n

52

Computing Inter-Area Paths (3b)

• Two cost components (similar to handling of stubs):1. Cost from ABR to remote destination (as advertised in

corresponding T3 summary LSA)

2. Cost from source to ABR in the local area

• Path selection– For all ABRs that advertise the target route (longest prefix

match), add the “cost to the ABR” to the “cost from that ABR to the remote destination.”

– Pick the ABR(s) with the smallest total cost as the target exit point(s) to reach remote destination.

– Set next hop(s) for remote destination to the next hop(s) of the shortest path(s) to the selected ABR(s).

53

Example of Inter-Area Path Computation

• Step 1– Router 14 advertises a T3summary with cost of 3 for r into area 0.– Router 13 advertises a T3summary with cost of 5 for r into area 0.

• Step 2– Router 4 advertises a T3summary with cost of 13 (10+3) for r in area 1.– Router 6 advertises a T3summary with cost of 9 (6+3) for r in area 1.

• Step 3– Router 1 identifies router 6 as the best exit point to reach r (4+9 < 4+13).– Router 1 identifies router 3 and router 5 as its next hops to reach r.

Rtr 2

Rtr 1

Rtr 3

Rtr 4

Rtr 10

Rtr 7

Rtr 9

Rtr 11

Rtr 6 Rtr 8 Rtr 12Rtr 5

Rtr 17

Rtr 14

Rtr 16

Rtr 18

Rtr 13 Rtr 19

Area 1 Area 0 Area 2

2 2

2

2

2

2

2

22

2

6

20

2

2

22

2

2 2

22 2

2

2

2

2 2

2

2

2 2

22

rr1

54

What Intra & Inter-Area Routing Table at R1?

0

Designated Router

R1

Stub

Stub

Stub

Stub

14

12119.2.4.0/24

19.2.5.0/24

1


0

R2

5

Stub

Stub

19.2.6.0/241

R46

6

2

Stub

Stub

19.2.7.0/24

14

R3

0

5

3

T3T3

19.1.6.0/24

T3T3

19.1.5.0/24

T3T3

19.1.2.0/23

T3T3

19.3.0.0/16

T3T3

19.1.0.0/16

17

21

26

19

ABR

55

Intra and Inter-Area Routing Table at R1

Routes Next Hop(s)











56

From Multiple Areas To The “World”

• How to reach destinations in other domains/AS?

• AS Boundary Routers (ASBR) are the gateways to the outside world.

– They “summarize” the world in pretty much the same way as ABRs summarize other areas.

– ASBRs advertise external routes (T5 LSA) that indicate their ability to reach remote destinations.

- T5 LSAs are flooded throughout the AS.– Reachability to ASBRs located in other areas are advertised by

ABRs through T4 LSA. (Why needed?)

• Cost of AS-external LSAs can be of two types:– Type 1 cost is compatible with the link costs within the AS.– Type 2 cost is incompatible with the link costs within the AS and

trumps any internal cost.

57




- Type 9: Link local

- Type 10: Area local

- Type 11: Throughout whole AS

Intra

Inter

External

58

T5

A Router’s View of The “World”

0

Designated Router

R1

Stub

Stub

Stub

Stub

14

12119.2.4.0/24

19.2.5.0/24

1


0

R2

5

Stub

Stub

19.2.6.0/241

R46

6

2

Stub

Stub

19.2.7.0/24

T4

R3

0

5

3

T3T3

T3T3

T3T3

T3T3

19.1.0.0/16

17

<2,20

>

ABR

19.3.0.0/16

19.1.2.0/23

19.1.5.0/24

T3T3

0.0.0.0/0

R5

14

17

21

26

19

19.1.6.0/24

ASBR

59







then stub networks)




Cha

nge

notif

icat

ion

60

Computing Paths to External Routes (3c)• Two cost components (again like inter-area & stubs):

1. Cost from ASBR to external route2. Cost from source to ASBR

• Path selection– Smallest type 2 cost wins independent of internal cost– If type 1 cost or equal type 2 cost

- Prefer non-backbone intra-area path to ASBR (or forwarding address) and use cost to break ties

- If no such path, use cost to select path– Cost computation

- If (equal) type 2 cost: Cost to ASBR- If type 1 cost: Cost to ASBR + Cost from ASBR- Cost to ASBR:

– Direct cost if within same area– Cost to ABR plus T4 summary cost advertised by ABR (cost from

ABR to ASBR in remote area)

61

Examples of External Path Computation

• Router 1 selects router 5 to reach external route r in spite of its higher cost (22 instead of 10 through router 10), as it is in area 1

• Router 8 selects router 17 to reach external route r’, as router 8 is in the backbone area and uses cost to identify the best path

Rtr 2

Rtr 1

Rtr 3

Rtr 4Rtr 10

Rtr 7

Rtr 9

Rtr 11

Rtr 6 Rtr 8 Rtr 12

Rtr 5

Rtr 17

Rtr 14

Rtr 16

Rtr 13 Rtr 19Area 1 Area 0 Area

2

2 2

2

2

2

2

2

22

2

6

2

2

2

22

2

2 2

22 2

2

2

2

2 2

2

2

2 2

22

r’r’

rr

r’r’rr

Rtr 18

<1,20

>

<1,2> <1,20

><1,

2>

<x,y>: <cost type,

cost> for external route

62

What Routing Table at Router R1?

0

Designated Router

R1

Stub

Stub

Stub

Stub

14

12119.2.4.0/24

19.2.5.0/24

1


0

R2

5

Stub

Stub

19.2.6.0/241

R46

6

2

Stub

Stub

19.2.7.0/24

T4

R3

0

5

3

T3T3

T3T3

T3T3

T3T3

T3T3

19.1.0.0/16

17

<2,20

>

ABR

19.3.0.0/16

19.1.2.0/23

19.1.5.0/24

0.0.0.0/0

R5

14

17

21

26

19

T5

19.1.6.0/24

ASBR

63

Routing Table at R1

Routes Next Hop(s)












64

Area 0Area 0

Summarizing OSPF Operation

Area 1Area 1

Area 3Area 3

Area 2Area 2

Area 4Area 4

AS 123

r1

r1

r2

r2

r3

r3

Intra-area route

Inter-area routeExternal route

ABR

ASBR

65







then stub networks)



4. Begin forwarding packets.Cha

nge

notif

icat

ion

66

Packet Forwarding (4)

• For each known route, paths (next hops) are installed in the final routing table as follows:

– Intra-area paths are preferred.– Inter-area paths are next.– Type 1 external paths follow.– Type 2 external paths are the least preferred.– Cost is used as a last resort to break ties.

• Forwarding table is constructed from the routing table.

• Upon receipt of a packet, a longest prefix match search is performed on the forwarding table to determine where to send the packet next.

67

In Case I Have Made It Look Too Simple!

• Understanding the flow of traffic in OSPF networks is not trivial.• Many “exceptions” to the basic rule of using shortest paths. Some

of them are clear, some are less obvious.– Intra-area paths have preference over inter-area paths.

- If I happen to hit a router with an intra-area route, it will divert my packet from the intended exit point.

– ABRs ignore each other’s advertisements within a given non-backbone area.

- Once an exit point is reached, there is no turning back.– Address summarization can hide the best path.

- Advertised cost for summary route is max cost across all individual routes (RFC 2328, Sec. 3.5).

– Choice of path to external route is not always cost driven.- Prefer path to local ASBR except if in backbone area, where

selection is based on cost (needed to avoid routing loops in some cases – See RFC 2178 and 2328 for details).

68

Example of Non-Shortest Path

• ABRs ignore T3 Summary LSAs in their own area (except backbone)

– Shortest path from Router 2 to Router 8 has cost 6. (Router 6 is the intended exit point)

– Actual path from Router 2 to Router 8 has cost 8. (Router 4 diverts packets over higher cost intra-area path to Router 8)

Rtr 2

Rtr 1

Rtr 3

Rtr 4

Rtr 10

Rtr 7

Rtr 9

Rtr 11


Rtr 17

Rtr 14

Rtr 16

Rtr 18

Rtr 13 Rtr 19


2 2

2

2

2

2

2

22

2

6

20

2

2

22

2

2 2

22 2

2

2

2

2 2

2

2

2 2

22

69

Another Example of Non-Shortest Path

• Both Router 5 and Router 8 advertise reachability to external route E with type 2 cost of 10.

– Router 5 is selected by Router 2 in spite of the higher (internal) cost to reach it (22 vs 10), because of preference for intra-area paths to ASBRs.

Rtr 2

Rtr 1

Rtr 3

Rtr 4

Rtr 10

Rtr 7

Rtr 9

Rtr 11


Rtr 17

Rtr 14

Rtr 16

Rtr 18

Rtr 13 Rtr 19


2 2

20

20

20

2

2

22

2

8

4

6

2

22

2

2 2

22 2

2

2

2

2 2

2

2

2 2

22

70

A Tricky Corner CaseRouter 2

Router 1

Router 3

Router 4

Router 10

Router 7

Router 9

Router 8

Router 6Router 5

Router 11

Router 14

Router 15

Router 12

Router 13

Router 16Area

1Area 0

Area 2

2 2

2

2

2

2

2

22

16

2

2

2 2

2

20

2

2

2 2

2

2

2 2

22

• What is (are) the path(s) between Router 2 and Router 16

– Hint: Router 10 has interfaces in Areas 0, 1 and 2

• Remember basic rule of distance vector protocol– Only advertise a route you are using

71

A Tricky Corner CaseRouter 2

Router 1

Router 3

Router 4

Router 10

Router 7

Router 9

Router 8

Router 6Router 5

Router 11

Router 14

Router 15

Router 12

Router 13

Router 16Area

1Area 0

Area 2

2 2

2

2

2

2

2

22

16

2

2

2 2

2

20

2

2

2 2

2

2

2 2

22

• Two exit points from Area 1– Router 10 advertises a cost of 24 to Router 16, even though there is a

better/cheaper path through Router 8– Router 7 advertises a cost of 8 to Router 16

• Minimum cost through either Router 10 or Router 7 is 28• BUT there are two paths of total cost 46!

72

A Few Other Extensions That Can Further Complicate Things

• Virtual links and transit areas– Useful to enhance backbone connectivity, but rarely used

and add complexity to understanding packet forwarding decisions

• Stub areas– No flooding of AS-external LSAs (T5)– Default T3-summary route flooded in the area

• Not-so-stubby areas (NSSA)– I don’t want to receive T5’s but want to originate T5’s– Use new LSA type (T7) for flooding of AS-external LSAs

within the NSSA, plus “translation” of T7 to T5 at ABRs and a few other tweaks

73

Virtual Links &Transit Areas

• Virtual links are meant to facilitate backbone connectivity– Connecting remote areas– Increasing backbone robustness to link failures

• The two end points of a virtual link are Area Border Routers

– A virtual link can be configured between two ABRs that have an interface to a common non-backbone area

– A virtual link is treated as an unnumbered pt-to-pt link– The cost of a virtual link is the cost of the path connecting the two

ABRs that have been configured as ends of the virtual link- Note that this implies that a virtual link only comes up after the

shortest path computation has completed• An area that has one or more active virtual links is called

a transit area (stub areas cannot be transit areas)– It can be used to carry transit traffic between other areas– It can make understanding how packets are “actually” forwarded

pretty complicated...

74

Two Examples of The Use of Virtual Links

Area 3Area 3

Area 0Area 0

Area 1

Area 1

Area 2Area 2

2

2

1

52

112

3

7

6

12

5

4

3

5

3

22

21

1

3

33

75

Another Virtual Link Example

• Or why virtual links can make understanding packet forwarding more complicated

– When computing the shortest path to a destination, ABRs attached to a transit area are allowed to consider summary LSAs advertised in the transit area by other ABRs

Area 2Area 2Area 0Area 0Area 1Area 1

1

2

10

2

12

3

76

Another Link State Protocol: IS-IS

• IS-IS = Intermediate System to Intermediate System– Intermediate System = Router– End System = Host

• Just like OSPF– A link-state protocol that relies on flooding

- Each node advertises its (local) view of the world in a Link State Packet (LSP)– Entries are regularly refreshed

- Nodes maintain a topology database on which they run Dijkstra to compute shortest paths– Database entries are aged out if not refreshed

– Hello protocol for liveness and neighbor discovery– Broadcast network represented as “pseudo-node” with an elected

Designated IS (DIS) responsible for impersonating it– A two-level hierarchy for improved scalability

77

Just like OSPF – Maybe Not• IS-IS runs directly on the link layer, i.e., below and

independent of IP– No exposure to IP “insecurity”, but no benefit from IP functionality,

e.g., fragmentation (common MTU across the entire network)• IS-IS relies on different area and level definitions

– Area = level 1 domain based on local sharing of area address– Routers belong to one area (area boundary through links not

routers, i.e., no ABR but “connected” routers)– Level 2 backbone defines logical connectivity based on link types

and not as a separate area (OSPF area 0)• IS-IS has a different LSP structure (just one per router)

– LSP originator identified by System ID of router (IS) - Not an IP address…

– Extensive use of Type/Length/Value (TLV) format to encode everything a router knows in one long list

- Affects update process: One small change triggers update for everything

- Affects extensibility: New information easily encoded in new TLV without requiring new packet type

78

More on IS-IS Differences – (1)

• Areas and levels (level 1 and level 2)– Routers form adjacency over link if and only if

- They both agree it is a level 2 link- They both agree it is a level 1 link AND they share at least one

area address– Area IDs are link local, which facilitates merging and splitting areas

• Level 2 and level 1 structure– Links can be level 1, level 2 or BOTH level 1 and level 2– Level 2 backbone is mostly a connectivity concept

- L1/L2 links are somewhat similar to OSPF virtual links– Connectivity between level 1 areas is provided through “attached”

router- Router with connectivity to L2 backbone as indicated in L1 LSP

(Attached bit)- Default behavior routes inter-area packets to closest attached

router (route leaking extensions)

79

Merging, Splitting, Renumbering Areas

• Merging area 7 and area 8– Reconfigure link A-B to be L1/L2

link with, say, area 7 as area ID common to A & B

– L1 flooding domain now extends to routers A, B, C and D

– Add area 7 as area ID for link B-D on both routers B and D

– Reconfigure link A-B to be L1 only

– Remove area 8 ID from routers B and D for link B-D

• Similar (symmetric) processes can be followed for area splitting or renumbering

A

CD

B

Area 7

Area 8

L2

A

C

D

B

Area 8

L1/L2

Area 7 Area 7

A

C

D

B

Area 7

L1

Area 7 Area 7

80

Level 2 Backbone

• L2 Backbone forms logical connected topology based on L2 and L2/L1 links

– Can facilitate use of high-speed backbone links for intra-area traffic (similar to OSPF virtual links but link rather than area specific)

• Inter-area traffic is sent to closest attached L2 router

– Closest exit point to backbone- No need to deliver traffic to C1

– Scalable, but can result in sub-optimal routing

• Route-leaking extensions– Similar to OSPF T3 summary routes

approach– Use of Up/Down bit to ensure that

routes are not propagated back out

A1

A2

Area 7

L2L1/L2

Area 9

Area 10

L1

L1 L1/L2

C2

C1

C3C4

L1L1

B2 DB1

B3B4

L2L2

L1

L1/L2

81

The Penalty of Choosing The Closest Exit Point

5 4

14

2

2

1

4

4

4

19.1.1.0/24

19.1.1.0/24

25

11

10

82


• HELLO protocol– Local hold timer for each link (carried in Hello messages)– Used for MTU check through padding of Hello messages– Three-way handshake to test for bidirectional connectivity

- Hello message list addresses of IS from which Hello have been heard

• Broadcast network– DIS election is preemptive (IS with highest priority wins)– LAN represented by pseudonode LSP from DIS

- Identified by non-zero pseudonode ID– No backup DIS (DIS can reduce its Hello hold timer – JUNOS)

• Database synchronization– On LANs DIS sends Complete Sequence Numbers PDUs (CSNPs)

every 10s (unreliable transmissions – implicit ACKs)– On P2P links initial CSNP sent only when adjacency comes up

- JUNOS implements periodic resending of CSNP on P2P links– Other routers request specific LSPs using Partial Sequence Numbers

PDUs (PSNPs) or reflood missing/old LSPs- ACKs (PSNP or CSNP) needed for received LSPs on P2P links

83


• Fragmentation– Remember that IP fragmentation is not available– IS-IS does application level fragmentation (assumes minimum

MTU size of 1492 bytes – verified using Hello padding)– CSNP fragmentation

- Based on using Start-LSP-ID and End-LSP-ID fields to indicate beginning and end of synchronization

– LSP fragmentation

- Based on fragment ID byte (up to 256 fragments)– Extension based on assigning additional IDs to IS

- Fragments zero is mandatory for others to be considered

- Fragments are atomic (arrive independently) Need to be careful in packaging information in fragments to avoid churning in the presence of changes

84

LSP Fragmentation

HeaderAdj #1

Adj #2

Adj #20

Adj #21

Adj #22

Fragment 00 Fragment 01

HeaderAdj #1

Adj #20

Adj #21

Adj #22


HeaderAdj #1

Adj #20

Adj #21

Adj #22


Adjacency goes away in fragment 00

Repackaging of fragments requires re-advertising both fragments 00 and 01

Preserving fragment structure ensures that only fragment 00 is re-advertised

85


• LSP structure– Link State Packet as a container– Container content based on packing independent entities (TLVs)

that each provide different type of information

• Benefits– Protocol machinery associated with the handling of the containers

is independent of its content (reusable)– Carrying new information in container only requires definition of

new TLVs– LSPs can still be parsed even if they contain unknown TLVs (just

skip the # bits specified in the Length field)

• Drawbacks– Container imposes rather coarse information granularity

- Whole container is resent in the presence of changes- Need to exercise caution when information is spread over

multiple fragments– Purge mechanism is required

86

Purging Old Fragments

• Adjacencies #21 & #22 go away– New fragment 00 is issued– No more need for fragment 01

• How does router know fragment 01 is gone?

– Will persist for a period of “lifetime”

• Router can issue “purge” LSP– Contains only the header with

zeroed out lifetime and checksum

HeaderAdj #1

Adj #20

Adj #21

Adj #22


HeaderAdj #1

Adj #20

New Fragment 00

87

Main IS-IS TLVs – (1)

TLV TLV # Where Used?

Area Address 1 Hello, LSP

IS Neighbors 6 Hello (LAN)

Padding 8 Hello

Authentication 10 Hello, LSP, CSNP, PSNP

Checksum 12 Hello, CSNP, PSNP

Protocols Supported 129 Hello, LSP

IP Interface Address 132 Hello, LSP

Dynamic Host Name 137 LSP

Multi-Topology Supported 229 Hello, LSP

88

Main IS-IS TLVs – (2)

TLV TLV # Where Used?

IS Reachability 2 LSP

Extended IS Reachability 22 LSP

Multi-Topology IS Reachability 222 LSP

IP Internal Reachability 128 LSP

IP External Reachability 130 LSP

Extended IP Reachability 135 LSP

Multi-Topology IP Reachability 235 LSP

Multi-Topology IPv6 Reachability 237 LSP

89

IS Reachability TLVs

• IS Reachability (TLV #2) – Hardly used anymore– Fixed length– Identifies connectivity to neighbors (independent of IP)– Multiple possible metrics (default, delay, expense, error), but

very limited range (only 6 bits, i.e., from 0 to 63)

• Extended IS Reachability (TLV #22)– Motivated by need for bigger metrics and support (room) for

extensions (sub-TLVs)- 24-bit metric field typically encodes value in reference to

Reference Bandwidth– Reference bandwidth of 1 Terabits/sec, so that a 1Gbps

link has a value of 1000 (smallest granularity is 64kbps)

- Variable length TLV– Ability to include a variety of sub-TLVs, e.g., for Traffic

Engineering extensions and MPLS

90

Some IS-IS Sub-TLVs – (1)

Sub-TLV Sub-TLV #

Maximum Link Bandwidth 9

Reservable Link Bandwidth 10

Unreserved Bandwidth 11

Traffic Engineering Metric 18

Link Protection Type 20

91

IP Reachability TLVs

• IP Internal/External Reachability (TLV #128/130) – Hardly used anymore

– Identifies IP routes directly connected to router (internal) or learned from other protocol (external)

– Same metrics and metrics limitations as IS Reachability TLV

• Extended IP Reachability (TLV #135)– Specifies IP routes reachable by router

- 6-bit for prefix length: from 0 to 32 (33 values)

- Subsumes both TLV #128 and #130– 32-bit metric field (compatibility with other protocols)

92

Multi-Topology TLVs

• Enables specification of multiple logical topologies each with a different routing

– Per topology metrics and SPF computations, e.g., VoIP traffic routed differently from data traffic

• TLV #229 identifies topologies a router supports – IPv 4 Unicast (#0)– In-Band Management (#1)– IPv6 Unicast (#2)– Multicast (#3)

• Multi-topology support validated during Hello exchanges

• Multi-topology IPv4/6 Reachability TLVs “duplicate” Extended IP Reachability and IPv6 Reachability TLVs

93


• SPF computations– Explicitly structured as a two-phase process

• Phase 1– Compute shortest paths from IS to IS (on a graph with

routers/IS and “networks” as vertices and links/adjacencies as edges

- One SPF per topology– Independent of any underlying “reachability” information

• Phase 2– Add shortest paths to all “leaves” – reachability information

• Phase 1 can be reused across protocols and is only triggered in the present of connectivity changes

• Phase 2 is reachability specific

94

AS 524AS 524

Routing Protocol Overview


– Interior Gateway Protocols (IGP) control routing within an AS/domain

– Exterior Gateway Protocol(s) (EGP) control routing between AS’s

• Different goals and constraints for each family of protocols

– IGP: Ability to fine tune internal operation and shielding from outside “noise”

– EGP: Scalability and ability to

AS 121AS 121

AS 1AS 1

AS 2AS 2

AS 3AS 3

AS 123AS 123

AS 3411

AS 3411

AS 321AS 321

AS 168AS 168

AS 376AS 376

AS 441AS 441

95

Growing Up From One to Many Domains

• Goal– Enable connectivity between domains (Internet-wide)

• Requirements– Operational flexibility and scalability, and

scalability, and scalability,… - Autonomous systems are typically operated by different

administrative entities- Cooperation but no “trust” between domains

• Border Gateway Protocol (BGP4) is the dominant (only!) External Gateway Protocol (EGP)

96

BGP Routing Table Growth

From http://bgp.potaroo.net

Telstra’s table (AS 1221)

http://bgp.potaroo.net/

97

Some Basic Remarks Before Jumping Into BGP

• A link state type of approach would simply not work– Requires building and maintaining a map of the entire Internet in

every router...– The need for consistent information and decisions cannot be

satisfied as the network size grows

- Things are always changing somewhere in the Internet

• Distance vector protocols are the only realistic option– Better scalability by limiting the level of topology information that

each router maintains– Preserve ability to use different route selection criteria as each

router

- No need for consistent metrics

- Seamless support for policies– Control of what routing information is sent to whom

98

Border Gateway Protocol• DV protocol for inter-domain routing

– Supports arbitrary topology (but no overlapping domains)– Governs exchange of information between internal and external

border routers (BGP peers)- Internal peers: within the same domain

- External peers: in two adjacent domains

- Each domain is characterized by a unique autonomous system number

• Major BGP characteristics– Selection of “best” path (avoid stupid choices and support strong

administrative control)- Multiple path attributes

– Loop avoidance (path vectors)– Scalability through route aggregation

• BGP as a protocol is relatively simple (86 pages for the latest draft vs 244 for RFC 2328), but its configuration can be complex and errors can have far-reaching implications

99

BGP Operation Overview

Three major phases:– Neighbor acquisition and reachability, exchange of routing

information, and path selection (steady state)

1. Neighbor acquisition and reachability– Initiated through OPEN message and maintained by

KEEPALIVE messages– Neighbor declared unreachable if no KEEPALIVE received

within Holding Time

2. Routing information exchanged through UPDATE messages– Incremental updates to advertise & withdraw routes

- Requires reliable transmission (uses TCP - port 179)

3. Path selection uses the information received in UPDATE messages to select the best path for a route and construct the routing table

100

The BGP State Machine

Connect Active

OpenSent

OpenConfirm

Established

Idle

Connect

OpenSent

OpenConfirm

Established

Idle

“Normal” Sequence

101

BGP Information Flow and Sources

• Different peering sessions with internal (same AS) and external (different AS) neighbors

– External BGP neighbors communicate via eBGP– Internal BGP neighbors communicate via iBGP

- All BGP peers in an AS are typically connected in a full mesh (more on this later)

iBGP

iBGP

iBGP

iBGP

iBGP iBG

P

eBGP

eBGP eBGP

eBGP

eBGP

AS 1

AS 2

AS 3

Rtr A1

Rtr B1

Rtr A2

Rtr B2

Rtr A3

Rtr B3

Rtr D2

Rtr C2

102

BGP Processing StepsPhase 1

Determines degree of preference

Phase 3

Determine which routes to advertise based on policies

Phase 2

Select best routes to install in LocRIB

RIB_InRtr A2

RIB_InRtr B2

RIB_InRtr C2

RIB_InRtr A3

RIB_InRtr B3

Local RIB

RIB_OutRtr A2

RIB_OutRtr B2

RIB_ OutRtr C2

RIB_ OutRtr A3

RIB_ OutRtr B3

EBGP EBGP

IBGPIBGP

Router D2

103

BGP UPDATE Message

• UPDATE message is the basic unit of route advertisement

– Can contain multiple routes that are being withdrawn

– Path Attributes describe a number of key properties of the advertised route that are used to select the best path

– NLRI is a list of IP address prefixes associated with a given BGP route (common set of Path Attributes)

Unfeasible Route Length (2 bytes)

Withdrawn Routes (variable)

Total Path Attribute Length (2 bytes)

Path Attributes (variable)

Network Layer Reachability Information (NLRI) (variable)

104

Path AttributesGeneral Characteristics

• Several categories of attributes– Optional or well-known, mandatory or discretionary, transitive or not,

partial or not

• Well-known attributes must be recognized by all BGP implementations

– Mandatory well-known attributes must be included in every UPDATE message, while discretionary well-known attributes may or may not be sent based on the content of the message

– Well-known attributes MUST be passed along (after updating) to other BGP peers

• Optional attributes need not be recognized by all BGP implementations

– Unrecognized transitive attributes SHOULD be passed to other BGP peers with the partial bit set

– Unrecognized non-transitive attributes are ignored

105

Path Attributes (1)• AS_PATH

– Well-known, mandatory– Sequence of path segments of type AS_SET (1) or AS_SEQUENCE (2)

- AS_SET: Unordered list of autonomous systems traversed by the route- AS_SEQUENCE: Ordered list of autonomous systems traversed by the

route– Updated by “pre-pending” own AS number when advertising to a BGP

speaker in another AS Loop prevention

• NEXT_HOP– Well-known, mandatory– IP address of border router to be used as next hop towards destination

identified by the NLRI field– Typically chosen to ensure that the “shortest” path is taken

• ORIGIN– Well-known, mandatory– Characterizes where the path first originated

- IGP: 0; EGP: 1; Other: 2– Should not be changed by other BGP speakers

106

Path Attributes (2)• LOCAL_PREF

– Well-known, discretionary– Advertisement to other BGP speakers in the same AS (iBGP) of the

degree of preference of a route by the advertising router (higher value is preferred)

• MULTI_EXIT_DISC (MED)– Optional, non-transitive– Used to give some preference to different exit/entry points in a

neighboring AS (lower value is preferred)

• COMMUNITY– Optional, transitive, used to simplify routing policies

- Common property used to determine which routes to accept, prefer, and pass to BGP neighbors

– Some well-known communities:- NO_EXPORT: do not advertise outside of the AS (or

confederation)- NO_EXPORT_SUBCONFED: do not advertise to external peers

(including peers in other autonomous systems within a confederation)

- NO_ADVERTISE: not advertised to any BGP peer

107

Path Attributes (3)

• AGGREGATOR– Optional, transitive– Contains IP address and AS number of the BGP speaker that

formed the aggregate route

• ATOMIC_AGGREGATE– Well-known, discretionary (should be propagated)– Informs other BGP speakers that the advertiser aggregated

several routes and may have removed some autonomous system numbers from the AS_SET (loop free property must be maintained, though)

- As a result, actual path may differ from AS_PATH- Basically used to signal possible loss of information

– NLRI field must not be modified by adding a more specific prefix, i.e., route must not be de-aggregated (loop prevention)

108

Path Attributes (4)

• ORIGINATOR_ID– Optional, non-transitive– Used by Route Reflectors (more on this later)– Identifies the local router (within the local AS) that originally

advertised the route

• CLUSTER_LIST– Optional, non-transitive– Used by Route Reflectors to detect looping of routing

information in an AS because of misconfiguration

- Each Route Reflector prepends its CLUSTER_ID to the CLUSTER_LIST

- Route Reflectors ignore advertisement that carry their CLUSTER_ID in the CLUSTER LIST

109

BGP Decision Process

• Three phase process– Phase 1: Calculates a “degree of preference” for each route in a

given RIB_In (locks the associated RIB_In)- If route is learned from local peer, the LOCAL_PREF attribute

is usually taken as the degree of preference.- If route is learned from an external peer, the degree of

preference is computed based on local policy.– The resulting value is used as LOCAL_PREF in any iBGP

re-advertisement.– Phase 2: Selects the “best” route out of all those available for

distinct destinations (locks all RIB_In)- Excludes routes with unresolvable NEXT_HOP or a loop in the

AS_PATH attribute- Best routes are installed in the Local RIB.

– Phase 3: Decides, based on policies, which routes in Local RIB to advertise to which peer (blocks execution of Phase 2).

- Route aggregation can be performed at this stage.

110

BGP Tie Breaking Rules

• BGP selects a SINGLE route.– Remove all routes that don’t have the smallest number of AS

numbers in AS_PATH (each AS_SET counts only as one!)– Remove all routes that don’t have the lowest ORIGIN value– Among routes learned from the same neighboring AS, remove

routes with less desirable (higher) MED values.– If at least one route was learned through eBGP, remove all

routes learned through iBGP.– Remove all routes with a non-minimum IGP cost to NEXT_HOP.– Remove all routes that were not advertised by the BGP speaker

with the lowest BGP identifier.– Prefer the route received from the lowest peer address.

111

Using LOCAL_PREF to Pick an Exit Point

• Choosing between a primary and a backup provider– Used to influence internal decisions

AS 2AS 2

AS 3AS 3

AS 1AS 1 AS 10AS 10 AS 11AS 11

Primary

Backup

LOCAL_PREF=20

LOCAL_PREF=100

112

AS_PATH Padding to Discourage the Use of Certain Links - (A Hack!)

• Used externally to influence choice of inbound links– Choosing between a primary and a backup link– Tuning inbound traffic for load-balancing purposes

• Can be over-ridden by local decisions (LOCAL_PREF)

AS 10AS 10 AS 1AS 1

1.3.0.0/161.3.0.0/16

1.2.0.0/16; <AS1>

1.3.0.0/16; <AS1,AS1>1.2.0.0/161.2.0.0/16

1.2.0.0/16; <AS1,AS1>

1.3.0.0/16; <AS1>

113

Another Way to Influence Entry Points

• MED allows crude selection ability– Avoid low speed internal links

• But not always taken into account

AS 1AS 1

AS 111AS 111

19.2.1.0/2419.2.1.0/2419.2.2.0/2419.2.2.0/24

AS 55AS 55

Low speed RF link

19.2.1.0/24; MED 5

19.2.2.0/24, MED 100

19.2.1.0/24; MED 100

19.2.2.0/24, MED 5

114

Ignoring MED Values

• Hot potato routing– Basic rule between ISPs– “I wont carry your bits for you…”

MCIMCI

AT&TAT&TAT&T

Customer

AT&T Customer

MCI Customer

MCI Customer

115

iBGP

iBGP

iBGP

iBGP

iBGP iBG

P

eBGP

eBGP eBGP

eBGP

eBGP

AS 1

AS 2

AS 3

Rtr A1

Rtr B1

Rtr A2

Rtr B2

Rtr A3

Rtr B3

Rtr D2

Rtr C2

rr

r’r’

Propagating Path Attributes (1)• Let us follow UPDATEs for routes r and r’ located in AS 1.• Router A1 originates updates for routes r and r’ and advertises them over its eBGP

session to Router A2.– ORIGIN is set to 0 as routes r and r’ were learned through IGP.– AS_PATH type set to AS_SEQUENCE and initialized with AS 1.– Router A1 sets NEXT_HOP to be the IP address of its interface on the link to

Router A2.– MED values of 0 and 50 for routes r and r’, respectively, as Router A1 is the

desired entry point for r but not r’ (Router B1 will use MED values of 50 and 0 when advertising routes r and r’ to Router B2).

116

Propagating Path Attributes (2)

iBGP

iBGP

iBGP

iBGP

iBGP iBG

P

eBGP

eBGP eBGP

eBGP

eBGP

AS 1

AS 2

AS 3

Rtr A1

Rtr B1

Rtr A2

Rtr B2

Rtr A3

Rtr B3

Rtr D2

Rtr C2

rr

r’r’

• Router A2 processes the updates it received from Router A1 for routes r and r’ and decides to advertise them over its iBGP sessions to Routers B2, C2 and D2.

– ORIGIN is kept unchanged.– AS_PATH is propagated unchanged.– Router A2 has been configured with NEXT_HOP self, so it sets NEXT_HOP

to be its own IP address.– MED values are propagated unchanged.– Router A2 sets LOCAL_PREF for r and r’ to 50 and 20, respectively (Router

B2 advertises both as 50 – more on this later).

117

Propagating Path Attributes (3)

iBGP

iBGP

iBGP

iBGP

iBGP iBG

P

eBGP

eBGP eBGP

eBGP

eBGP

AS 1

AS 2

AS 3

Rtr A1

Rtr B1

Rtr A2

Rtr B2

Rtr A3

Rtr B3

Rtr D2

Rtr C2

rr

r’r’

• Router D2 processes updates received from Routers A2 and B2 for routes r and r’ and advertises a single UPDATE for aggregate route r* over its eBGP sessions to Router A3.

– ORIGIN is kept unchanged. – Router D2 generates new AS_PATH attributes for r and r’ by pre-pending

AS2 to the AS_PATH (value is now <AS2,AS1>) and because both AS_PATH attributes are identical, the AS_PATH of r* is set to the same value and type.

– Router D2 adds an AGGREGATOR attribute <AS 2;own IP address> but no ATOMIC_AGGREGATE attribute as there was no information loss

– Router D2 sets NEXT_HOP to its own IP address.

118

Decision Process Example (1)

iBGP

iBGP

iBGP

iBGP

iBGP iBG

P

eBGP

eBGP eBGP

eBGP

eBGP

AS 1

AS 2

AS 3

Rtr A1

Rtr B1

Rtr A2

Rtr B2

Rtr A3

Rtr B3

Rtr D2

Rtr C2

rr

r’r’

• In AS 1 both routes r and r’: are learned from IGP• In AS 2 routers hear about r and r’ from Router A2 and Router B2,

and both routes have the same AS_PATH count and ORIGIN value.– For routes r and r’, Router A1 advertises MED values of 0 and 50, and

Router B1 advertises MED values of 50 and 0.– If LOCAL_PREF values are equal, Routers C2 and D2 in AS 2 rely on MED

values and pick Router A2 as the NEXT_HOP for r and Router B2 as the NEXT_HOP for r’ (Routers A2 and B2 pick Routers A1 and B1, respectively)

• In AS 3, Router A3 will pick Router D2 (eBGP from Router D2 vs iBGP from Router B3); Router B3 will pick Router C2 (smaller BGP ID); other BGP speakers pick Routers A3 or B3 based on IGP cost.

119

Decision Process Example (1’)

iBGP

iBGP

iBGP

iBGP

iBGP iBG

P

eBGP

eBGP eBGP

eBGP

eBGP

AS 1

AS 2

AS 3

Rtr A1

Rtr B1

Rtr A2

Rtr B2

Rtr A3

Rtr B3

Rtr D2

Rtr C2

rr

r’r’

• In AS 2 routers hear about r and r’ from Router A1 and Router B1, and both routes have the same AS_PATH count and ORIGIN value, but different MED values.

– For routes r and r’, Router A1 advertises MED values of 0 and 50, and Router B1 advertises MED values of 50 and 0.

• For routes r and r’, Router A2 advertises LOCAL_PREF values of 50 and 20, while Router B2 advertises 50 for both

– Router C2 and D2 pick Router B2 for r’, and select either Router A2 or Router B2 for r based on their IGP cost (MED is ignored)

120

Another Aggregation Example• Routes r and r’ are aggregated into route r* by Router R when

advertised into AS 8– AS_PATH attribute type changed to AS_SET– Unordered list of ASes <AS 1;AS 2;AS 3;AS 4;AS 5;AS 6;AS 7>– May omit some AS numbers if there is no risk of loop, e.g., advertise

AS_SET <AS 1; AS 2; AS 3; AS 7>- ATOMIC_AGGREGATE attribute is added- Actual path need not follow AS_PATH

AS 1

AS 2 AS 3

AS 7

AS 4AS 5 AS 6

AS 8rr

r’r’

Router R

r*r*

121

De-Aggregation and Loops

AS 5AS 5

AS 2AS 2

AS 1AS 1 AS 4AS 4

r’/24r’/24

r/16r/16

AS 6AS 6

Route r’ < Route r

r’; <AS1,AS2,AS3,AS4>

r; < AS 5>

r’; <AS1,AS2,AS3,AS4>

r; <AS5,AS6>

r’; <AS5,AS6>

AS 3AS 3Routing Loop for packets destined for route r’

Illegal de-aggregation

122

Policies – One Example

• Transit (customer) vs. non-transit (peer) agreements between providers (routing domains)

– In a transit agreement, I will accept traffic from you that is intended for any destination.

– In a non-transit agreement, I will only accept traffic from you that is destined to my customers.

• Associated routing policies– I advertise to you all routes I can reach and for which I am

willing to carry your traffic.– I only advertise to you routes to my own customers.

123

Controlling Route Advertisements Through Policies

AS 1AS 1

AS 2AS 2AS 3AS 3

AS 6AS 6

AS 7AS 7

AS 5AS 5

AS 4AS 4

0.0.0.0/0

0.0.0.0/00.0.0.0/0

0.0.0.0/0

AS 1, AS 6 AS 1, AS 6

124

Controlling Route Advertisements Through Communities

• COMMUNITY attribute– First two bytes carry ASN and last two bytes carry community

values used for local policy routing.– 444: I2 routes; 445: Univ. X; 446: UUNET; 447: Co. X Research

Internet 2

(444)

Internet 2

(444)

Company XResearch

(447)

Company XResearch

(447)

UUNET(447)

UUNET(447)

Company Y

Company YCompany X

Corporate

Company XCorporate

Univ. X(445)

Univ. X(445)

GigaPOP

Univ. YUniv. Y444; 445; 446; 447444; 4

45

445

445; 447

445; 4

47

125

Enhancing BGP Scalability

• What is wrong with this picture?• The need for an iBGP mesh

creates many problems.– N-1 TCP connections at every

router– Every new router requires

configuration updates at all other routers.

– Every router maintains N-1 RIB_In and RIB_Out.

– Every change at one router needs to be processed by all other routers.

• Solutions– Break it up in smaller pieces

- Route Reflectors- Confederations

126

Route Reflector

• Simple solution, compatible with current BGP operation, and supports easy migration

– Some BGP speakers, Route Reflectors (RR), can redistribute to iBGP peers routes learned from other iBGP peers.

• Route Reflectors have two types of iBGP peers:– Client peers and non-client peers

- Non-client peers must be fully meshed but not client peers.– RR and its clients form a cluster identified by a CLUSTER_ID.

- Multiple RRs are allowed in a cluster (redundancy).• Two Attributes: ORIGINATOR_ID and CLUSTER_LIST

– RR sets ORIGINATOR_ID to be the ROUTER_ID of the router that originated the route.

- Routers ignore routes with ORIGINATOR_ID equal to their ROUTER-ID.

– RR prepends the local CLUSTER_ID to the CLUSTER_LIST when reflecting a route.

- Used to detect looping of routing information- Routes with local CLUSTER_ID in CLUSTER_LIST are ignored.

127

Route Reflector Operation

• When an RR receives a route from an iBGP peer:

– Selects the best path based on its path selection rule

– If the best path is from a non-client peer, reflect to all clients

– If the best path is from a client peer, reflect to all client and non-client peers

• Note that path selection need not be identical to that of a full iBGP mesh.

128

Confederations• Basic principle

– Break-up one big autonomous system into smaller internal autonomous systems

• But, this arrangement increases:– Complexity of routing policy based on AS_PATH information– External overhead when internal topology changes

• Autonomous system confederation– Collection of autonomous systems advertised as a single autonomous

system to BGP speakers outside of the confederation- Confederation is identified externally by a single autonomous system

confederation identifier- Each member of the Confederation is given a member autonomous

system number that is used only inside the confederation– Two additional AS_PATH type attributes:

- AS_CONFED_SEQUENCE: Ordered set of member autonomous system numbers that an UPDATE message has traversed inside the Confederation

- AS_CONFED_SET: Unordered set of member autonomous system numbers

129

Confederation Operation

• AS_PATH update rules:– Different handling of speakers

in AS inside and outside the Confederation

– Basically hide Confederation structure when advertising AS_PATH to the outside, and otherwise follow essentially the same update rules.

• Within a Confederation– NEXT_HOP, MED and

LOCAL_PREFERENCE can be advertised unchanged to neighboring AS members.

AS 1

AS 111

AS 112

AS 113

AS 114

130

From BGP to Packet Forwarding Decisions

• Recursive lookup at Router 1.1.1.1– BGP routing table identifies Router 1.1.5.1 as the

NEXT_HOP for route r.– IGP routing table identifies interface 10.2.1.1 on Router

1.1.2.1 as the next hop towards Router 1.1.5.1. Forwarding table entry for route r points to 10.2.1.1 on

router 1.1.2.1 as the next hop.

AS 1

AS 2

AS 3

Router 1.1.1.1

rr

Router 1.1.5.1

10.2.1.1

Router 1.1.2.1

Router 1.1.3.1

Router 1.1.4.1

iBGP

IGP

131

End-to-End ConnectivityGluing BGP and IGP Decisions Together

• Two cases1. All routers are BGP speakers (BGP mesh, common in ISPs).

2. Some internal routers do not speak BGP.

• Case 1: BGP mesh– Forwarding table can be constructed simply based on

recursive lookup.

- IGP provides connectivity between routers.

- BGP associates routes to routers.

• Case 2: Mix of BGP speakers and IGP-only routers– BGP speakers participate in IGP.– BGP speakers “export” routes into IGP.

- Example of OSPF ASBRs

132

From Routing Table to Forwarding Table

• OK, we got to Router 1.1.2.1. Where to next?– Case 1: BGP full or partial mesh

- Routers 1.1.2.1, 1.1.3.1, 1.1.4.1 also participate in iBGP.– Partial mesh means that only those routers on the path

between 1.1.1.1 and 1.1.5.1 need to participate in BGP.– Dangerous (why?) but not uncommon (why?)

- They all know that 1.1.5.1 is the desired exit point and can forward packets.

AS 1

AS 2

AS 3

Router 1.1.1.1

rr

Router 1.1.5.1

10.2.1.1

Router 1.1.2.1

Router 1.1.3.1

Router 1.1.4.1

133

From Routing Table to Forwarding Table

• OK, we got to Router 1.1.2.1. Where to next?– Case 2: BGP routes imported into IGP, e.g., OSPF

- Routers 1.1.1.1 and 1.1.5.1 are ASBRs.- Router 1.1.5.1. advertises a type 1/2 external route r.- Routers 1.1.2.1, 1.1.3.1 and 1.1.4.1 learn about r through

a type 5 External LSA advertised by 1.1.5.1.- Router 1.1.1.1 learns about r through both BGP and

OSPF (consistency, precedence?)

AS 1

AS 2

AS 3

Router 1.1.1.1

rr

Router 1.1.5.1

10.2.1.1

Router 1.1.2.1

Router 1.1.3.1

Router 1.1.4.1

T5: < r >

134

Forwarding Table Challenges

• With today’s CPU’s SPF (Phase 1) computations are not anymore the dominant challenge even in large networks

– Less than 50ms per run on 400 routers network

• Processing load of Phase 2 (route/stub updates) can be more significant for full Internet routing tables (stepping through all entries in routing table)

– But what are the odds that IS-IS or OSPF will carry a full Internet routing table?

• Which brings us to the true challenge(s)– Impact of dependencies across protocols, e.g., BGP and IS-IS– Volume of data to be pushed/modified

- Full Internet routing table >200MB and ~300k prefixes- Forwarding table size ~2MB

135

Impact of Protocol Dependencies

• BGP tells A that B and C can both reach the Internet

– IGP costs to B and C are the tie-breakers with d1<d2

– B is the selected exit point to reach the Internet through port #1 on A

A

B C

300k routes

300k routes

d1 d2

31

136





• Internal link failure affects path from A to B

– Exits through port #2 with IGP cost d’1<d2

• A needs to step through full BGP table to determine that IGP change did not affect BGP decision (d’1<d2)

• A needs to update all 300k entries in forwarding tables to point to new forwarding next hop for B now reachable over port #2

– A better option: Recursive lookup

A

B C

300k routes

300k routes

d’1 d2

3

2

1

137

Recursive Forwarding Structure

• A change in forwarding decision for a Next_Hop does not require modifications of individual prefix entries

– Only the Next_Hop forwarding information is updated– One vs 300,000 updates!

• Unfortunately, this wont help if Next_Hop itself changes– Still need to update up to 300,000 entries in that case

300k prefixes10’s of Next_Hops

138





• Internal link failure affects path from A to B

– We now have d’1>d2

• A needs to step through full BGP table to determine that IGP change affects the BGP decision (d’1>d2)

• A needs to update all 300k entries in forwarding tables to point to new BGP Next_Hop of C reachable over port #3

A

B C

300k routes

300k routes

d’1 d2

3

2

139

Dealing with Multiple Protocols

• Routers often learn from multiple protocols that use different/incompatible metrics

– Which one to prefer?

• Administrative distance specifies the degree of preference of a protocol

– Smaller is better

• Default administrative distance can be vendor specific, and changed…

Protocol Distance

Connected interface

0

Static route 1

EIGRP 5

eBGP 20

OSPF 110

IS-IS 115

RIP 120

EGP 140

iBGP 200

Unknown 255

140

Back to BGP: VPN Support

R21

R22

R24

R23

MY BLUE NETWORK

R11

R12

R14

R13

MY GREEN NETWORK

R11

R12

R14R13

R21

R22

R24

R23

CE PEP

MY BLUE/GREEN

VIRTUAL NETWORKS

141

VPN Definition and Scope

A set of “Sites” are attached to a common backbone network Subsets of this set form VPNs A common backbone delivers IP connectivity to sites belonging to

the same VPN Many possible VPN types

Intranet: All sites belong to the same enterprise Extranet: Sites belong to different enterprises

Sites can Belong to multiple VPNs

Intranet and several different extranets Span broad geographical areas

Routers within a site communicate directly (not through the common backbone network)

Policies determine which VPNs a site belongs to and what routes it learns and can use

Supporting all these requirements in a scalable and efficient manner is challenging BGP/MPLS defines mechanisms to effectively realize VPNs

142

BGP/MPLS VPNs

Two main components MPLS as the tunneling technology (implementing VPN

connectivity) Label stacking (two levels) for ease of scalable backbone

forwarding and easy VPN association– Outer label identifies the egress backbone router

connecting to customer site• Stripped upon reception at egress router

– Inner label points to the VPN Routing and Forwarding (VRF) table for the customer site at the egress router

BGP as the route distribution and installation mechanism (controlling connectivity)

Several extensions to allow transport and selective installation and use of VPN routes across provider routers

– Which route goes into which VRF?

143

VPN Terminology and Configurations

Three types of routers Provider (P) or backbone only routers Provider Edge (PE) routers interface

to customer sites Customer Edge (CE) routers attach

to Service Provider routers P and PE router form the Service

Provider network CE routers belong to customer

VPNs But do not peer directly with each

other (they peer with PE routers) Sample VPN Configurations

VPN1 and VPN2 intranets VPN3 extranet

VPN2 sites connect to servers at R11 and R12 through firewall at R13

R11 R12

R14R13

R21

R22

R24

R23

CE PEP

VPN1: R11, R12, and R13

VPN2: R21, R22, and R23

VPN3: R21, R22, and R23 connect to R11 and R12 through R13

144

VPN Forwarding Overview

PEs maintain multiple forwarding tables Default forwarding table VPN Routing and Forwarding tables (VRFs)

Each VRF contains a specific subset of VPN routes At ingress PE, each PE-CE connection is associated with a

VRF Incoming (from CE) packets are forwarded by looking up the

destination address in the corresponding (ingress) VRF Local (attached to same PE) packets are forwarded directly Remote (to other VPN site) packets are forwarded as MPLS packets

– VPN route label is assigned based on VRF content– Tunnel label is pushed on top of label stack to enable delivery

of packet to “next hop” (PE) across the backbone Backbone (P) routers forward packets based on outer label At egress PE tunnel label is removed and route label is

used to access appropriate VRF Forwarding may or may not require an additional VRF lookup

145

VPN Route Distribution

PEs learn VPN routes from attached CEs Can use static or routing protocol (RIP, OSPF, or BGP) Routes are installed in the associated VRF

PEs convert routes into VPN-IPv4 routes by pre-pending a Route Distinguisher (RD) to each of them Distinguishes between addresses from different VPNs

PEs redistribute VPN routes to other PEs using MP-BGP PEs use their own address as the “BGP next hop” PEs assign an MPLS label to each route

Multiple options for assigning labels to routes from the same VRF Export policies determine the set of Route Targets (RT - BGP

attribute similar to Community) associated with each route Import policies specify the RTs of routes eligible to be installed in a

given VRF

146

BGP Extensions in Support of VPNs

MP-BGP (RFC 4760) Multi-Protocol BGP (MP-BGP) as a generic extension to

BGP to support other protocols (than IPv4) including multiple address families

VPN routes are viewed as a separate address family Carrying (MPLS) labels in BGP updates (RFC 3107)

Where and how to associate one or more label with a prefix in a BGP update

Binds routes to tunnels BGP/MPLS VPNs (RFC 4364)

Use of BGP as a prefix distribution mechanism in support of multiple VPNs over a common MPLS network

Full specification of VPN support with MPLS and MP-BGP

147

Multiprotocol Extensions to BGP

BGP-4 specifications include only three pieces of information that are tied to IPv4 NEXT_HOP (IPv4 address) AGGREGATOR (IPv4 address) NLRI (IPv4 prefix)

Extending BGP to handle multiple protocols is achieved by introducting two new (optional, non-transitive – can be ignored) attributes Multiprotocol Reachable NLRI (MP_REACH_NLRI)

Carries set of reachable destinations together with NEXT_HOP Multiprotocol Unreachable NLRI (MP_UNREACH_NLRI)

Carries set of unreachable destinations Multiprotocol support is specified in capability advertisement

Capability code set to 1 Followed by list of supported address families (protocols)

148

MP_REACH_NLRI

Provides protocol specific reachability information Advertises a feasible route together with the (network layer)

address of the next hop router Encoding is as follows

Address Family Identifier (AFI) – 2 bytes

Subsequent Address Family Identifier (SAFI) – 1 byte

Length of Next Hop Network Address – 1 byte

Network Address of Next Hop – Variable

Reserved – 1 byte

Network Layer Reachability Information (NLRI) – Variable

149

Specifying IPv4-VPN Routes

AFI/SAFI field identifies the network layer protocol to which the NEXT_HOP address belongs, and specifies the NLRI semantic VPN-IPv4 address family

AFI=1 (IP) and SAFI=128 for labeled VPN-IPv4 addresses Address format -12-byte quantity

8-byte Route Distinguisher (RD) + 4-byte IPv4 address/ prefix Route Distinguisher (2-byte type field, 6-byte value)

Three defined types Type 0: 2-byte administrator subfield (AS number)

4-byte number field administered by AS owner Type 1: 4-byte administrator subfield (IP address)

4-byte number field administered by IP address owner

Type 2: 4-byte administrator subfield (4-byte AS number) 2-byte number field administered by AS owner

150

NLRI Encoding

NLRI is encoded as one or more triplets of the form <length;label(s);prefix> Length is 1-byte and gives number of bits for label(s)+prefix Label(s) encoded as 3 bytes with high-order 20 bits for label

and low-order bit as “bottom of stack” indicator Prefix is followed by don’t care bits to align on byte boundary

Consists of RD + IPv4 prefix Prefix length and start position “deduced” from length field and

number of labels– Keep stepping through labels until reaching bottom of

stack indicator in the label– Remainder is prefix + padding bits with length field

providing information on the number of padding bits

151

Populating VRFs

Routes installed in a given VRF come from Routes “received” from local CE routers, e.g., through eBGP

Corresponding VRF is determined from router interface Routes learned from remote PE routers over iBGP

New attribute (Route Target – RT) determines in which VRFs routes are installed based on local policies

Policies and Route Targets for VRF construction Similar approach as in standard BGP in using Community

attributes to implement policies RT defined as an Extended Community Attribute

For local (associated with CEs attached to PE) routes, export policies determine value(s) of RT’s

Local route is converted into a VPN-IPv4 route and added to the corresponding VRF with one or more RT attributes

Remote VPN-IPv4 routes received through BGP are installed in local VRF’s if one of their RT’s matches a local import policy

152

Routing Information Flow

Egress PE learns route associated with given CE Corresponding VRF is identified Route is converted to VPN-IPv4 route and RD value is assigned based on

VRF configuration, e.g., each VRF has its own RD RT attributes are assigned to route based on local export policies

Egress PE communicates VPN-IPv4 route to MP-BGP peer Sets NEXT_HOP to its own address encoded as VPN-IPv4 address with

RD value of 0 Assigns a label to route

One label per VRF, or per outgoing interface, or per route Note that PE can aggregate routes before distribution

– Label identifying aggregate route then calls for L3 lookup in VRF Ingress PE receives VPN-IPv4 route over MP-BGP session

Route is installed in VRFs based on matching RT values to import policies of each VRF

Note that two VPN-IPv4 routes with the same prefix but different RD values can both be installed in a given VRF

Note that unless it is a Route Reflector, a PE should discard all routes that have no RT attributes matching the import target of at least one VRF

Tunnel (MPLS or not) is identified for NEXT_HOP of route

153

Forwarding Information Flow

Packet arrives at ingress PE over interface associated with a given CE Corresponding VRF is identified based on incoming interface If a match is found for destination address, “next hop” is retrieved

If the next hop is on same PE, the packet is forwarded without pushing any new label onto the packet’s label stack (if any)

Note that if egress interface is associated with a different VRF, and the matching route is an aggregate, an additional lookup in the egress VRF may be required

If the next hop is a remote BGP next hop The packet is converted into an MPLS packet with the corresponding VPN

route label The next hop “tunnel” information (MPLS label) is retrieved and pushed on

top of the packet’s label stack The packet is forwarded to the tunnel’s next hop

At the next hop (egress PE) the packet treatment depends on the label The label can identify

An egress interface together with the corresponding link layer header A VRF in which to lookup the destination address

The packet is ultimately forwarded on egress interface

154

Route Distribution Through Reflectors

Use of Route Reflectors is again motivated by scalability

However, RRs need to maintain routing information for VPNs for which they have NO attachments In general, RRs accept ALL routes received from client PEs,

provided they carry RT attributes from a “given” set Set can be configured or learned Routes with RTs not in that set can be (inbound) filtered

Main difference with BGP is that RRs are not really applying a decision process to inbound routes and advertising to clients the output of their decision process

Closer to passive reflectors

155

Sample VPNs – Closed Mesh (1)

VPN1: 4 fully inter-connected sites Basic configuration at PEs

RD1 value identifies VPN1 RT value of T1 for all VRF1 export and

import policies VRF construction at PE1 (VRF1)

Learns route 10.1.0.0/16 from CE1, and installs in VRF1

Exports <RD1,10.1.0.0/16;T1,L1;PE1> to BGP (Next_Hop self (PE1) and label L1)

Advertises <RD1,10.1.0.0/16;T1;L1;PE1> to PE2, PE3 and PE4

Receives <RD1,10.0.0.0/16;T1;L0;PE4> from PE4 <RD1,10.3.0.0/16;T1;L3;PE3> from PE3 <RD1,10.2.0.0/16;T1;L2;PE2> from PE2

and installs them in VRF1

CE1

P

PE1

PE4PE3

PE2

10.3.0.0/16 10.0.0.0/16

10.0.2.0/1610.0.1.0/16

CE2

CE3CE4

156

Sample VPNs – Closed Mesh (2)

Forwarding of packet to 10.0.0.1 from PE1 Packet received from CE1

Lookup in VRF1 at PE1 10.0.0.0/16 as best route with

Next_Hop of PE4 Packet sent as MPLS packet with

label stack of <L(PE4),L0> Packet to PE4 delivered through

MPLS backbone based on label L(PE4)

PE4 pops label stack to expose L0 L0 identifies CE4 as packet

destination PE4 forwards packet to CE4 as

standard IP packet (removes L0)

CE1

P

PE1

PE4PE3

PE2

10.3.0.0/16 10.0.0.0/16

10.0.2.0/1610.0.1.0/16

CE2

CE3CE4

157

Sample VPNs – Hub and Spoke

VPN2: All connectivity through CE1 Basic configuration at PEs

RD1 value identifies VPN2 Two route targets are defined: TH (hub)

and TS (spoke) At the VRFs attached to the hub site

(PE1), TH is the Export target and TS the Import target

At the VRFs attached to the spoke sites (PE2, PE3, and PE4), TS is the Export target and TH the Import target

VRFs construction PEs associated with spoke sites

Receive routes from their CEs and export them to PE1 with target TS

Receive routes from PE1 with target TH and import them in the VRF of their CEs

PE1 Receive routes from spoke PEs with

target TS and installs them in CE1’s VRF Export routes (back)to spoke PEs with

target TH

CE1

P

PE1

PE4PE3

PE2

10.3.0.0/16 10.0.0.0/16

10.0.2.0/1610.0.1.0/16

CE2

CE3 CE4