IP Routing
2
A Router’s Job
• So I got a packet, where to next?– Which “next hop”?
- Out-going router interface to use when forwarding traffic to the destination. May also include the IP address of the next router (if any) in the path towards the destination.
83.125.35.131
0101001101……………….1101111100011010
?
??
Destination address
3
The Lookup Operation
• Two basic pieces1. Forwarding table
2. Longest prefix match rule • Forwarding table
– List of all the routes (destinations I know) and their associated next hops
• Longest prefix match– For any destination address
find the route that differs from the destination address at the furthest bit position- More specific is better!
RouteNext Hop
155.23.0.0/16 3
155.23.14.0/24 4
162.26.0.0/16 1
112.0.0.0/8 2
183.30.0.0/16 7
83.0.0.0/8 3
83.125.0.0/16 11
83.125.35.0/24 9
83.125.35.128/25 12
193.2.0.0/16 1
0.0.0.0/0 2
83.125.35.131
4
So Where Does Routing Fit in All Of This?
• Routing builds the forwarding table.– Manual entries (static routes) take you only so far.– You need a process that:
- Automates populating the routing table.
- Adapts to changes.
• Routing protocols are responsible for:– Distributing known routes and the latest routing changes to a
router’s neighbors.– Deciding the next hop(s) to be used for each route.
- What’s the best choice?
- Ensuring consistency between decisions at different routers (no loops!)
5
Key Problem
• How to make correct local decisions?– each router/switch must know something about global state
• Global state– inherently large– dynamic– hard to collect
• A routing protocol must intelligently acquire, summarize, and maintain relevant information
– How do I find out about other routers and links?– How do I use that information to generate routes?– How do I maintain routes in the presence of changes?
6
Design Direction
• Designing a single solution that spans the entire Internet is unlikely to work
– Computational cost, protocol and bandwidth overhead
• Heterogeneous environment– Domain sizes and requirements, routers capabilities– Different constraints on internal and external connectivity
Basic design approach– Hierarchy of domains (reflects address hierarchy)
- Ensures scalability– Independence of routing protocols in different domains
- Support for heterogeneity– Gateways between domain for end-to-end solution
7
AS 524AS 524
A Bird’s Eye View of the Internet
• Basic “two-level” hierarchy– Federation of inter-
connected islands – Autonomous Systems (AS)
- Each island has its own internal rules.
- Islands collaborate to offer end-to-end connectivity.
• Some islands are bigger and more powerful than others.
– Willingness to carry traffic for others
- Peering or transit agreements
AS 121AS 121
AS 1AS 1
AS 2AS 2
AS 3AS 3
AS 123AS 123
AS 3411
AS 3411
AS 321AS 321
AS 168AS 168
AS 376AS 376
AS 441AS 441
8
AS 524AS 524
Delivering Ubiquitous Connectivity
• Basic principles– No global knowledge– Hop-by-hop decisions
• Map analogy– Detailed map of your
neighborhood– Coarse knowledge of
how to exit your neighborhood to reach remote destinations
– On-going validation of map each time you reach an intersection
- It could change…
AS 121AS 121
AS 1AS 1
AS 2AS 2
AS 3AS 3
AS 123AS 123
AS 3411
AS 3411
AS 321AS 321
AS 168AS 168
AS 376AS 376
AS 441AS 441
9
AS 524AS 524
Routing Protocol Overview
• Routing protocols follow the two-level hierarchy of the Internet
– Interior Gateway Protocols (IGP) control routing within an AS/domain
– Exterior Gateway Protocols (EGP) control routing between AS’s
• Different goals and constraints for each family
of protocols– IGP: Ability to fine tune
internal operation and shielding from outside “noise”
– EGP: Scalability and ability to accommodate a broad range of administrative policies
AS 121AS 121
AS 1AS 1
AS 2AS 2
AS 3AS 3
AS 123AS 123
AS 3411
AS 3411
AS 321AS 321
AS 168AS 168
AS 376AS 376
AS 441AS 441
10
AS 524AS 524
Interior Gateway Protocols
• Routing protocols follow the two-level hierarchy of the Internet
– Interior Gateway Protocols (IGPs)
control routing within an AS/domain
– Exterior Gateway Protocol(s) (EGP) control routing between AS’s
• Different goals and constraints for each family of protocols
– IGP: Ability to fine tune internal operation and shielding from outside “noise”
– EGP: Scalability and ability accommodate a broad range of administrative policies
AS 121AS 121
AS 1
AS 2AS 2
AS 3AS 3
AS 123AS 123
AS 3411
AS 3411
AS 321AS 321
AS 168AS 168
AS 376AS 376
AS 441AS 441
11
Protocol Design Goals and Requirements
• Minimize routing table space– Faster look-up (although forwarding table can be more compact)– Less to exchange (lower processing and bandwidth overhead)– Lower storage cost
• Minimize number and frequency of control messages– Lower processing and bandwidth overhead– But need to take responsiveness to changes into account
• Robustness– Avoid black holes (unable to reach destination)– Prevent or recover from loops (inconsistent decisions among routers)– Limit instability (oscillation between possible routes)
• Optimize use of network resources and overall performance
– Best possible and/or most efficient path
12
General Design Choices
• Where are routes computed?– Centralized vs. distributed
- Centralized is simpler but may not scale and more prone to failure - Distributed requires collaboration between routers and has more
complex transient behavior
• How are routes computed?– Distributed computations vs. distribution of information from
which routes can be computed- Distance vector: routers exchange results of computations
– Routing table and costs
- Link state: routers exchange information on which computations can be performed independently– Topological and reachability information
• What criteria are used when computing routes?– What metrics to optimize (hop count, bandwidth, delay, etc.)?– Static vs. dynamic metrics?
13
IGPs Job Description
• Ensure consistent and efficient (optimized) forwarding for destinations within a routing domain.
• Enable selection of appropriate exit points for destinations outside the domain.
• Two major families of protocols– Distance vector, examples:
– Routing Information Protocol V2 (RIP2)– Enhanced Interior Gateway Router Protocol (EIGRP), Cisco
proprietary
- Distributed computation– Link state, examples:
– Open Shortest Path First (OSPF)– Intermediate System – Intermediate System (IS-IS)
- Distributed information
14
Link State (LS) Routing
• Common feature with DV protocols– Relies on communications with neighbors– Supports destination-based shortest path forwarding
• Everything else is different: Distributing information versus distributing computations
– Type of information exchanged between neighbors- Topology and link costs vs route costs- Use to build a common domain map in all routers
– Route computations- Independently performed in each router vs distributed
• Two major components of LS protocols– Topology dissemination and maintenance (flooding)– Route computation algorithm executed in each router
15
OSPF Overview• Two level hierarchy
– Backbone area (area 0) connects other areas (hub and spoke).– Costs are assigned to internal links and external routes for fine
tuning of traffic distribution.
• Link state operation– Routers broadcast inside their area knowledge of their local
neighborhood.- Routers glue local neighborhood pieces to create an area map.
– Information about other areas is summarized and broadcast into an area by Area Border Routers (ABRs).
– Autonomous System Boundary Routers (ASBRs) are responsible for injecting information on how to reach external destinations.
• Routing table construction– Routers rely on their domain map and summary information about
other areas and the outside world to compute consistent best paths to known destinations.
– Multiple paths can exist for any given route.
16
A Day in the Life of an OSPF Router
1. Establish adjacency to neighbor routers through HELLO protocol.
2. Build and maintain topology database.a. Synchronize databases with neighbors.
b. Advertise router’s local “neighborhood” to others.
c. Process and forward advertisements received from neighbors.
3. Compute routing table.a. Intra-area destinations (routers and transit networks and
then stub networks)
b. Inter-area destinations
c. External destinations
4. Begin forwarding packets.Cha
nge
notif
icat
ion
17
OSPF HELLO Protocol (1)
• HELLO protocol is a liveness protocol between adjacent routers.
– HELLO packets are periodically exchanged.
• HELLO protocol serves multiple purposes:– Dynamic discovery of neighbors – Advertising of router attributes– Identification of Designated Router (DR) and DR election
(more on this later)– Detection of link and router failures
- Frequency of advertising varies with network technology (from 10 sec to 30 sec).
18
A Day in the Life of an OSPF Router
1. Establish adjacency to neighbor routers through HELLO protocol.
2. Build and maintain topology database.a. Synchronize databases with neighbors.
b. Advertise router’s local “neighborhood” to others.
c. Process and forward advertisements received from neighbors.
3. Compute routing table.a. Intra-area destinations (routers and transit networks and
then stub networks)
b. Inter-area destinations
c. External destinations
4. Begin forwarding packets.
Cha
nge
notif
icat
ion
19
Database Synchronization (2a)
• Each router maintains a database that is used to store a “complete” map of the network (more on how this map/database is actually constructed later on).
• When routers boot, or the link between routers comes up, routers perform database synchronization.
– The goal is to quickly ensure a common view of the world.
- Synchronization process determines what each router knows and does not know and which router has the most recent information.
– Database entries are characterized by sequence number and age.
- Only newer or unknown entries are exchanged. Neighbor routers end up with a common domain
map.
20
Database Synchronization
Router 2
Router 1
Router 3
Router 4
Router 6
Router 5
Router 7
Router 81
2
3
• During partition, routers’ databases can become unsynchronized.
• After failure 1, network is partitioned.– Routers 1 to 4 don’t know about failure 2.– Routers 5 to 8 don’t know about failure 3.
• When routers 4 and 5 reconnect, database synchronization is required to ensure consistent views.
• Databases are described by database descriptor records exchanged between master and slave.
– Routers on each side of a newly restored link talk to each other to update databases (determine missing and out-of-date pieces).
21
Building the Topology Database (2b/c)
• OSPF router advertises itself and its view of its neighborhood with Link State Advertisements (LSAs).
– Puzzle piece from which the full map can be built- Router ID identifies the originator of each puzzle piece
– Several types of advertisements (in OSPF)- Multiple LSA types keep LSA size small (40 bytes on
average for OSPF).- Small granularity ensures updates of minimal size.- Different types of LSAs can have different scopes of
distribution.– Link local, area local, domain-wide
• Router forwards LSAs it originates or receives from its neighbors (flooding process).
• Sum of all received LSAs makes up a router’s topology database.
– Common network map shared by all routers in an area
22
LSA Distribution: Flooding
• New LSAs are forwarded on all eligible (type-dependent) links except the link on which they were received (if applicable).
– Dissemination of information is independent of routing.– Each LSA is transmitted at most twice on each link.
• New LSAs replace previous ones in local database.
• LSAs are transmitted reliably (acknowledged).
• Flooding rate bounded through minimum gap between LSAs
• Periodic (30 min.) flooding to refresh database (aging of LSAs)
– Why is periodic refreshing necessary if transmission is reliable?
All routers end-up with a “complete” domain map.
23
Flooding Example
1
1
1
1
1
24
Flooding Example
2
3
2
4
2
4
4
25
Flooding Example
2
3
2
4
2
5
5
26
LSA Sequence Numbers
• Use to identify most recent update.– Greater sequence number is newer.– Newer LSA replaces old one.
• Two problems to deal with:– Wrap around of sequence number counter
- Smaller sequence number is now newer.
– Choice of number at boot-up time - Need to be able to overwrite previous LSAs
27
OSPF Sequence Numbering and LSA Refreshes
• 32-bit sequence number (signed integer)– InitialSequenceNumber of -N+1– Increment by 1 (up to N-1) for each new update
• Wrap-around handled by premature aging of entry before flooding new value (when reaching N-1)
– Flood LSA with age of MaxAge (1 hour) to purge LSA– Send new LSA with sequence number of -N+1– Rare event (over 100,000 years with a 30 min. refresh period)
• Receiving a self-originated LSA with a larger sequence number than last transmitted LSA
– Caused by presence of residual LSA originated prior to last router restart
– Jump to one more than received number and flood again
28
Aging of LSAs
• LSA has an age field. – Incremented at each transmission and while stored– MaxAge determines time-to-live
- Gets reset at each refresh
• Two uses of MaxAge: Flooding of MaxAge LSA– Removal of timed-out or invalid LSAs – Removal of current entry before counter wrap-around
• Impact of MaxAge on boot-up problem– If router waits for MaxAge, old LSAs will be purged.– Selection of MaxAge is a difficult choice.
- Small MaxAge minimizes boot-up latency.- Small MaxAge imposes higher flooding overhead and may prevent
full LSA distribution in large networks.
• Initial database synchronization handles boot-up.
29
Securing LSA Databases
• Consistency of LSA databases is critical to avoid routing loops.
• Need to protect against multiple error scenarios:– Link errors
- LSA checksum and acknowledgment to ensure reliable transmissions
– Injection of spurious LSAs (errors or malice)
- Support for authentication capability in LSAs (OSPFv2)
30
What Are Link State Advertisements?
• Five basic types of LSAs:– Router LSA: A router’s neighborhood– Network LSA: Connectivity through a broadcast network– T3 Summary LSA: Reachability to a route in another area– T4 Summary LSA: Reachability to an ASBR in another area– T5 LSA: AS-external LSA to a route in another AS
• Several additional “special” LSA types:– T7 LSA: Support of “not-so-stubby area” (NSSA)– Opaque LSAs: Support extensibility of functionality:
- Type 9: Link local- Type 10: Area local- Type 11: Throughout whole AS
Intra
Inter
External
31
What Is a Router LSA?• Router LSA advertises to other routers in my “area” what my
neighborhood looks like– What kind of links, to which neighbors, at what cost?
- Point-to-point, point-to-multipoint, broadcast, etc.– What set of local destinations can I reach?
- Stub networks and transit networks– What kind of router am I?
- Internal, Area Border Router, AS Boundary Router
Stub
Stub
Stub
Stub TransitTransit
1
412
7
10
0
00
19.2.4.0/24
19.2.5.0/24
19.2.0.0/16
Designated Router
6
32
What Is a Network LSA?
• Network LSA identifies all the routers attached to the same broadcast network.
– Advertised by Designated Router (DR) that is chosen through an election process.
– Backup DR is also elected and takes over if the DR fails.– All routers exchange HELLO packets with only the DR and
the backup DR.
TransitTransit
19.2.0.0/16
Designated Router
Backup DR
33
Why A Network LSA?
• Multiple routers attached to broadcast LAN
– They can all reach each other.
• Direct advertising of complete router connectivity is expensive.
– Every router specifies connectivity to N-1 routers.
– ~N2 state overhead (bandwidth and storage)
• Network LSA provides more compact advertising
– Single copy of full router connectivity is sent by designated router.
– Backup DR minimizes sensitivity to DR stability.
A B C D
E F G
A
B C
D
E
F
G
A
B C
D
E
F
G
T
34
Building an Area Topology Map
• Putting the puzzle pieces together
R1Stub
Stub
Stub
Stub
R2
1
121
1019.2.4.0/24
19.2.5.0/24
1
TransitTransit19.2.0.0/16
1. Router LSA from R1 2 stub interfaces 1 transit interface 1 pt-to-pt link
35
Building an Area Topology Map
• Putting the puzzle pieces together
R4
R3
0
Designated Router
R1Stub
Stub
Stub
Stub
R21
121
1019.2.4.0/24
19.2.5.0/24
1
TransitTransit19.2.0.0/16
0
2. Network LSA from DR R3 R1, R3, R4: attached routers
36
Building an Area Topology Map
• Putting the puzzle pieces together
R4
R3
0
Designated Router
R1Stub
Stub
Stub
Stub
4
121
1019.2.4.0/24
19.2.5.0/24
1
TransitTransit19.2.0.0/16
0
R25
Stub
Stub
19.2.6.0/241
3. Router LSA from R2 2 pt-to-pt links 1 stub interface
37
Building an Area Topology Map
• Putting the puzzle pieces together
R3
0
Designated Router
R1Stub
Stub
Stub
Stub
14
121
1019.2.4.0/24
19.2.5.0/24
1
TransitTransit19.2.0.0/16
0
R25
Stub
Stub
19.2.6.0/24
1
4. Router LSA from R4 1 pt-to-pt link 1 transit interface 2 stub interfaces
• Note: Stub network 19.2.6.0/24 is dual-homed, but packets wont “transit” through it No HELLO protocol
packets sent by R2 and R4 on those interfaces (passive interface config.)
R46
2
Stub
Stub
19.2.7.0/24
1
6
38
Building an Area Topology Map
• Putting the puzzle pieces together
0
Designated Router
R1Stub
Stub
Stub
Stub
14
121
1019.2.4.0/24
19.2.5.0/24
1
TransitTransit19.2.0.0/16
0
R25
Stub
Stub
19.2.6.0/241
R46
62
Stub
Stub
19.2.7.0/24
1
R3
0
53
5. Router LSA from R3 1 transit interface 1 stub interface
• Core Network Graph
R1R2
R4
R3
T
1
46
5
6
00
012
5
39
Building an Area Topology Map
• Putting the puzzle pieces together
0
Designated Router
R1Stub
Stub
Stub
Stub
14
121
19.2.4.0/24
19.2.5.0/24
1
TransitTransit19.2.0.0/16
0
R25
Stub
Stub
19.2.6.0/241
R46
62
Stub
Stub
19.2.7.0/24
1
R3
0
53
6. Final Network Graph– Stub networks are
added
R1R2
R4
R3
T
1
46
5
6
00
012
5
1
1
1
1
2
3
40
A Day in the Life of an OSPF Router
1. Establish adjacency to neighbor routers through HELLO protocol.
2. Build and maintain topology database.a. Synchronize databases with neighbors.
b. Advertise router’s local “neighborhood” to others.
c. Process and forward advertisements received from neighbors.
3. Compute routing table.a. Intra-area destinations (routers and transit networks
and then stub networks)
b. Inter-area destinations
c. External destinations
4. Begin forwarding packets.
Cha
nge
notif
icat
ion
41
Computing the Intra-Area Routing Table (3a)
• Basic setting– Topology database has stabilized.– Router computes shortest paths from itself to all routers and transit
networks and then stub networks.
• Path computation based on Dijkstra algorithm– Maintain two sets of nodes
- Nodes to which the shortest path is known (set S)- Nodes to which candidate shortest paths are known (set C)- Initially only the origin router is in set S
– Iterate and at each iteration:- Consider all neighbors of last node X added to S and add them to C if
they are not already in it.- Update candidate paths of all neighbors of X if path through X is shorter
than their current path.- If C is empty, the algorithm terminates.- Otherwise add to S the node in C that is “closest” to X and iterate.
42
• What intra-area routing table does router R1 come-up with, and how does it do it?
Dijkstra Shortest Path Computation
2
R1
R2
R4
R3
T
1
46
5
6
0
0
012
5
1
1
1
1
3
43
Dijkstra’s Operation At R1
R1
R2
R4
R3
T
1
46
5
6
0
0
012
5
1
1
1
1
2
3
S = {(R1,0,R1)};
C = {(R2,1,R2); (T,12,T); (R3,,*); (R4,,*)}
S = {(R1,0,R1); (R2,1,R2)};
C = {(T,12,T); (R3,,*); (R4,6,R2)}
S = {(R1,0,R1); (R2,1,R2); (R4,6,R2)};
C = {(T,12,T,R2); (R3,,*)}
S = {(R1,0,R1); (R2,1,R2); (R4,6,R2);
(T,12,T,R2)};
C = {(R3,12,T,R2)}
S = {(R1,0,R1); (R2,1,R2); (R4,6,R2);
(T,12,T,R2); (R3,12,T,R2)};
C =
44
Adding Transit Networks First
• The order in which nodes are added to the labeled set can affect the number of paths discovered to some nodes. This is because once a node is added to the labeled set it is never revisited
– If E is added first to set of labeled nodes, the path A-C-E of cost 2 is not discovered
– If C is added first to set of labeled nodes, the path A-B-E-C of cost 2 is not discovered
• In OSPF transit network nodes always have outgoing costs of 0, and therefore must be added first to the set of labeled nodes
A
B
C
E
1
02
1
A
B
E
C
1
11
1
0
2
0
1
45
Adding Stub Networks
• For each stub network:– Identify all routers that advertise the stub network.– Retrieve the shortest path to those routers.– Add the cost of the shortest path to the router to the cost of the
stub network link advertised by each router in its Router LSA.– Pick the router(s) that yield the smallest cost.– Add the stub network to the routing table with the same next
hop(s) as the selected router(s).
• Four stub networks in previous example:– 19.2.4.0/24 and 19.2.5.0/24 are directly connected to router R1.– 19.2.6.0/24 is reachable from both R2 and R4, and R2 is the lower
cost option (total cost of 1+1=2 vs 1+5+2=8).– 19.2.7.0/24 is reachable from both R3 and R4, and R4 is the lower
cost option (total cost of 1+5+1=7 vs 12+3=15).
46
What Intra-Area Routing Table at R1?
Routes Next Hop(s)
19.2.4.0/24 19.2.4.1 (IP address of local interface at R1)
19.2.5.0/24 19.2.5.1 (IP address of local interface at R1)
19.2.6.0/24 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)
19.2.7.0/24 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)
19.2.0.0/16 19.2.1.1 (IP address of local interface at R1)
0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)
47
From One To Multiple Areas
• Why can’t we keep increasing the number of routers in an area?– Topology database size and flooding overhead increase.– Most importantly, route computation can become very onerous.
- Cost and frequency of Dijkstra increase.
• Basic solution is to partition a domain into multiple areas.– Two-level hierarchy:
- Backbone area as a hub to which other areas connect- Area Border Routers (ABRs) interconnect areas.
• Full topology is maintained only within an area.– Flooding of router and network LSAs is limited to within an area.– Dijkstra computation is limited to one area.
• Domain-wide shortest path computed using a DV-like approach– ABRs advertise their cost to remote destinations. – Shortest path is computed by concatenating costs to and from ABR.
48
What Are Link State Advertisements?
• Five basic types of LSAs:– Router LSA: A router’s neighborhood– Network LSA: Connectivity through a broadcast network– T3 Summary LSA: Reachability to a route in another area– T4 Summary LSA: Reachability to an ASBR in another area– T5 LSA: AS-external LSA to a route in another AS
• Several additional “special” LSA types:– T7 LSA: Support of “not-so-stubby area” (NSSA)– Opaque LSAs: Support extensibility of functionality:
- Type 9: Link local- Type 10: Area local- Type 11: Throughout whole AS
Intra
Inter
External
49
Router’s View of a Multi-Area Domain
0
Designated Router
R1
Stub
Stub
Stub
Stub
14
12119.2.4.0/24
19.2.5.0/24
1
TransitTransit19.2.0.0/16
0
R2
5
Stub
Stub
19.2.6.0/241
R46
6
2
Stub
Stub
19.2.7.0/24
14
R3
0
5
3
T3T3
19.1.6.0/24
T3T3
19.1.5.0/24
T3T3
19.1.2.0/23
T3T3
19.3.0.0/16
T3T3
19.1.0.0/16
17
21
26
19
ABR
50
Generating Summary LSAs
• Summary LSAs advertise cost to routes or routers (ASBRs) in other areas.
• Area Border Routers (ABRs) are responsible for generating summary LSAs.
– ABRs advertise to other routers the results of their own shortest path computations for remote (but within the same AS) destinations.
- Essentially a “distance vector” type of approach
12.3.4.0/24
12.3.4.0/24
12
T3: [
12.3
.4.0
/24;
12]
T3
: [1
2.3
.4.0
/24
; 1
2]
T3: [12.3.4.0/24;
12]
51
1. Establish adjacency to neighbor routers through HELLO protocol.
2. Build and maintain topology database.a. Synchronize databases with neighbors.
b. Advertise router’s local “neighborhood” to others.
c. Process and forward advertisements received from neighbors.
3. Compute routing table.a. Intra-area destinations (routers and transit networks and
then stub networks)
b. Inter-area destinations
c. External destinations
4. Begin forwarding packets.
A Day in the Life of an OSPF RouterC
hang
e no
tific
atio
n
52
Computing Inter-Area Paths (3b)
• Two cost components (similar to handling of stubs):1. Cost from ABR to remote destination (as advertised in
corresponding T3 summary LSA)
2. Cost from source to ABR in the local area
• Path selection– For all ABRs that advertise the target route (longest prefix
match), add the “cost to the ABR” to the “cost from that ABR to the remote destination.”
– Pick the ABR(s) with the smallest total cost as the target exit point(s) to reach remote destination.
– Set next hop(s) for remote destination to the next hop(s) of the shortest path(s) to the selected ABR(s).
53
Example of Inter-Area Path Computation
• Step 1– Router 14 advertises a T3summary with cost of 3 for r into area 0.– Router 13 advertises a T3summary with cost of 5 for r into area 0.
• Step 2– Router 4 advertises a T3summary with cost of 13 (10+3) for r in area 1.– Router 6 advertises a T3summary with cost of 9 (6+3) for r in area 1.
• Step 3– Router 1 identifies router 6 as the best exit point to reach r (4+9 < 4+13).– Router 1 identifies router 3 and router 5 as its next hops to reach r.
Rtr 2
Rtr 1
Rtr 3
Rtr 4
Rtr 10
Rtr 7
Rtr 9
Rtr 11
Rtr 6 Rtr 8 Rtr 12Rtr 5
Rtr 17
Rtr 14
Rtr 16
Rtr 18
Rtr 13 Rtr 19
Area 1 Area 0 Area 2
2 2
2
2
2
2
2
22
2
6
20
2
2
22
2
2 2
22 2
2
2
2
2 2
2
2
2 2
22
rr1
54
What Intra & Inter-Area Routing Table at R1?
0
Designated Router
R1
Stub
Stub
Stub
Stub
14
12119.2.4.0/24
19.2.5.0/24
1
TransitTransit19.2.0.0/16
0
R2
5
Stub
Stub
19.2.6.0/241
R46
6
2
Stub
Stub
19.2.7.0/24
14
R3
0
5
3
T3T3
19.1.6.0/24
T3T3
19.1.5.0/24
T3T3
19.1.2.0/23
T3T3
19.3.0.0/16
T3T3
19.1.0.0/16
17
21
26
19
ABR
55
Intra and Inter-Area Routing Table at R1
Routes Next Hop(s)
19.2.4.0/24 19.2.4.1 (IP address of local interface at R1)
19.2.5.0/24 19.2.5.1 (IP address of local interface at R1)
19.2.6.0/24 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)
19.2.7.0/24 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)
19.2.0.0/16 19.2.1.1 (IP address of local interface at R1)
19.1.5.0/24 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)
19.1.6.0/24 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)
19.1.2.0/23 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)
19.1.0.0/16 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)
19.3.0.0/16 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)
56
From Multiple Areas To The “World”
• How to reach destinations in other domains/AS?
• AS Boundary Routers (ASBR) are the gateways to the outside world.
– They “summarize” the world in pretty much the same way as ABRs summarize other areas.
– ASBRs advertise external routes (T5 LSA) that indicate their ability to reach remote destinations.
- T5 LSAs are flooded throughout the AS.– Reachability to ASBRs located in other areas are advertised by
ABRs through T4 LSA. (Why needed?)
• Cost of AS-external LSAs can be of two types:– Type 1 cost is compatible with the link costs within the AS.– Type 2 cost is incompatible with the link costs within the AS and
trumps any internal cost.
57
What Are Link State Advertisements?
• Five basic types of LSAs:– Router LSA: A router’s neighborhood– Network LSA: Connectivity through a broadcast network– T3 Summary LSA: Reachability to a route in another area– T4 Summary LSA: Reachability to an ASBR in another area– T5 LSA: AS-external LSA to a route in another AS
• Several additional “special” LSA types:– T7 LSA: Support of “not-so-stubby area” (NSSA)– Opaque LSAs: Support extensibility of functionality:
- Type 9: Link local
- Type 10: Area local
- Type 11: Throughout whole AS
Intra
Inter
External
58
T5
A Router’s View of The “World”
0
Designated Router
R1
Stub
Stub
Stub
Stub
14
12119.2.4.0/24
19.2.5.0/24
1
TransitTransit19.2.0.0/16
0
R2
5
Stub
Stub
19.2.6.0/241
R46
6
2
Stub
Stub
19.2.7.0/24
T4
R3
0
5
3
T3T3
T3T3
T3T3
T3T3
19.1.0.0/16
17
<2,20
>
ABR
19.3.0.0/16
19.1.2.0/23
19.1.5.0/24
T3T3
0.0.0.0/0
R5
14
17
21
26
19
19.1.6.0/24
ASBR
59
A Day in the Life of an OSPF Router
1. Establish adjacency to neighbor routers through HELLO protocol.
2. Build and maintain topology database.a. Synchronize databases with neighbors.
b. Advertise router’s local “neighborhood” to others.
c. Process and forward advertisements received from neighbors.
3. Compute routing table.a. Intra-area destinations (routers and transit networks and
then stub networks)
b. Inter-area destinations
c. External destinations
4. Begin forwarding packets.
Cha
nge
notif
icat
ion
60
Computing Paths to External Routes (3c)• Two cost components (again like inter-area & stubs):
1. Cost from ASBR to external route2. Cost from source to ASBR
• Path selection– Smallest type 2 cost wins independent of internal cost– If type 1 cost or equal type 2 cost
- Prefer non-backbone intra-area path to ASBR (or forwarding address) and use cost to break ties
- If no such path, use cost to select path– Cost computation
- If (equal) type 2 cost: Cost to ASBR- If type 1 cost: Cost to ASBR + Cost from ASBR- Cost to ASBR:
– Direct cost if within same area– Cost to ABR plus T4 summary cost advertised by ABR (cost from
ABR to ASBR in remote area)
61
Examples of External Path Computation
• Router 1 selects router 5 to reach external route r in spite of its higher cost (22 instead of 10 through router 10), as it is in area 1
• Router 8 selects router 17 to reach external route r’, as router 8 is in the backbone area and uses cost to identify the best path
Rtr 2
Rtr 1
Rtr 3
Rtr 4Rtr 10
Rtr 7
Rtr 9
Rtr 11
Rtr 6 Rtr 8 Rtr 12
Rtr 5
Rtr 17
Rtr 14
Rtr 16
Rtr 13 Rtr 19Area 1 Area 0 Area
2
2 2
2
2
2
2
2
22
2
6
2
2
2
22
2
2 2
22 2
2
2
2
2 2
2
2
2 2
22
r’r’
rr
r’r’rr
Rtr 18
<1,20
>
<1,2> <1,20
><1,
2>
<x,y>: <cost type,
cost> for external route
62
What Routing Table at Router R1?
0
Designated Router
R1
Stub
Stub
Stub
Stub
14
12119.2.4.0/24
19.2.5.0/24
1
TransitTransit19.2.0.0/16
0
R2
5
Stub
Stub
19.2.6.0/241
R46
6
2
Stub
Stub
19.2.7.0/24
T4
R3
0
5
3
T3T3
T3T3
T3T3
T3T3
T3T3
19.1.0.0/16
17
<2,20
>
ABR
19.3.0.0/16
19.1.2.0/23
19.1.5.0/24
0.0.0.0/0
R5
14
17
21
26
19
T5
19.1.6.0/24
ASBR
63
Routing Table at R1
Routes Next Hop(s)
19.2.4.0/24 19.2.4.1 (IP address of local interface at R1)
19.2.5.0/24 19.2.5.1 (IP address of local interface at R1)
19.2.6.0/24 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)
19.2.7.0/24 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)
19.2.0.0/16 19.2.1.1 (IP address of local interface at R1)
19.1.5.0/24 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)
19.1.6.0/24 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)
19.1.2.0/23 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)
19.1.0.0/16 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)
19.3.0.0/16 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)
0.0.0.0/0 0.0.0.3 (MIB index of pt-to-pt link between R1 and R2)
64
Area 0Area 0
Summarizing OSPF Operation
Area 1Area 1
Area 3Area 3
Area 2Area 2
Area 4Area 4
AS 123
r1
r1
r2
r2
r3
r3
Intra-area route
Inter-area routeExternal route
ABR
ASBR
65
A Day in the Life of an OSPF Router
1. Establish adjacency to neighbor routers through HELLO protocol.
2. Build and maintain topology database.a. Synchronize databases with neighbors.
b. Advertise router’s local “neighborhood” to others.
c. Process and forward advertisements received from neighbors.
3. Compute routing table.a. Intra-area destinations (routers and transit networks and
then stub networks)
b. Inter-area destinations
c. External destinations
4. Begin forwarding packets.Cha
nge
notif
icat
ion
66
Packet Forwarding (4)
• For each known route, paths (next hops) are installed in the final routing table as follows:
– Intra-area paths are preferred.– Inter-area paths are next.– Type 1 external paths follow.– Type 2 external paths are the least preferred.– Cost is used as a last resort to break ties.
• Forwarding table is constructed from the routing table.
• Upon receipt of a packet, a longest prefix match search is performed on the forwarding table to determine where to send the packet next.
67
In Case I Have Made It Look Too Simple!
• Understanding the flow of traffic in OSPF networks is not trivial.• Many “exceptions” to the basic rule of using shortest paths. Some
of them are clear, some are less obvious.– Intra-area paths have preference over inter-area paths.
- If I happen to hit a router with an intra-area route, it will divert my packet from the intended exit point.
– ABRs ignore each other’s advertisements within a given non-backbone area.
- Once an exit point is reached, there is no turning back.– Address summarization can hide the best path.
- Advertised cost for summary route is max cost across all individual routes (RFC 2328, Sec. 3.5).
– Choice of path to external route is not always cost driven.- Prefer path to local ASBR except if in backbone area, where
selection is based on cost (needed to avoid routing loops in some cases – See RFC 2178 and 2328 for details).
68
Example of Non-Shortest Path
• ABRs ignore T3 Summary LSAs in their own area (except backbone)
– Shortest path from Router 2 to Router 8 has cost 6. (Router 6 is the intended exit point)
– Actual path from Router 2 to Router 8 has cost 8. (Router 4 diverts packets over higher cost intra-area path to Router 8)
Rtr 2
Rtr 1
Rtr 3
Rtr 4
Rtr 10
Rtr 7
Rtr 9
Rtr 11
Rtr 6 Rtr 8 Rtr 12Rtr 5
Rtr 17
Rtr 14
Rtr 16
Rtr 18
Rtr 13 Rtr 19
Area 1 Area 0 Area 2
2 2
2
2
2
2
2
22
2
6
20
2
2
22
2
2 2
22 2
2
2
2
2 2
2
2
2 2
22
69
Another Example of Non-Shortest Path
• Both Router 5 and Router 8 advertise reachability to external route E with type 2 cost of 10.
– Router 5 is selected by Router 2 in spite of the higher (internal) cost to reach it (22 vs 10), because of preference for intra-area paths to ASBRs.
Rtr 2
Rtr 1
Rtr 3
Rtr 4
Rtr 10
Rtr 7
Rtr 9
Rtr 11
Rtr 6 Rtr 8 Rtr 12Rtr 5
Rtr 17
Rtr 14
Rtr 16
Rtr 18
Rtr 13 Rtr 19
Area 1 Area 0 Area 2
2 2
20
20
20
2
2
22
2
8
4
6
2
22
2
2 2
22 2
2
2
2
2 2
2
2
2 2
22
70
A Tricky Corner CaseRouter 2
Router 1
Router 3
Router 4
Router 10
Router 7
Router 9
Router 8
Router 6Router 5
Router 11
Router 14
Router 15
Router 12
Router 13
Router 16Area
1Area 0
Area 2
2 2
2
2
2
2
2
22
16
2
2
2 2
2
20
2
2
2 2
2
2
2 2
22
• What is (are) the path(s) between Router 2 and Router 16
– Hint: Router 10 has interfaces in Areas 0, 1 and 2
• Remember basic rule of distance vector protocol– Only advertise a route you are using
71
A Tricky Corner CaseRouter 2
Router 1
Router 3
Router 4
Router 10
Router 7
Router 9
Router 8
Router 6Router 5
Router 11
Router 14
Router 15
Router 12
Router 13
Router 16Area
1Area 0
Area 2
2 2
2
2
2
2
2
22
16
2
2
2 2
2
20
2
2
2 2
2
2
2 2
22
• Two exit points from Area 1– Router 10 advertises a cost of 24 to Router 16, even though there is a
better/cheaper path through Router 8– Router 7 advertises a cost of 8 to Router 16
• Minimum cost through either Router 10 or Router 7 is 28• BUT there are two paths of total cost 46!
72
A Few Other Extensions That Can Further Complicate Things
• Virtual links and transit areas– Useful to enhance backbone connectivity, but rarely used
and add complexity to understanding packet forwarding decisions
• Stub areas– No flooding of AS-external LSAs (T5)– Default T3-summary route flooded in the area
• Not-so-stubby areas (NSSA)– I don’t want to receive T5’s but want to originate T5’s– Use new LSA type (T7) for flooding of AS-external LSAs
within the NSSA, plus “translation” of T7 to T5 at ABRs and a few other tweaks
73
Virtual Links &Transit Areas
• Virtual links are meant to facilitate backbone connectivity– Connecting remote areas– Increasing backbone robustness to link failures
• The two end points of a virtual link are Area Border Routers
– A virtual link can be configured between two ABRs that have an interface to a common non-backbone area
– A virtual link is treated as an unnumbered pt-to-pt link– The cost of a virtual link is the cost of the path connecting the two
ABRs that have been configured as ends of the virtual link- Note that this implies that a virtual link only comes up after the
shortest path computation has completed• An area that has one or more active virtual links is called
a transit area (stub areas cannot be transit areas)– It can be used to carry transit traffic between other areas– It can make understanding how packets are “actually” forwarded
pretty complicated...
74
Two Examples of The Use of Virtual Links
Area 3Area 3
Area 0Area 0
Area 1
Area 1
Area 2Area 2
2
2
1
52
112
3
7
6
12
5
4
3
5
3
22
21
1
3
33
75
Another Virtual Link Example
• Or why virtual links can make understanding packet forwarding more complicated
– When computing the shortest path to a destination, ABRs attached to a transit area are allowed to consider summary LSAs advertised in the transit area by other ABRs
Area 2Area 2Area 0Area 0Area 1Area 1
1
2
10
2
12
3
76
Another Link State Protocol: IS-IS
• IS-IS = Intermediate System to Intermediate System– Intermediate System = Router– End System = Host
• Just like OSPF– A link-state protocol that relies on flooding
- Each node advertises its (local) view of the world in a Link State Packet (LSP)– Entries are regularly refreshed
- Nodes maintain a topology database on which they run Dijkstra to compute shortest paths– Database entries are aged out if not refreshed
– Hello protocol for liveness and neighbor discovery– Broadcast network represented as “pseudo-node” with an elected
Designated IS (DIS) responsible for impersonating it– A two-level hierarchy for improved scalability
77
Just like OSPF – Maybe Not• IS-IS runs directly on the link layer, i.e., below and
independent of IP– No exposure to IP “insecurity”, but no benefit from IP functionality,
e.g., fragmentation (common MTU across the entire network)• IS-IS relies on different area and level definitions
– Area = level 1 domain based on local sharing of area address– Routers belong to one area (area boundary through links not
routers, i.e., no ABR but “connected” routers)– Level 2 backbone defines logical connectivity based on link types
and not as a separate area (OSPF area 0)• IS-IS has a different LSP structure (just one per router)
– LSP originator identified by System ID of router (IS) - Not an IP address…
– Extensive use of Type/Length/Value (TLV) format to encode everything a router knows in one long list
- Affects update process: One small change triggers update for everything
- Affects extensibility: New information easily encoded in new TLV without requiring new packet type
78
More on IS-IS Differences – (1)
• Areas and levels (level 1 and level 2)– Routers form adjacency over link if and only if
- They both agree it is a level 2 link- They both agree it is a level 1 link AND they share at least one
area address– Area IDs are link local, which facilitates merging and splitting areas
• Level 2 and level 1 structure– Links can be level 1, level 2 or BOTH level 1 and level 2– Level 2 backbone is mostly a connectivity concept
- L1/L2 links are somewhat similar to OSPF virtual links– Connectivity between level 1 areas is provided through “attached”
router- Router with connectivity to L2 backbone as indicated in L1 LSP
(Attached bit)- Default behavior routes inter-area packets to closest attached
router (route leaking extensions)
79
Merging, Splitting, Renumbering Areas
• Merging area 7 and area 8– Reconfigure link A-B to be L1/L2
link with, say, area 7 as area ID common to A & B
– L1 flooding domain now extends to routers A, B, C and D
– Add area 7 as area ID for link B-D on both routers B and D
– Reconfigure link A-B to be L1 only
– Remove area 8 ID from routers B and D for link B-D
• Similar (symmetric) processes can be followed for area splitting or renumbering
A
CD
B
Area 7
Area 8
L2
A
C
D
B
Area 8
L1/L2
Area 7 Area 7
A
C
D
B
Area 7
L1
Area 7 Area 7
80
Level 2 Backbone
• L2 Backbone forms logical connected topology based on L2 and L2/L1 links
– Can facilitate use of high-speed backbone links for intra-area traffic (similar to OSPF virtual links but link rather than area specific)
• Inter-area traffic is sent to closest attached L2 router
– Closest exit point to backbone- No need to deliver traffic to C1
– Scalable, but can result in sub-optimal routing
• Route-leaking extensions– Similar to OSPF T3 summary routes
approach– Use of Up/Down bit to ensure that
routes are not propagated back out
A1
A2
Area 7
L2L1/L2
Area 9
Area 10
L1
L1 L1/L2
C2
C1
C3C4
L1L1
B2 DB1
B3B4
L2L2
L1
L1/L2
81
The Penalty of Choosing The Closest Exit Point
5 4
14
2
2
1
4
4
4
19.1.1.0/24
19.1.1.0/24
25
11
10
82
More on IS-IS Differences – (2)
• HELLO protocol– Local hold timer for each link (carried in Hello messages)– Used for MTU check through padding of Hello messages– Three-way handshake to test for bidirectional connectivity
- Hello message list addresses of IS from which Hello have been heard
• Broadcast network– DIS election is preemptive (IS with highest priority wins)– LAN represented by pseudonode LSP from DIS
- Identified by non-zero pseudonode ID– No backup DIS (DIS can reduce its Hello hold timer – JUNOS)
• Database synchronization– On LANs DIS sends Complete Sequence Numbers PDUs (CSNPs)
every 10s (unreliable transmissions – implicit ACKs)– On P2P links initial CSNP sent only when adjacency comes up
- JUNOS implements periodic resending of CSNP on P2P links– Other routers request specific LSPs using Partial Sequence Numbers
PDUs (PSNPs) or reflood missing/old LSPs- ACKs (PSNP or CSNP) needed for received LSPs on P2P links
83
More on IS-IS Differences – (3)
• Fragmentation– Remember that IP fragmentation is not available– IS-IS does application level fragmentation (assumes minimum
MTU size of 1492 bytes – verified using Hello padding)– CSNP fragmentation
- Based on using Start-LSP-ID and End-LSP-ID fields to indicate beginning and end of synchronization
– LSP fragmentation
- Based on fragment ID byte (up to 256 fragments)– Extension based on assigning additional IDs to IS
- Fragments zero is mandatory for others to be considered
- Fragments are atomic (arrive independently) Need to be careful in packaging information in fragments to avoid churning in the presence of changes
84
LSP Fragmentation
HeaderAdj #1
Adj #2
Adj #20
Adj #21
Adj #22
Fragment 00 Fragment 01
HeaderAdj #1
Adj #20
Adj #21
Adj #22
Fragment 00 Fragment 01
HeaderAdj #1
Adj #20
Adj #21
Adj #22
Fragment 00 Fragment 01
Adjacency goes away in fragment 00
Repackaging of fragments requires re-advertising both fragments 00 and 01
Preserving fragment structure ensures that only fragment 00 is re-advertised
85
More on IS-IS Differences – (4)
• LSP structure– Link State Packet as a container– Container content based on packing independent entities (TLVs)
that each provide different type of information
• Benefits– Protocol machinery associated with the handling of the containers
is independent of its content (reusable)– Carrying new information in container only requires definition of
new TLVs– LSPs can still be parsed even if they contain unknown TLVs (just
skip the # bits specified in the Length field)
• Drawbacks– Container imposes rather coarse information granularity
- Whole container is resent in the presence of changes- Need to exercise caution when information is spread over
multiple fragments– Purge mechanism is required
86
Purging Old Fragments
• Adjacencies #21 & #22 go away– New fragment 00 is issued– No more need for fragment 01
• How does router know fragment 01 is gone?
– Will persist for a period of “lifetime”
• Router can issue “purge” LSP– Contains only the header with
zeroed out lifetime and checksum
HeaderAdj #1
Adj #20
Adj #21
Adj #22
Fragment 00 Fragment 01
HeaderAdj #1
Adj #20
New Fragment 00
87
Main IS-IS TLVs – (1)
TLV TLV # Where Used?
Area Address 1 Hello, LSP
IS Neighbors 6 Hello (LAN)
Padding 8 Hello
Authentication 10 Hello, LSP, CSNP, PSNP
Checksum 12 Hello, CSNP, PSNP
Protocols Supported 129 Hello, LSP
IP Interface Address 132 Hello, LSP
Dynamic Host Name 137 LSP
Multi-Topology Supported 229 Hello, LSP
88
Main IS-IS TLVs – (2)
TLV TLV # Where Used?
IS Reachability 2 LSP
Extended IS Reachability 22 LSP
Multi-Topology IS Reachability 222 LSP
IP Internal Reachability 128 LSP
IP External Reachability 130 LSP
Extended IP Reachability 135 LSP
Multi-Topology IP Reachability 235 LSP
Multi-Topology IPv6 Reachability 237 LSP
89
IS Reachability TLVs
• IS Reachability (TLV #2) – Hardly used anymore– Fixed length– Identifies connectivity to neighbors (independent of IP)– Multiple possible metrics (default, delay, expense, error), but
very limited range (only 6 bits, i.e., from 0 to 63)
• Extended IS Reachability (TLV #22)– Motivated by need for bigger metrics and support (room) for
extensions (sub-TLVs)- 24-bit metric field typically encodes value in reference to
Reference Bandwidth– Reference bandwidth of 1 Terabits/sec, so that a 1Gbps
link has a value of 1000 (smallest granularity is 64kbps)
- Variable length TLV– Ability to include a variety of sub-TLVs, e.g., for Traffic
Engineering extensions and MPLS
90
Some IS-IS Sub-TLVs – (1)
Sub-TLV Sub-TLV #
Maximum Link Bandwidth 9
Reservable Link Bandwidth 10
Unreserved Bandwidth 11
Traffic Engineering Metric 18
Link Protection Type 20
91
IP Reachability TLVs
• IP Internal/External Reachability (TLV #128/130) – Hardly used anymore
– Identifies IP routes directly connected to router (internal) or learned from other protocol (external)
– Same metrics and metrics limitations as IS Reachability TLV
• Extended IP Reachability (TLV #135)– Specifies IP routes reachable by router
- 6-bit for prefix length: from 0 to 32 (33 values)
- Subsumes both TLV #128 and #130– 32-bit metric field (compatibility with other protocols)
92
Multi-Topology TLVs
• Enables specification of multiple logical topologies each with a different routing
– Per topology metrics and SPF computations, e.g., VoIP traffic routed differently from data traffic
• TLV #229 identifies topologies a router supports – IPv 4 Unicast (#0)– In-Band Management (#1)– IPv6 Unicast (#2)– Multicast (#3)
• Multi-topology support validated during Hello exchanges
• Multi-topology IPv4/6 Reachability TLVs “duplicate” Extended IP Reachability and IPv6 Reachability TLVs
93
More on IS-IS Differences – (5)
• SPF computations– Explicitly structured as a two-phase process
• Phase 1– Compute shortest paths from IS to IS (on a graph with
routers/IS and “networks” as vertices and links/adjacencies as edges
- One SPF per topology– Independent of any underlying “reachability” information
• Phase 2– Add shortest paths to all “leaves” – reachability information
• Phase 1 can be reused across protocols and is only triggered in the present of connectivity changes
• Phase 2 is reachability specific
94
AS 524AS 524
Routing Protocol Overview
• Routing protocols follow the two-level hierarchy of the Internet
– Interior Gateway Protocols (IGP) control routing within an AS/domain
– Exterior Gateway Protocol(s) (EGP) control routing between AS’s
• Different goals and constraints for each family of protocols
– IGP: Ability to fine tune internal operation and shielding from outside “noise”
– EGP: Scalability and ability to
AS 121AS 121
AS 1AS 1
AS 2AS 2
AS 3AS 3
AS 123AS 123
AS 3411
AS 3411
AS 321AS 321
AS 168AS 168
AS 376AS 376
AS 441AS 441
95
Growing Up From One to Many Domains
• Goal– Enable connectivity between domains (Internet-wide)
• Requirements– Operational flexibility and scalability, and
scalability, and scalability,… - Autonomous systems are typically operated by different
administrative entities- Cooperation but no “trust” between domains
• Border Gateway Protocol (BGP4) is the dominant (only!) External Gateway Protocol (EGP)
96
BGP Routing Table Growth
From http://bgp.potaroo.net
Telstra’s table (AS 1221)
97
Some Basic Remarks Before Jumping Into BGP
• A link state type of approach would simply not work– Requires building and maintaining a map of the entire Internet in
every router...– The need for consistent information and decisions cannot be
satisfied as the network size grows
- Things are always changing somewhere in the Internet
• Distance vector protocols are the only realistic option– Better scalability by limiting the level of topology information that
each router maintains– Preserve ability to use different route selection criteria as each
router
- No need for consistent metrics
- Seamless support for policies– Control of what routing information is sent to whom
98
Border Gateway Protocol• DV protocol for inter-domain routing
– Supports arbitrary topology (but no overlapping domains)– Governs exchange of information between internal and external
border routers (BGP peers)- Internal peers: within the same domain
- External peers: in two adjacent domains
- Each domain is characterized by a unique autonomous system number
• Major BGP characteristics– Selection of “best” path (avoid stupid choices and support strong
administrative control)- Multiple path attributes
– Loop avoidance (path vectors)– Scalability through route aggregation
• BGP as a protocol is relatively simple (86 pages for the latest draft vs 244 for RFC 2328), but its configuration can be complex and errors can have far-reaching implications
99
BGP Operation Overview
Three major phases:– Neighbor acquisition and reachability, exchange of routing
information, and path selection (steady state)
1. Neighbor acquisition and reachability– Initiated through OPEN message and maintained by
KEEPALIVE messages– Neighbor declared unreachable if no KEEPALIVE received
within Holding Time
2. Routing information exchanged through UPDATE messages– Incremental updates to advertise & withdraw routes
- Requires reliable transmission (uses TCP - port 179)
3. Path selection uses the information received in UPDATE messages to select the best path for a route and construct the routing table
100
The BGP State Machine
Connect Active
OpenSent
OpenConfirm
Established
Idle
Connect
OpenSent
OpenConfirm
Established
Idle
“Normal” Sequence
101
BGP Information Flow and Sources
• Different peering sessions with internal (same AS) and external (different AS) neighbors
– External BGP neighbors communicate via eBGP– Internal BGP neighbors communicate via iBGP
- All BGP peers in an AS are typically connected in a full mesh (more on this later)
iBGP
iBGP
iBGP
iBGP
iBGP iBG
P
eBGP
eBGP eBGP
eBGP
eBGP
AS 1
AS 2
AS 3
Rtr A1
Rtr B1
Rtr A2
Rtr B2
Rtr A3
Rtr B3
Rtr D2
Rtr C2
102
BGP Processing StepsPhase 1
Determines degree of preference
Phase 3
Determine which routes to advertise based on policies
Phase 2
Select best routes to install in LocRIB
RIB_InRtr A2
RIB_InRtr B2
RIB_InRtr C2
RIB_InRtr A3
RIB_InRtr B3
Local RIB
RIB_OutRtr A2
RIB_OutRtr B2
RIB_ OutRtr C2
RIB_ OutRtr A3
RIB_ OutRtr B3
EBGP EBGP
IBGPIBGP
Router D2
103
BGP UPDATE Message
• UPDATE message is the basic unit of route advertisement
– Can contain multiple routes that are being withdrawn
– Path Attributes describe a number of key properties of the advertised route that are used to select the best path
– NLRI is a list of IP address prefixes associated with a given BGP route (common set of Path Attributes)
Unfeasible Route Length (2 bytes)
Withdrawn Routes (variable)
Total Path Attribute Length (2 bytes)
Path Attributes (variable)
Network Layer Reachability Information (NLRI) (variable)
104
Path AttributesGeneral Characteristics
• Several categories of attributes– Optional or well-known, mandatory or discretionary, transitive or not,
partial or not
• Well-known attributes must be recognized by all BGP implementations
– Mandatory well-known attributes must be included in every UPDATE message, while discretionary well-known attributes may or may not be sent based on the content of the message
– Well-known attributes MUST be passed along (after updating) to other BGP peers
• Optional attributes need not be recognized by all BGP implementations
– Unrecognized transitive attributes SHOULD be passed to other BGP peers with the partial bit set
– Unrecognized non-transitive attributes are ignored
105
Path Attributes (1)• AS_PATH
– Well-known, mandatory– Sequence of path segments of type AS_SET (1) or AS_SEQUENCE (2)
- AS_SET: Unordered list of autonomous systems traversed by the route- AS_SEQUENCE: Ordered list of autonomous systems traversed by the
route– Updated by “pre-pending” own AS number when advertising to a BGP
speaker in another AS Loop prevention
• NEXT_HOP– Well-known, mandatory– IP address of border router to be used as next hop towards destination
identified by the NLRI field– Typically chosen to ensure that the “shortest” path is taken
• ORIGIN– Well-known, mandatory– Characterizes where the path first originated
- IGP: 0; EGP: 1; Other: 2– Should not be changed by other BGP speakers
106
Path Attributes (2)• LOCAL_PREF
– Well-known, discretionary– Advertisement to other BGP speakers in the same AS (iBGP) of the
degree of preference of a route by the advertising router (higher value is preferred)
• MULTI_EXIT_DISC (MED)– Optional, non-transitive– Used to give some preference to different exit/entry points in a
neighboring AS (lower value is preferred)
• COMMUNITY– Optional, transitive, used to simplify routing policies
- Common property used to determine which routes to accept, prefer, and pass to BGP neighbors
– Some well-known communities:- NO_EXPORT: do not advertise outside of the AS (or
confederation)- NO_EXPORT_SUBCONFED: do not advertise to external peers
(including peers in other autonomous systems within a confederation)
- NO_ADVERTISE: not advertised to any BGP peer
107
Path Attributes (3)
• AGGREGATOR– Optional, transitive– Contains IP address and AS number of the BGP speaker that
formed the aggregate route
• ATOMIC_AGGREGATE– Well-known, discretionary (should be propagated)– Informs other BGP speakers that the advertiser aggregated
several routes and may have removed some autonomous system numbers from the AS_SET (loop free property must be maintained, though)
- As a result, actual path may differ from AS_PATH- Basically used to signal possible loss of information
– NLRI field must not be modified by adding a more specific prefix, i.e., route must not be de-aggregated (loop prevention)
108
Path Attributes (4)
• ORIGINATOR_ID– Optional, non-transitive– Used by Route Reflectors (more on this later)– Identifies the local router (within the local AS) that originally
advertised the route
• CLUSTER_LIST– Optional, non-transitive– Used by Route Reflectors to detect looping of routing
information in an AS because of misconfiguration
- Each Route Reflector prepends its CLUSTER_ID to the CLUSTER_LIST
- Route Reflectors ignore advertisement that carry their CLUSTER_ID in the CLUSTER LIST
109
BGP Decision Process
• Three phase process– Phase 1: Calculates a “degree of preference” for each route in a
given RIB_In (locks the associated RIB_In)- If route is learned from local peer, the LOCAL_PREF attribute
is usually taken as the degree of preference.- If route is learned from an external peer, the degree of
preference is computed based on local policy.– The resulting value is used as LOCAL_PREF in any iBGP
re-advertisement.– Phase 2: Selects the “best” route out of all those available for
distinct destinations (locks all RIB_In)- Excludes routes with unresolvable NEXT_HOP or a loop in the
AS_PATH attribute- Best routes are installed in the Local RIB.
– Phase 3: Decides, based on policies, which routes in Local RIB to advertise to which peer (blocks execution of Phase 2).
- Route aggregation can be performed at this stage.
110
BGP Tie Breaking Rules
• BGP selects a SINGLE route.– Remove all routes that don’t have the smallest number of AS
numbers in AS_PATH (each AS_SET counts only as one!)– Remove all routes that don’t have the lowest ORIGIN value– Among routes learned from the same neighboring AS, remove
routes with less desirable (higher) MED values.– If at least one route was learned through eBGP, remove all
routes learned through iBGP.– Remove all routes with a non-minimum IGP cost to NEXT_HOP.– Remove all routes that were not advertised by the BGP speaker
with the lowest BGP identifier.– Prefer the route received from the lowest peer address.
111
Using LOCAL_PREF to Pick an Exit Point
• Choosing between a primary and a backup provider– Used to influence internal decisions
AS 2AS 2
AS 3AS 3
AS 1AS 1 AS 10AS 10 AS 11AS 11
Primary
Backup
LOCAL_PREF=20
LOCAL_PREF=100
112
AS_PATH Padding to Discourage the Use of Certain Links - (A Hack!)
• Used externally to influence choice of inbound links– Choosing between a primary and a backup link– Tuning inbound traffic for load-balancing purposes
• Can be over-ridden by local decisions (LOCAL_PREF)
AS 10AS 10 AS 1AS 1
1.3.0.0/161.3.0.0/16
1.2.0.0/16; <AS1>
1.3.0.0/16; <AS1,AS1>1.2.0.0/161.2.0.0/16
1.2.0.0/16; <AS1,AS1>
1.3.0.0/16; <AS1>
113
Another Way to Influence Entry Points
• MED allows crude selection ability– Avoid low speed internal links
• But not always taken into account
AS 1AS 1
AS 111AS 111
19.2.1.0/2419.2.1.0/2419.2.2.0/2419.2.2.0/24
AS 55AS 55
Low speed RF link
19.2.1.0/24; MED 5
19.2.2.0/24, MED 100
19.2.1.0/24; MED 100
19.2.2.0/24, MED 5
114
Ignoring MED Values
• Hot potato routing– Basic rule between ISPs– “I wont carry your bits for you…”
MCIMCI
AT&TAT&TAT&T
Customer
AT&T Customer
MCI Customer
MCI Customer
115
iBGP
iBGP
iBGP
iBGP
iBGP iBG
P
eBGP
eBGP eBGP
eBGP
eBGP
AS 1
AS 2
AS 3
Rtr A1
Rtr B1
Rtr A2
Rtr B2
Rtr A3
Rtr B3
Rtr D2
Rtr C2
rr
r’r’
Propagating Path Attributes (1)• Let us follow UPDATEs for routes r and r’ located in AS 1.• Router A1 originates updates for routes r and r’ and advertises them over its eBGP
session to Router A2.– ORIGIN is set to 0 as routes r and r’ were learned through IGP.– AS_PATH type set to AS_SEQUENCE and initialized with AS 1.– Router A1 sets NEXT_HOP to be the IP address of its interface on the link to
Router A2.– MED values of 0 and 50 for routes r and r’, respectively, as Router A1 is the
desired entry point for r but not r’ (Router B1 will use MED values of 50 and 0 when advertising routes r and r’ to Router B2).
116
Propagating Path Attributes (2)
iBGP
iBGP
iBGP
iBGP
iBGP iBG
P
eBGP
eBGP eBGP
eBGP
eBGP
AS 1
AS 2
AS 3
Rtr A1
Rtr B1
Rtr A2
Rtr B2
Rtr A3
Rtr B3
Rtr D2
Rtr C2
rr
r’r’
• Router A2 processes the updates it received from Router A1 for routes r and r’ and decides to advertise them over its iBGP sessions to Routers B2, C2 and D2.
– ORIGIN is kept unchanged.– AS_PATH is propagated unchanged.– Router A2 has been configured with NEXT_HOP self, so it sets NEXT_HOP
to be its own IP address.– MED values are propagated unchanged.– Router A2 sets LOCAL_PREF for r and r’ to 50 and 20, respectively (Router
B2 advertises both as 50 – more on this later).
117
Propagating Path Attributes (3)
iBGP
iBGP
iBGP
iBGP
iBGP iBG
P
eBGP
eBGP eBGP
eBGP
eBGP
AS 1
AS 2
AS 3
Rtr A1
Rtr B1
Rtr A2
Rtr B2
Rtr A3
Rtr B3
Rtr D2
Rtr C2
rr
r’r’
• Router D2 processes updates received from Routers A2 and B2 for routes r and r’ and advertises a single UPDATE for aggregate route r* over its eBGP sessions to Router A3.
– ORIGIN is kept unchanged. – Router D2 generates new AS_PATH attributes for r and r’ by pre-pending
AS2 to the AS_PATH (value is now <AS2,AS1>) and because both AS_PATH attributes are identical, the AS_PATH of r* is set to the same value and type.
– Router D2 adds an AGGREGATOR attribute <AS 2;own IP address> but no ATOMIC_AGGREGATE attribute as there was no information loss
– Router D2 sets NEXT_HOP to its own IP address.
118
Decision Process Example (1)
iBGP
iBGP
iBGP
iBGP
iBGP iBG
P
eBGP
eBGP eBGP
eBGP
eBGP
AS 1
AS 2
AS 3
Rtr A1
Rtr B1
Rtr A2
Rtr B2
Rtr A3
Rtr B3
Rtr D2
Rtr C2
rr
r’r’
• In AS 1 both routes r and r’: are learned from IGP• In AS 2 routers hear about r and r’ from Router A2 and Router B2,
and both routes have the same AS_PATH count and ORIGIN value.– For routes r and r’, Router A1 advertises MED values of 0 and 50, and
Router B1 advertises MED values of 50 and 0.– If LOCAL_PREF values are equal, Routers C2 and D2 in AS 2 rely on MED
values and pick Router A2 as the NEXT_HOP for r and Router B2 as the NEXT_HOP for r’ (Routers A2 and B2 pick Routers A1 and B1, respectively)
• In AS 3, Router A3 will pick Router D2 (eBGP from Router D2 vs iBGP from Router B3); Router B3 will pick Router C2 (smaller BGP ID); other BGP speakers pick Routers A3 or B3 based on IGP cost.
119
Decision Process Example (1’)
iBGP
iBGP
iBGP
iBGP
iBGP iBG
P
eBGP
eBGP eBGP
eBGP
eBGP
AS 1
AS 2
AS 3
Rtr A1
Rtr B1
Rtr A2
Rtr B2
Rtr A3
Rtr B3
Rtr D2
Rtr C2
rr
r’r’
• In AS 2 routers hear about r and r’ from Router A1 and Router B1, and both routes have the same AS_PATH count and ORIGIN value, but different MED values.
– For routes r and r’, Router A1 advertises MED values of 0 and 50, and Router B1 advertises MED values of 50 and 0.
• For routes r and r’, Router A2 advertises LOCAL_PREF values of 50 and 20, while Router B2 advertises 50 for both
– Router C2 and D2 pick Router B2 for r’, and select either Router A2 or Router B2 for r based on their IGP cost (MED is ignored)
120
Another Aggregation Example• Routes r and r’ are aggregated into route r* by Router R when
advertised into AS 8– AS_PATH attribute type changed to AS_SET– Unordered list of ASes <AS 1;AS 2;AS 3;AS 4;AS 5;AS 6;AS 7>– May omit some AS numbers if there is no risk of loop, e.g., advertise
AS_SET <AS 1; AS 2; AS 3; AS 7>- ATOMIC_AGGREGATE attribute is added- Actual path need not follow AS_PATH
AS 1
AS 2 AS 3
AS 7
AS 4AS 5 AS 6
AS 8rr
r’r’
Router R
r*r*
121
De-Aggregation and Loops
AS 5AS 5
AS 2AS 2
AS 1AS 1 AS 4AS 4
r’/24r’/24
r/16r/16
AS 6AS 6
Route r’ < Route r
r’; <AS1,AS2,AS3,AS4>
r; < AS 5>
r’; <AS1,AS2,AS3,AS4>
r; <AS5,AS6>
r’; <AS5,AS6>
AS 3AS 3Routing Loop for packets destined for route r’
Illegal de-aggregation
122
Policies – One Example
• Transit (customer) vs. non-transit (peer) agreements between providers (routing domains)
– In a transit agreement, I will accept traffic from you that is intended for any destination.
– In a non-transit agreement, I will only accept traffic from you that is destined to my customers.
• Associated routing policies– I advertise to you all routes I can reach and for which I am
willing to carry your traffic.– I only advertise to you routes to my own customers.
123
Controlling Route Advertisements Through Policies
AS 1AS 1
AS 2AS 2AS 3AS 3
AS 6AS 6
AS 7AS 7
AS 5AS 5
AS 4AS 4
0.0.0.0/0
0.0.0.0/00.0.0.0/0
0.0.0.0/0
AS 1, AS 6 AS 1, AS 6
124
Controlling Route Advertisements Through Communities
• COMMUNITY attribute– First two bytes carry ASN and last two bytes carry community
values used for local policy routing.– 444: I2 routes; 445: Univ. X; 446: UUNET; 447: Co. X Research
Internet 2
(444)
Internet 2
(444)
Company XResearch
(447)
Company XResearch
(447)
UUNET(447)
UUNET(447)
Company Y
Company YCompany X
Corporate
Company XCorporate
Univ. X(445)
Univ. X(445)
GigaPOP
Univ. YUniv. Y444; 445; 446; 447444; 4
45
445
445; 447
445; 4
47
125
Enhancing BGP Scalability
• What is wrong with this picture?• The need for an iBGP mesh
creates many problems.– N-1 TCP connections at every
router– Every new router requires
configuration updates at all other routers.
– Every router maintains N-1 RIB_In and RIB_Out.
– Every change at one router needs to be processed by all other routers.
• Solutions– Break it up in smaller pieces
- Route Reflectors- Confederations
126
Route Reflector
• Simple solution, compatible with current BGP operation, and supports easy migration
– Some BGP speakers, Route Reflectors (RR), can redistribute to iBGP peers routes learned from other iBGP peers.
• Route Reflectors have two types of iBGP peers:– Client peers and non-client peers
- Non-client peers must be fully meshed but not client peers.– RR and its clients form a cluster identified by a CLUSTER_ID.
- Multiple RRs are allowed in a cluster (redundancy).• Two Attributes: ORIGINATOR_ID and CLUSTER_LIST
– RR sets ORIGINATOR_ID to be the ROUTER_ID of the router that originated the route.
- Routers ignore routes with ORIGINATOR_ID equal to their ROUTER-ID.
– RR prepends the local CLUSTER_ID to the CLUSTER_LIST when reflecting a route.
- Used to detect looping of routing information- Routes with local CLUSTER_ID in CLUSTER_LIST are ignored.
127
Route Reflector Operation
• When an RR receives a route from an iBGP peer:
– Selects the best path based on its path selection rule
– If the best path is from a non-client peer, reflect to all clients
– If the best path is from a client peer, reflect to all client and non-client peers
• Note that path selection need not be identical to that of a full iBGP mesh.
128
Confederations• Basic principle
– Break-up one big autonomous system into smaller internal autonomous systems
• But, this arrangement increases:– Complexity of routing policy based on AS_PATH information– External overhead when internal topology changes
• Autonomous system confederation– Collection of autonomous systems advertised as a single autonomous
system to BGP speakers outside of the confederation- Confederation is identified externally by a single autonomous system
confederation identifier- Each member of the Confederation is given a member autonomous
system number that is used only inside the confederation– Two additional AS_PATH type attributes:
- AS_CONFED_SEQUENCE: Ordered set of member autonomous system numbers that an UPDATE message has traversed inside the Confederation
- AS_CONFED_SET: Unordered set of member autonomous system numbers
129
Confederation Operation
• AS_PATH update rules:– Different handling of speakers
in AS inside and outside the Confederation
– Basically hide Confederation structure when advertising AS_PATH to the outside, and otherwise follow essentially the same update rules.
• Within a Confederation– NEXT_HOP, MED and
LOCAL_PREFERENCE can be advertised unchanged to neighboring AS members.
AS 1
AS 111
AS 112
AS 113
AS 114
130
From BGP to Packet Forwarding Decisions
• Recursive lookup at Router 1.1.1.1– BGP routing table identifies Router 1.1.5.1 as the
NEXT_HOP for route r.– IGP routing table identifies interface 10.2.1.1 on Router
1.1.2.1 as the next hop towards Router 1.1.5.1. Forwarding table entry for route r points to 10.2.1.1 on
router 1.1.2.1 as the next hop.
AS 1
AS 2
AS 3
Router 1.1.1.1
rr
Router 1.1.5.1
10.2.1.1
Router 1.1.2.1
Router 1.1.3.1
Router 1.1.4.1
iBGP
IGP
131
End-to-End ConnectivityGluing BGP and IGP Decisions Together
• Two cases1. All routers are BGP speakers (BGP mesh, common in ISPs).
2. Some internal routers do not speak BGP.
• Case 1: BGP mesh– Forwarding table can be constructed simply based on
recursive lookup.
- IGP provides connectivity between routers.
- BGP associates routes to routers.
• Case 2: Mix of BGP speakers and IGP-only routers– BGP speakers participate in IGP.– BGP speakers “export” routes into IGP.
- Example of OSPF ASBRs
132
From Routing Table to Forwarding Table
• OK, we got to Router 1.1.2.1. Where to next?– Case 1: BGP full or partial mesh
- Routers 1.1.2.1, 1.1.3.1, 1.1.4.1 also participate in iBGP.– Partial mesh means that only those routers on the path
between 1.1.1.1 and 1.1.5.1 need to participate in BGP.– Dangerous (why?) but not uncommon (why?)
- They all know that 1.1.5.1 is the desired exit point and can forward packets.
AS 1
AS 2
AS 3
Router 1.1.1.1
rr
Router 1.1.5.1
10.2.1.1
Router 1.1.2.1
Router 1.1.3.1
Router 1.1.4.1
133
From Routing Table to Forwarding Table
• OK, we got to Router 1.1.2.1. Where to next?– Case 2: BGP routes imported into IGP, e.g., OSPF
- Routers 1.1.1.1 and 1.1.5.1 are ASBRs.- Router 1.1.5.1. advertises a type 1/2 external route r.- Routers 1.1.2.1, 1.1.3.1 and 1.1.4.1 learn about r through
a type 5 External LSA advertised by 1.1.5.1.- Router 1.1.1.1 learns about r through both BGP and
OSPF (consistency, precedence?)
AS 1
AS 2
AS 3
Router 1.1.1.1
rr
Router 1.1.5.1
10.2.1.1
Router 1.1.2.1
Router 1.1.3.1
Router 1.1.4.1
T5: < r >
134
Forwarding Table Challenges
• With today’s CPU’s SPF (Phase 1) computations are not anymore the dominant challenge even in large networks
– Less than 50ms per run on 400 routers network
• Processing load of Phase 2 (route/stub updates) can be more significant for full Internet routing tables (stepping through all entries in routing table)
– But what are the odds that IS-IS or OSPF will carry a full Internet routing table?
• Which brings us to the true challenge(s)– Impact of dependencies across protocols, e.g., BGP and IS-IS– Volume of data to be pushed/modified
- Full Internet routing table >200MB and ~300k prefixes- Forwarding table size ~2MB
135
Impact of Protocol Dependencies
• BGP tells A that B and C can both reach the Internet
– IGP costs to B and C are the tie-breakers with d1<d2
– B is the selected exit point to reach the Internet through port #1 on A
A
B C
300k routes
300k routes
d1 d2
31
136
Impact of Protocol Dependencies
• BGP tells A that B and C can both reach the Internet
– IGP costs to B and C are the tie-breakers with d1<d2
– B is the selected exit point to reach the Internet through port #1 on A
• Internal link failure affects path from A to B
– Exits through port #2 with IGP cost d’1<d2
• A needs to step through full BGP table to determine that IGP change did not affect BGP decision (d’1<d2)
• A needs to update all 300k entries in forwarding tables to point to new forwarding next hop for B now reachable over port #2
– A better option: Recursive lookup
A
B C
300k routes
300k routes
d’1 d2
3
2
1
137
Recursive Forwarding Structure
• A change in forwarding decision for a Next_Hop does not require modifications of individual prefix entries
– Only the Next_Hop forwarding information is updated– One vs 300,000 updates!
• Unfortunately, this wont help if Next_Hop itself changes– Still need to update up to 300,000 entries in that case
300k prefixes10’s of Next_Hops
138
Impact of Protocol Dependencies
• BGP tells A that B and C can both reach the Internet
– IGP costs to B and C are the tie-breakers with d1<d2
– B is the selected exit point to reach the Internet through port #1 on A
• Internal link failure affects path from A to B
– We now have d’1>d2
• A needs to step through full BGP table to determine that IGP change affects the BGP decision (d’1>d2)
• A needs to update all 300k entries in forwarding tables to point to new BGP Next_Hop of C reachable over port #3
A
B C
300k routes
300k routes
d’1 d2
3
2
139
Dealing with Multiple Protocols
• Routers often learn from multiple protocols that use different/incompatible metrics
– Which one to prefer?
• Administrative distance specifies the degree of preference of a protocol
– Smaller is better
• Default administrative distance can be vendor specific, and changed…
Protocol Distance
Connected interface
0
Static route 1
EIGRP 5
eBGP 20
OSPF 110
IS-IS 115
RIP 120
EGP 140
iBGP 200
Unknown 255
140
Back to BGP: VPN Support
R21
R22
R24
R23
MY BLUE NETWORK
R11
R12
R14
R13
MY GREEN NETWORK
R11
R12
R14R13
R21
R22
R24
R23
CE PEP
MY BLUE/GREEN
VIRTUAL NETWORKS
141
VPN Definition and Scope
A set of “Sites” are attached to a common backbone network Subsets of this set form VPNs A common backbone delivers IP connectivity to sites belonging to
the same VPN Many possible VPN types
Intranet: All sites belong to the same enterprise Extranet: Sites belong to different enterprises
Sites can Belong to multiple VPNs
Intranet and several different extranets Span broad geographical areas
Routers within a site communicate directly (not through the common backbone network)
Policies determine which VPNs a site belongs to and what routes it learns and can use
Supporting all these requirements in a scalable and efficient manner is challenging BGP/MPLS defines mechanisms to effectively realize VPNs
142
BGP/MPLS VPNs
Two main components MPLS as the tunneling technology (implementing VPN
connectivity) Label stacking (two levels) for ease of scalable backbone
forwarding and easy VPN association– Outer label identifies the egress backbone router
connecting to customer site• Stripped upon reception at egress router
– Inner label points to the VPN Routing and Forwarding (VRF) table for the customer site at the egress router
BGP as the route distribution and installation mechanism (controlling connectivity)
Several extensions to allow transport and selective installation and use of VPN routes across provider routers
– Which route goes into which VRF?
143
VPN Terminology and Configurations
Three types of routers Provider (P) or backbone only routers Provider Edge (PE) routers interface
to customer sites Customer Edge (CE) routers attach
to Service Provider routers P and PE router form the Service
Provider network CE routers belong to customer
VPNs But do not peer directly with each
other (they peer with PE routers) Sample VPN Configurations
VPN1 and VPN2 intranets VPN3 extranet
VPN2 sites connect to servers at R11 and R12 through firewall at R13
R11 R12
R14R13
R21
R22
R24
R23
CE PEP
VPN1: R11, R12, and R13
VPN2: R21, R22, and R23
VPN3: R21, R22, and R23 connect to R11 and R12 through R13
144
VPN Forwarding Overview
PEs maintain multiple forwarding tables Default forwarding table VPN Routing and Forwarding tables (VRFs)
Each VRF contains a specific subset of VPN routes At ingress PE, each PE-CE connection is associated with a
VRF Incoming (from CE) packets are forwarded by looking up the
destination address in the corresponding (ingress) VRF Local (attached to same PE) packets are forwarded directly Remote (to other VPN site) packets are forwarded as MPLS packets
– VPN route label is assigned based on VRF content– Tunnel label is pushed on top of label stack to enable delivery
of packet to “next hop” (PE) across the backbone Backbone (P) routers forward packets based on outer label At egress PE tunnel label is removed and route label is
used to access appropriate VRF Forwarding may or may not require an additional VRF lookup
145
VPN Route Distribution
PEs learn VPN routes from attached CEs Can use static or routing protocol (RIP, OSPF, or BGP) Routes are installed in the associated VRF
PEs convert routes into VPN-IPv4 routes by pre-pending a Route Distinguisher (RD) to each of them Distinguishes between addresses from different VPNs
PEs redistribute VPN routes to other PEs using MP-BGP PEs use their own address as the “BGP next hop” PEs assign an MPLS label to each route
Multiple options for assigning labels to routes from the same VRF Export policies determine the set of Route Targets (RT - BGP
attribute similar to Community) associated with each route Import policies specify the RTs of routes eligible to be installed in a
given VRF
146
BGP Extensions in Support of VPNs
MP-BGP (RFC 4760) Multi-Protocol BGP (MP-BGP) as a generic extension to
BGP to support other protocols (than IPv4) including multiple address families
VPN routes are viewed as a separate address family Carrying (MPLS) labels in BGP updates (RFC 3107)
Where and how to associate one or more label with a prefix in a BGP update
Binds routes to tunnels BGP/MPLS VPNs (RFC 4364)
Use of BGP as a prefix distribution mechanism in support of multiple VPNs over a common MPLS network
Full specification of VPN support with MPLS and MP-BGP
147
Multiprotocol Extensions to BGP
BGP-4 specifications include only three pieces of information that are tied to IPv4 NEXT_HOP (IPv4 address) AGGREGATOR (IPv4 address) NLRI (IPv4 prefix)
Extending BGP to handle multiple protocols is achieved by introducting two new (optional, non-transitive – can be ignored) attributes Multiprotocol Reachable NLRI (MP_REACH_NLRI)
Carries set of reachable destinations together with NEXT_HOP Multiprotocol Unreachable NLRI (MP_UNREACH_NLRI)
Carries set of unreachable destinations Multiprotocol support is specified in capability advertisement
Capability code set to 1 Followed by list of supported address families (protocols)
148
MP_REACH_NLRI
Provides protocol specific reachability information Advertises a feasible route together with the (network layer)
address of the next hop router Encoding is as follows
Address Family Identifier (AFI) – 2 bytes
Subsequent Address Family Identifier (SAFI) – 1 byte
Length of Next Hop Network Address – 1 byte
Network Address of Next Hop – Variable
Reserved – 1 byte
Network Layer Reachability Information (NLRI) – Variable
149
Specifying IPv4-VPN Routes
AFI/SAFI field identifies the network layer protocol to which the NEXT_HOP address belongs, and specifies the NLRI semantic VPN-IPv4 address family
AFI=1 (IP) and SAFI=128 for labeled VPN-IPv4 addresses Address format -12-byte quantity
8-byte Route Distinguisher (RD) + 4-byte IPv4 address/ prefix Route Distinguisher (2-byte type field, 6-byte value)
Three defined types Type 0: 2-byte administrator subfield (AS number)
4-byte number field administered by AS owner Type 1: 4-byte administrator subfield (IP address)
4-byte number field administered by IP address owner
Type 2: 4-byte administrator subfield (4-byte AS number) 2-byte number field administered by AS owner
150
NLRI Encoding
NLRI is encoded as one or more triplets of the form <length;label(s);prefix> Length is 1-byte and gives number of bits for label(s)+prefix Label(s) encoded as 3 bytes with high-order 20 bits for label
and low-order bit as “bottom of stack” indicator Prefix is followed by don’t care bits to align on byte boundary
Consists of RD + IPv4 prefix Prefix length and start position “deduced” from length field and
number of labels– Keep stepping through labels until reaching bottom of
stack indicator in the label– Remainder is prefix + padding bits with length field
providing information on the number of padding bits
151
Populating VRFs
Routes installed in a given VRF come from Routes “received” from local CE routers, e.g., through eBGP
Corresponding VRF is determined from router interface Routes learned from remote PE routers over iBGP
New attribute (Route Target – RT) determines in which VRFs routes are installed based on local policies
Policies and Route Targets for VRF construction Similar approach as in standard BGP in using Community
attributes to implement policies RT defined as an Extended Community Attribute
For local (associated with CEs attached to PE) routes, export policies determine value(s) of RT’s
Local route is converted into a VPN-IPv4 route and added to the corresponding VRF with one or more RT attributes
Remote VPN-IPv4 routes received through BGP are installed in local VRF’s if one of their RT’s matches a local import policy
152
Routing Information Flow
Egress PE learns route associated with given CE Corresponding VRF is identified Route is converted to VPN-IPv4 route and RD value is assigned based on
VRF configuration, e.g., each VRF has its own RD RT attributes are assigned to route based on local export policies
Egress PE communicates VPN-IPv4 route to MP-BGP peer Sets NEXT_HOP to its own address encoded as VPN-IPv4 address with
RD value of 0 Assigns a label to route
One label per VRF, or per outgoing interface, or per route Note that PE can aggregate routes before distribution
– Label identifying aggregate route then calls for L3 lookup in VRF Ingress PE receives VPN-IPv4 route over MP-BGP session
Route is installed in VRFs based on matching RT values to import policies of each VRF
Note that two VPN-IPv4 routes with the same prefix but different RD values can both be installed in a given VRF
Note that unless it is a Route Reflector, a PE should discard all routes that have no RT attributes matching the import target of at least one VRF
Tunnel (MPLS or not) is identified for NEXT_HOP of route
153
Forwarding Information Flow
Packet arrives at ingress PE over interface associated with a given CE Corresponding VRF is identified based on incoming interface If a match is found for destination address, “next hop” is retrieved
If the next hop is on same PE, the packet is forwarded without pushing any new label onto the packet’s label stack (if any)
Note that if egress interface is associated with a different VRF, and the matching route is an aggregate, an additional lookup in the egress VRF may be required
If the next hop is a remote BGP next hop The packet is converted into an MPLS packet with the corresponding VPN
route label The next hop “tunnel” information (MPLS label) is retrieved and pushed on
top of the packet’s label stack The packet is forwarded to the tunnel’s next hop
At the next hop (egress PE) the packet treatment depends on the label The label can identify
An egress interface together with the corresponding link layer header A VRF in which to lookup the destination address
The packet is ultimately forwarded on egress interface
154
Route Distribution Through Reflectors
Use of Route Reflectors is again motivated by scalability
However, RRs need to maintain routing information for VPNs for which they have NO attachments In general, RRs accept ALL routes received from client PEs,
provided they carry RT attributes from a “given” set Set can be configured or learned Routes with RTs not in that set can be (inbound) filtered
Main difference with BGP is that RRs are not really applying a decision process to inbound routes and advertising to clients the output of their decision process
Closer to passive reflectors
155
Sample VPNs – Closed Mesh (1)
VPN1: 4 fully inter-connected sites Basic configuration at PEs
RD1 value identifies VPN1 RT value of T1 for all VRF1 export and
import policies VRF construction at PE1 (VRF1)
Learns route 10.1.0.0/16 from CE1, and installs in VRF1
Exports <RD1,10.1.0.0/16;T1,L1;PE1> to BGP (Next_Hop self (PE1) and label L1)
Advertises <RD1,10.1.0.0/16;T1;L1;PE1> to PE2, PE3 and PE4
Receives <RD1,10.0.0.0/16;T1;L0;PE4> from PE4 <RD1,10.3.0.0/16;T1;L3;PE3> from PE3 <RD1,10.2.0.0/16;T1;L2;PE2> from PE2
and installs them in VRF1
CE1
P
PE1
PE4PE3
PE2
10.3.0.0/16 10.0.0.0/16
10.0.2.0/1610.0.1.0/16
CE2
CE3CE4
156
Sample VPNs – Closed Mesh (2)
Forwarding of packet to 10.0.0.1 from PE1 Packet received from CE1
Lookup in VRF1 at PE1 10.0.0.0/16 as best route with
Next_Hop of PE4 Packet sent as MPLS packet with
label stack of <L(PE4),L0> Packet to PE4 delivered through
MPLS backbone based on label L(PE4)
PE4 pops label stack to expose L0 L0 identifies CE4 as packet
destination PE4 forwards packet to CE4 as
standard IP packet (removes L0)
CE1
P
PE1
PE4PE3
PE2
10.3.0.0/16 10.0.0.0/16
10.0.2.0/1610.0.1.0/16
CE2
CE3CE4
157
Sample VPNs – Hub and Spoke
VPN2: All connectivity through CE1 Basic configuration at PEs
RD1 value identifies VPN2 Two route targets are defined: TH (hub)
and TS (spoke) At the VRFs attached to the hub site
(PE1), TH is the Export target and TS the Import target
At the VRFs attached to the spoke sites (PE2, PE3, and PE4), TS is the Export target and TH the Import target
VRFs construction PEs associated with spoke sites
Receive routes from their CEs and export them to PE1 with target TS
Receive routes from PE1 with target TH and import them in the VRF of their CEs
PE1 Receive routes from spoke PEs with
target TS and installs them in CE1’s VRF Export routes (back)to spoke PEs with
target TH
CE1
P
PE1
PE4PE3
PE2
10.3.0.0/16 10.0.0.0/16
10.0.2.0/1610.0.1.0/16
CE2
CE3 CE4