34
Inter-domain Routing Don Fussell CS 395T Measuring Internet Performance

Inter-domain Routing Don Fussell CS 395T Measuring Internet Performance

Embed Size (px)

Citation preview

Inter-domain Routing

Don Fussell

CS 395T

Measuring Internet Performance

Internet Routing

• Two-level architecture, two protocol classes– IGP: Internal Gateway Protocol

• Within an organization’s network• Optimized protocol• Intra-domain routing protocol

– EGP: External Gateway Protocol• Between organizations’ networks• Policy routing• Inter-domain routing protocol

Internal Gateway Protocol

• Runs within an Autonomous System (AS)• An AS is a collection of routers (not a collection

of IP addresses or prefixes)• Can provide optimal paths between nodes

(according to some cost metric)• Examples

– RIP (Routing Information Protocol– OSPF (Open Shortest Path First)– IS-IS (Intermediate System to Intermediate System)– IGRP, EIGRP (CISCO proprietary)

External Gateway Protocol

• Allows different ASs to exchange routing information

• Policy routing – Control can be exerted over the information that crosses the border between Ass

• Based on cost metrics, but do not necessarily optimize like IGPs do

• Examples– BGP4 (Border Gateway Protocol, de facto standard)– EGP (External Gateway Protocol, specific not generic)– GGP (Gateway to Gateway Protocol)– Hello

Distance Vector Protocols

• Simple to understand and implement• Poor scalability, based on transmitting routing

tables between routers• Require periodic retransmission of routing

information as routing tables expire • Limited to small networks with simple topologies• Can exhibit “counting to infinity” behavior in the

presence of link failures• Example – RIP (Routing Information Protocol)

Link State Protocols

• Routers exchange Link State Packets (LSPs), not routing tables

• LSP information from a router flooded to rest of network

• Only regenerates this information based on topology changes

• Good scalability - amount of information sent proportional to topology change, not number of IP prefixes

• Each router maintains local map of entire network (AS), called Link State Database (LSDB), and constructs shortest path information using Dijkstra’s algorithm

• Examples – OSPF, IS-IS

Classless Inter-Domain Routing (CIDR)

• The Internet is a collection of networks – hence an IP address contains two parts, a network identifier and a host identifier

• Networks within the Internet have different numbers of hosts, hence originally networks were divided into classes

• Network classes– Class A – 0 in high order bit, network id is in first octet, host address is in

the last three octets• 128 class A networks each with 16.7 million host addresses

– Class B – 10 in high order two bits, network id is in first two octets, host address is in the last two octets

• 16,384 class B networks each with 65,535 host addresses

– Class C – 110 in high order three bits, network id is in the first three octets, host address is in the last octet

• 2.1 million class C networks each with 255 host addresses

– Class D – for multicast– Class E – reserved and unused

• This architecture is now obsolete

Classless Addressing

• Rapid growth of Internet outpaced class based addressing– Routing tables growing too large

– Running out of IP address space

– CIDR primarily addresses routing table problem

• Basic idea – get rid of implicit netmasks, pass explicit netmasks in inter-domain routing protocols

• CIDR allows service providers to aggregate classful networks and provide single summarized routing advertisements to other domains, thus controlling the growth of routing tables

• Addresses can overlap, forwarding must use longest matching prefix

CIDR Advantages

• Reduced the size of the Internet routing table

• Reduced the growth rate of the Internet routing table

• Allows current generation routers to handle Internet addressing and forwarding

• Extended the lifetime of IPv4 addressing

CIDR Issues

• Address allocation must be done in such a way as to allow aggregation

• BGP4, which was created to support CIDR, must also be configured to support aggregation

• Multihoming – having more than one link to the Internet – how to aggregate

• Proxy aggregation – One AS performs aggregation of addresses contained within another

BGP Outline• Based on Distance Vector algorithms• Uses TCP as transport protocol• A BGP session involves two nodes• Routers can be involved in several concurrent BGP sessions• BGP message types

– Open session– Activate new routes to prefixes– Deactivate old routes to prefixes– Report unusual conditions– Close session

• Advertised routes are actively being used by advertiser• Prefix advertisement attributes

– Next hops– Route preference metrics– AS path of routing announcement– How the prefix entered the routing table of the source AS

• BGP is extensible – new attributes can be added as needed

BGP State Machine

Idle Connect Active

OpenSent

OpenConfirmEstablished

ConnectionAccepted

OpenReceived

TCP Connection

FailedTCP

ConnectionEstablished

ConnectionRejectedor Error

Error

TCP ConnectionAttempted

TCP Connection

Failed

BGP Message Types

• Open

• Update

• Notification

• Keepalive

Open Message

• Version (1 octet)• My Autonomous System (2 octets)• Hold time (2 octets)• BGP identifier (4 octets)• Optional parameters length (1 octet)• Optional parameters (variable length

– Type (1 octet)– Length (1 octet)– Value (variable)

OPEN Optional Parameters

• 1 – Authentication information (1 octet authentication code and variable length information field. Not really used.)

• 2 – Capability negotiation

Update Message

• Withdrawn (unfeasible) routes length (2 octets)• Withdrawn (unfeasible) routes (variable)

– IP prefix length in bits (1 octet)

– IP prefix (variable)

• Total path attributes length (2 octets)• Path attributes (variable)• Network layer reachability information (variable)

Attribute Encoding

• Attribute Type (2 octets)– Attribute Flags (1 octet)– Attribute Type Code (1 octet)

• Attribute Length (1 or 2 octets)

• Attribute Value (variable)

Attribute Flags

• Bit 1 – Optional– 0 = well-known, required in all BGP implementations– 1 = optional

• Bit 2 – Transitive– 0 = non-transitive, not passed to other peers– 1 = transitive, must be passed on to others

• Bit 3 – Partial– 1 = some router didn’t understand optional transitive attribute– 0 = otherwise, must be 0 for well-known and optional

nontransitive attributes

• Bit 4 – Extended Length– 0 = attribute length represented in 1 octet– 1 = attribute length represented in 2 octets

Notification Message

• Error code (1 octet)

• Error subcode (1 octet)

• Data (variable)

Error Codes

• 1 – Message Header Error

• 2 – OPEN Message Error

• 3 – UPDATE Message Error

• 4 – Hold Timer Expired

• 5 – Finite State Machine Error

• 6 – Cease

Message Header Error Subcodes

• 1 – Connection Not Synchronized

• 2 – Bad Message Length

• 3 – Bad Message Type

OPEN Message Error Subcodes

• 1 – Unsupported Version Number

• 2 – Bad Peer AS

• 3 – Bad BGP Identifier

• 4 – Unsupported Optional Parameter

• 5 – Authentication Failure

• 6 – Unacceptable Hold Time

UPDATE Message Error Subcodes• 1 – Malformed Attribute List• 2 – Unrecognized Well-known Attribute• 3 – Missing Well-known Attribute• 4 – Attribute Flags Error• 5 – Attribute Length Error• 6 – Invalid ORIGIN Attribute• 7 – AS Routing Loop• 8 – Invalid NEXT-HOP Attribute• 9 – Optional Attribute Error• 10 – Invalid Network Field• 11 – Malformed AS-PATH

Keepalive

• Common header, no data

Model of Operation

• Each peer contains three locations– Adj-RIB-In (Adjacent Routing Information Base In)

• 1 per peer (BGP session)

• Contains prefixes learned from that peer

– Loc-RIB (Local Routing Information Base)• 1 per system

• Contains prefixes selected for use

– Adj-RIB-Out (Adjacent Routing Information Base Out)• 1 per peer (BGP session)

• Contains prefixes advertised to that peer

Standard Attributes

• 1 – Origin (well-known)– Indicates how a given prefix came into BGP at

the AS originating the prefix announcement– 1 – IGP: The prefix was learned from an IGP– 2 – EGP: The prefix was learned through BGP– 3 – INCOMPLETE: The prefix was learned

through some mechanism other than IGP or EGP, in practice these are the static routes

Standard Attributes• 2 – AS-PATH (well-known)

– Contains sequence of ASNs through which the announcement has passed

– Primarily used for loop detection/prevention– If a peer’s ASN appears in the AS-PATH, the

announcement is generally rejected, although some implementations can be configured to accept such a route for partition healing.

– Encoded as sequence of AS-PATH segments• Each has a TYPE ( 1 octet), LENGTH (1 octet), VALUE (list

of length LENGTH of 2 octet ASNs)• TYPE is either AS-SET or AS-SEQUENCE, allows for

aggregation of routes received via different AS-PATHS

Standard Attributes

• 3 – NEXT-HOP (well-known)– Address of the node to send packets to get them to the

advertised prefix– Often the same as the speaker’s IP address– Can be different (third-party next hop), otherwise

would be redundant– Requires special configuration, need not be accepted by

listener– Can be useful when several routers are on a LAN but

only some of them speak BGP

Standard Attributes

• 4 – MULTI-EXIT-DISCRIMINATOR (MED) (optional, nontransitive, 4-octet unsigned integer)– Used when two ASs connect to each other at multiple

places

– Carries a metric expressing a degree of preference for the link in the advertisement for routing to a prefix

– Sent by one AS, used by another, thus typically used in provider-subscriber relationships

Standard Attributes

• 5 – LOCAL-PREF (well-known, discretionary, 4 octet unsigned integer)– Generally used locally by an AS to express

preferences for routes to a prefix when multiple routes to different ASs are known

– Different from MED in that it isn’t passed by one AS to another, and doesn’t only apply to multiple connections between a pair of ASs

Standard Attributes

• 6 – ATOMIC-AGGREGATE (well-known, discretionary, 0 length used as a flag)– Indicates that the advertised prefix has been aggregated

– Some parts of paths to parts of the aggregate address space advertised may not appear in the AS-PATH

– The receiver of the advertisement should not deaggregate the prefix into more specific BGP entries

Standard Attributes

• 7 – AGGREGATOR (optional, transitive, 2 octet ASN, 4 octet IP address)– Indicates the AS and router that performed the

aggregation of the announced prefix

Internal and External BGP• How do multiple routers speaking BGP within a single AS

exchange routing information?– Could use IGP such as OSPF, but the volume of routing table

information and frequency of updates typically transmitted by BGP would break LSPs

– A preferred way is to use Internal BGP (I-BGP)– Strictly speaking, we should call the typical EGP use of BGP E-

BGP– Basically, the two are the same, with the key difference that

prefixes learned from an E-BGP neighbor can be advertised to an I-BGP neighbor and vice versa, but a prefix learned from an I-BGP neighbor cannot be advertised to another I-BGP neighbor

– This presents looping routing announcements within an AS, the AS-PATH attribute is useless for this within one AS

– It also leads to the requirement of a full-mesh of logical connections between I-BGP peers within an AS

BGP Route Selection• How does a system choose among multiple routes for the

same (identical, not overlapping) prefix?– The route with the highest LOCAL-PREF is selected first

– If no unique route is found, then the route with the shortest AS-PATH is selected from among those previously selected,

– If this does not produce a unique route, then if the system accepts MED and the multiple routes were learned from a single neighboring AS, the route with the lowest MED value is selected

– If multiple routes are still available, then choose the route with the minimum cost to the NEXT-HOP according to the IGP in use

– If no unique route has been chosen, and exactly one of the routes was learned by E-BGP, choose that one.

– If no unique route has been chosen, and all routes were learned via I-BGP, then choose the route learned from the I-BGP neighbor with the lowest BGP ID