Towards a Transparent and Proactively-Managed Internet

Towards a Transparent and Proactively-Managed

Internet

Ehab Al-Shaer

School of Computer Science

DePaul University

Yan Chen

EECS DepartmentNorthwestern

University

Motivations

• The Internet has evolved to become a un-cooperative ossificated network of networks– Network has to be treated as a blackbox

» Performance of even neighboring networks are opaque» Inter-domain routing based on policies but not performance» Have to resort to overlay networks which are suboptimal

– Diagnosis and fault location extremely hard

• Network config management reactive and expensive– Reactive configurations: tune after deployment– Vulnerable: manually handled and subject to conflicts– Imperative & fragmented: need to access several

specific devices in order to implement a service goal

Proposed Solution I: Transparent Internet

• Every network shares its measurement and management information with other networks when necessary (glass box)– Link-level performance: delay, loss rate, available

bandwidth, etc.– Management info

» Configuration: QoS setting, traffic policing» Middle box settings: firewalls, etc.

• The information sharing – As part of the inter-domain protocols: Transparent

Gateway Protocols (TGP)– Other applications: leverage DHT

Analogy to the Airline Alliance

• When airlines compose multi-lag flights, they need more than just route info.– Type of aircraft, # of vacancies, probability of

punctuation, etc.

• Such open model is mutual beneficial– Provide the best flight composition for clients– Similarly, open network model can provide

best communications for applications

Proposed Solution II: Proactive Configuration Management

• Proactive verification: configuration verified and translated to different vendor specific devices

• Proactive validation: Test the configuration changes on the real archived network traffic without interrupting the main operation network

• Autonomic configuration: configurations are auto-tuned dynamically to achieve the “objectives

defining Verifying

Deploying

Evaluating Optimizing

Validation

Dynamic Validation: auto-tuning

ObjectivesProvides a completely transparent view of the Internet to

networks and applications• Diagnosis & trouble shooting becomes extremely easy

– No more Internet tomography needed

• Flexible inter-domain routing– Not just based on policy or # of AS/hops– Flexible metrics based on bandwidth, latency, etc.

• Global traffic engineering– Each AS performs its own local traffic engineering– Provide AS path-level routing guide

• Unified framework that applications query (push/pull) info as needed

– Streaming media, content distribution– Anomaly/security applications

Flexible Inter-domain Routing

• Multiple routing paths with TGP– Incorporate measurement info into AS paths– Bandwidth-intensive and latency-intensive

applications can take different AS paths.

• Challenge: inter-domain routing based on bandwidth without making reservation

• Solution: Discretize the bandwidth for better stability– Though stability is a classical problem, not unique

to TGP

Global Traffic Engineering

• For the current Internet, only local optimum is achieved in each AS– Allowing the network to handle all traffic patterns

possible, within the networks ingress-egress capacity constraints (e.g. two phase routing)

• With global information, we can potentially achieve global optimum (or Nash equilibrium)– Each AS is a selfish individual– A center (or each AS) infers the Nash equilibrium – Each AS can try the Nash equilibrium, or attempt

to benefit itself based on the inferred Nash equilibrium

Example of Benefit of Global TE

AS 1

AS 2

AS 3

AS 5

AS 4

1G

2G 2G

1G

2G

1G traffic to AS 1

1G traffic to AS 1

• Without Global TE


AS 1

AS 2

AS 3

AS 5

AS 4

1G

2G 2G

1G

2G

1G traffic to AS 1

1G traffic to AS 1

1G

0.5G

0.5G

1.5G

0.5G

• With Global TE


AS 1

AS 2

AS 3

AS 5

AS 4

1G

2G 2G

1G

2G

1G traffic to AS 1

1G traffic to AS 1

1G

1G

1G

1G

Unified Transparency Framework for Various

Functionality• Sharing of anomaly/security-related

measurement– Various characteristics of traffic: heavy hitter,

heavy changes, histogram, etc.– Self-diagnosis to survivability

• Adaptations– Routing adaptations at router level or application

level

Practical Issues and Solutions

• Incentives for information sharing– Mandatory for next-generation Internet ?– Alliance model for incremental growth

• Security/cheating: Trust but verify– Trust most of the info shared but periodically verify

» Much easier than the current Internet tomography unless many ASes collude

– Verification part of the protocol» Some fields in the packet headers designed for that

purpose

Backup Materials

Measurement Info to Share

• Basic metrics– Delay, loss rate, capacity, available bandwidth– Demand (or traffic volume) and application types

• Intra-AS Measurement Info– Link-level info

» Queried only when necessary

– Aggregated Info» OD flow level info» Path segment b/t entry and exit points in each AS

• Inter-AS Measurement Info– General AS relationship– AS-level topology– Inter-AS link metrics

Combined w/ routing info and

export to neighboring ASes

through TGP protocol

Provide global retrievableManagement Information Base (MIB)

with DHT

Network link-level monitoring

Transparent Internet Architecture

Methodology

• Network topology• Web workload• Network end-to-end

latency measurement

Analytical evaluation

Algorithm design

Realistic simulation

iterate

PlanetLab tests

TGP MIB Dissemination Architecture

• Leverage Distributed Hash Table - Tapestry for– Distributed, scalable location with guaranteed

success– Search with locality

data plane

network plane

datasource

Web server

SCAN server

client

replica

always update

cache

DHT mesh

Replica Location

Dynamic Replication/Update

and Replica Management

adaptivecoherence

Overlay Network Monitoring

SERVER

OVERLAY RELAYNODE

OVERLAY NETWORKOPERATION CENTER

CLIENT

3. Network congestion /failure

4. Detect congestion /failure

2. Register trigger

7. Skip-free streamingmedia recovery

6. Setup New Path

1. Setupconnection

5. Alert +New Overlay Path

X

UC Berkeley

UC San Diego

Stanford

HP Labs

Adaptive Overlay Streaming Media

• Implemented with Winamp client and SHOUTcast server

• Congestion introduced with a Packet Shaper• Skip-free playback: server buffering and rewinding• Total adaptation time < 4 seconds

Summary• A tomography-based overlay network

monitoring system– Selectively monitor a basis set of O(n logn) paths

to infer the loss rates of O(n2) paths– Works in real-time, adaptive to topology changes,

has good load balancing and tolerates topology errors

• Both simulation and real Internet experiments promising

• Built adaptive overlay streaming media system on top of TOM– Bypass congestion/failures for smooth playback

within seconds

Tie Back to SCAN

Provision: Dynamic Replication

+ Update Multicast Tree BuildingReplica Management:

(Incremental) Content Clustering

Network End-to-End Distance Monitoring

Internet Iso-bar: latency TOM: loss rate

Network DoS Resilient

Replica Location: Tapestry

Contribution of My Thesis

• Replica location – Proposed the first simulation-based network DoS

resilience benchmark and quantify three types of directory services

• Dynamically place close to optimal # of replicas– Self-organize replicas into a scalable app-level

multicast tree for disseminating updates

• Cluster objects to significantly reduce the management overhead with little performance sacrifice– Online incremental clustering and replication to

adapt to users’ access pattern changes

• Scalable overlay network monitoring

Existing CDNs Fail to Address these Challenges

Non-cooperative replication inefficient

No coherence for dynamic content

Unscalable network monitoring - O(M × N)M: # of client groups, N: # of server farms

X

Problem Formulation

• Subject to certain total replication cost (e.g., # of URL replicas)• Find a scalable, adaptive replication strategy to reduce avg access cost

CDN Applications (e.g. streaming media)

SCAN: Scalable Content Access Network

Provision: Cooperative Clustering-based Replication

User Behavior/Workload Monitoring

Coherence: Update Multicast Tree Construction

Network PerformanceMonitoring

Network Distance/ Congestion/ FailureEstimation

red: my work, black: out of scope

Comparison of Content Delivery Systems (cont’d)

Properties Web caching (client initiated)

Web caching (server initiated)

Pull-based CDNs (Akamai)

Push-based CDNs

SCAN

Distributed load balancing

No Yes Yes No Yes

Dynamic replica placement

Yes Yes Yes No Yes

Network- awareness

No No Yes, unscalable monitoring system

No Yes, scalable monitoring system

No global network topology assumption

Yes Yes Yes No Yes

Documents

Towards a Transparent and Proactively-Managed Internet