42
CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

Embed Size (px)

Citation preview

Page 1: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

CS 4700 / CS 5700Network Fundamentals

Lecture 17: Network Modeling (Not Everyone has a Datacenter)

Page 2: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

Wide-Area Network Research

Most research now focused on large-scale systems

Challenges: testing and evaluation How to perform wide-area tests in a repeatable,

reliable manner ModelNet, Emulab

Challenge: understanding/capturing Internet topologies Graph characterization: dK-series

2

Page 3: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

3

ModelNet dK

Outline

Page 4: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

A Case for Network Emulation

Need a way to test large-scale Internet services Peer-to-peer, overlay networks, novel protocols

Testing in the real world PlanetLab… Results not reproducible or predictable Difficult to deploy and administer research software

Simulation tools Allows control over test environment May miss important system interactions

Emulation Emulators subject application traffic to end-to-end bandwidth

constraints, latency, and loss rate of user specified topology Previous implementations not scalable

4

Page 5: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

ModelNet

A scalable, cluster-based, comprehensive network emulation environment

5

Page 6: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

Design

User run configurable number of instances of application on Edge Nodes within cluster

Each instance is a Virtual Edge Node (VN) Each VN has a unique IP address

Edge nodes route traffic through cluster of Core Routers Equipped with large memories,

modified FreeBSD kernels Core routers route traffic through

emulated links or “pipes” Each pipe has own packet queue and queuing discipline

6

Page 7: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

ModelNet Phases

Create Generates a network topology as a graph From Internet traces, BGP dumps, synthetic topology

generators, etc. Annotate graph with loss rates, failure distributions…

Distillation Transforms GMLgraph into pipe topology

Assignment Maps pipe topology to core nodes, distributing emulation

load across core nodes Finding ideal mapping is NP-complete ModelNet uses greedy k-clusters assignment

For k core nodes, randomly select k nodes in distilled topology. Greedily select links from connected component in round robin

7

Page 8: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

ModelNet Phases

Binding Multiplex multiple VNs to each physical edge

nodes Bind each physical edge node to a core router Generate shortest path routes between all VNs

and install in core routing tables

Run Executes target application code on edge nodes

8

Page 9: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

Inside the Core

Route traffic through emulated “pipes” Each route is an ordered list of pipes Packets move through pipes by reference Routing table requires O(n2) space

Packet Scheduling When packet arrives, put at tail of first pipe in its route. Scheduler stores heap of pipes sorted by earliest deadline -

exit time for first packet in its queue Once every clock tick

Traverse pipes in heap for packets that are ready to exit Move packets to tail of next pipe or schedule for delivery Calculate new deadlines

Multi-core Configuration Next pipe in route may be on different machine If so, core node tunnels packet descriptor to next node

9

Page 10: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

Scalability Issues

Traffic traversing core is limited by cluster’s physical internal bandwidth

ModelNet must buffer up to full bandwidth-delay product of target network.

250 MB of packet buffer space to carry flows at aggregate bandwidth of 10 GB/s with 200 ms roundtrip latency.

Assumes perfect routing protocol

10

Page 11: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

Baseline Accuracy

Want to insure that under load, packets are subject to correct end-to-end delays

Used kernel logging to track ModelNet performance and accuracy

Results show that by running ModelNet scheduler at highest kernel priority Packets are delivered within 1ms of target end-

to-end value Accuracy is maintained up to 100% CPU usage

11

Page 12: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

Scalability

Additional Cores Adding core routers allows ModelNet to deliver

higher throughput Communication between core routers introduces

overhead. Higher cross-core communication results in less throughput benefit

VN Multiplexing Higher degrees of multiplexing enable larger

network emulation Inaccuracies introduced due to context switching,

scheduling, resource contention, etc

12

Page 13: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

Accuracy vs. Scalability

Reduce overhead by deviating from target network requirements

Changes should minimally impact application behavior

Ideally, system reports degree and nature of emulation inaccuracy

13

Page 14: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

Scalability via Distillation

Pure hop-by-hop emulation Distilled topology is isomorphic

to target network High per packet overhead

End-to-end distillation Remove all interior network nodes Collapse each path into

single pipe Latency = sum of latencies

along path Reliability = product of link

reliabilities along path Low per packet overhead Does not emulate link contention along path

14

Page 15: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

Time Dilation on Modelnet

The challenge Need to emulate networks with more resources E.g. fast CPU (20Ghz), large b/w networks (TB/s) But only commodity machines available

Solution Modelnet + time dilation via virtual machines Run application inside single VMs Slow down time inside VM Result: everything looks faster/bigger/fatter

More CPU cycles/time, packets/time, disk I/O /time

15

Page 16: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

How It’s Done

Must isolate VM from outside measures of time Time based on shared data structure provided by VMM Scale data structure by a Time Dilation Factor (TDF) Also scale hardware timer by TDF

How do we scale only some resources? Slow the others back

down!! Example: speed up

network by TDF=10 B/w increases by 10,

but delay dec by 10So inc delay by 10

Virtual Machine Monitor (VMM)

NodeMgr

LocalAdmin

VM1 VM2 VMn…

16

Page 17: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

ModelNet Summary

ModelNet, antithesis of PlanetLab Testing of unmodified applications Reproducible results Experimentation using broad range of network

topologies and characteristics Large scale experiments (thousands of nodes and

gigabits of cross traffic) Can scale to emulate non-existent resource levels

But what if you want real deployment on-demand? Emulab / NetBed

17

Page 18: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

Emulab / NetBed

A shared configuration on-demand testbed What if you don’t have your own cluster What if you need to test specific

environments/HW? What if you need this in 5 mins?

Emulab / NetBed Hardware: 328 PCs, high speed Gb Cisco switches Software: OS-loader and manager via web

interface Wipe all disks, load OS-images, configure routers in

<2 mins Reboot and give ssh access

18

Page 19: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

Emulab Web Interface19

Page 20: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

20

ModelNet dK

Outline

Page 21: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

Importance of Network Topology

Access to real-world network topologies is vital for research

New routing and other protocol design, development, testing, etc. Analysis: performance of a routing algorithm

strongly depends on topology Generation: empirical estimation of scalability

Network robustness, resilience under attack, worm spreading, etc.

21

Page 22: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

Network Topology Research22

Sta

tic

Top

olo

gie

sD

yn

am

icTop

olo

gie

s

Page 23: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

23

Trade Secrets

Unfortunately, large scale network topologies are often proprietary Think about BGP ISPs want to hide their internal topology

Real datasets are rare Small scale Out of date Static (i.e. not dynamic)

Page 24: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

24

Towards Synthetic Topologies

Question: can we use graph models to capture real network topologies? Fit a model to a real topology Use a generator to produce synthetic topologies

that are similar, but not identical to the real topology

Benefits Privacy – synthetic graphs are not proprietary Randomization – produce an infinite number of

stochastic snapshots Scalable – generator can produce similar

topologies of any size

Page 25: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

Important Topology Metrics

Degree distribution Clustering Assortativity Distance distribution Betweenness distribution

Problems

No way to reproduce most of the important metrics

No guarantee there will not be any other/new metric found important

25

Page 26: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

The Approach

Look at inter-dependencies among topology characteristics

See if by reproducing most basic, simple, characteristics, we can also reproduce all other characteristics, including practically important

Try to find the characteristic(s) that define all others Key Observation:

Graphs are structures of connections between nodes

26

Page 27: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

Definition of dK-distributions

dK-distributions are degree correlations within simple connected graphs of size d

For example 1K distribution

correlations between node degree distribution 2K distribution

correlations on joint node degree distribution 3K distribution

correlations on clustering coefficient

27

Page 28: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

An Example of dK

xK is distribution of subgraphs with particular degrees dK-1 describes node degree distribution dK-2 describes joint node degree

distribution dK-3 captures clustering coefficient

28

dk-0: average degree=2dk-1: P(1)=1, P(2)=2, P(3)=1 dk-2: P(1,3)=1, P(2,2)=1, P(2,3)=2 dk-3: P(1,3,2)=2, P(2,2,3)=1

28

Page 29: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

Nice properties of dK-series

Constructability: we can construct graphs having properties Pd (dK-graphs)

Inclusion: if a graph has property Pd, then it also has all properties Pi, with i < d (dK-graphs are also iK-graphs)

Convergence: the set of graphs having property Pn consists only of one element, G itself (dK-graphs converge to G)Guarantees that all (even not yet defined!) graph metrics can be captured by sufficiently high d

29

Page 30: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

Inclusion and dK-randomness

2K

0K

0K-random

1K

Given G

1K-randomnK

2K-random

30

Page 31: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

How Do We Generate Graphs?

A number of different approaches Stochastic Pseudograph Matching Rewriting

Some are extensible to d=3, others are not New research proposed d=2.5, to make

generation tractible

31

Page 32: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

Stochastic approach

Classical (Erdos-Renyi) random graphs are 0K-random graph in the stochastic approach

Easily generalizable for any d: Reproduce the expected value of the dK-

distributions by connecting random d-plets of nodes with (conditional) probabilities extracted from G

Best for theory Worst in practice

32

Page 33: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

Pseudograph approach

Reproduces dK-distributions exactly Constructs not necessarily connected

pseudographs Extended for d = 2 Failed to generalize for d > 2: d-sized

subgraphs start overlap over edges at d = 3

33

Page 34: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

Pseudograph details

1K1. dissolve graph into a

random soup of nodes2. crystallize it back

2K1. dissolve graph into a

random soup of edges2. crystallize it back

k1 k2

k1k2

k3

k4

k1

k1

k1 k1-ends

34

Page 35: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

dK-Randomizing Rewiring

Can generate random graphs from original Generalizes to any d But cannot generate desired graph from dK-

distributions

35

Page 36: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

Algorithms

All algorithms deliver consistent results for d = 0

All algorithms, except stochastic(!), deliver consistent results for d = 1 and d = 2

Both rewiring algorithms deliver consistent results for d = 3

Eventual choice Use pseudograph to construct 1K graphs Use targeted rewriting to build higher d graphs

36

Page 37: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

Skitter Scalar Metrics

Metric 0K 1K 2K 3K skitter

<k> 6.31 6.34 6.29 6.29 6.29

r 0 -0.24 -0.24 -0.24 -0.24

<C> 0.001 0.25 0.29 0.46 0.46

d 5.17 3.11 3.08 3.09 3.12

sd 0.27 0.4 0.35 0.35 0.37

l1 0.2 0.03 0.15 0.1 0.1

ln-1 1.8 1.97 1.85 1.9 1.9

37

Page 38: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

HOT Scalar Metrics

Metric 0K 1K 2K 3K HOT

<k> 2.47 2.59 2.18 2.10 2.10

r -0.05 -0.14 -0.23 -0.22 -0.22

<C> 0.002 0.009 0.001 0 0

d 8.48 4.41 6.32 6.55 6.81

sd 1.23 0.72 0.71 0.84 0.57

l1 0.01 0.034 0.005 0.004 0.004

ln-1 1.989 1.967 1.996 1.997 1.997

38

Page 39: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

HOT 0K39

True HOT Graph HOT 0K

Page 40: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

HOT 1K40

True HOT Graph HOT 1K

Page 41: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

HOT 2K41

True HOT Graph HOT 2K

Page 42: CS 4700 / CS 5700 Network Fundamentals Lecture 17: Network Modeling (Not Everyone has a Datacenter)

HOT 3K42

True HOT Graph HOT 3K