23
Ningning Hu Carnegie Mellon University 1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck (AT&T), Jia Wang (AT&T)

Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck

Embed Size (px)

Citation preview

Page 1: Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck

Ningning Hu Carnegie Mellon University 1

Optimizing Network Performance In Replicated Hosting

Peter Steenkiste (CMU)

with Ningning Hu (CMU),

Oliver Spatscheck (AT&T),

Jia Wang (AT&T)

Page 2: Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck

Ningning Hu Carnegie Mellon University 2

Motivation

The question of how to use latency to select a replicated web server has been well studied

How about using available bandwidth?

?

Page 3: Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck

Ningning Hu Carnegie Mellon University 3

Outline

Pathneck

Internet end user RTT distribution and access bandwidth distribution

Optimization results For RTT For bandwidth For data transmission time

Page 4: Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck

Ningning Hu Carnegie Mellon University 4

Pathneck: Recursive Packet Train (RPT)

Two measurement packets are dropped at each router

ICMP packets allow source to estimate train length at each hop

Changes in train length provide bounds on the available bandwidth of each link

Load packetsmeasurement

packetsmeasurement packets

1 2 20 20 2 1

20 pkts, 60 B 20 pkts, 60 B

100 100 100 100 100

60 pkts, 500 B

TTL

Page 5: Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck

Ningning Hu Carnegie Mellon University 5

Pathneck Operation

1001 2 3 4 4 3 2 1100 100 100 100

991 2 3 3 2 199 99 99 99

981 2 2 198 98 98 98

R1

S

R2

R3

0 0

0 0

0 0

g1

g2

g3

982 298 98 98 981 1

971 197 97 97 97

g1

g2

g2

Page 6: Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck

Ningning Hu Carnegie Mellon University 6

Pathneck Properties

Pathneck is an active probing tool designed for locating Internet bottlenecks It is efficient and effective Also provide route, delay, and bandwidth

information For technical detail please see

www.cs.cmu.edu/~hnn/pathneck

We improve Pathneck to cover the last hop This allows us to measure the RTT and the

access bandwidth of many end users.

Page 7: Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck

Ningning Hu Carnegie Mellon University 7

Methodology

Measurement sources: 18 nodes from a large tier-1 ISP 14 in the US, 3 in Europe, and 1 in East-Asia Large fraction of paths cover other ISPs Play the role of possible replica sites

Measurement destinations: 164,130 IP addresses from different prefixes 67,271 IPs correspond to real online hosts Firewalls etc sometime require us to use

intermediate node as “virtual” destination Play the role of clients accessing the web

Page 8: Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck

Ningning Hu Carnegie Mellon University 8

Results

Internet end user RTT distribution and access bandwidth distribution

Optimization results For RTT For bandwidth For data transmission time

Page 9: Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck

Ningning Hu Carnegie Mellon University 9

RTT Distribution

The RTT “views” of Internet clients from different geographical locations are significantly different

US-NE

Europe

East-Asia

Page 10: Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck

Ningning Hu Carnegie Mellon University 10

Bandwidth Distribution

US-NEEuropeEast-Asia

The bandwidth “views” are much more alike

Page 11: Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck

Ningning Hu Carnegie Mellon University 11

End Access Bandwidth Distribution

Low access bandwidth still dominates among end users

40% < 2.2Mbps

50% < 4.2Mbps

62.5% < 10Mbps

Limited by downstream bandwidth of measurement source

Page 12: Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck

Ningning Hu Carnegie Mellon University 12

Bottleneck Location Distribution

75% of bottleneck links are at the last two hop Little chance to avoid these bottlenecks using

replication

However, when access bandwidth is higher than 40Mbps, content replication can help to improve performance

Page 13: Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck

Ningning Hu Carnegie Mellon University 13

Results

Internet end user RTT distribution and access bandwidth distribution

Optimization results For RTT For bandwidth For data transmission time

Page 14: Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck

Ningning Hu Carnegie Mellon University 14

Optimization Algorithm

We use simple greedy algorithm to optimize the performance of our replication infrastructure In each step, select the replication node that

has the largest marginal utility

Greedy algorithm has been shown to be able to obtain results very close to the optimal results For our study, it is only 0.1% worse than the

optimal results from brute-force search

Page 15: Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck

Ningning Hu Carnegie Mellon University 15

RTT Optimization

RTT optimization results have a clear geographical pattern

The first 5 replicas provide most of the benefit

US-EastEurope

East-AsiaUS-West

US-Central

Page 16: Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck

Ningning Hu Carnegie Mellon University 16

Marginal Utility of RTT Optimization

The first 5 nodes have significant improvement (i.e., larger than 5%)

[ Marginal utility: the relative performance improvement from a specific node ]

Page 17: Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck

Ningning Hu Carnegie Mellon University 17

Bandwidth Optimization

The first 2 replicas provide most of the benefit

Page 18: Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck

Ningning Hu Carnegie Mellon University 18

Marginal Utility for B.W. Optimization

Only the first 2 (3) nodes have significant improvement

Page 19: Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck

Ningning Hu Carnegie Mellon University 19

For Well-provisioned Access Links

Replication can indeed improve bandwidth performance for end users with access bandwidth larger than 40Mbps

74%

35%

54Mbps

Page 20: Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck

Ningning Hu Carnegie Mellon University 20

Data Transmission Time

End-users’ data transmission time depends on delay, bandwidth, and data size

We estimate data transmission time using a simplified TCP model: a slow start and congestion avoidance phase Assumes no packet loss Slow start: transfer time is delay sensitive Congestion avoidance: bandwidth sensitive

Data size determines whether replication should optimize delay or bandwidth Use “slow-start size” as cross over point

Results: 70% of paths have slow-start size larger than 10KB Larger than the average web page

Page 21: Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck

Ningning Hu Carnegie Mellon University 21

Data Transmission Time (2)

The transmission times for 10KB, 100KB, 1MB and 10MB are 0.4s, 1.1s, 6.4s, and 59.2s, respectively

Page 22: Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck

Ningning Hu Carnegie Mellon University 22

Related Work

Content replication with different optimization metrics Geographic location, network hops and

latency, Retrieval costs, update cost, storage cost, QoS guarantee, …

Greedy algorithm used in replica selection

Page 23: Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck

Ningning Hu Carnegie Mellon University 23

Conclusion

Quantify Internet end-node access-bandwidth distribution and bottleneck location distribution

Two differences distinguish the optimization on bandwidth and on RTT Geographic location is not important for

bandwidth optimization For throughput, only well-provisioned end

users can benefit from content replication