63
1 Integrated Approach to Improving Web Performance Lili Qiu Cornell University

Integrated Approach to Improving Web Performance

  • Upload
    hugh

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

Integrated Approach to Improving Web Performance. Lili Qiu Cornell University. Outline. Motivation & Open Issues Solutions Study Web workload, and properly provision the content distribution networks Optimizing TCP performance for Web transfers Fast packet classification Summary - PowerPoint PPT Presentation

Citation preview

Page 1: Integrated Approach to Improving Web Performance

1

Integrated Approach to Improving Web Performance

Lili Qiu

Cornell University

Page 2: Integrated Approach to Improving Web Performance

2

Outline Motivation & Open Issues Solutions

Study Web workload, and properly provision the content distribution networks

Optimizing TCP performance for Web transfers

Fast packet classification Summary Other Work

Page 3: Integrated Approach to Improving Web Performance

3

Motivation Web is the dominant traffic in the

Internet today Web performance is often

unsatisfactory WWW – World Wide Wait Consequence: losing potential

customers! Network congestio

nOverloadedWeb server

Page 4: Integrated Approach to Improving Web Performance

4

Why is the Web so slow? Application layer

Web servers are overloaded … Transport layer

Web transfers are short and busty, and interact poorly with TCP

Network layer Routers are not fast enough Network congestion Route flaps and routing instabilities

…Inefficiency in any layer of the

protocol stack can slow down the Web!

Page 5: Integrated Approach to Improving Web Performance

5

Our Solutions Application layer

Study Web Workload Properly provision content distribution

networks (CDNs) Transport layer

Optimize TCP startup performance for Web transfers

Network layer Speed up packet classification (useful for

firewall & diff-serv)

Page 6: Integrated Approach to Improving Web Performance

6

Part I Application Layer Approach

Study the workload of busy Web servers The Content and Access Dynamics of a Busy Web

Site: Findings and Implications. Proceedings of ACM SIGCOMM 2000, Stockholm, Sweden, August 2000. (Joint work with V. N. Padmanabhan)

Properly provision content distribution networks

On the Placement of Web Server Replicas. Submitted to INFOCOM'2001. (Joint work with V. N. Padmanabhan and G. M. Voelker)

Page 7: Integrated Approach to Improving Web Performance

7

Introduction Solid understanding of Web workload is critical

for designing robust and scalable systems The workload of popular Web servers is not well

understood Study the content and access dynamics of MSNBC

web site a large news server one of the busiest sites in the Web 25 million accesses a day (HTML content alone) Period studied: Aug. – Oct. 99 & Dec. 17, 98 flash crowd

Properly provision content distribution networks Where to place the edge servers in the CDNs

Page 8: Integrated Approach to Improving Web Performance

8

Temporal Stability of File Popularity

Methodology Consider the traces from

a pair of days Pick the top n popular

documents from each day Compute the overlap

Results One day apart:significant

overlap (80%) Two months apart:

smaller overlap (20-80%) Ten months apart: very

small overlap (mostly below 20%)

0

0.2

0.4

0.6

0.8

1

1 10 100 1000 10000 100000

# popular documents picked

Exte

nt o

f ove

rlap

17DEC98 - 18OCT99 01AUG99 - 18OCT99 17OCT99 - 18OCT99

The set of popular documents remains stable for days

Page 9: Integrated Approach to Improving Web Performance

9

Spatial Locality inClient Accesses

Normal Day

0

0.2

0.4

0.6

0.8

1

0 10000 20000 30000 40000 50000

Domain ID

Frac

tion

of re

ques

ts s

hare

d

Domain membership is significant except when there is a “hot” event of global interest

Dec. 17, 1998

0

0.2

0.4

0.6

0.8

1

1.2

0 5000 10000 15000 20000 25000 30000 35000

Domain IDFr

actio

n of

requ

ests

sha

red

Trace

Random

Page 10: Integrated Approach to Improving Web Performance

10

Spatial Distribution of Client Accesses

Cluster clients using network aware clustering [KW00]

IP addresses with the same address prefix belongs to a cluster

Top 10, 100, 1000, 3000 clusters account for about 24%, 45%, 78%, and 94% of the requests respectively

A small number of client clusters contribute most of the requests.

Page 11: Integrated Approach to Improving Web Performance

11

The Applicability of Zipf-law to Web requests

The Web requests follow Zipf-like distribution Request frequency 1/i, where i is a document’s ranking

The value of is much larger in MSNBC traces 1.4 – 1.8 in MSNBC traces smaller or close to 1 in the proxy traces close to 1 in the small departmental server logs [ABC+96] Highest when there is a hot event

0

0.5

1

1.5

2

MSNBC Proxies Less popular servers

Page 12: Integrated Approach to Improving Web Performance

12

Impact of larger Accesses in MSNBC traces

are much more concentrated90% of the accesses are accounted by

Top 2-4% files in MSNBC traces

Top 36% files in proxy traces (Microsoft proxies and the proxies studied in [BCF+99])

Top 10% files in small departmental server logs reported in [AW96]

Popular news sites like MSNBC see much more concentrated accesses Reverse caching and replication can be very

effective!

0

0.2

0.4

0.6

0.8

1

1.2

0 0.5 1 1.5

Percentage of Documents (sorted by popularity)

Pe

rce

nta

ge

of R

eq

uest

s

12/17/98 Server Traces 08/01/99 Server Traces10/06/99 Proxy Traces

Page 13: Integrated Approach to Improving Web Performance

13

Introduction to Content Distribution Networks (CDNs)

Content providers want to offer better service to their clients at lower cost

Increasing deployment of content distribution networks (CDNs)

Akamai, Digital Island, Exodus … Idea: a network of servers Features:

Outsourcing infrastructure Improve performance by moving

content closer to end users Flash crowd protection

CDNserver

server

ClientsContent

Providers

server

server

server

Page 14: Integrated Approach to Improving Web Performance

14

Placement of CDN servers Goal

minimize users’ latency or bandwidth usage

Minimum K-median problem

Select K centers to minimize the sum of assignment costs

Cost can be latency or bandwidth or other metric we want to optimize

NP-hard problem

CDNserver

server

server

server

server

ClientsContent

Providers

Page 15: Integrated Approach to Improving Web Performance

15

Placement Algorithms Tree based algorithm [LGG+99]

Assume the underlying topologies are trees, and model it as a dynamic programming problem

O(N3M2) for choosing M replicas among N potential places

Random Pick the best among several random

assignments Hot spot

Place replicas near the clients that generate the largest load

Page 16: Integrated Approach to Improving Web Performance

16

Placement Algorithms (Cont.)

Greedy algorithmGreedy(N,M) { for I = 1 .. M { for each remaining replica R {

cost[R] = cost after placing an additional replica at R

} select the replica with the lowest cost }}

Super Optimal algorithm Lagrangian relaxation + subgradient method

Page 17: Integrated Approach to Improving Web Performance

17

Simulation Methodology Network topology

Randomly generated topologies Using GT-ITM Internet topology generator

Real Internet network topology AS level topology obtained using BGP routing data from

a set of seven geographically dispersed BGP peers Web Workload

Real server traces MSNBC, ClarkNet, NASA Kennedy Space Center

Performance Metric Relative performance: costpractical/costsuper-optimal

Page 18: Integrated Approach to Improving Web Performance

18

Simulation Results inRandom Tree Topologies

Page 19: Integrated Approach to Improving Web Performance

19

Simulation Results inRandom Graph Topologies

Page 20: Integrated Approach to Improving Web Performance

20

Simulation Results inReal Internet Topologies

Page 21: Integrated Approach to Improving Web Performance

21

Effects of Imperfect Knowledge about Input Data

Predict load using moving window average

(a) Perfect knowledge about topology

(b) Knowledge about Topology with a factor of 2

accurate

Page 22: Integrated Approach to Improving Web Performance

22

Conclusion Characterize Web workload using MSNBC traces Placement of CDN servers

Knowledge about client workload and topology is crucial for provisioning CDNs

The greedy algorithm performs the best Within a factor of 1.1 – 1.5 of super-optimal

The greedy algorithm is insensitive to noise Stay within a factor of 2 of the super-optimal when the salted

error is a factor of 4 The hot spot algorithm performs nearly as well

Within a factor of 1.6 – 2 of super-optimal How to obtain inputs

Moving window average for load prediction Using BGP router data to obtain topology information

Page 23: Integrated Approach to Improving Web Performance

23

Part II Transport Layer Approach Speeding Up Short Data Transfers: Theory,

Architectural Support, and Simulation Results. Proceedings of NOSSDAV 2000 (Joint work with Yin Zhang and Srinivasan Keshav)

Page 24: Integrated Approach to Improving Web Performance

24

Motivation Characteristics of Web data transfers

Short & bursty [Mah97] Use TCP

Problem: Short data transfers interact poorly with TCP !

Page 25: Integrated Approach to Improving Web Performance

25

TCP/Reno Basics

Slow Start Exponential growth in

congestion window, Slow: log(n) round

trips for n segments Congestion

Avoidance Linear probing of BW

Fast Retransmission Triggered by 3

Duplicated ACK’s

Page 26: Integrated Approach to Improving Web Performance

26

Related Work P-HTTP [PM94]

Reuses a single TCP connection for multiple Web transfers, but still pays slow start penalty

T/TCP [Bra94] Cache connection count, RTT

TCP Control Block Interdependence [Tou97]: Cache cwnd, but large bursts cause losses

Rate Based Pacing [VH97] 4K Initial Window [AFP98] Fast Start [PK98, Pad98]

Need router support to ensure TCP friendliness

Page 27: Integrated Approach to Improving Web Performance

27

Our Approach Directly enter Congestion Avoidance Choose optimal initial congestion window

A Geometry Problem: Fitting a block to the service rate curve to minimize completion time

Page 28: Integrated Approach to Improving Web Performance

28

Optimal Initial cwnd Minimize completion time by having the

transfer end at an epoch boundary.

Page 29: Integrated Approach to Improving Web Performance

29

Shift Optimization Minimize initial cwnd while keeping the

same integer number of RTT’s

Before optimization:cwnd = 9

After optimization:cwnd = 5

Page 30: Integrated Approach to Improving Web Performance

30

Effect of Shift Optimization

Page 31: Integrated Approach to Improving Web Performance

31

TCP/SPAND Estimate network state by sharing performance

information SPAND: Shared PAssive Network Discovery [SSK97]

Directly enter Congestion Avoidance, starting with the optimal initial cwnd

Avoid large bursts by pacing

Internet

Web Servers

PerformanceServer

Page 32: Integrated Approach to Improving Web Performance

32

Implementation Issues Scope for sharing and aggregation

24-bit heuristic network-aware clustering [KW00]

Collecting performance information Performance reports, New TCP option, Windmill’s

approach, … Information aggregation

Sliding window average Retrieving estimation of network state

Explicit query, active push, … Pacing

Leaky bucket based pacing

Page 33: Integrated Approach to Improving Web Performance

33

Opportunity for Sharing MSNBC: 90% requests arrive within 5 minutes

since the most recent request from the same client network (using 24-bit heuristic)

Page 34: Integrated Approach to Improving Web Performance

34

Cost for Sharing MSNBC: 15,000-25,000 different client

networks in a 5-minute interval during peak hours (using 24-bit heuristic)

Page 35: Integrated Approach to Improving Web Performance

35

Simulation Results Methodology

Download files in rounds Performance Metric

Average completion time TCP flavors considered

reno-ssr: Reno with slow start restart reno-nssr: Reno w/o slow start restart newreno-ssr: NewReno with slow start restart newreno-nssr: NewReno w/o slow start restart

Page 36: Integrated Approach to Improving Web Performance

36

Simulation Topologies

Page 37: Integrated Approach to Improving Web Performance

37

T1 Terrestrial WAN Link withSingle Bottleneck

Page 38: Integrated Approach to Improving Web Performance

38

T1 Terrestrial WAN Link withMultiple Bottlenecks

Page 39: Integrated Approach to Improving Web Performance

39

T1 Terrestrial WAN Link with Multiple Bottlenecks and Heavy Congestion

Page 40: Integrated Approach to Improving Web Performance

40

TCP Friendliness (I)Against reno-ssr with 50-ms Timer

Page 41: Integrated Approach to Improving Web Performance

41

TCP Friendliness (II)Against reno-ssr with 200-ms Timer

Page 42: Integrated Approach to Improving Web Performance

42

Conclusions TCP/SPAND significantly reduces latency

for short data transfers 35-65% compared to reno-ssr / newreno-ssr 20-50% compared to reno-nssr / newreno-

nssr Even higher for fatter pipes

TCP/SPAND is TCP-friendly TCP/SPAND is incrementally deployable

Server-side modification only No modification at client-side

Page 43: Integrated Approach to Improving Web Performance

43

Part III Network Layer Approach Fast Packet Classification on Multiple

Dimensions. Cornell CS Technical Report 2000-1805, July 2000. (Joint work with G. Varghese and S. Suri, in progress)

Page 44: Integrated Approach to Improving Web Performance

44

Motivation Traditionally, routers forward packets based on

the destination field only Diff-serv and firewall require layer 4 switching

forward packets based on multiple fields in the packet header, e.g. source IP address, destination IP address, source port, destination port, protocol, type of service (tos) …

The general packet classification problem has poor worst-case cost:

Given N arbitrary filters with k packet fields either the worst-case search time is ((logN)k-1) or the worst-case storage is O(Nk)

Page 45: Integrated Approach to Improving Web Performance

45

Problem Specification Given a set of filters (or rules), where each

filter specifies a class of packet headers based on K fields an associated directive, which specifies how to

forward the packet matching this filter Goal: Find the best matching filter for each

incoming packet A packet P matches a filter F if every field of P

matches the corresponding field of F Exact match, prefix match, or range match Assume prefix matching

Page 46: Integrated Approach to Improving Web Performance

46

Problem Specification (Cont.) Example of Cisco Access control List

(ACL)1. access-list 100 deny udp 26.145.168.192

255.255.255.255 74.199.168.192 255.255.255.0 eq 2049

2. access-list 100 permit ip 74.199.191.192 255.255.0.0 255 74.199.168.192.255.0.0

3. access-list 100 permit tcp 250.197.149.202 255.0.0.0 74.199.20.76 255.0.0.0

Packet: tcp 250.19.34.34 74.23.5.12 matches filter 3

Page 47: Integrated Approach to Improving Web Performance

47

Backtracking Search A trie is a binary

branching tree, with each branch labeled 0 or 1

The prefix associated with a node is the concatenation of all the bits from the root to the node

F1 00*

F2 10*

D

E

Page 48: Integrated Approach to Improving Web Performance

48

Backtracking Search (Cont.)

Extend to multiple dimensions

Backtracking is a depth-first traversal of the tree which visits all the nodes satisfying the given constraints

Example: search for [00*,0*,0*]

Page 49: Integrated Approach to Improving Web Performance

49

Trie Compression Algorithm

If a path AB satisfies the Compressible Property: All nodes on its left point to the same place L All nodes on its right point to the same place R

then we compress the entire branches by 3 edges Center edge with value (AB) pointing to B Left edge with value < (AB) pointing to L Right edge with value > (AB) pointing to R

Advantages of compression: save time & storage

Page 50: Integrated Approach to Improving Web Performance

50

Trading Storage for Time Smoothly tradeoff storage for time

Selective push Push down the filters with large

backtracking time Iterate until the worst-case backtracking

time satisfies our requirement

Exponential Time

ExponentialSpace

Page 51: Integrated Approach to Improving Web Performance

51

Example of Selective PushGoal: worst-case memory

accesses < 12 The filter [0*, 0*,

0000*] has 12 memory accesses.

Push the filter down reduce lookup time

Now the search cost of the filter [0*,0*,001*] becomes 12 memory accesses. So we need to push it down. Done!

Page 52: Integrated Approach to Improving Web Performance

52

Using Available Hardware So far, we focus on software techniques for

packet classification. Further improve the performance by taking

advantage of limited hardware if it is available By moving some filters (or rules) from software to

hardware Key issue: Which filters to move from software to

hardware?Answer:

To reduce lookup time, move the filters with the largest number of memory accesses when using software approach

Page 53: Integrated Approach to Improving Web Performance

53

Summary

Approach Description Performance Gain

Trie compression algorithm

Effectively exploit redundancy in trie nodes

Reduce lookup time by a factor of 2 – 5, save storage by a factor of 2.8 – 8.7

Selective push

“Push down” the filters with large backtracking time

Reduce lookup time by 10 – 25% with only marginal increase in storage

Moving filters from software to hardware

Heuristics to move a small number of filters from software to hardware

Moving 10 – 20 rules to hardware cuts storage by 33% - 50%, or lookup time by 10% – 20%

Page 54: Integrated Approach to Improving Web Performance

54

Contributions Application layer

Study Web Workload of busy Web servers Properly provision content distribution

networks Transport layer

Optimize TCP startup performance for short Web transfers

Network layer Speed up packet classification

Page 55: Integrated Approach to Improving Web Performance

55

Other Work Available at

http://www.cs.cornell.edu/lqiu/papers/papers.html Integrating Packet FEC into Adaptive Voice Playout

Buffer Algorithms on the Internet. Proceedings of IEEE INFOCOM'2000, Tel-Aviv, Israel, March 2000.

On Individual and Aggregate TCP Performance. 7th International Conference on Network Protocols (ICNP'99), Toronto, Canada, October 1999.

Understanding the End-to-End Performance Impact of RED in a Heterogeneous Environment. July 2000. Submitted to INFOCOM'2001.

Page 56: Integrated Approach to Improving Web Performance

56

Integrating Packet FEC into Adaptive Voice Playout Buffer Algorithms

Internet telephony are subject to Variable loss rate Variable delay

Previous work has addressed the two problems separately Use FEC for loss recovery Use playout buffer adaptation for

delay jitter compensation

Page 57: Integrated Approach to Improving Web Performance

57

Integrating Packet FEC into Adaptive Voice Playout Buffer Algorithms (Cont.)

Our work Demonstrate the interaction between

playout algorithm and FEC Playout algorithm should depend on both FEC and

network loss conditions and network jitter Propose several playout algorithms that

provide this coupling Demonstrate the effectiveness of the

algorithms through simulations

Page 58: Integrated Approach to Improving Web Performance

58

On Individual and Aggregate TCP Performance Motivation

TCP behavior under many competing TCP connections has not been sufficiently explored

Our work Use extensive simulations to

investigate the individual and aggregate TCP performance for many concurrent connections

Page 59: Integrated Approach to Improving Web Performance

59

On Individual and Aggregate TCP Performance (Cont.) Major findings

All connections have the same rtt Wc > 3*Conn global synchronization Conn < Wc < 3*Conn local synchronization Wc < Conn shut off connections

Adding random processing time synchronization and consistent discrimination less pronounced

Derive the general characterization of overall throughput, goodput, and loss probability

Quantify the roundtrip bias for connections with different RTT

Page 60: Integrated Approach to Improving Web Performance

60

Understanding the End-to-End Performance Impact of RED in a Heterogeneous Environment

Motivation IETF recommends wide spread

deployment of RED in routers Most previous work studies RED in

relatively homogeneous environment Our work

Investigate the interaction of RED with five types of heterogeneity

Page 61: Integrated Approach to Improving Web Performance

61

Understanding the End-to-End Performance Impact of RED in a Heterogeneous Environment (Cont.) Major findings

Mix of short and long TCP connections Short TCP connections get higher goodput with RED than with

Drop Tail Mix of TCP and UDP

Bursty UDP tends to get lower loss rate with RED than with Drop Tail

Mix of ECN and non-ECN capable traffic ECN-capable TCP connections get higher goodput than non-ECN-

capable TCP connections Effect of different RTT

RED reduces the bias against long-RTT bulk transfers Effect of two-way traffic

When ACK path is congested, TCP gets higher goodput with RED than with Drop Tail

Page 62: Integrated Approach to Improving Web Performance

62

Effects of Imperfect Knowledge about Input Data

Page 63: Integrated Approach to Improving Web Performance

63

Effects of Imperfect Knowledge about Input Data (Cont.)

The effect of imperfect topology information

Randomly remove from 0 up to 50% edges in the AS topology derived from the BGP routing tables

The greedy algorithm is insensitive to edge removal

Performs within 2.6 of optimal when the edge removal is 50%