43
Squirrel: A peer-to-peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002 / Sitaram Iyer / Tuesday July 23 / Monterey, CA

Squirrel: A peer-to-peer web cache

  • Upload
    meg

  • View
    55

  • Download
    2

Embed Size (px)

DESCRIPTION

Squirrel: A peer-to-peer web cache. Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University). PODC 2002 / Sitaram Iyer / Tuesday July 23 / Monterey, CA. Web Caching. Latency, External traffic, Load on web servers and routers. - PowerPoint PPT Presentation

Citation preview

Page 1: Squirrel: A peer-to-peer web cache

Squirrel: A peer-to-peer web

cacheSitaram Iyer (Rice University)

Joint work withAnt Rowstron (MSR Cambridge)Peter Druschel (Rice University)

PODC 2002 / Sitaram Iyer / Tuesday July 23 / Monterey, CA

Page 2: Squirrel: A peer-to-peer web cache

Web Caching

1. Latency, 2. External traffic,3. Load on web servers and routers.

Deployed at: Corporate network boundaries, ISPs, Web Servers, etc.

Page 3: Squirrel: A peer-to-peer web cache

Centralized

Web Cache

Web Cache

BrowserBrowser

Cache

WebServer

BrowserBrowser

Cache

Client

Client

InternetCorporate LAN

Page 4: Squirrel: A peer-to-peer web cache

InternetCorporate LAN

Cooperative Web Cache

BrowserBrowser

Cache

WebServer

BrowserBrowser

Cache

Client

ClientWeb

Cache

Web Cache

Web Cache

Web Cache

Web Cache

Page 5: Squirrel: A peer-to-peer web cache

Internet

Decentralized Web Cache

Browser

WebServer

BrowserBrowser

Cache

Client

Client

Corporate LAN

Browser

Cache

Squirrel

Page 6: Squirrel: A peer-to-peer web cache

Distributed Hash Table

Peer-to-peer location service: Pastry

• Completely decentralized and self-organizing• Fault-tolerant, scalable, efficient

Operations:

Insert(k,v)Lookup(k)

k6,v6

k1,v1

k5,v5

k2,v2

k4,v4

k3,v3

nodes

<key,value>

Peer-to-peer routing and

location substrate

Page 7: Squirrel: A peer-to-peer web cache

Why peer-to-peer?

1. Cost of dedicated web cache No additional hardware

2. Administrative effortSelf-organizing network

3. Scaling implies upgrading Resources grow with clients

Page 8: Squirrel: A peer-to-peer web cache

Setting

•Corporate LAN

•100 - 100,000 desktop machines

•Located in a single building or campus

•Each node runs an instance of

Squirrel•Sets it as the browser’s proxy

Page 9: Squirrel: A peer-to-peer web cache

Mapping Squirrel onto Pastry

Two approaches:

• Home-store

• Directory

Page 10: Squirrel: A peer-to-peer web cache

Home-store model

client

homeLANInternet

URL hash

Page 11: Squirrel: A peer-to-peer web cache

Home-store model

client

home

…that’s how it works!

Page 12: Squirrel: A peer-to-peer web cache

Directory model

Client nodes always cache objects locally.

Home-store: home node also stores objects.

Directory: the home node only stores pointers to recent clients, and forwards requests.

Page 13: Squirrel: A peer-to-peer web cache

Directory model

client

home

InternetLAN

Page 14: Squirrel: A peer-to-peer web cache

Directory model

client

homeRandomly choose

entry from table

Page 15: Squirrel: A peer-to-peer web cache

Directory: Advantages

Avoids storing unnecessary copies of objects.

Rapidly changing directory for popular objects seems to improve load balancing.

Home-store scheme can incur hotspots.

Page 16: Squirrel: A peer-to-peer web cache

Directory: Disadvantages

Cache insertion only happens at clients, so:

• active clients store all the popular objects,

• inactive clients waste most of their storage.

Implications:1. Reduced cache size.2. Load imbalance.

Page 17: Squirrel: A peer-to-peer web cache

Directory: Load spike example

• Web page with many embedded images, or

• Periods of heavy browsing.

Many home nodes point to such clients!

Evaluate …

Page 18: Squirrel: A peer-to-peer web cache

Trace characteristics

Microsoft in : Redmond

Cambridge

Total duration 1 day 31 days

Number of clients 36,782 105

Number of HTTP requests

16.41 million

0.971 million

Peak request rate606 req/sec

186 req/sec

Number of objects5.13 million

0.469 million

Number of cacheable objects

2.56 million

0.226 million

Mean cacheable object reuse

5.4 times 3.22 times

Page 19: Squirrel: A peer-to-peer web cache

Total external traffic

85

90

95

100

105

0.001 0.01 0.1 1 10 100

Directory

Home-store

No web cache

Centralized cache

Redm

ond

[low

er

is b

ett

er]

Per-node cache size (in MB)Tota

l exte

rnal tr

affi

c (

GB

)

Page 20: Squirrel: A peer-to-peer web cache

Total external traffic

5.5

5.6

5.7

5.8

5.9

6

6.1

0.001 0.01 0.1 1 10 100

Tota

l exte

rnal tr

affi

c (

GB

)[l

ow

er

is b

ett

er]

Per-node cache size (in MB)

Directory

Home-store

No web cache

Centralized cache

Cam

brid

ge

Page 21: Squirrel: A peer-to-peer web cache

LAN Hops

0%

20%

40%

60%

80%

100%

0 1 2 3 4 5 6

Total hops within the LAN

Redm

ond

Centralized Home-store Directory

% o

f cach

eab

le r

eq

uests

Page 22: Squirrel: A peer-to-peer web cache

LAN Hops

0%

20%

40%

60%

80%

100%

0 1 2 3 4 5

% o

f cach

eab

le r

eq

uests

Centralized Home-store Directory

Cam

brid

ge

Total hops within the LAN

Page 23: Squirrel: A peer-to-peer web cache

Load in requests per sec

1

10

100

1000

10000

100000

0 10 20 30 40 50

Nu

mb

er

of

tim

es o

bserv

ed

Max objects served per-node / second

Home-storeDirectoryRed

mon

d

Page 24: Squirrel: A peer-to-peer web cache

Load in requests per sec

1

10

100

1000

10000

100000

1e+06

1e+07

0 10 20 30 40 50

Nu

mb

er

of

tim

es o

bserv

ed

Max objects served per-node / second

Home-storeDirectoryCa

mbr

idge

Page 25: Squirrel: A peer-to-peer web cache

Load in requests per min

1

10

100

0 50 100 150 200 250 300 350

Nu

mb

er

of

tim

es o

bserv

ed

Max objects served per-node / minute

Home-storeDirectoryRed

mon

d

Page 26: Squirrel: A peer-to-peer web cache

Load in requests per min

1

10

100

1000

10000

0 20 40 60 80 100 120

Nu

mb

er

of

tim

es o

bserv

ed

Max objects served per-node / minute

Home-storeDirectoryCa

mbr

idge

Page 27: Squirrel: A peer-to-peer web cache

Fault tolerance

Sudden node failures result inpartial loss of cached content.

Home-store: Proportional to failed nodes.

Directory: More vulnerable.

Page 28: Squirrel: A peer-to-peer web cache

Fault tolerance

Home-store

Directory

Redmond

Mean 1%

Max 1.77%

Mean 1.71%

Max 19.3%

Cambridge

Mean 1%

Max 3.52%

Mean 1.65%

Max 9.8%

If 1% of Squirrel nodes abruptly crash, the fraction of lost cached content is:

Page 29: Squirrel: A peer-to-peer web cache

Conclusions

• Possible to decentralize web caching.

• Performance comparable to a centralized web cache,

• Is better in terms of cost, scalability, and administration effort, and

• Under our assumptions, the home-store scheme is superior to the directory scheme.

Page 30: Squirrel: A peer-to-peer web cache

Other aspects of Squirrel

•Adaptive replication–Hotspot avoidance– Improved robustness

•Route caching–Fewer LAN hops

Page 31: Squirrel: A peer-to-peer web cache

Thanks.

Page 32: Squirrel: A peer-to-peer web cache

(backup) Storage utilization

Redmond Home-store Directory

Total 97641 MB 61652 MB

Mean per-node 2.6 MB 1.6 MB

Max per-node 1664 MB 1664 MB

Page 33: Squirrel: A peer-to-peer web cache

(backup) Fault tolerance

Home-store Directory

EquationsMean H/OMax Hmax /O

Mean (H+S)/OMax max(Hmax,Smax)/O

Redmond

Mean 0.0027%Max 0.0048%

Mean 0.198%Max 1.5%

Cambridge

Mean 0.95%Max 3.34%

Mean 1.68%Max 12.4%

Page 34: Squirrel: A peer-to-peer web cache

(backup) Full home-store protocol

server

client

otherother

req

home

req

req

a : object or notmod from home

b : object or notmod from origin3

1

b2

(WAN)(LAN)

origin

b : req

Page 35: Squirrel: A peer-to-peer web cache

(backup) Full directory protocol

dir

server

servere : cGET req

origin

origin

otherother

req

home

req

client

req

2

b : not-modified

3

e3

21c ,e : req

c ,e : object1

4a , d

2a , d : req 1a : no dir, go to origin. Also d2

3

1

not-modifiedobject or

dele-gate

Page 36: Squirrel: A peer-to-peer web cache

(backup) Peer-to-peer Computing

Decentralize a distributed protocol:– Scalable

– Self-organizing

– Fault tolerant

– Load balanced

Not automatic!!

Page 37: Squirrel: A peer-to-peer web cache

Decentralized Web Cache

Browser

Browser

Browser

Cache

Browser

Cache

Web

Server

LAN Internet

Page 38: Squirrel: A peer-to-peer web cache

Challenge

Decentralized web caching algorithm:

Need to achieve those benefits in practice!

Need to keep overhead unnoticeably low.

Node failures should not become significant.

Page 39: Squirrel: A peer-to-peer web cache

Peer-to-peer routing, e.g., Pastry

Peer-to-peer object location and routing substrate = Distributed Hash Table.

Reliably maps an object key to a live node.

Routes in log16(N) steps

(e.g. 3-4 steps for 100,000 nodes)

Page 40: Squirrel: A peer-to-peer web cache

Home-store is better!

Simpler home-store scheme achieves load balancing by hash function randomization.

Directory scheme implicitly relies on access patterns for load distribution.

Page 41: Squirrel: A peer-to-peer web cache

Directory scheme seems better…

Avoids storing unnecessary copies of objects.

Rapidly changing directory for popular objects results in load balancing.

Page 42: Squirrel: A peer-to-peer web cache

Interesting difference

Consider:– Web page with many images, or– Heavily browsing node

Directory: many pointers to some

node.

Home-store: natural load balancing.

Evaluate …

Page 43: Squirrel: A peer-to-peer web cache

Fault tolerance

Home-store

Directory

Redmond

Mean

0.0027%

Max 0.0048%

Mean 0.2%

Max 1.5%

Cambridge

Mean

0.95%

Max 3.34%

Mean 1.7%

Max 12.4%

When a single Squirrel node crashes, the fraction of lost cached content is: