Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR...

Preview:

Citation preview

Squirrel: A peer-to-peer web

cacheSitaram Iyer (Rice University)

Joint work withAnt Rowstron (MSR Cambridge)Peter Druschel (Rice University)

PODC 2002 / Sitaram Iyer / Tuesday July 23 / Monterey, CA

Web Caching

1. Latency, 2. External traffic,3. Load on web servers and routers.

Deployed at: Corporate network boundaries, ISPs, Web Servers, etc.

Centralized

Web Cache

Web Cache

BrowserBrowser

Cache

WebServer

BrowserBrowser

Cache

Client

Client

InternetCorporate LAN

InternetCorporate LAN

Cooperative Web Cache

BrowserBrowser

Cache

WebServer

BrowserBrowser

Cache

Client

ClientWeb

Cache

Web Cache

Web Cache

Web Cache

Web Cache

Internet

Decentralized Web Cache

Browser

WebServer

BrowserBrowser

Cache

Client

Client

Corporate LAN

Browser

Cache

Squirrel

Distributed Hash Table

Peer-to-peer location service: Pastry

• Completely decentralized and self-organizing• Fault-tolerant, scalable, efficient

Operations:

Insert(k,v)Lookup(k)

k6,v6

k1,v1

k5,v5

k2,v2

k4,v4

k3,v3

nodes

<key,value>

Peer-to-peer routing and

location substrate

Why peer-to-peer?

1. Cost of dedicated web cache No additional hardware

2. Administrative effortSelf-organizing network

3. Scaling implies upgrading Resources grow with clients

Setting

•Corporate LAN

•100 - 100,000 desktop machines

•Located in a single building or campus

•Each node runs an instance of

Squirrel•Sets it as the browser’s proxy

Mapping Squirrel onto Pastry

Two approaches:

• Home-store

• Directory

Home-store model

client

homeLANInternet

URL hash

Home-store model

client

home

…that’s how it works!

Directory model

Client nodes always cache objects locally.

Home-store: home node also stores objects.

Directory: the home node only stores pointers to recent clients, and forwards requests.

Directory model

client

home

InternetLAN

Directory model

client

homeRandomly choose

entry from table

Directory: Advantages

Avoids storing unnecessary copies of objects.

Rapidly changing directory for popular objects seems to improve load balancing.

Home-store scheme can incur hotspots.

Directory: Disadvantages

Cache insertion only happens at clients, so:

• active clients store all the popular objects,

• inactive clients waste most of their storage.

Implications:1. Reduced cache size.2. Load imbalance.

Directory: Load spike example

• Web page with many embedded images, or

• Periods of heavy browsing.

Many home nodes point to such clients!

Evaluate …

Trace characteristics

Microsoft in : Redmond

Cambridge

Total duration 1 day 31 days

Number of clients 36,782 105

Number of HTTP requests

16.41 million

0.971 million

Peak request rate606 req/sec

186 req/sec

Number of objects5.13 million

0.469 million

Number of cacheable objects

2.56 million

0.226 million

Mean cacheable object reuse

5.4 times 3.22 times

Total external traffic

85

90

95

100

105

0.001 0.01 0.1 1 10 100

Directory

Home-store

No web cache

Centralized cache

Redm

ond

[low

er

is b

ett

er]

Per-node cache size (in MB)Tota

l exte

rnal tr

affi

c (

GB

)

Total external traffic

5.5

5.6

5.7

5.8

5.9

6

6.1

0.001 0.01 0.1 1 10 100

Tota

l exte

rnal tr

affi

c (

GB

)[l

ow

er

is b

ett

er]

Per-node cache size (in MB)

Directory

Home-store

No web cache

Centralized cache

Cam

brid

ge

LAN Hops

0%

20%

40%

60%

80%

100%

0 1 2 3 4 5 6

Total hops within the LAN

Redm

ond

Centralized Home-store Directory

% o

f cach

eab

le r

eq

uests

LAN Hops

0%

20%

40%

60%

80%

100%

0 1 2 3 4 5

% o

f cach

eab

le r

eq

uests

Centralized Home-store Directory

Cam

brid

ge

Total hops within the LAN

Load in requests per sec

1

10

100

1000

10000

100000

0 10 20 30 40 50

Nu

mb

er

of

tim

es o

bserv

ed

Max objects served per-node / second

Home-storeDirectoryRed

mon

d

Load in requests per sec

1

10

100

1000

10000

100000

1e+06

1e+07

0 10 20 30 40 50

Nu

mb

er

of

tim

es o

bserv

ed

Max objects served per-node / second

Home-storeDirectoryCa

mbr

idge

Load in requests per min

1

10

100

0 50 100 150 200 250 300 350

Nu

mb

er

of

tim

es o

bserv

ed

Max objects served per-node / minute

Home-storeDirectoryRed

mon

d

Load in requests per min

1

10

100

1000

10000

0 20 40 60 80 100 120

Nu

mb

er

of

tim

es o

bserv

ed

Max objects served per-node / minute

Home-storeDirectoryCa

mbr

idge

Fault tolerance

Sudden node failures result inpartial loss of cached content.

Home-store: Proportional to failed nodes.

Directory: More vulnerable.

Fault tolerance

Home-store

Directory

Redmond

Mean 1%

Max 1.77%

Mean 1.71%

Max 19.3%

Cambridge

Mean 1%

Max 3.52%

Mean 1.65%

Max 9.8%

If 1% of Squirrel nodes abruptly crash, the fraction of lost cached content is:

Conclusions

• Possible to decentralize web caching.

• Performance comparable to a centralized web cache,

• Is better in terms of cost, scalability, and administration effort, and

• Under our assumptions, the home-store scheme is superior to the directory scheme.

Other aspects of Squirrel

•Adaptive replication–Hotspot avoidance– Improved robustness

•Route caching–Fewer LAN hops

Thanks.

(backup) Storage utilization

Redmond Home-store Directory

Total 97641 MB 61652 MB

Mean per-node 2.6 MB 1.6 MB

Max per-node 1664 MB 1664 MB

(backup) Fault tolerance

Home-store Directory

EquationsMean H/OMax Hmax /O

Mean (H+S)/OMax max(Hmax,Smax)/O

Redmond

Mean 0.0027%Max 0.0048%

Mean 0.198%Max 1.5%

Cambridge

Mean 0.95%Max 3.34%

Mean 1.68%Max 12.4%

(backup) Full home-store protocol

server

client

otherother

req

home

req

req

a : object or notmod from home

b : object or notmod from origin3

1

b2

(WAN)(LAN)

origin

b : req

(backup) Full directory protocol

dir

server

servere : cGET req

origin

origin

otherother

req

home

req

client

req

2

b : not-modified

3

e3

21c ,e : req

c ,e : object1

4a , d

2a , d : req 1a : no dir, go to origin. Also d2

3

1

not-modifiedobject or

dele-gate

(backup) Peer-to-peer Computing

Decentralize a distributed protocol:– Scalable

– Self-organizing

– Fault tolerant

– Load balanced

Not automatic!!

Decentralized Web Cache

Browser

Browser

Browser

Cache

Browser

Cache

Web

Server

LAN Internet

Challenge

Decentralized web caching algorithm:

Need to achieve those benefits in practice!

Need to keep overhead unnoticeably low.

Node failures should not become significant.

Peer-to-peer routing, e.g., Pastry

Peer-to-peer object location and routing substrate = Distributed Hash Table.

Reliably maps an object key to a live node.

Routes in log16(N) steps

(e.g. 3-4 steps for 100,000 nodes)

Home-store is better!

Simpler home-store scheme achieves load balancing by hash function randomization.

Directory scheme implicitly relies on access patterns for load distribution.

Directory scheme seems better…

Avoids storing unnecessary copies of objects.

Rapidly changing directory for popular objects results in load balancing.

Interesting difference

Consider:– Web page with many images, or– Heavily browsing node

Directory: many pointers to some

node.

Home-store: natural load balancing.

Evaluate …

Fault tolerance

Home-store

Directory

Redmond

Mean

0.0027%

Max 0.0048%

Mean 0.2%

Max 1.5%

Cambridge

Mean

0.95%

Max 3.34%

Mean 1.7%

Max 12.4%

When a single Squirrel node crashes, the fraction of lost cached content is:

Recommended