43
Squirrel: A peer-to-peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002 / Sitaram Iyer / Tuesday July 23 / Monterey, CA

Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Embed Size (px)

Citation preview

Page 1: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Squirrel: A peer-to-peer web

cacheSitaram Iyer (Rice University)

Joint work withAnt Rowstron (MSR Cambridge)Peter Druschel (Rice University)

PODC 2002 / Sitaram Iyer / Tuesday July 23 / Monterey, CA

Page 2: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Web Caching

1. Latency, 2. External traffic,3. Load on web servers and routers.

Deployed at: Corporate network boundaries, ISPs, Web Servers, etc.

Page 3: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Centralized

Web Cache

Web Cache

BrowserBrowser

Cache

WebServer

BrowserBrowser

Cache

Client

Client

InternetCorporate LAN

Page 4: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

InternetCorporate LAN

Cooperative Web Cache

BrowserBrowser

Cache

WebServer

BrowserBrowser

Cache

Client

ClientWeb

Cache

Web Cache

Web Cache

Web Cache

Web Cache

Page 5: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Internet

Decentralized Web Cache

Browser

WebServer

BrowserBrowser

Cache

Client

Client

Corporate LAN

Browser

Cache

Squirrel

Page 6: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Distributed Hash Table

Peer-to-peer location service: Pastry

• Completely decentralized and self-organizing• Fault-tolerant, scalable, efficient

Operations:

Insert(k,v)Lookup(k)

k6,v6

k1,v1

k5,v5

k2,v2

k4,v4

k3,v3

nodes

<key,value>

Peer-to-peer routing and

location substrate

Page 7: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Why peer-to-peer?

1. Cost of dedicated web cache No additional hardware

2. Administrative effortSelf-organizing network

3. Scaling implies upgrading Resources grow with clients

Page 8: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Setting

•Corporate LAN

•100 - 100,000 desktop machines

•Located in a single building or campus

•Each node runs an instance of

Squirrel•Sets it as the browser’s proxy

Page 9: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Mapping Squirrel onto Pastry

Two approaches:

• Home-store

• Directory

Page 10: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Home-store model

client

homeLANInternet

URL hash

Page 11: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Home-store model

client

home

…that’s how it works!

Page 12: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Directory model

Client nodes always cache objects locally.

Home-store: home node also stores objects.

Directory: the home node only stores pointers to recent clients, and forwards requests.

Page 13: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Directory model

client

home

InternetLAN

Page 14: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Directory model

client

homeRandomly choose

entry from table

Page 15: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Directory: Advantages

Avoids storing unnecessary copies of objects.

Rapidly changing directory for popular objects seems to improve load balancing.

Home-store scheme can incur hotspots.

Page 16: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Directory: Disadvantages

Cache insertion only happens at clients, so:

• active clients store all the popular objects,

• inactive clients waste most of their storage.

Implications:1. Reduced cache size.2. Load imbalance.

Page 17: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Directory: Load spike example

• Web page with many embedded images, or

• Periods of heavy browsing.

Many home nodes point to such clients!

Evaluate …

Page 18: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Trace characteristics

Microsoft in : Redmond

Cambridge

Total duration 1 day 31 days

Number of clients 36,782 105

Number of HTTP requests

16.41 million

0.971 million

Peak request rate606 req/sec

186 req/sec

Number of objects5.13 million

0.469 million

Number of cacheable objects

2.56 million

0.226 million

Mean cacheable object reuse

5.4 times 3.22 times

Page 19: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Total external traffic

85

90

95

100

105

0.001 0.01 0.1 1 10 100

Directory

Home-store

No web cache

Centralized cache

Redm

ond

[low

er

is b

ett

er]

Per-node cache size (in MB)Tota

l exte

rnal tr

affi

c (

GB

)

Page 20: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Total external traffic

5.5

5.6

5.7

5.8

5.9

6

6.1

0.001 0.01 0.1 1 10 100

Tota

l exte

rnal tr

affi

c (

GB

)[l

ow

er

is b

ett

er]

Per-node cache size (in MB)

Directory

Home-store

No web cache

Centralized cache

Cam

brid

ge

Page 21: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

LAN Hops

0%

20%

40%

60%

80%

100%

0 1 2 3 4 5 6

Total hops within the LAN

Redm

ond

Centralized Home-store Directory

% o

f cach

eab

le r

eq

uests

Page 22: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

LAN Hops

0%

20%

40%

60%

80%

100%

0 1 2 3 4 5

% o

f cach

eab

le r

eq

uests

Centralized Home-store Directory

Cam

brid

ge

Total hops within the LAN

Page 23: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Load in requests per sec

1

10

100

1000

10000

100000

0 10 20 30 40 50

Nu

mb

er

of

tim

es o

bserv

ed

Max objects served per-node / second

Home-storeDirectoryRed

mon

d

Page 24: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Load in requests per sec

1

10

100

1000

10000

100000

1e+06

1e+07

0 10 20 30 40 50

Nu

mb

er

of

tim

es o

bserv

ed

Max objects served per-node / second

Home-storeDirectoryCa

mbr

idge

Page 25: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Load in requests per min

1

10

100

0 50 100 150 200 250 300 350

Nu

mb

er

of

tim

es o

bserv

ed

Max objects served per-node / minute

Home-storeDirectoryRed

mon

d

Page 26: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Load in requests per min

1

10

100

1000

10000

0 20 40 60 80 100 120

Nu

mb

er

of

tim

es o

bserv

ed

Max objects served per-node / minute

Home-storeDirectoryCa

mbr

idge

Page 27: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Fault tolerance

Sudden node failures result inpartial loss of cached content.

Home-store: Proportional to failed nodes.

Directory: More vulnerable.

Page 28: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Fault tolerance

Home-store

Directory

Redmond

Mean 1%

Max 1.77%

Mean 1.71%

Max 19.3%

Cambridge

Mean 1%

Max 3.52%

Mean 1.65%

Max 9.8%

If 1% of Squirrel nodes abruptly crash, the fraction of lost cached content is:

Page 29: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Conclusions

• Possible to decentralize web caching.

• Performance comparable to a centralized web cache,

• Is better in terms of cost, scalability, and administration effort, and

• Under our assumptions, the home-store scheme is superior to the directory scheme.

Page 30: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Other aspects of Squirrel

•Adaptive replication–Hotspot avoidance– Improved robustness

•Route caching–Fewer LAN hops

Page 31: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Thanks.

Page 32: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

(backup) Storage utilization

Redmond Home-store Directory

Total 97641 MB 61652 MB

Mean per-node 2.6 MB 1.6 MB

Max per-node 1664 MB 1664 MB

Page 33: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

(backup) Fault tolerance

Home-store Directory

EquationsMean H/OMax Hmax /O

Mean (H+S)/OMax max(Hmax,Smax)/O

Redmond

Mean 0.0027%Max 0.0048%

Mean 0.198%Max 1.5%

Cambridge

Mean 0.95%Max 3.34%

Mean 1.68%Max 12.4%

Page 34: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

(backup) Full home-store protocol

server

client

otherother

req

home

req

req

a : object or notmod from home

b : object or notmod from origin3

1

b2

(WAN)(LAN)

origin

b : req

Page 35: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

(backup) Full directory protocol

dir

server

servere : cGET req

origin

origin

otherother

req

home

req

client

req

2

b : not-modified

3

e3

21c ,e : req

c ,e : object1

4a , d

2a , d : req 1a : no dir, go to origin. Also d2

3

1

not-modifiedobject or

dele-gate

Page 36: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

(backup) Peer-to-peer Computing

Decentralize a distributed protocol:– Scalable

– Self-organizing

– Fault tolerant

– Load balanced

Not automatic!!

Page 37: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Decentralized Web Cache

Browser

Browser

Browser

Cache

Browser

Cache

Web

Server

LAN Internet

Page 38: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Challenge

Decentralized web caching algorithm:

Need to achieve those benefits in practice!

Need to keep overhead unnoticeably low.

Node failures should not become significant.

Page 39: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Peer-to-peer routing, e.g., Pastry

Peer-to-peer object location and routing substrate = Distributed Hash Table.

Reliably maps an object key to a live node.

Routes in log16(N) steps

(e.g. 3-4 steps for 100,000 nodes)

Page 40: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Home-store is better!

Simpler home-store scheme achieves load balancing by hash function randomization.

Directory scheme implicitly relies on access patterns for load distribution.

Page 41: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Directory scheme seems better…

Avoids storing unnecessary copies of objects.

Rapidly changing directory for popular objects results in load balancing.

Page 42: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Interesting difference

Consider:– Web page with many images, or– Heavily browsing node

Directory: many pointers to some

node.

Home-store: natural load balancing.

Evaluate …

Page 43: Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002

Fault tolerance

Home-store

Directory

Redmond

Mean

0.0027%

Max 0.0048%

Mean 0.2%

Max 1.5%

Cambridge

Mean

0.95%

Max 3.34%

Mean 1.7%

Max 12.4%

When a single Squirrel node crashes, the fraction of lost cached content is: