52
Web Caching Schemes 1 A Survey of Web Caching Schemes for the Internet Jia Wang

A Survey of Web Caching Schemes for the Internet

  • Upload
    zurina

  • View
    58

  • Download
    0

Embed Size (px)

DESCRIPTION

A Survey of Web Caching Schemes for the Internet. Jia Wang. Agenda. The World Wide Web Problem and solution (caching) Proxy servers Advantages of web caching Disadvantages of web caching Elements of A WWW caching system Desirable properties of WWW caching system - PowerPoint PPT Presentation

Citation preview

Page 1: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 1

A Survey of Web Caching Schemes for the Internet

Jia Wang

Page 2: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 2

AgendaThe World Wide WebProblem and solution (caching)Proxy serversAdvantages of web cachingDisadvantages of web cachingElements of A WWW caching systemDesirable properties of WWW caching systemProblems in designing caching systems for the WWWCaching architecture

Page 3: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 3

The World Wide WebThe WWW can be considered as a large distributed information system.Exponential growth in size.On may 1999 included 600 millions of static web pages.Increases 15% per month.Very popular.

Page 4: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 4

0

100

200

300

400

500

600

Jun-97 Nov-97 Mar-98 May-99

SIZE OF DISTINCT STATIC WEB PAGES

Page 5: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 5

The World Wide WebUsage is relatively inexpensiveAccessing information is very fastDocuments appeal to a wide range of interests

But…..

Page 6: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 6

The World Wide WebNetwork congestion

Server overloading

Page 7: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 7

ProblemInternet backbone capacity increases 60% per year.Bandwidth is not growing fast enough.

Without solution WWW will become too congested and its entire appeal would be lost.

Page 8: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 8

SolutionCaching:

Placing popular objects at locations close to the clients.

Page 9: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 9

proxy serversHTTP servers handled by companies for security reasons.

The bottleneck of the connection between the client and the internet.

Shared by all clients inside the firewall.

Page 10: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 10

Clients within the firewall

Proxy on firewall machine

HTTP server

HTTP server

HTTP server

Page 11: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 11

proxy servers

Belonging to same organization, clients

share common interests.

They probably access the same set of

documents.

Page 12: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 12

thus

On the proxy server, a previously

requested and cached documents

would likely result in future hits.

Page 13: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 13

proxy seversCaching most popular web pages on the proxy server can:

Save network bandwidth Lower access latency for the client

Page 14: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 14

Advantages of web caching Reduces bandwidth consumption

Decreases network traffic

Lessens network congestion

Access latency:frequently used docs are cached nearbyless traffic shorter delay for docs not cached

Page 15: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 15

Advantages of web caching (cont.)

Reduces workload of remote serverData can be accessed when remote server is down (enhanced robustness).Allows analysis of organization usage patterns

cooperation between caches increases efficiency.

Page 16: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 16

Disadvantages of web caching

Data not updated automaticallyCache miss can cause increase in latency (extra proxy processing).Bottleneck effect – limit # of clients per proxy.A single proxy is a single point of failureInformation providers can not monitor # of visits per site.

Page 17: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 17

Elements of A WWW caching system

Documents can be cached at the clients, the proxies and the servers.

Web server Web server

Proxy server cooperation

clients clients

clients

Page 18: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 18

Elements of a WWW caching system

request

client

Does proxy have requested page yes

no

Does cooperative proxies have web page

yes no

Find web page on server

Page 19: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 19

Desirable properties of WWW caching system

fast access

robustness

transparency

scalability

efficiency

adaptivity

stability

load balance

ability to deal with heterogeneity

simplicity

Page 20: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 20

Fast accessReduce web access latency to a minimum.

Especially comparing to other servers not using caching techniques.

Page 21: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 21

RobustnessRobustness = Availability to usereliminate single point failurein case of failure – fall down gracefullyeasy to recover from failure

Page 22: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 22

TransparencyTransparent to the user

The user should only notice:Faster response Higher availability

Page 23: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 23

ScalabilityScale well along the increasing size and density of the network.All protocols should be as lightweight as possible.

Page 24: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 24

Efficiencyimpose minimal additional burden on the network (in control & data packets)do not adopt any scheme which leads to under-utilization of the network

Page 25: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 25

Adaptivityadapt to dynamic changing in the user demand and network environment

achieve optimal performance

Page 26: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 26

StabilityDo not introduce

instabilities into

the network

Page 27: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 27

Load balancingdistribute load evenly through the entire network

no bottlenecks / hot-spots

Page 28: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 28

Ability to deal with heterogeneity

Adapt to a range

of network

architecture

(hardware &

software)

Page 29: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 29

SimplicityMechanism simple to deploysimpler schemes are easier to implement and likely to be accepted as international standards

Page 30: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 30

What Problems do we face in designing caching systems for the WWW ???

Page 31: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 31

Problems in designing caching systems for the

WWWCaching system architecturehow cache proxies

are organized – hierarchically, distributed or hybrid.

Page 32: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 32

Problems in designing caching systems for the

WWWProxy placement were to place a

cache proxy in order to optimize performance

Page 33: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 33

Problems in designing caching systems for the

WWWCaching contentsWhat can be

cached in the caching system

Page 34: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 34

Problems in designing caching systems for the

WWWProxy cooperationHow do proxies

cooperate with each other

Page 35: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 35

Problems in designing caching systems for the

WWWData sharingwhat kind of

data/information can be shared among among cooperative proxies

Page 36: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 36

Problems in designing caching systems for the

WWWCache resolution/routinghow does a proxy

decide where to fetch a page requested by a client.

Page 37: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 37

Problems in designing caching systems for the

WWWPrefetchingHow does a proxy

decide what and when to prefetch from webservers or other proxies to reduce access latency.

Page 38: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 38

Problems in designing caching systems for the

WWWCache placement/ replacementhow the proxy decides

which page to be stored in its cache and which page to be removed from it.

Page 39: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 39

Problems in designing caching systems for the

WWWCache coherencyhow does a proxy

maintain data consistency

Page 40: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 40

Problems in designing caching systems for the

WWWControl information distributionhow is the control

information (e.g URL) distributed among proxies.

Page 41: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 41

Problems in designing caching systems for the

WWWDynamic data cachinghow to deal with

data that is not cachable

Page 42: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 42

Caching architectureHierarchicalCaches are placed at multiple levels of the

network.

national

regionalinstitutional

bottom

Page 43: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 43

Hierarchicalarchitecture

Bottom – clients/browsers caches.

national

regionalinstitutional

bottom web page not found

web page not found

web page not found

Page 44: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 44

Hierarchicalarchitecture

after web page is found

national

regionalinstitutional

bottom

forward page, leave copy

forward page, leave copy

forward page, leave copy

Page 45: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 45

Hierarchical architecture

Advantages:Bandwidth efficient – especially when

cache servers are slow.Allows to efficiently diffuse popular

web pages towards the demand.

Page 46: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 46

Hierarchical architectureDisadvantagesCache server needs to be placed at key

access points of the network requires coordination among caches.

Each level adds a delay.High levels are bottlenecks.multiple copies at different cache levels.

Page 47: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 47

Distributed architectureCaches at the bottom level only.No other intermediate caching levels.Each cache server contains meta-data on the data stored on other servers.Hierarchy used only for distributing information about location of the copy.No copying of actual documents.

Page 48: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 48

Advantages:Traffic flows through low network

levels which are less congested.No additional disk space required for

intermediate network levels.Better load sharing.More fault tolerant.

Distributed architecture

Page 49: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 49

Disadvantages:High connection timesHigher bandwidth usageAdministrative issues.

Distributed architecture

Page 50: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 50

ExamplesICP – Internet Cache Protocol (Harvest group)

Retrieve data from neighboring caches + parent caches

CARP – Cache Array Routing ProtocolURL space divided to an array of caches.Each cache stores only documents whose

URL are hashed to it.

Distributed architecture

Page 51: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 51

Hybrid architectureCaches may cooperate with other caches at the same level or at a higher level using distributed caching.

ICP is an example: the document is fetched from a parent/neighbor

cache that has the lowest RTT.

Page 52: A Survey of Web Caching Schemes for the Internet

Web Caching Schemes 52

Performance of architecturesHierarchical caching has shorter connection times than distributed caching.

Additional copies at intermediate level reduces retrieval latency for small documents.

Distributed caching has shorter transmission times & higher bandwidth usage.

“Well configured” hybrid scheme can reduce both connection time and transmission time.