Design, Implementation and Evaluation of Differentiated Caching
30
Design, Implementation and Evaluation of Differentiated Caching Services Ying Lu, Tarek F. Abdelzaher and Avneesh Saxena Department of Computer Science University of Virginia ying,zaher,avneesh @cs.virginia.edu Abstract With the dramatic explosion of online information, the Internet is undergoing a transition from a data communication infrastructure to a global information utility. PDAs, wireless phones, web- enabled vehicles, modem PCs and high-end workstations can be viewed as appliances that “plug- in” to this utility for information. The increasing diversity of such appliances calls for an archi- tecture for performance differentiation of information access. The key performance accelerator on the Internet is the caching and content distribution infrastructure. While many research efforts addressed performance differentiation in the network and on web servers, providing multiple levels of service in the caching system has received much less attention. This paper has two main contributions. First, we describe, implement, and evaluate an archi- tecture for differentiated content caching services as a key element of the Internet content distri- bution architecture. Second, we describe a control-theoretical approach that lays well-understood theoretical foundations for resource management to achieve performance differentiation in proxy caches. An experimental study using the Squid proxy cache shows that differentiated caching ser- vices provide significantly better performance to the premium content classes. Keywords: Web Caching, Control Theory, Content Distribution, Differentiated Services, QoS. The work reported in this paper was supported in part by NSF grants CCR-0093144, ANI-0105873, and CCR- 0208769. 1
Design, Implementation and Evaluation of Differentiated Caching
Ying Lu, Tarek F. Abdelzaher and Avneesh Saxena Department of
Computer Science
University of Virginia ying,zaher,avneesh @cs.virginia.edu
Abstract
With the dramatic explosion of online information, the Internet is
undergoing a transition from a
data communication infrastructure to a global information utility.
PDAs, wireless phones, web-
enabled vehicles, modem PCs and high-end workstations can be viewed
as appliances that “plug-
in” to this utility for information. The increasing diversity of
such appliances calls for an archi-
tecture for performance differentiation of information access. The
key performance accelerator
on the Internet is the caching and content distribution
infrastructure. While many research efforts
addressed performance differentiation in the network and on web
servers, providing multiple levels
of service in the caching system has received much less
attention.
This paper has two main contributions. First, we describe,
implement, and evaluate an archi-
tecture for differentiated content caching services as a key
element of the Internet content distri-
bution architecture. Second, we describe a control-theoretical
approach that lays well-understood
theoretical foundations for resource management to achieve
performance differentiation in proxy
caches. An experimental study using the Squid proxy cache shows
that differentiated caching ser-
vices provide significantly better performance to the premium
content classes.
Keywords: Web Caching, Control Theory, Content Distribution,
Differentiated Services, QoS.
The work reported in this paper was supported in part by NSF grants
CCR-0093144, ANI-0105873, and CCR-
0208769.
1
1 Introduction
The phenomenal growth of the Internet as an information source
makes web content distribution
and retrieval one of its most important applications today.
Internet clients are becoming increas-
ingly heterogeneous ranging from high-end workstations to low-end
PDAs. A corresponding het-
erogeneity is observed in Internet content. In the near future, a
much greater diversification of
clients and content is envisioned as traffic sensors, smart
buildings, and various home appliances
become web-enabled, representing new data sources and sinks of the
information backbone. This
trend for heterogeneity, calls for customizable content delivery
architectures with a capability for
performance differentiation.
In this paper, we design and implement a resource management
architecture for web proxy
caches that allows controlled hit rate differentiation among
content classes. The desired relation
between the hit rates of different content classes is enforced via
per-class feedback control loops.
The architecture separates policy from mechanism. While the policy
describes how the hit rates
of different content classes are related, the performance
differentiation mechanism enforces that
relation. Of particular interest, in this context, is the
proportional hit rate differentiation model.
Applying this model to caching, “tuning knobs” are provided to
adjust the quality spacing between
classes, independently of the class loads. The two unique features
of the proportional differentiated
service model [14, 13] are its guarantees on both predictable and
controllable relative differentia-
tion. It is predictable in the sense that the differentiation is
consistent (i.e. higher classes are better,
or at least no worse) regardless of the variations of the class
loads. It is controllable, meaning that
the network operators are able to adjust the quality spacing
between classes based on their selected
criteria. While we illustrate the use of our performance
differentiation architecture in the context
of hit rate control, it is straightforward to extend it to control
directly other performance metrics
that depend on cache hit rate, such as average client-perceived
page access latency.
One significantly novel aspect of this paper is that we use a
control-theoretical approach for
resource allocation to achieve the desired performance
differentiation. Digital feedback control
theory offers techniques for developing controllers that utilize
feedback from measurements to ad-
just the controlled performance variable such that it reaches a
given set point. This theory offers
2
analytic guarantees on the convergence time of the resulting
feedback control loop. It is the authors
belief that feedback control theory bears a significant promise for
predictable performance control
of computing systems operating in uncertain, unpredictable
environments. By casting cache re-
source allocation as a controller design problem, we are able to
leverage control theory to arrive
at an allocation algorithm that converges to the desired
performance differentiation in the shortest
time in the presence of a very bursty self-similar cache
load.
The rest of this paper is organized as follows. Section 2 presents
the case for differentiated
caching services. Section 3 describes the architecture of a cache
that supports service differenti-
ation. A control-theoretical approach is proposed to achieve the
desired distance between perfor-
mance levels of different classes. In Section 4, the implementation
of this architecture on Squid, a
very popular proxy cache in today’s web infrastructure, is
presented. Section 5 gives experimental
evaluation results of our architecture, obtained from performance
measurements on our modified
Squid prototype. Section 6 discusses related work and Section 7
concludes the paper.
2 The Case for Differentiated Caching Services
While a significant amount of research went into implementing
differentiated services at the net-
work layer, the proliferation of application-layer components that
affect client-perceived network
performance such as proxy caches and content distribution networks
(CDNs) motivates investigat-
ing application layer QoS. In this section, we present a case for
differentiated caching services as
a fundamental building block of an architecture for web performance
differentiation. Our argu-
ment is based on three main premises. First, we show that the
storage in proxy cache is a rare
resource that requires better allocations. Second, we argue for the
importance of web proxy caches
in providing performance improvements beyond those achievable by
push-based content distribu-
tion networks. Third, we explain why there are inherently different
returns for providing a given
caching benefit to different content types. Thus, improved storage
resource management calls for
performance differentiation in proxy caches.
Let us first illustrate the rareness of network storage relative to
the web workloads. As reported
by AOL, the daily traffic on their proxy caches is in excess of 8
Terabytes of data. With a hit
3
rate of 60%, common to AOL caches, the cache has to fetch Terabytes
of new
content a day. Similarly, the advent of content distribution
networks that distribute documents on
behalf of heavily accessed sites may require large storage sizes
because they have a large actively
accessed working set. It is therefore important to allocate storage
resources appropriately such that
the maximum perceived benefit is achieved.
Second, consider the argument for employing proxy caches in our
storage resource allocation
framework. Web proxy caching and CDNs are the key performance
acceleration mechanisms in the
web infrastructure. While demand-side (i.e., pull-based) proxy
caches wait for surfers to request
information, supply-side (i.e., push-based) proxies in CDNs let
delivery organizations or content
providers proactively push the information closer to the users.
Research [30, 23] indicates that
the combination of the two mechanisms leads to better performance
than either of them alone.
Gadde et al. [17] used the Zipf-based caching model from Wolman et
al. [36] to investigate
the effectiveness of content distribution networks. They find that
although supply-side caches
in CDNs may yield good local hit rates, they contribute little to
the overall effectiveness of the
caching system as a whole, when the populations served by the
demand-side caches are reasonably
large. These results are consistent with what Koletsou and Voelker
[23] conclude in their paper.
In [23], Koletsou and Voelker. compare the speedups achieved by the
NLANR proxy caches to
those achieved by the Akamai content distribution servers. They
found that the NLANR (pull-
based) cache hierarchy served 63% of HTTP requests in their
workload at least as fast as the
origin servers, resulting in a decrease of average latency of 15%.
In contrast, while Akamai edge
servers were able to serve HTTP requests an average of 5.7 times as
fast as the origin servers,
using Akamai reduced overall mean latency by only 2%, because
requests to Akamai edge servers
were only 6% of the total workload. The aforementioned research
results indicate that the demand-
side web proxy caching remains the main contributor in reducing
overall mean latency, while the
supply-side proxies are optimizing only a small portion of the
total content space. Hence, in this
paper we present an architecture geared for pull-based proxy
caches. Providing QoS control for
push-based proxies in CDN will be investigated in the future work,
where the storage reallocation
is triggered actively by the content providers’ requirements
instead of passively by the content
4
consumers’ requests. The proactive nature of storage reallocation
introduces additional degrees of
freedom in QoS management that are not explored in traditional
proxy caching.
Finally, consider the argument for performance differentiation as a
way to increase the global
utility of the caching service. First, let us illustrate the
end-users’ perspective. It is easy to see
that caching is more important to faster clients. If the
performance bottleneck is in the backbone,
caching has an important effect on reducing average user service
time as hit rate increases. Con-
versely, if the performance bottleneck is on the side of the
client, user-perceived performance is not
affected significantly by saving backbone trips when the caching
hit rate is increased. An example
where user-speed-motivated differentiation may be implementable is
the preferential treatment of
regular web content over wireless content. Web appliances such as
PDAs and web-enabled wire-
less phones require new content types for which a new language, the
Wireless Markup Language
(WML) was designed. Proxy caching will have a lower impact on
user-perceived performance of
wireless clients, because saving backbone round-trips does not
avoid the wireless bottleneck. Prox-
ies that support performance differentiation can get away with a
lower hit rate on WML traffic to
give more resources to faster clients (that are more susceptible to
network delays) thus optimizing
aggregate resource usage. This is especially true of caches higher
in the caching hierarchy where
multiple content types are likely to be intermixed.1
Another argument for differentiation is a content-centric one. In
particular, it may be possible
to improve client-perceived performance by caching most
“noticeable” content more often. It has
been observed that different classes of web content contribute
differently to the user’s perception
of network performance. For example, user-perceived performance
depends more on the download
latency of HTML pages than on the download latency of their
dependent objects (such as images).
This is because while a user has to wait explicitly for the HTML
pages to download, their embed-
ded objects can be downloaded in the background incurring less
disruption to the user’s session.
Treating HTML text as a premium class in a cache would improve the
experience of the clients for
the same network load conditions and overall cache hit rate. Later,
in the evaluation section, we
1Currently, ISP caches closest to the client are usually dedicated
to one type of clients, e.g., “all wireless” or “all wired”.
5
show (by replaying real proxy cache traces) that a differentiated
caching service can substantially
decrease the average client wait-time on HTML files at the expense
of only a moderate increase in
wait times for embedded objects.
Finally, a web proxy cache may choose to classify content by the
identity of the requested
URL. For instance, an ISP (such as AOL) can have agreements with
preferred content providers
or CDN service providers to give their sites better service for a
negotiated price. Our architecture
would enable such differentiation to take place, although there are
better ways to achieve provider-
centric differentiation, such as using a push-based approach. We
conclude that there are important
practical applications for differentiated caching services in
pull-based proxy caches. This paper
addresses this need by presenting a resource management framework
and theoretical foundations
for such differentiation.
3 A Differentiated Caching Services Architecture
In this section, we present our architecture for service
differentiation among multiple classes of
content cached in a proxy cache. Intuitively, if we assign more
storage space to a class, its hit
rate will increase, and the average response time of client
accesses to this type of content will
decrease.2 If we knew future access patterns, we could tell the
amount of disk space that needs to
be allocated to each class ahead of time to achieve their
performance objectives. In the absence
of such knowledge, we need a feedback mechanism to adjust space
allocation based on the differ-
ence between actual system performance and desired performance.
This feedback mechanism is
depicted in Figure 1 which illustrates a feedback loop that
controls performance of a single class.
One such loop is needed for each class.
In the figure, the reference (i.e., the desired performance level)
for the class is determined by a
service differentiation policy. Assume there are content classes.
To provide the proportional hit
rate differentiation, the policy should specify that the hit rates
( ) of the classes be related by
the expression:
(1)
2This presumes that the request traffic on the cache is not enough
to overload its CPU and I/O bandwidth.
6
Figure 1. The hit rate control loop.
where is a constant weighting factor, representing the QoS
specification for . To sat-
isfy the above constraints, it is enough that the relative hit
ratio of each , defined as , be equal to the relative hit ratio
computed from the specification (i.e., ! ). Thus, they are used as
the performance metrics of the feedback control loop. In
Figure 1, the actual system performance measured by the output
sensor is the relative hit ratio , which is compared with the
reference " and their difference called #%$&$&'$ ")( is
used by the cache space controller to decide the space allocation
adjustment online.
An appealing property of this model is that the aggregate
performance error of the system is
always zero, because:
* ,+"+.- #
* ,++.-
/ 0" 1 "2( 3 4 ,+"+.- 5 65 5 ( 4 ,+"+.- 75 85 5
:9 ( 9 (2)
As we show in the next section, this property allows us to develop
resource allocation algorithms
in which resources of each class are heuristically adjusted
independently of adjustments of other
classes, yet the total amount of allocated resources remains
constant equal to the total size of the
cache.
Note that the success of the feedback loop in achieving its QoS
goal is contingent on the fea-
sibility of the specification. That is, the constraints stated by
equation (1) should be achievable.
Assume the average hit rate of the unmodified cache is . In
general, when space is divided
equally among classes, the maximum multiplicative increase in space
that any one class can get is
upper-bounded by the number of classes . It is well-known that hit
rate increases logarithmically
with cache size [35, 4, 18, 7]. Thus, in a cache of total size ; ,
the maximum increase in hit rate for
7
the highest priority class is upper-bounded by . After some
algebraic manipulation, this leads
to - - - . If the relative hit ratio between the top and bottom
classes is - ,
the hit rate of the bottom class is upper bounded by . This gives
some orientation for choosing
the specification .
3.1 The Performance Differentiation Problem
We cast the proportional hit rate differentiation into a
closed-loop control problem. Each content ! is assigned a certain
amount of cache storage , such that 4 is the total size of the
cache.
The objective of the system is to achieve the desired relative hit
ratio. This objective is achieved
using a resource allocation heuristic, which refers to the policy
that adjusts the cache storage space
allocation among the classes such that a desired relative hit ratio
is reached. We need to show that
(i) our resource allocation heuristic makes the system converge to
the relative hit ratio specification,
and that (ii) the convergence is bounded by a finite constant that
is a design parameter. To provide
these guarantees, we rely on feedback control theory in designing
the resource allocation heuristic.
The heuristic is invoked at fixed time intervals at which it
corrects resource allocation based on
the measured performance error. Let the measured performance error
at the invocation of the
heuristic be # "!#%$ . To compute the correction & "!#%$ in
resource allocation, we choose a linear
function ' / # 3 so that ' / 3 (no correction unless there is an
error). At the invocation, the
heuristic computes: (*) +& "!#%$ ' / # "!#%$3 (3)
(*) ,!-%$ "!# ( 9 $ 5.& "!#%$ (4)
If the computed correction & "!#%$ is positive, the space
allocated to is increased by / & "!#%$/ . Otherwise it is
decreased by that amount. Since the function ' is linear, 4 ' / #
"!#%$3 ' / 4 # !#%$3 . From Equation (2), 4 # "!#%$ . Thus, 4 ' / #
"!#%$3 ' / 3 . It follows that the sum of
corrections across all classes is zero. This property is desirable
since it ensures that while the
resource adjustment can be computed independently for each class
based on its own error # , the
aggregate amount of allocated resources does not change after the
adjustment and it is always
8
equal to the total size of the cache. Next, we show how to design
the function ' in a way that
guarantees convergence of the cache to the specified performance
differentiation within a single
sampling period.
3.2 Control Loop Design
To design the function ' , a mathematical model of the control loop
is needed. The cache system
is essentially nonlinear. We approximate it by a linear model in
order to simplify the design of the
control mechanism. Such linearization is a well-known technique in
control theory that facilitates
the analysis of non-linear problems. The relevant observation is
that non-linear systems are well
approximated by their linear counterparts in the neighborhood of
linearization (e.g., the slope of a
non-linear curve does not deviate too far from the curve itself in
the neighborhood of the point at
which the slope was taken). Observe that the feasibility of control
loop design based on a linear
approximation of the system does not imply that cache behavior is
linear. It merely signifies that
the designed controller is robust enough to deal gracefully with
any modeling errors introduced
by this approximation. Such robustness, common to many control
schemes, is one reason for
the great popularity of linear control theory despite the
predominantly non-linear nature of most
realistic control loops.
Approximating the non-linear cache behavior, a change & ,!-+$
in space allocation is assumed
to result in a proportional change in the probability of a hit
& "!#%$ & "!#%$ . While
we cannot measure the probability "!#%$ directly, we can infer it
from the measured hit rate. The
expected hit rate at the end of a sampling interval (where
expectation is used in a mathematical
sense) is determined by the space allocation and the resulting hit
probability that took place at the
beginning of the interval. Hence:
/ & "!#%$3 & "!# ( 9 $ (5)
(and / ,!-%$3 "!# ( 9 $ 5 / & "!#%$3 ).
Remember that the relative hit ratio (the controlled performance
variable) is defined as 4 . Unfortunately, the measured ,!-+$ might
have a large standard deviation around the
expected value unless the sampling period is sufficiently large.
Thus, using ,!-+$ for feedback to
9
the controller will introduce a significant random noise component
into the feedback loop. Instead,
the measured ,!-+$ is smoothed first using a low pass filter. Let
the smoothed ,!-+$ be called "!-%$ . It is computed as a moving
average as follows:
!#%$ !- ( 9 $.5 / 9 ( 3 "!-%$ (6)
In this computation, older values of hit rate are exponentially
attenuated with a factor , where 9 . Values of closer to 9 will
increase the horizon over which is averaged and vice
versa. The corresponding smoothed relative hit ratio is
4 . This value is compared to the
set point for this class and the error is used for space allocation
adjustment in the next sampling
interval, thereby closing the loop.
Next we take the -transform of Equations (3), (4), (5), and (6) and
draw a block diagram that
describes the flow of signals in the hit rate control loop. The
-transform is a widely used technique
in digital control literature that transforms difference equations
into equivalent algebraic equations
that are easier to manipulate. Figure 2 depicts the control loop
showing the flow of signals and
their mathematical relationships in the -transform. The -transform
of the heuristic resource
reallocation function ' is denoted by / 3 .
Figure 2. -Transform of the control loop.
We can now derive the relation between and " . From Figure 2, # / 3
/
.3 , where:
Substituting for # , we get / " ( 3 / .3 /
3 . Using simple algebraic manipulation
/ 3 /
3 0" 1 " (8)
To design the allocation heuristic, / .3 , we specify the desired
behavior of the closed loop, namely
10
that follows 0 1 " within one sampling time, or !#%$ " !- ( 9 $ .
In -transform, this
requirement translates to: 0" 1 " (9)
Hence, from Equation (8) and Equation (9), we get the design
equation:
/ 3 /
. Substituting for
/ 3 from Equation (7)
we arrive at the -transform of the desired heuristic function,
namely / 3
4 + . The
/ # ,!-%$ ( # ,!- ( 9 $3 (11)
The above equation gives the adjustment of the disk space allocated
to given the performance
error # of that class and the aggregate 4 of smoothed hit rates.
The resulting closed loop is
stable, because the closed loop transfer function, , is stable and
the open loop transfer function
does not contain unstable poles or zeros [33].
4 Implementation of the Differentiation Heuristic in Squid
We modified Squid, a widely used and popular real-world proxy-cache
to validate and evaluate
our QoS-based resource allocation architecture. Squid is an
open-source, high-performance, In-
ternet proxy-cache [12] that services HTTP requests on the behalf
of clients (browsers and other
caches). It acts as an intermediate, accepting requests from
clients and contacting web servers for
servicing those requests. Squid maintains a cache of the documents
that are requested to avoid
refetching from the web server if another client makes the same
request. The efficiency of the
cache is measured by its hit-rate, : the rate at which valid
requests can be satisfied without con-
tacting the web server. The least-recently-used (LRU) replacement
policy is used in Squid to evict
objects that have not been used for the longest time from the
cache. Squid maintains entries for
all objects that are currently residing in its cache. The entries
are linked in the order of their last
access times using a doubly-linked list. On getting a request for
an object residing in the cache, the
corresponding entry is moved to the top of the list; if the request
is for an object not in the cache,
11
a new entry is created for it. LRU policy is implemented by
manipulating the entries in this list.
To provide service differentiation, one possible implementation of
our control-algorithm in
Squid would have been to create a linked list for each class
containing entries for all objects be-
longing to it. Freeing up disk space would have involved scanning
the list of all the classes that
were over-using resources and releasing entries in their lists. The
multiple-list implementation pro-
vides the most efficient solution; however, to minimize changes to
Squid, we chose to use a single
list implementation that achieves the same goal. The single
linked-list implementation was used
to simulate the multiple-list implementation. It links all the
entries in a single list and each entry
contains a record of all the classes it belongs to. In our special
case where the different classes of
content are non-overlapping, each entry belongs to a single class
only. This allows us to use Squid
original implementation of the linked-lists without
modifications.
In LRU, an entry is moved to the top of the list when accessed.
With separate lists, the entry
would be moved to the top of the list for the class it belongs to.
In a single list implementation, the
entry is moved to the top of every other entry, which implies that
it is moved to the top of all the
other entries of its class. Hence, by moving an entry we preserve
the order, i.e., the effect is the
same as if we had separate lists for each class.
In Squid a number of factors determine whether or not any given
object can be removed. If
the time since last access is less than the LRU threshold, the
object will not be removed. In our
implementation, we removed this threshold checking for two reasons.
First, we want to implement
LRU in a stricter sense and we replace the object that is least
recently used regardless how old it
is. Second, this help us reduce the time required for running our
experiments as we don’t have to
wait for objects to expire. To have a fair evaluation, in Section
5, we compare our QoS Squid with
the original Squid with no threshold checking.
Our implementation closely corresponds to the control loop design;
we implemented five mod-
ules in the QoS cache: timer, output sensor, cache space
controller, classifier and actuator. The
timer sends signals to output sensor and cache space controller to
let them update their outputs
periodically. The classifier is responsible for request
classification and the actuator is in charge
of cache space deallocation and allocation. In Section 3, we have
detailed the functions of cache
12
focus on the other three modules.
Timer: In order to make the control loops work at fixed time
interval, we added a module
in Squid that regulates the control loop execution frequency. Using
the module, we could
configure a parameter to let the loops execute periodically, for
example, once every 30 sec-
onds. That means for every certain period, the output sensor
measures the smoothed relative
hit ratio and the cache space controller calculates the space
allocations, which is then used
to adjust the cache space assignments for the classes.
Classifier: This module is used to identity the requests for
various classes. On getting a
request this module is invoked and obtains the class of the
request. The classification policy
is application specific and should be easily configurable. We were
doing classification based
on requested site or content type. In general, classification
policies based on different criteria,
such as the service provider or IP address are possible. For
example, an ISP might have
separate IP blocks allocated to low bandwidth wireless clients
requesting WML documents
and high bandwidth ADSL clients requesting regular HTML
content.
Actuator: As described in Section 3, at each sampling time the
cache space controller per-
forms the computation "!#%$ ,!- ( 9 $ 5 & "!#%$ and outputs the
new value of desired space "!#%$ for each class. In Squid, the
cache space deallocation and allocation are two separate
processes. The actuator uses the output of the controller to guide
the two processes. Let
$&# . ; # be a running counter of the actual amount of cache
space used by ! . In
the space deallocation process, the cache scans the entries from
the bottom of the LRU list.
Whenever an entry is scanned, the cache will first find out which
class it belongs to and then
according to the cache space assigned to the class at the time, the
cache will decide whether
to remove the entry or not. If the cache space assigned is less
than the desired cache space for
the class ( $&# . ; # ,!-+$ ), the entry will not be removed.
Otherwise it will. Similarly,
in the space allocation process, whenever a page is fetched from a
web server, the cache will
choose to save it in the disk or not based on which class requests
the page and the current
13
cache space of the class. If the cache space assigned is greater
than the desired cache space
for the class ( $&# . ; # !#%$ ), the page will not be saved.
That is, we change the status of
the page to be not cachable. In our current implementation, only
when a page is saved in the
disk will the disk space occupied be counted as part of the cache
space for the corresponding
class. Ideally, as a result of the above enforcement of the desired
cache allocation, the cache
space $&# ; # occupied by each class )
by the end of the sampling time should be
exactly the same as the desired value !#%$ set at the beginning of
the interval. In reality, a
discrepancy may arise for at least two reasons. First, it is
possible that we want to give one
class more cache space while at the sampling period, the class
doesn’t send enough requests
to fill in that much space with requested pages. Second, some pages
which are waiting to
be saved to the disk, haven’t been counted as the cache space of
their class yet. In order
to remedy this problem, we include the difference "!#%$ ( $ # . ; #
at the end of the sampling interval in our computation of desired
cache space for the 5 9 sampling period.
That is, "!# 5 9 $ "!#%$ 5 & ,!- 5 9 $ $ # . ; # 5 / "!#%$ (
$&# . ; # 375 & "!# 5 9 $ . From the formula, we can see
that the difference between the old real space and the new
desired space for the class is ( ,!-%$ - $&# ; # ) + & ,!-
5 9 $ . If the difference is positive,
that means we want to give the class more cache space; otherwise,
we want to release space
from the class. We use only one actuator to realize cache space
deallocation and allocation
for all the classes.
5 Evaluation
We tested the performance of the feedback control architecture
using both synthetic and empirical
traces. We used synthetic workload in Section 5.1 to show that our
design made the cache con-
verge most efficiently to the specified performance differentiation
under representative cache load
conditions. In Section 5.2, we evaluated the practical impact of
our architecture from the user’s
perspective. To do so, we used one of the applications described in
Section 2; namely, improving
the hit rate on HTML content at the expense of embedded objects to
reduce user waiting times.
We chose this application for two reasons. First, HTML content is
known to be less cachable than
14
other content types. This is due to the inherent uniqueness of text
pages compared, for example,
with generic gif icons (which tend to appear on several web pages
simultaneously thus generating
a higher hit rate if cached). Consequently, improving the cache hit
rate of HTML is a more difficult
goal than improving the hit rate of other content mixes. The second
reason for choosing this appli-
cation is that content type (such as HTML, GIF, JPG) is explicitly
indicated in proxy cache traces.
Hence it is easy to assess the performance improvement due to
differentiated caching services by
inspecting existing cache traces. In Section 5.2, we re-ran those
traces against our instrumented
Squid prototype and experimentally measured the performance
improvement. We proved that the
average client wait time for HTML content can be substantially
reduced at the expense of only a
moderate increase in wait times for embedded objects.
5.1 Synthetic Trace Experiments
The first part of our experiments is concerned with testing the
efficacy of our control-theoretical
resource allocation heuristic. We verified that the heuristic
indeed achieved the desired relative
differentiation for a realistic load. The experiments were
conducted on a testbed of seven AMD-
based Linux PCs connected with 100Mbps Ethernet. The QoS web cache
and three Apache [16]
web servers were started on four of the machines. To emulate a
large number of real clients access-
ing the three web servers, we used three copies of Surge (Scalable
URL Reference Generator) [6]
running on different machines that sent URL requests to the cache.
The main advantage of Surge
is that it generates web references matching empirical measurements
of 1) server file size dis-
tribution; 2) request size distribution; 3) relative file
popularity; 4) embedded file references; 5)
temporal locality of reference; and 6) idle periods of individual
users. In this part of experiments,
the traffic were divided into three client classes based on their
requested site. We configured Surge
in a way such that the total traffic volume of the three classes
are the same, all very huge. To test
the performance of the cache under saturation, we configured the
file population to be times
the cache size. As mentioned in Section 3, in order to apply the
proportional differentiation model
in practice, we have to make feasible QoS specifications. Hence, we
set the reference ratio to
in all the synthetic trace experiments. By analyzing the average
hit
15
rates of the undifferentiated cache, we conclude that the
specification is feasible and should lead
to better performance for the high priority class with only a small
sacrifice of the low priority class
performance.
To develop a base reference point against which our
control-theoretical heuristic (Equation 11)
could be compared, we first used a simple linear controller ' / # 3
=K # in the control loop (Fig-
ure 3) to determine the best cache performance over all values of .
In this case, the system
reacts to performance errors simply by adjusting space allocation
by an amount proportional to
the error, where is the proportionality constant. Second, we
implemented the control function
(Equation 11) designed using the theoretical analysis in Section 3.
By comparison, we found out
that the theoretically designed function produced better
performance than the linear function with
the best empirically found , thus guaranteeing the best convergence
of the cache. In this con-
text, by performance we mean the efficiency of convergence of the
relative hit ratio to the desired
differentiation. This convergence is expressed as the aggregate of
the squared errors between the
desired and actual relative hit ratio achieved for each class over
the duration of the experiment.
The smaller the aggregate error, the better the convergence.
Figure 4 depicts the aggregate error for the proportional
controller and our theoretically designed
controller, when the specified performance differentiation is .
The
horizontal axis indicates the base 10 logs of the gain value for
the proportional controller. The
vertical axis is the sum of the square of errors ( " ( , where is
the relative hit ratio)
over all classes collected in 20 sampling periods (each sampling
period is 30 seconds long). The
smaller the sum, the better is the convergence of the cache. We can
see from the aggregate error
plot in the figure, that using different values of for the
proportional control function ' / # !3 =K # , results in different
convergence performance. In particular, small values of are too
sluggish
16
in adjusting space allocation resulting in slow convergence and
large aggregate error. Similarly,
large values of tend to over-compensate the space adjustment
causing space allocation (and the
resulting relative hit ratio) to oscillate in a permanent fashion
also increasing the aggregate error. In
between the two extremes there is a value of that results in a
global minimum of aggregate error.
This corresponds to the best convergence we can achieve using the
proportional controller. We
compare this best performance of the simple heuristic ' / # 3 =K #
with that of our heuristic function
(Equation 11) designed using digital feedback control theory. The
aggregate error computed for
the latter heuristic is depicted by the straight line at the bottom
of Figure 4. It can be seen that
the aggregate error using the designed function is even smaller
than the smallest error achieved
using the simple linear heuristic above, which means that the
designed function produces very
good performance and successfully converges the cache.
Figure 4. The aggregate error versus controller gain .
To appreciate the quality of convergence for different controller
settings, Figure 5-a shows plots
of the relative hit ratio of different classes versus time in
representative experiments with the pro-
portional controller ' / # 3 =K # . Every point in those plots
shows the data collected in one sampling
period. In the figure, curve ' . is the desired performance of ( 0"
1 " 4 ) and curve
! is the corresponding relative hit ratio ( 4 ). Since the
difference " (
reflects the performance error # of , we will know how well the
control loop performs by
comparing the two curves and ' . . The closer the two curves, the
better the control loop
17
performs and the better is the convergence of the cache.
a1) The relative hit ratio for K=2000 b1) Space allocation for
K=2000
a2) The relative hit ratio for K=8000 b2) Space allocation for
K=8000
a3) The relative hit ratio for K=100000 b3) Space allocation for
K=100000
Figure 5. Performance of the proportional controller with different
gain.
Figure 5-a1 depicts the relative hit ratio using a small value of
for the controller. From the
figure, we can see that curve approaches the curve ' . . However,
the convergence is too
slow. The controller is too conservative in reacting to the
performance error. Figure 5-a2 plots the
18
relative hit ratio for the best possible . The figure shows that
the cache is converging quickly
to the specified performance differentiation. Figure 5-a3 depicts
the relative hit ratio for a big
value . It shows if we use the big , the cache space adaptation is
so large that the relative hit
ratio overshoots the desired value. This over-compensation causes
the relative hit ratio to continue
changing in an oscillatory fashion, making the system
unstable.
Figure 5-b plots the allocated space for each class versus time. We
observe that when is small,
space allocation converges very slowly. Similarly, when is large,
space allocation oscillates per-
manently due to over-compensation. Space oscillation is not desired
since it means that documents
are repeatedly evicted then re-fetched into the cache. Such cyclic
eviction and re-fetching will
increase the backbone traffic generated by the cache which is an
undesirable effect. The optimal
value of results in a more stable space allocation that is
successful in maintaining the specified
relative performance differentiation.
a) Relative hit ratio b) Space allocation
Figure 6. Performance of the analytically designed controller (for
a synthetic log).
The above experiments show that the controller tuning has a
dramatic effect on the convergence
rate and subsequently on the success of performance
differentiation. One of the main contributions
of the paper lies in deriving a technique for controller tuning
that avoids the need for an ad hoc
trial and error design of cache resource allocation heuristic to
effect proper service differentiation.
We have demonstrated the effect of changing a single parameter on
resulting performance.
In reality the controller design space is much larger than that of
tuning a single parameter. For
example, the controller function may have two constants in which
case two variables must be
19
a) Original Squid without differentiation b) QoS Squid Figure 7.
Absolute hit rates (for a synthetic log).
tuned. We presented a design technique in Section 3 that computed
the structure and parameters
of the best heuristic function. The convergence of the cache when
this function is used with the
analytically computed parameters is depicted in Figure 6, which
shows that the performance is
favorably comparable to the best performance we can achieve by
experimental tuning (Figure 5-
a2 and Figure 5-b2). In Figure 7, we present the absolute hit rates
of the three classes for Squid
with and without service differentiation. We can see that the QoS
Squid increases the hit rate
of the high priority class at the expense of the hit rate of the
low priority class. The hit rate
differentiation among different classes implies that less popular
content of a “favored” class may
displace more popular content of less favored classes. Such
displacement is suboptimal from the
perspective of maximizing hit rate. This consequence is acceptable,
however, since the favored
classes are presumably more important. Observe that in real-life
situations the number of high-
paying “first-class” customers is typically smaller than the number
of “economy” customers; a
situation present in many application domains from airline seating
to gas pumps. Hence, a small
resource reallocation from economy to first-class customers is
likely to cause a larger relative
benefit to the latter at the expense of a less noticeable
performance change to the former. Thus, it is
possible to improve the hit rate of premium clients without
significantly impacting other customers.
5.2 Empirical Trace Experiments
In the second part of our experiments, we developed a URL reference
generator which read
URL references from a proxy trace, generated the corresponding
requests, and sent them to our
20
NLANR (National Laboratory for Applied Network Research) sanitized
access logs, uc.sanitized-
access.20030922.gz available at the time from
URL:ftp://ircache.nlanr.net/Traces/. To serve re-
quests generated from the trace, the proxy cache contacts the real
web servers on the Internet.
For example, if a request for “www.cnn.com” is found in the data
set, our proxy cache actually
contacts the CNN web server. The reason we set our testbed this way
is that we wanted to know
the backbone latency (the time needed to download the file from the
original server to the proxy
cache) in real life. Only using such “real” data can we prove that
our architecture works well in
the Internet. The requests are divided into two classes based on
whether they are HTML file re-
quests or not. (Among the first references we generated from the
trace file, there are totally 9 requests for HTML files.) The HTML
is one of the least cachable types. We challenge our
QoS architecture by using this difficult case. We then re-run the
experiment using an unmodified
proxy cache. The results of the two cases are compared to
distinguish the effect of performance
differentiation. From the experiments, we determined that the
proportion of the hit rate between
non HTML and HTML class is normally roughly (i.e., non HTML content
has 1.4 times the
hit rate of HTML in the absence of differentiation, which confirms
that HTML is less cachable).
By specifying that the relative hit ratio of the two classes be ,
where represents
hit rate for non HTML class and represents hit rate for HTML class,
our differentiation policy
favors the HTML class while still services the non HTML class
well.
One concern about the accuracy of our experiments lies in how the
generator actually replays
the trace. To expedite the experiments, we replay the log faster
than real-time. This means that
documents on the servers have less opportunity to become stale
during the expedited experiment,
which leads to potentially inflated hit rates. This effect,
however, is equally true of both the QoS
cache experiment and the ordinary Squid experiment. Hence, the
relative performance improve-
ment we measure is still meaningful. Moreover, we claim that the
impact of expedited replay in
our experiments is limited to begin with. To validate this claim we
measured the rate of change of
content during the original duration of the log (12 hours) and
demonstrated that such change was
minimal. More specifically, we played the trace with a very large
cache that could save all the files
21
requested in the trace. After 12 hours (the length of the log), we
played the trace again towards the
same cache. In the second run, of the Squid result codes were
either TCP HIT (i.e., a valid
copy of the requested object was in the cache), TCP MEM HIT (i.e.,
a valid copy of the requested
object was in the cache memory), or TCP REFRESH HIT (i.e., the
requested object was cached
but stale and the IMS query for the object resulted in ”304 not
modified”). An additional of
Squid result codes were TCP MISS (i.e., the requested object was
not in the cache) with DIRECT
hierarchy code, meaning the request was required to go straight to
the source. Only of Squid
result codes were TCP REFRESH MISS (i.e., the requested object was
cached but stale and the
IMS query returned the new content). Therefore, we conclude that
the maximum difference be-
tween our measured hit rates versus those we would get if we had
played it for the original duration
(12 hours) is only .
We carried out two experiments with the regular Squid and the QoS
Squid respectively. The
performance metrics we considered were hit rate and backbone
latency reduction. To reflect the
backbone latency reduction, we used both raw latency reduction and
relative latency reduction.
By raw latency reduction, we mean the average backbone latency (in
second) reduced per request.
By relative latency reduction, we mean the percentage of the sum of
downloading latencies of the
pages that hit in the cache over the sum of all downloading
latencies. Here, downloading is from
the original server to the proxy cache.
Figure 8-a depicts the relative hit ratio in the two experiments.
Figure 8-a1 shows the relative hit
ratio achieved for each content type in the case of regular Squid.
Figure 8-a2 plots the case of the
QoS Squid. The non HTML and HTML curves represent the relative hit
ratio for the two classes
respectively. Since the differentiation policy specifies to be the
goal, the desired
relative hit ratio 4 for non HTML and HTML is and respectively. For
comparison
reasons, we plot the targets and in both graphs, although Figure
8-a1 depicts the data for the
regular Squid, which doesn’t use the differentiation policy.
Comparing Figure 8-a1 with Figure 8-a2, we can see that our QoS
proxy cache pulls the relative
hit ratio of the two classes to the goals after spending some time
on getting to the steady state. That
initial transient occurs when the cache has not run long enough for
unpopular content to percolate
22
a1) Relative hit ratio for the original Squid b1) Space allocation
for the original Squid
a2) Relative hit ratio for the QoS Squid b2) Space allocation for
the QoS Squid
Figure 8. Performance of the analytically designed controller (for
a real log).
to the bottom of the LRU queue and get replaced.
The big gap of population between non HTML requests and HTML
requests makes our choice
of sampling interval harder. The huge number of non HTML requests
asks for a small interval in
order to make the cache more sensitive, while the small number of
HTML requests asks for a big
interval, because too small an interval will cause noise in the
system. Our proxy cache balances
the two cases and chooses a reasonable small sampling interval (30
seconds). The smoothed hit
rate is calculated with a large enough “ ”( ) in Equation 6. Large
“ ” increases the horizon over
which the hit rate is averaged and decreases the influence of
noise. As seen from Figure 8-a2, our
QoS cache works fine and makes the relative hit ratio converge to
the goals in a reasonable scale.
Figure 8-b plots the allocated space for each class, which shows
how the QoS cache changes the
space allocation in order to achieve the desired differentiation.
Absolute hit rates are presented in
23
a) Original Squid without differentiation b) QoS Squid Figure 9.
Absolute hit rates (for a real log).
Figure 9, indicating a big performance improvement of HTML class
with only a slight decrease of
non HTML class performance.
Figure 10 and 11 depict the backbone latency reduction due to
caching both in the case of a
regular Squid and the case of a QoS Squid. From Figure 10, we
observe that the average latency
reduced per request for the HTML class is above second for the QoS
Squid compared with second for the regular Squid, while the number
for the non HTML class are around 9 second for
both cases. Because the transient network status at the time of
experiments could have big effect on
the resulting latency, it might make the raw latency measured in
the two experiments incomparable.
Therefore, we also use relative latency reduction (i.e., the
percentage of the sum of downloading
latencies of the pages that hit in the cache over the sum of all
downloading latencies of the cache)
as our metric to evaluate and compare the performance of the two
systems. The average relative
latency reduction per request is presented in Figure 11. Consistent
with the results shown by the
raw latency reduction, the data further proves that our QoS
architecture can significantly improve
the latency reduction for the HTML class, while incurring only a
moderate cost for the non HTML
class. On one hand, the uniqueness of HTML content makes it less
cachable than other content
types such that to improve its cache hit rate is a difficult task.
On the other hand, the small request
volume and small file size of the HTML content favors caching it,
as HTML files do not consume
much cache space. To generalize our results, if we assign high
priority to only a small portion
24
of Internet content, significant performance improvement can be
expected for them at only a very
small cost of the low priority classes performance. Thus, we
consider our differentiated caching
services a serious candidate for the future heterogeneous,
QoS-aware, web infrastructure.
6 Related Work
Service differentiation and QoS control at the network layer have
been studied extensively in
IETF[20] community. In order not to negate the network’s efforts,
it is important to extend QoS
support to endpoint systems. Recent research efforts focused on QoS
control in web servers in-
clude [3, 10, 5, 34, 1, 9, 8, 26]. By CPU scheduling and accept
queue scheduling respectively,
Almeida et al. [3] and Abdelzaher et al. [1] successfully provide
differentiated levels of service to
web server clients. Demonstrating the need to manage different
resources in the system depending
on the workload characteristics, Pradhan et al. [26] develop an
adaptation technique for controlling
multiple resources dynamically. Like [26], Banga et al. [5] and
Voigt et al. [34] provide web server
QoS support at OS kernel level. In [10, 9, 8], session-based QoS
mechanisms are proposed, which
utilize session-based relationship among HTTP requests.
The above efforts addressed performance differentiation at the
origin server. Differentiation
techniques developed for origin servers may not be applicable to
proxy caches because the cache
introduces a crucial additional degree of complexity to performance
differentiation. Namely, it in-
troduces the ability to import and offload selected items depending
on their popularity and resource
25
Figure 11. Relative backbone latency reduction.
requirements. On an origin server all requests are served locally
(i.e., a 100% hit rate is achieved
on valid requests). Thus, the perceived performance of the server
depends primarily on the order
in which clients are served. A priority queue, for example, will
give high priority clients shorter
response times. The performance speedup due to a cache, on the
other hand, depends primarily
on whether or not the requested content is cached. Thus, hit rate,
rather than service order, is a
significant performance factor. Performance differentiation in
caches, therefore, requires a new
approach.
Web caching research has traditionally focused on replacement
policies. In [7], the authors
introduced the GreedyDualSize algorithm, which incorporates
locality with cost and size concerns
in the replacement policy. [29] proposed LRV, which selects for
replacement the document with
the Lowest Relative Value among those in cache. In [24], a number
of techniques were surveyed
for better exploiting the bits in HTTP caches. Aiming at optimally
allocating disk storage and
giving clients better performance, their schemes do not provide QoS
guarantees.
In [22], a weighted replacement policy was proposed which provides
differential quality-of-
service. However, their differentiation model doesn’t provide a
“tuning knob” to control the per-
formance distance between different classes. Fixed weights are
given to each server or URL, but
higher weights alone don’t guarantee user-perceived service
improvement. For instance, the hit
rate for the high weight URL may be very low because the proxy
cache is over-occupied by many
popular low weight URLs. Although the scheme is good in the sense
that it saves backbone traffic
26
by caching popular files, there is no predictability and
controllability in the differentiated service.
In contrast, proportional differentiated caching services described
in this paper provide application-
layer QoS “tuning knobs” that are useful in a practical setting for
network operators to adjust the
quality spacing between classes depending on pricing and policy
objectives.
Like caching, content distribution networks (CDNs), such as Akamai
[2], Digital Island [19] or
Speedera [31] are targeted for speeding up the delivery of web
content. As another form of content
distribution, peer-to-peer (P2P) networks are mainly used to share
individual files among users,
which also improve content availability and response time. Examples
of peer-to-peer networks
include Napster [25], Gnutella [27] and Freenet [11], who are
intended for the large-scale sharing
of music; Chord [32], CAN [28] and Tapestry [37] provide solutions
for efficient data retrieving
and routing; while PAST [15] focuses on data availability and load
balancing issues in P2P envi-
ronment. In this paper, we address the performance differentiation
problem in demand-side web
proxy caching. The complementary problem of how to provide QoS
control mechanism in con-
tent distribution networks will be considered in a forthcoming
paper. Research on Internet storage
management [21, 28] also focused on resolving the conflict between
increasing storage require-
ment and finite storage capacity in every node of the storage
system. Instead of finding an optimal
storage management scheme for all the content, in this paper, we
address the QoS-aware storage
allocation problem in which the resources are allocated
preferentially to content classes that are
more important or more sensitive to delays.
7 Conclusions
In this paper, we argued for differentiated caching services in
future caches in order to cope with
the increasing heterogeneity in Internet clients and content
classes. We proposed a relative differ-
entiated caching services model that achieves differentiation of
cache hit rates between different
classes. The specified differentiation is carried out via a
feedback-based cache resource allocation
heuristic that adjusts the amount of cache space allocated to each
class based on the difference
between its specified performance and actual performance. We
described a control theoretical ap-
proach for designing the resource allocation heuristic. It
addresses the problem as one of controller
27
design and leverages principles of digital control theory to
achieve an efficient solution. We im-
plemented our results in a real-life cache and performed
performance tests. Evaluation suggests
that the control theoretical approach results in a very good
controller design. Compared to manual
parameter tuning approaches, the resulting space controller has
superior convergence properties
and is successful in maintaining the desired performance
differentiation for a realistic cache load.
References
[1] T. F. Abdelzaher, K. G. Shin, and N. Bhatti. Performance
guarantees for Web server end-systems: A
control-theoretical approach. IEEE Transactions on Parallel and
Distributed Systems, 13(1):80–96,
2002.
[2] Akamai. http://www.akamai.com.
[3] J. Almeida, M. Dabu, A. Manikntty, and P. Cao. Providing
differentiated levels of service in web
content hosting. In First Workshop on Internet Server Performance,
Madison, Wisconsin, June 1998.
[4] V. Almeida, A. Bestavros, M. Crovella, and A. de Oliveira.
Characterizing reference locality in the
WWW. In Proceedings of the IEEE Conference on Parallel and
Distributed Information Systems
(PDIS), Miami Beach, FL, 1996.
[5] G. Banga, P. Druschel, and J. C. Mogul. Resource containers: A
new facility for resource management
in server systems. In Operating Systems Design and Implementation,
pages 45–58, 1999.
[6] P. Barford and M. E. Crovella. Generating representative web
workloads for network and server
performance evaluation. In Proceedings of Performance ’98/ACM
SIGMETRICS ’98, pages 151–160,
Madison, WI, 1998.
[7] P. Cao and S. Irani. Cost-aware www proxy caching algorithms.
In Proceedings of the 1997 USENIX
Symposium on Internet Technology and Systems, pages 193–206,
December 1997.
[8] J. Carlstrom and R. Rom. Application-aware admission control
and scheduling in web servers. In
IEEE Infocom, NEW YORK, NY, June 2002.
[9] H. Chen and P. Mohapatra. Session-based overload control in
qos-aware web servers. In IEEE Info-
com, NEW YORK, NY, June 2002.
[10] L. Cherkasova and P. Phaal. Session based admission control: a
mechanism for improving the perfor-
mance of an overloaded web server, 1998.
28
[11] I. Clarke, O. Sandberg, B. Wiley, and T. W. Hong. Freenet: A
distributed anonymous information
storage and retrieval system. In Workshop on Design Issues in
Anonymity and Unobservability, pages
311–320, July 2001.
[12] J. Dilley, M. Arlitt, and S. Perret. Enhancement and
validation of the squid cache replacement policy.
In 4th International Web Caching Workshop, San Diego, CA, March
1999.
[13] C. Dovrolis and P. Ramanathan. Proportional differentiated
services, part ii: Loss rate differentiation
and packet dropping. In International Workshop on Quality of
Service, Pittsburgh, PA, June 2000.
[14] C. Dovrolis, D. Stiliadis, and P. Ramanathan. Proportional
differentiated services: Delay differentia-
tion and packet scheduling. In SIGCOMM, pages 109–120, 1999.
[15] P. Druschel and A. Rowstron. Past: A large-scale, persistent
peer-to-peer storage utility. In HotOS
VIII, May 2001.
[16] R. T. Fielding and G. Kaiser. The apache http server project.
IEEE-Internet-Computing, 1(4):88–90,
July 1997.
[17] S. Gadde, J. S. Chase, and M. Rabinovich. Web caching and
content distribution: a view from the
interior. Computer Communications, 24(2):222–231, 2001.
[18] S. Glassman. A caching relay for the World Wide Web. Computer
Networks and ISDN Systems,
27(2):165–173, 1994.
[19] D. I. Inc. http://www.sandpiper.net, 2003.
[20] Internet Engineering Task Force. http://www.ietf.org.
[21] J. Kangasharju, J. Roberts, and K. Ross. Object replication
strategies in content distribution networks.
In Web Caching and Content Distribution Workshop, June 2001.
[22] T. P. Kelly, Y. M. Chan, S. Jamin, and J. K. MacKie-Mason.
Biased replacement policies for web
caches: Differential quality-of-service and aggregate user value.
In 4th International Web Caching
Workshop, San Diego, CA, March 1999.
[23] M. Koletsou and G. Voelker. The medusa proxy: A tool for
exploring user-perceived web performance.
In 6th International Web Caching Workshop and Content Delivery
Workshop, Boston, MA, June 2001.
[24] J. Mogul. Squeezing more bits out of http caches. IEEE
Network, pages 6–14, May/June 2000.
[25] Napster. http://www.napster.com.
[26] P. Pradhan, R. Tewari, S. Sahu, A. Chandra, and P. Shenoy. An
observation-based approach towards
self-managing web servers. In International Workshop on Quality of
Service, Miami, FL, May 2002.
[27] T. G. protocol specification.
http://dss.clip2.com/gnutellaprotocol04.pdf, 2000.
29
[28] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker.
A scalable content-addressable
network. In ACM SIGCOMM’01, August 2001.
[29] L. Rizzo and L. Vicisano. Replacement policies for a proxy
cache. IEEE/ACM Transactions on
Networking, 8(2):158–170, 2000.
[30] A. I. T. Rowstron and P. Druschel. Storage management and
caching in PAST, a large-scale, persistent
peer-to-peer storage utility. In Symposium on Operating Systems
Principles, pages 188–201, 2001.
[31] Speedera. http://www.speedera.com.
[32] I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H.
Balakrishnan. Chord: A scalable peer-to-peer
lookup service for internet applications. In ACM SIGCOMM’01, August
2001.
[33] S. G. Tzafestas. Applied Digital Control. North-Holland
Systems and Control Series, 1986.
[34] T. Voigt, R. Tewari, D. Freimuth, and A. Mehra. Kernel
mechanisms for service differentiation in
overloaded web servers, 2001.
[35] S. Williams, M. Abrams, C. R. Standridge, G. Abdulla, and E.
A. Fox. Removal policies in network
caches for World-Wide Web documents. In Proceedings of the ACM
SIGCOMM ’96 Conference,
Stanford University, CA, 1996.
[36] A. Wolman, G. M. Voelker, N. Sharma, N. Cardwell, A. R.
Karlin, and H. M. Levy. On the scale
and performance of cooperative web proxy caching. In Symposium on
Operating Systems Principles,
pages 16–31, 1999.
[37] B. Y. Zhao, J. D. Kubiatowicz, and A. D. Joseph. Tapestry: An
infrastructure for fault-resilient wide-
area location and routing. Technical report, Technical Report
UCB//CSD-01-1141, April 2001.
30