64
Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit¨ at M¨ unchen Fakult¨atf¨ ur Informatik Bachelor-Arbeit Christian Vollmert Aufgabenstellerin: Univ-Prof. Anja Feldmann, Ph.D. Betreuer: Nils Kammenhuber, J¨orgWallerich Abgabedatum: 15.M¨arz2004

Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

Design and Implementation of aWeb Workload Generator for the

SSFNet Simulator

Technische Universitat MunchenFakultat fur Informatik

Bachelor-Arbeit

Christian Vollmert

Aufgabenstellerin: Univ-Prof. Anja Feldmann, Ph.D.Betreuer: Nils Kammenhuber,

Jorg Wallerich

Abgabedatum: 15. Marz 2004

Page 2: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜
Page 3: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

Ich versichere, dass ich diese Bachelor-Arbeit selbstandig verfasst und nur dieangegebenen Quellen und Hilfsmittel verwendet habe.

Munchen, den 15.03.2004

Christian Vollmert

Page 4: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜
Page 5: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

Contents

1 Introduction 1Structure of the Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Background 32.1 The World Wide Web . . . . . . . . . . . . . . . . . . . . . . . . 32.2 The SSFNet-Simulator . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Web Workload Generators Models . . . . . . . . . . . . . . . . . . 6

2.3.1 SSF.OS.WWW . . . . . . . . . . . . . . . . . . . . . . . . 62.3.2 SURGE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3.3 NSWeb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Design and Implementation 83.1 Web Content Creation . . . . . . . . . . . . . . . . . . . . . . . . 8

3.1.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.1.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2 Web Object Selection . . . . . . . . . . . . . . . . . . . . . . . . . 113.2.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2.2 Manual Popularity . . . . . . . . . . . . . . . . . . . . . . 123.2.3 Statistical Popularity . . . . . . . . . . . . . . . . . . . . . 14

3.3 Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.3.1 Basic Model . . . . . . . . . . . . . . . . . . . . . . . . . . 163.3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . 18

3.4 Putting It All Together . . . . . . . . . . . . . . . . . . . . . . . . 193.4.1 Global Structure . . . . . . . . . . . . . . . . . . . . . . . 193.4.2 The Manager . . . . . . . . . . . . . . . . . . . . . . . . . 223.4.3 The Server . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.4.4 The Client . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.4.5 Network Access . . . . . . . . . . . . . . . . . . . . . . . . 26

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4 Evaluation 284.1 Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.1.1 Singlebell Topology . . . . . . . . . . . . . . . . . . . . . . 28

i

Page 6: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

ii CONTENTS

4.1.2 Flexbell Topology . . . . . . . . . . . . . . . . . . . . . . . 294.1.3 Unibell Topology . . . . . . . . . . . . . . . . . . . . . . . 29

4.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.2.1 Manual Page Generation – Manual Popularity . . . . . . . 304.2.2 Statistical Page Generation – Statistical Popularity . . . . 314.2.3 Connection Types . . . . . . . . . . . . . . . . . . . . . . . 344.2.4 Session Model . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5 Conclusion 39

6 Outlook and Open Problems 41

A Design Overview 43

B Sample DML-Files 45B.1 Web Content Configuration . . . . . . . . . . . . . . . . . . . . . 45B.2 Server Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 50B.3 Client Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 51

C Log Files 54

Page 7: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

Chapter 1

Introduction

Over the last few years terms like Internet and World Wide Web have becomepart of everyday language. Many people are meanwhile familiar with the Weband use it on a regular basis. What most people are not aware of is the under-lying networking infrastructure. With the enormous growth of the Internet newquestions constantly appear, new solutions are suggested and have to be evalu-ated, new aspects of the networking infrastructure’s complexity are discoveredand many more things are subject to current and future research. Over manyyears the answer to growing network load mostly has been to increase bandwidthand server capacity. This is not only an expensive solution but might also notscale arbitrarily.

In order to develop new ways of dealing with networking problems, an in-depthunderstanding of characteristics of network traffic and their reaction to changesof e. g. network topologies or routing is important. Probably one of the bestways to gain knowledge such as the impact of new protocols and technologies issimulation. It is almost impossible to set up a laboratory environment for mediumsize and large networks to generate realistic network traffic and analyze the effectsof changes in parameters such as routing or user behavior. On the other hand,simulation provides a convenient way for examining large networks. By usinga simulation instead of a real set-up, one is typically provided an easy way tochange some parameters within the analyzed network. Additionally simulationsusually provide easier accessability to measurements than real networks.

Simulation of course makes only sense when there is traffic on the simulatednetwork. This traffic is usually created by workload generators. The main goalof this work is to develop a Web Workload Generator. A web workload generatorcreates traffic typical for the web. We design a ”web” workload generator due tothe fact that the Web still generates a considerable amount of the traffic volumeon the Internet. This thesis covers the design and implementation of such aworkload generator for the SSFNet simulation environment [SSFa]. The goal isto make the generator highly parameterizable as well as to incorporate typicalfeatures of the Web, such as persistent and pipelined connections.

1

Page 8: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

2 CHAPTER 1. INTRODUCTION

Structure of the Work

Chapter 2 contains an overview on the World Wide Web and its properties. Ashort introduction to the network simulator SSFNet is given and finally thethree Web workload models this work is based on are presented.

Chapter 3 describes basic models for web content creation and access as wellas a model for user behavior. Furthermore the actual implementation ofthe simulation environment is covered from the overall structure, over theactual implementation of the models until network access.

Chapter 4 contains a detailed evaluation on whether our implementation of aweb workload generator works as specified.

Chapter 5 is a summary of this thesis.

Chapter 6 gives an outlook on possible improvements and possible further workbased on this thesis.

Appendix A shows a detailed class diagram of the developed workload gener-ator.

Appendix B contains information on how to configure the workload generator.

Appendix C is an overview on the log file format created as the output of asimulation run.

Page 9: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

Chapter 2

Background

In order to be able to start with design or implementation of our intendedWeb Workload Generator, a certain background knowledge is necessary. In thischapter we will start giving a short overview about the World Wide Web andSSFNet, the network simulator used for this work. We will then describe threeworkload generators that were used as a foundation for this work.

2.1 The World Wide Web

The terms World Wide Web, short Web or WWW, and Internet are nowadaysknown to almost everybody and are often used interchangeably. However, thisinterchangeable use of these terms is not really correct. The Web rather relies onthe infrastructure provided by the Internet. In fact the Internet has been aroundmuch longer than the Web. Until the 1990s the Internet was used primarily byresearchers and universities. With the emergence of the Web, an Internet appli-cation developed by Tim Berners-Lee [Bern94], in the early 1990s the Internetbecame interesting for the general public.

The Web is can be considered a hypermedia information system. Hypermediasimply refers to the fact that content is linked to other content by so calledHyperlinks. These links allow to navigate through the Web. Web content includesvarious kinds of media, such as simple text files and lots of other types like images,audio, videos, applets and many more.

At this point it might be useful to explain some further terminology we willuse in this work. We will refer to atomic Web content as a Web file. The termfile is used because content is usually stored as files on a Web server. Web filesfall into one of two categories. A Web Page is usually a base text file (normally aHTML or XML file) that may contain references and links to other Web files. Thedifference between references and links is that referenced files are usually loadedautomatically and are displayed embedded into the page, whereas links primarilyserve navigation purposes. An Embedded Object is an object that is referenced

3

Page 10: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

4 CHAPTER 2. BACKGROUND

by a Web Page and is usually requested and displayed whenever the referencingPage is displayed. Note that the term Web Page is also often used to describethe compound of what we call page and all embedded objects. However, we willuse Web Object to describe the compound of Page and Embedded Objects.

We assume that using the Web is familiar enough to the reader to understandthe following example; if not, please refer to [KR03]. When you open a documenton your Web browser the base HTML file is the Web Page in our terms, embeddedimages, animations etc. are Embedded Objects. The base file together with allthe images, animations etc. would be called Web Object.

We already mentioned that Web files are stored on a Web server and thatusers display content via a browser. Thus we already implicitly mentioned thatthe Web uses a client–server architecture. Still we have to mention anotheressential part of the Web, the addressing scheme. Web files are referenced by anURL or Uniform Resource Locator which exactly describes where files are located.Hyperlinks are realized via the URL addressing scheme. So far we have describedwhat can be requested and the way it can be located but we still miss out howit can be requested. Although multiple protocols might be used for transferringWeb files (e. g. FTP), strictly speaking this is not considered the actual Web.The Web rather relies on a transport protocol called:

HTTP—The HyperText Transfer Protocol

The Hypertext Transfer Protocol HTTP is a simple text-based application layerprotocol. HTTP uses the transport layer protocol TCP [RFC 793] to transportdata between client and server. We will not cover all aspects of HTTP, for adeeper coverage see e. g. [KR03]. Instead we will focus especially on the connec-tion types offered by HTTP, an aspect essential for web workload generation.

In the early versions HTTP/0.9 and HTTP/1.0 [RFC 1945], the client opensa new connection for every Web file it wants to load. The server shuts downthe connection after it has finished sending a reply. This shutting down of theconnection was is used by the client to determine the end of the file. Along withthis approach comes a significant overhead for each request. This can signifi-cantly increases response time. TCP has to perform a SYN-handshake to open aconnection and a FIN-shutdown for every single file. Moreover due to the TCPcongestion control mechanism TCP will not leave the Slow-Start for small andmedium files. A first approach to face this problem was made by introducingparallel connections. This can lower user response times but doesn’t solve theproblem of overhead caused by TCP connection establishment. In addition tothat this approach lays a big burden on the Web server since it has to handle abig number of connections.

A better solution was introduced with HTTP/1.1 [RFC 2616]. A new headerline was introduced containing the size of the transferred Web file. Thus closingthe connection to signal the end of the file to the client is no longer necessary.

Page 11: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

2.2. THE SSFNET-SIMULATOR 5

Figure 2.1: Simple, Persistent and Pipelined Connections (not to scale)

This allows the deployment of two new connection types: persistent and pipelinedconnections, which can significantly reduce overhead. With a persistent connec-tion, the TCP connection is kept alive after a response has been sent by theserver. The client can reuse the same connection to send another request to theserver after receiving the previous response. With pipelined connections a clienteven does not have to wait till it has received a response as with persistent con-nections, but can send an arbitrary number of requests to the server while waitingfor a response. The server will send the responses according to the order of therequests. Figure 2.1 shows the different kinds of connections. Note how the useof persistent connections reduces the time necessary to download four Web filescompared to simple connections. Pipelining reduces the time until all files aredownloaded even further since we do not have to wait for a response to arrivebefore we can send the next request. Modern Web browsers try to reduce theresponse time even more by using several persistent or pipelined connections inparallel to simultaneously transfer multiple files.

2.2 The SSFNet-Simulator

This work will be based on the SSFNet simulation environment. SSFNet is a net-work simulator developed by the SSFNet project [SSFa]. The simulator actuallyconsists of three major parts, SSF, DML, and SSFNet. The Scalable SimulationFramework (SSF) is a public-domain standard for discrete-event simulation of

Page 12: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

6 CHAPTER 2. BACKGROUND

large, complex systems in Java and C++. The channel and event based SSF rep-resents the foundation for SSFNet. DML provides the public-domain standardused to write configuration files for SSFNet. Even though DML and SSF arepublic-domain standards, the implementations may be closed-source products.

SSFNet itself is an open-source collection of Java SSF-based components formodeling and simulation of Internet protocols and networks. This frameworkhides the details of the discrete event simulator SSF by generating an implemen-tation model that is very similar to the real life protocol stack found in operatingsystems. In this work we will primarily concentrate on implementing an appli-cation layer protocol. SSFNet provides us thus with any remaining networkinginfrastructure. In addition we are supported with random number generators forcertain probability distributions.

We will mention further details about SSFNet when applicable. Detailedinformation can also be found in [SSFb]. This work is based on SSFNet release1.5 together with Raceway SSF and Raceway DML. Raceway is a commercialimplementation of the Java SSF API from Renesys Corp. but might be usedroyalty free for university research.

2.3 Web Workload Generators Models

As SSFNet is the base providing us with the necessary networking infrastructure,we will now introduce the base for our workload model, SURGE and NSweb. Forcompleteness we will also mention the workload generator shipped with SSFNetrelease 1.5.

2.3.1 SSF.OS.WWW

Along with SSFNet comes a HTTP client/server-pair [SSFc] written by AndyOgielski. This client/server-pair presents a very simple workload model. Theuser can configure for each client individually from which servers it can downloadpages. The client starts a session and randomly chooses from the available serversand how many web objects to download. The client then follows an On/Off-pattern, sleeping between single requests for a certain time. Upon receiving arequest the server generates a page according to given distributions for file sizeand number of embedded pages. We will later on see that this is a very limitedapproach and, as we believe, is not sufficient to generate realistic web traffic.However since this workload generator is open-source, it provides us with preciousinformation about implementing our model for SSFNet.

Page 13: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

2.3. WEB WORKLOAD GENERATORS MODELS 7

2.3.2 SURGE

SURGE is a web workload model developed by Paul Barford and Mark Crovellato generate realistic WWW workload. A basic concept of SURGE [BC98] areUser Equivalents. A User Equivalent (UE) is described as a process in a infiniteloop, altering between issuing requests and being idle. Probability distributionsfor access times as well as page popularity, size and number of embedded objectsare used to mimic the behavior of real web users. The used distributions areempirically chosen to generate traffic with characteristics as close as possible tothose of traffic found in real networks.

2.3.3 NSWeb

Directly derived from the model used in SURGE is the workload model for NSWeb[Wal01]. NSWeb is a web workload generator for the NS-2 network simulatordeveloped as a diploma thesis by Jorg Wallerich. The concepts used are those ofSURGE adjusted to the limitations of the NS-2 simulator. Together with [BC98],[Wal01] wil provide us with the basic concepts for the models used in this work.

Page 14: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

Chapter 3

Design and Implementation

Bearing the background knowledge introduced in the last chapter in mind, weare now able to describe design and implementation of our workload generator.For the remaining part of this work, we will call the Web-Workload-Generator,which is developed in this work, SSFweb.

The problem of generating realistic workload can be broken down to threemajor categories:

• How to create web files (Web Content Creation)

• Which files to access (Page Selection)

• When to access files (Sessions)

One might want to add a fourth problem namely the actual creation of trafficby accessing the network. However we think this fourth point does not go withthe others since it is a pure implementation problem and does not involve anymodel to be described.

In this chapter we will first describe the models used for each of the threemajor problems mentioned above and then give an explanation of the actualimplementation. In the subsequent section everything will be put together andwe will also describe network access in SSFNet. The final section will mention thelimitations of the current implementation. It might be worth to mention alreadyat this point that SSFweb uses a client/server architecture like the real Web.

3.1 Web Content Creation

The problem which we will address first is how to actually create the web filesthat we later want to request.

8

Page 15: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

3.1. WEB CONTENT CREATION 9

3.1.1 Basics

In order to generate realistic web workload, a first step has to be matching theweb objects in our simulation to those of the real Web. As described in the lastchapter, a web object consists of a web page and several embedded objects. Em-bedded objects are in the majority of cases images, audio and video files, whereasweb pages usually are HTML files. Sizes of embedded objects thus usually dif-fer significantly from those of web pages. In reality embedded objects are oftenshared between multiple web pages. This is a quite common technique e. g. forlogos and advertisements and causes these embedded objects to be accessed morefrequently. This feature is currently not yet available in SSFweb but could bepart of further extensions. In SSFweb each embedded object is thus embeddedinto exactly one page.

Generating web objects properly depends on three main parameters: pagesize, embedded object size and the number of embedded objects for a page.Modeling those properties accurately has important influence on the resultingtraffic. Research [CB96] has shown that e. g. file sizes of pages and embeddedobjects are one of the main reasons for certain characteristics of Web traffic suchas self-similar appearance. [CB96] also notes that the empirically measured filesizes can be approximately described by a Pareto distribution with α ' 1.06. Thisdistribution is also used in [Wal01] whereas [BC98] uses a hybrid distribution todescribe page sizes. Since SSFNet doesn’t provide us with hybrid distributions wewill not deepen the SURGE distribution. The next section gives you a detailedexplanation on the solution used in SSFweb.

3.1.2 Implementation

SSFweb provides two different possibilities to configure web content for a simu-lation. As all other parameters of a simulation, web content is configured via aDML file.

Manual Generation

With this approach, all responsibility for reaching the desired distributions forsize, location and number of embedded objects is with the user. All detailshave to be specified explicitly. For each page the user wants to create he hasto provide the size of the web page, the servers that will be hosting this pageand the objects embedded into this page. For each embedded object the size aswell as the location has to be provided. This approach is thus very cumbersomeespecially for large scenarios with thousands or millions of pages. However, itallows the user the largest degree of freedom possible.

Page 16: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

10 CHAPTER 3. DESIGN AND IMPLEMENTATION

Figure 3.1: Generating Web Content Using PageSets

Statistical Generation

With this second approach the generation of large numbers of web pages is muchmore comfortable but lacks a certain amount of freedom. Using this approachonly several parameters have to be configured. The user specifies the number ofpages he wants to be created. These pages are then generated according to thethree given distributions for page size, object size and number of embedded objectsby the simulator. These statistically generated pages can either be created at thestart of the simulation and then remain in memory or a single page can be createdon demand whenever a page request is done.

At this point we still miss out one more important question concerning theautomatic generation of web content. Which server(s) will host our web files? Theanswer can be given in two parts. The user specifies on how many servers eachfile will be stored. Then the actual servers are chosen from either all availableservers or from a configured subset according to a uniform distribution. (Infurther releases it might be possible to choose the pages-to-server distributionarbitrarily.)

Combining Manual and Automated Generation

Being now able to generate web objects in either of three ways (manual, statis-tically and statistically on request) we still lack an important feature: We arenot yet able to combine those approaches. For this reason we introduce a newfeature: PageSets. A PageSet is basically a group of pages generated in one ofthe aforementioned ways. The single PageSets are independent of each other.As Figure 3.1 shows, different ways of generating web content can be combinedeasily (Note that Figure 3.1 is a simplified picture showing only distributions forweb pages).

Appendix B.1 gives a commented example on web content generation usingmanual as well as statistical generation. Table 3.1 shows the default values ofSSFweb for automated generation of web content, if the user does not give anyother specifications.

Page 17: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

3.2. WEB OBJECT SELECTION 11

Parameter Default

Page Size Pareto Distribution with α = 1.2, k = 13300Embedded Object Size Pareto Distribution with α = 1.1, k = 133000Embeddeds per Page Pareto Distribution with α = 2.43, k = 1Servers per Page 1Servers per Object 1

Table 3.1: Default Web Content Generation Parameters

3.2 Web Object Selection

The next important problem that we have to address is that of selection. Herethe crucial aspect of Web traffic is the popularity of web objects. Popularity is ameasure for the access probability or the relative access frequency of single webobjects. High popularity of a web object will result in this web object beingrequested more often than web objects with lower popularity. Thus web objectswith high popularity can lead to hotspots in the underlying networking infras-tructure. It is thus obvious that the Page Selection problem is highly significantfor generating realistic workload. Below the selection mechanism of SSFweb isdescribed.

3.2.1 Basics

When trying to generate a sequence of requests with every web object beingaccessed with a certain probability, we have to assign an access probability to eachweb object. There are two possible solutions to address this problem. A simpleapproach would be to use a static SURGE-like way. In SURGE the popularity of aweb object is determined by finding a permutation of access probabilities in a waythat applying this permutation leads to the desired distributions [BC98]. Sincepopularity is a static parameter in SURGE, this approach has certain limitations.One goal of SSFweb is to provide the possibility of changing the popularity ofweb objects with simulated time. Another thing we want to provide in SSFweb ismanually assigning a popularity for each web object that does not have to followa probability distribution described by a closed mathematical formula.

Both targets are hard if not impossible to reach with the SURGE approach.We thus decided to use a solution like that provided in NSWeb [Wal01]. Thefirst thing we do is creating a vector of all the web objects. This way every webobject can be accessed using its index in the vector. You can see an example ofsuch a vector in Figure 3.2.

Now we generate random values according to the desired probability distri-bution and use these values as an index in the web object vector. In Figure 3.2the web object vector is shown with an assigned probability distribution. Webobjects with smaller index are accessed more often than those with higher index.

Page 18: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

12 CHAPTER 3. DESIGN AND IMPLEMENTATION

Figure 3.2: Web Object Vector with Access Probabilities

We are left however with certain problems we still need to solve. SSFwebprovides two significantly different kinds of probability distributions, a manuallygenerated distribution and several statistical/mathematical distributions. Eachkind causes specific challenges which will be addressed in the subsequent sections.

3.2.2 Manual Popularity

The basic idea behind Manual Popularity is the same as behind manually gen-erating web files: the highest degree of freedom possible. SSFweb thus allowsthe user to configure an individual popularity for each web object. This way allresponsibility for generating realistic access distribution is left to the user. Thebig advantage of this approach is that it offers the possibility to be able to createvery special popularity scenarios. This might be used to model a scenario wherewe got one very popular web objects and all the other web objects are accessedonly very infrequently.

Realisation

In order to realize this idea we introduce a second vector in addition to theweb object vector. As you can see in Figure 3.3 this new vector (the popularityvector) has got the same length as the web object vector. The popularity vectoris a vector of double values each specified by the user in the DML configurationfile. Each double value represents the popularity of the web object with the sameindex as the value.

Page 19: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

3.2. WEB OBJECT SELECTION 13

Figure 3.3: Manual Popularity Model in SSFweb

The next step is to change the popularity vector into a cumulative popularityvector by summing up the single values. Having done this, how do we actuallydetermine the indices of the web objects that will be accessed? We solve this byusing a random number generator for uniform distributed values between 0 and1. As shown in Figure 3.3 we then multiply the random number by the highestvalue in the cumulative popularity distribution and perform a binary search onthe cumulative distribution vector to determine which interval the generated valuebelongs to. This way we get an index from the cumulative popularity distributionvector and can use this index also as an index into the web object vector. Whatwe get is a pseudo random number generator that will generate numbers between0 and the length of the web object vector minus 1. The distribution of thosevalues matches the popularities of the web objects configured by the user.

However we recommend to use this approach mainly for manually generatedweb objects since in the current implementation all web objects statistically gener-

Page 20: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

14 CHAPTER 3. DESIGN AND IMPLEMENTATION

ated by a certain PageSet can only be assigned the same popularity. Generating arandom popularity for each automatically generated page is not yet implementedand will have to be done with special care not to cause undesired results. Werecommend that when this feature is implemented to use range limited numbergenerators at any rate. In addition to this setback it is quite cumbersome toconfigure especially for large scenarios. It is unfortunately not possible to reacha scaling with manual popularity as good as that of a probability distributionprovided by SSFNet. This leads us to the next section.

3.2.3 Statistical Popularity

In contrast to manual popularity the user does not have to configure a individualprobability for each web object, but an overall probability distribution for all webobjects is given when using the solution introduced in this section. This leavesus with the question of how to find the index for the web object that is to berequested form a given probability distribution.

As a first step we can a use random number generator which generates randomvalues that underlie our given probability distribution. If P (u) is the cumulativedistribution function of the desired distribution with u ∈ [0; 1[, a random numbergenerator can generate values according to a certain probability distribution byusing the inverse P−1(u). The resulting values underlie the desired probabilitydistribution if P−1 is applied to uniform distributed values in [0; 1[. SSFNetprovides us with various of those random number generators, see [SSFc].

The randomly distributed values generated by such random number genera-tors can be arbitrarily large. We could now simply round the generated values,discard those that are larger than the highest index in the web object vector anduse the generated values directly as an index for the web object vector. Thisapproach certainly works, however there is one big concern coming along with it:whenever only a certain range of a distribution is used, then the distributionalproperties are changed. E. g. cutting off too many values from a Pareto distri-bution causes the distribution not to show any heavy tail anymore but to showproperties similar to the Exponential distribution. For this reason we carefullyneed to limit the range of the used random number generators. We need to calcu-late a cutoff value c in such a way that a sufficient number of all generated valuesare smaller than c to avoid a significant change in distributional properties.

Below we give the same example for calculating c as given in [Wal01], namelythe calculation of the cutoff value for the standard Exponential distribution.

pexp = e−x, x > 0 and Pexp = 1− e−x

To avoid a significant change of the distribution properties, the cutoff is chosenat a point where the cumulative distribution function P (x) of the random numbergenerator reaches a certain limit l with l + ε = 1, ε → 0. The cutoff value c can

Page 21: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

3.3. SESSIONS 15

thus be found as c = P−1(l). If l is chosen as 0.9999 then 99.99% of all valuesare smaller than c.

Thus

P exp−1(x) = − log(1− x) and P exp

−1(0.9999) = − log(1− 0.9999) = 9.2103

For the Standard Exponential Distribution, 99.99% of all generated values aresmaller than 9.2103.

This way our random number generators provide us with random values be-tween 0 and c. In order to select an item form the web object vector of fixedlength N , the index i can be determined by scaling the possible random values[0; c] to the length of the vector:

i = rand · N

c

So far cutoff values for SSFweb have to be calculated manually by the userand specified along with the configuration. Generating these cutoff values auto-matically according to the distribution parameters is a possible improvement forthe future.

Sorted Vector

We are now able to generate web objects and we are able to access those webobjects according to a given probability distribution. So far web object sizesand access probability are totally independent of each other. Unfortunately thisis not true for real life traffic. Observations have shown that in reality accessprobability closely follows Zipf’s Law [Zip49]. Zipf’s law is stating that smallerweb objects have generally a higher access probability than larger ones. [Wal01]has shown that this dilemma can be solved by using a sorted vector. By assigningsmaller indices to smaller web objects and using an access probability distributionthat causes higher access probability for smaller indices, we can simulate real life.More precise we can e. g. use an ascending sorted vector and a Pareto distributionlike [Wal01]. A user can specify in the DML file whether or not he wants thevector sorted (see Appendix B.1).

3.3 Sessions

Having examined what web objects are accessed and the access rate for each webobject we are still left with the question of when (at which point of time) theyshould be accessed.

Page 22: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

16 CHAPTER 3. DESIGN AND IMPLEMENTATION

3.3.1 Basic Model

Try to imagine the typical behavior of a web user: The user wants to view someweb content, he will thus open his/her browser and request a web object. Thiswill cause the browser to start downloading the base HTML file (the web page),parse the file and search for embedded objects. Whenever the browser findsreferences to embedded objects it will request those objects from the server. Theweb user will have a look at the displayed page with the embedded objects andperhaps click on a link which will prompt the browser to request the next webpage.

Figure 3.4: ON/OFF Model used in SURGE

Figure 3.4 shows how this simple scenario is modeled in SURGE [BC98]. Aswe can see there are two types of idle times, active off and inactive off. Applyingthese terms to our example above, inactive off would refer to the period of timeafter the browser has finished loading an entire web object and before the userissues the next request. Simply speaking, inactive off is the time while the userviews the downloaded content. In contrast, active off refers to the time betweenthe single requests for web files. This might e. g. be the time a browser needs toparse a HTML file.

As you already might have recognized this is a model that only applies tonon pipelined connections. This causes problems in generating realistic work-load: Modern web browsers typically use persistent and pipelined connections.In addition it is also quite common to use more than one connection to a serverin parallel to reduce time for loading web objects. These features however have abig influence on network traffic [NG97]. Using just Simple (i. e. non-persistent)connections can cause a significant overhead for setting up new connection andtearing it down again for each single requested web file, especially if the requestedfiles are quite small. We refer the reader to Section 2.1 and Figure 2.1.

In order to be able to simulate pipelined connections, the SURGE model hasbeen slightly modified for NSWeb [Wal01]. As you can see in Figure 3.5 activeoff times have been omitted and a new type off idle times has been introduced,

Page 23: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

3.3. SESSIONS 17

Figure 3.5: The Session Model of NSWeb (Simple Connections)

so called intersession times. The inter-session phase is basically a long inactiveoff time. This way still two different types of idle times are present and theoverall model is preserved. [BC98] and [Wal01] have proven that both models aresufficient to generate realistic web traffic, generating long idle periods followedby periods of high activity.

Figure 3.6: The Session Model of SSFweb (Simple Connections)

For SSFweb we have chosen a combination of both models. In order to beable to simulate pipelined traffic we can not use the SURGE model exclusively.To be able to provide highest possible flexibility to the user of SSFweb we donot want to give up active off times completely either. Figures 3.6 and 3.7show our model for Simple and Persistent connections. As you can see the only

Page 24: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

18 CHAPTER 3. DESIGN AND IMPLEMENTATION

Figure 3.7: The Session Model of SSFweb (Persistent Connections)

difference between Figure 3.6 representing SSFweb and Figure 3.5 representingNSweb are the additional active off times. However we had to make a compromisefor pipelined connections: In this case active off time is only used between loadingthe web page and loading the embedded objects. It is not used in between requestsfor the embedded objects.

3.3.2 Implementation

Implementing the model described in the last section is mainly a matter ofscheduling requests. As users are alternating between requesting web objectsand doing nothing, we realize this behavior by a client application in an infiniteloop altering between processing a request and being idle. The behavior of ourloop depends on three basic parameters: active off time, inactive off time andintersession time. Each of these parameters can be described by a probabilitydistribution. SURGE for example uses a Weibull distribution to determine thelength of active off times and a Pareto distribution for inactive off times.

In SSFweb it is possible to simulate the models of SURGE and NSWeb. Thisis due to the fact that SSFweb e. g. offers the possibility to set the Active-Off-Time explicitly to 0.

Parameter Default

Inter-Session-Time Pareto Distribution with α = 1.4 and k = 20Inactive-Off-Time Pareto Distribution with α = 1.4 and k = 1Active-Off-Time Weibull Distribution with a = 1.46 and b = 0.328Pages-Per-Session Pareto Distribution with α = 3.0 and k = 10

Table 3.2: Default Session Parameters

Page 25: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

3.4. PUTTING IT ALL TOGETHER 19

Besides the distributions for the different idle times we must not forget thatwe actually need a fourth parameter namely the number of pages per session toproperly implement the session model. This parameter will be described by aprobability distribution, too. Table 3.2 shows the default values used in SSFweb.Of course the user is free to specify different distributions. Let us now go back tothe actual implementation. SSFNet fortunately provides enough features to makethe implementation of sessions quite straight forward. For scheduling we use so-called Timers. A Timer can be set to an arbitrary-length time span. After thissimulated time span has expired, the timer will execute its callback() method.

When the Simulation is started we have to be careful in order to avoid syn-chronization. For this sake we use an incremental delay to keep all clients fromstarting at the same time. For each client a first timer is set to the start timeof the previously initiated client plus a random time span determined with thehelp of a uniform random number generator. When this timer expires a firstsession is started. Each time a session is started the number of web objects thatwill be loaded within this session is given by the pages-per-session distribution(Table 3.2). Let us assume we use a single non-pipelined connection to makethis explanation a bit easier. On determining how many web objects to loadin the session, the request for the first page is issued. After the reply has beenreceived, a new timer is set using the active-off distribution. On expiration ofthis timer, the first of the embedded objects is loaded and a new active-off timeris set. Once all embedded objects are loaded a new timer is set with the help ofthe inactive-off distribution and the same procedure to load a web object startsagain. Once all objects for this session are loaded, no inactive-off timer is set buta timer that schedules the next session.

3.4 Putting It All Together

The goal of this work is to develop a Web Workload Generator for the SSFNet Sim-ulator. In the previous three sections we had a look at three separate aspects thatare important when designing and implementing a Web Workload Generator. Inorder to actually get a working implementation we need to put those three piecestogether and fill in the gaps between. This will be done in this section. We willfirst give a general overview of SSFweb’s structure and then take a top downapproach on the networking layers used.

3.4.1 Global Structure

The basic design for our workload generator is partly predetermined by the realWeb. Recall from Section 2.1 that the World Wide Web uses a client serverarchitecture. This gives us the first two parts of our design: client and server.Distributing the tasks create web objects, manage access and request web files,

Page 26: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

20 CHAPTER 3. DESIGN AND IMPLEMENTATION

identified in the last section, to the single parts of our design is straightforwardin case of the third task, request web files. Requesting web files is obviously atask for our client application. The other two tasks leave us with a problem. Onemight consider the creation of web objects to be a task for the servers, howevermanaging access is neither really a task for the individual clients nor for theindividual servers since accessing web objects as described in Section 3.2 requiresoperating on all web objects. The probability of a web objects is given relativeto all other web objects, not e. g. only relative to those residing on the sameserver. For this reason we introduce a third part for SSFweb, a central instancecalled Manager that creates the web objects and manages access.

Below the responsibilities of the single parts are summarized again:

1. Manager

• Generate web objects as described in 3.1

• Manage Web object access as described in 3.2 (i. e. based on popular-ities)

2. Servers

• Manage connections to multiple clients

• Reply to client requests

3. Clients

• Request web files from servers

• Implement session model described in 3.3

• Process loaded web pages and determine embedded objects to load

• Manage connections (simple, persistent or pipelined)

Before we will have a deeper look into the single parts, it is worth to considerfirst how the individual parts work together. Figure 3.8 shows the interaction ina graphical way. The client checks with the manager which web object it shouldrequest next and then request all the single files of this web object from theservers they are stored on.

We have chosen an approach for this interaction that might look a bit strangeat the first look. When the client asks the Manager which web object it shouldrequest next it does not get any location information but actually gets informa-tion about the web object itself from the manager. This web object consists ofweb page and embedded objects. Information on which servers it is stored isencapsulated into each web file. The client then passes the base web page to theserver and the server passes it back to the client. The client then subsequently

Page 27: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

3.4. PUTTING IT ALL TOGETHER 21

Figure 3.8: Interaction of Client, Server and Manager

requests all the embedded objects by passing them to the server and waiting toget them back.

In order to understand why this approach does not necessarily influence net-work traffic we have introduce one more of SSFNet’s concepts. In SSFNet wedistinguish between two types of data, real and virtual data. Virtual data refersto bytes transferred via the network, whereas real data denotes actual informa-tion (e. g. size and location of a web object). The simulated network traffic isthus caused by virtual data. Real data is passed in zero simulated time. To beable to distinguish these two kinds of data we use send or transfer when talkingabout virtual data and passing data when we refer to real data. The passingof web object information described above thus refers to real data. The actualnetwork traffic will be caused by sending virtual data according to request andresponse size.

This approach has one big advantage: less complexity . Neither client norserver need to know anything about the available web files or where these filesare stored. Especially the implementation of our server becomes much easier thisway. Since everything that has to do with web object creation and web objectmanagement is left to a single part namely the Manager. A detailed diagram onthe structure of SSFweb can be found in Appendix A.

Page 28: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

22 CHAPTER 3. DESIGN AND IMPLEMENTATION

Protocol Design and Implementation in SSFNet

SSFNet provides a programming model similar to that of a modern operatingsystem: The programming model for network access is a protocol stack of differentprotocol layers with different responsibilities. For SSFweb we will implement anapplication layer protocol. In reality we would be implementing an applicationthat would use an application layer protocol (e. g. HTTP). In SSFNet howeveralso applications have to be implemented as a SSFNet protocol. We will have acloser look on the interaction of an application layer protocol with the remainingprotocol stack in Section 3.4.5. As you can see in Figure 3.8 only Client and Serverneed network access, thus the Manager does not need to be implemented as aprotocol. We can now start looking into the individual components of SSFweb.

3.4.2 The Manager

From the server’s and the client’s point of view the Manager is a black box thatprovides one single service, passing the next web object to request to a client. Inorder to provide this service our Manager needs to encapsulate certain properties.It needs to implement the selection mechanism described in Section 3.2. This isdone as described in in a single method that operates on a web object vector asshown in Figure 3.9.

Figure 3.9 also shows the data structure used to store information about theavailable web objects. We define two structures for this sake. A web file encap-sulates information about its size and a set of servers this file can be requestedfrom. A web page needs additional information about a set of embedded objectthis page references.

As mentioned above the Manager is also responsible for creating web objects,this is done as described in Section 3.1. Since all clients have to be able to accessthe Manager, the Manager is implemented in Java as a static Class with onlystatic methods. For explanation of the meaning of static, we refer the readerto [Fla00].

3.4.3 The Server

The next component we will look at is the server. Among the three components ofSSFweb the server is the least complicated. The server listens for client requests,reads the simulated request from its socket (i. e. a user configured number ofbytes) and is passed the web file the client wants to get from the server at thesame time. The server then does a simple error checking whether the web file’slocation information actually matches the server itself. If not the server sends auser configured number of bytes (a simulated response) back to the client andpasses an error message. Otherwise the web file is passed back to the client and

Page 29: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

3.4. PUTTING IT ALL TOGETHER 23

Figure 3.9: The Manager Component of SSFweb

a number of bytes according to the size of the requested web file is written to thesocket.

The real challenge in writing this server is not the actual request processingbut is the fact that modern Web servers are able to answer more than one re-quest in parallel. For this reason modern web servers use at most one thread orprocess for each client connection to be able to serve more than one client at thesame time. However it is not possible to explicitly use threads or processes inSSFNet. Parallelism in SSFNet can be reached with the help of a programmingfeature called Continuation. A Continuation is an interface consisting of onlytwo methods, success() and failure().

Again you have to remember that we are working not in reality but in a simu-lation environment thus all calculations/method calls are done in zero simulatedtime, except those operations that explicitly want to consume simulated time.When our simulated server receives an connection request from a client it willcreate a new socket for this connection, just like in reality. It then calls read()

on this socket and a Continuation object is passed to this method. The read()

method will return after a zero simulated time as shown in Figure 3.10. But read-ing from a socket clearly involves transmission of data over the network and thusshould take simulated time. Now this is a conflict. it is solved by the fact that thesimulation environment will call the methods (failure() or success()) of thepassed Continuation object when the reading from the socket has finished afterthe simulated time. This way we get one or more additional, simulated threads.

Page 30: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

24 CHAPTER 3. DESIGN AND IMPLEMENTATION

Figure 3.10: Simulating Parallelism in SSFNet

3.4.4 The Client

The last component we have not yet looked at is the client, which happens tobe the most complex. The client is responsible for handling connections as wellas processing the received web pages. Moreover it has to implement the sessionmodel described in Section 3.3.

Let us start looking at connections. In order to simulate a realistic Webclient, our client has to be able to handle all three connection types describedin Section 2.1: simple, persistent and pipelined connections. Additionally it hasto be able to accommodate to the possible scenario that the client is configuredto use persistent connections but the server supports only simple connections.The last thing we have to consider when designing a connection management isthat modern Web clients are able to use several connections in parallel. For thissake we have to use the same technique to simulate parallelism as mentioned inSection 3.4.3.

To solve these challenges in SSFweb we use Connection Handlers. You mightthink of a Connection handler as a virtual Thread. In case of persistent andpipelined connections a Handler will be created for each connection opened. Thisis not entirely true for simple connections. With simple connections a handlerwill host more than one connection. In this case the number of Handlers createdmatches the number of connections used to a specific server in parallel. To givea better understanding of how a client is operating, we give an example below.

Page 31: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

3.4. PUTTING IT ALL TOGETHER 25

Figure 3.11: Example Request Processing in a SSFweb Client

Example Request Processing

For this example we assume the use of persistent connections and the use ofa maximum of two parallel connections to each server. Figure 3.11 shows thisscenario.

After the client has determined which web object to load next it will start withcreating a Handler for downloading the base web page. The client is passed theinformation by the Manager, which web object to load next. The Page Handler

Page 32: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

26 CHAPTER 3. DESIGN AND IMPLEMENTATION

opens a persistent connection to the server that houses the web page, sends arequest to the server and receives the answer. It then starts processing the webpage. In the scenario shown in Figure 3.11 it determines that there are fiveobjects embedded into the web page. Those embedded objects are then split upinto groups according to their location. In the example four of the embeddedobjects are stored on the same server as the web page, one is not.

The next step is to assign a connection to each web file. In our example we usea maximum of two connections to each server. The embedded objects stored ona specific server are thus assigned to one of the two connections to this server. InFigure 3.11 this means for server 1 that there are two connections with two objectsto be transferred on each connection. For server 2 we only got one embeddedobject and thus only need one connection. All in all we need three connections.Because of that three Handlers for persistent connections are created. Rememberfrom above that we already got one open connection to server 1. This connectionsis passed to Handler 1 in Figure 3.11. Only Handlers 2 and 3 will thus opennew connections. Each handler will request and load the embedded objects itis responsible for. Note that only the number of parallel connections to eachindividual server is limited but not the total number of connections a client canuse to different servers.

The session model is implemented as described in Section 3.3 by using Timersto cause sleep times between the single requests.

3.4.5 Network Access

In the last two sections we have mentioned that client and server read fromand write to sockets. It is worth to say a few words about the transport layerand network access. SSFNet uses a protocol stack similar to that of modernoperating systems. When writing an application, the layer it will interact withis the transport layer. The interface between the actual transport protocol andthe application is a so-called Socket, the ”door” of the application.

A goal when designing SSFweb was to remain as independent of the underlyingtransport protocol (such as TCP or UDP) as possible. In SSFNet the differencesbetween transport protocols are abstracted away by the SocketAPI . Speaking interms of Java, the SocketAPI is an Interface that is implemented by the actualtransport protocol dependent sockets. However this is not entirely true sincedifferent transport protocol implement the single SocketAPI methods in slightlydifferent ways. It is possible to change the transport protocol via the configurationof the client and the server, whereas slight changes to the code of SSFweb might benecessary. Note however, that SSFNet’s existing web workload module is designedsolely for TCP as underlying transport protocol. The necessary changes to thecode when adopting it to a different transport protocol would thus be more andmore complicated than with SSFweb. TCP has been tested far more extensivelywith SSFweb than UDP, though, and all evaluation in Chapter 4 has been done

Page 33: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

3.5. SUMMARY 27

using TCP.At this point we also have to mention a limitation of SSFNet’s TCP imple-

mentation. An original goal of SSFweb was also to allow the user to configurewhether server or client closes the TCP connection. In SSFNet however, in orderto close a TCP connection properly, both sockets, server-side as well as client-sidesocket, have to be closed.

3.5 Summary

In this chapter we have introduced the basic models for the design of our workloadgenerator SSFweb. Table 3.3 summarizes the default values used for creating WebObjects, managing Access and the session model. We have explained the threecomponents that make up the implementation. Web Content creation and accessmanagement is left to a global instance the Manager. Client and server simulatetheir equivalent in the real Web. The session model is embedded into the client.

Parameter Default

Page Size Pareto Distribution with α = 1.2, k = 13300Embedded Object Size Pareto Distribution with α = 1.1, k = 133000Embeddeds per Page Pareto Distribution with α = 2.43, k = 1Servers per Page 1Servers per Object 1Inter-Session-Time Pareto Distribution with α = 1.4 and k = 20Inactive-Off-Time Pareto Distribution with α = 1.4 and k = 1Active-Off-Time Weibull Distribution with a = 1.46 and b = 0.328Pages-Per-Session Pareto Distribution with α = 3.0 and k = 10

Table 3.3: Default Values for SSFweb

Page 34: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

Chapter 4

Evaluation

After covering design and implementation, we will now describe several teststhat we conduct to evaluate if the components of our workload generator workas desired. Each of our experiments will focus on a certain property that wasimportant for our design. We will thus take the following steps:

Our first two tests will concentrate on page generation and page selection.We will start by evaluating Manual Page Generation and Manual Popularity andtry to show that they work as specified. We will then go on with StatisticalPage Generation and Statistical Popularity. The last two experiments are thendesigned to test if the various connection types and the Session Model workproperly.

4.1 Topologies

In this section we describe several network topologies that were used for ourevaluation. The topologies are almost identical to those found in [Wal01] and[FGHW99]. We will use three different topologies:

4.1.1 Singlebell Topology

Figure 4.1: Singlebell Topology

28

Page 35: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

4.1. TOPOLOGIES 29

The first topology is called singlebell and is shown in Figure 4.1. The basictrait of this topology is the single server that is accessed by a large number ofclients. The Server is connected via a 100Mbit link and has a delay of 1ms. LinkB has the same bandwidth but a delay of 5 ms. It connects the server link toLink A. Link A has a bandwidth of only 1.5Mbit and serves as a bottleneck link.On the right side of the picture we have 420 Client nodes. These nodes have linkswith bandwidths ranging between 40 kbit and 100 kbit, simulating dial-up links.The delay for the client links is 1 ms.

4.1.2 Flexbell Topology

Figure 4.2: Flexbell Topology

The next topology, called flexbell, is shown in Figure 4.2. The right side ofthe topology is exactly the same as in the singlebell topology. The difference ison the left side. Instead of a single server, we now assemble four clusters of tenservers each. The cluster links differ both in terms of bandwidth as well as delayand represent the bottleneck links of this scenario. Between the client side andthe server side there are two links of 100Mbit bandwidth and a delay of 5mseach.

4.1.3 Unibell Topology

The unibell topology is a variant of the flexbell topology and shows the samestructure. The difference between unibell and flexbell topology lies in the uplinks(i. e. the links to the individual server clusters, namely links B, C, D and E) andthe client links. As pointed out in the last section, the flexbell topology usesdifferent bandwidth and delay for each uplink. In contrast to that, the unibell

Page 36: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

30 CHAPTER 4. EVALUATION

topology uses equal parameters for each link (45Mbit bandwidth and 2ms delay).For the client links equal parameters (50 kbit bandwidth, 1ms delay) are used aswell. The clients are thus provided with forty servers that can be reached overlinks with equal parameters.

4.2 Experiments

We are now ready to explain the actual evaluation. Section 4.2.1 will focuson Manual Page Generation and Manual Popularity. Section 4.2.2 will thengo on focusing on high-level page creation and statistical page popularity. InSection 4.2.3 we will evaluate the correct functioning of the different connectiontypes simple, persistent and pipelined. The final section is dedicated to the sessionmodel.

4.2.1 Manual Page Generation – Manual Popularity

Page Page Size Number of Total Size PopularityBytes Embedded Objects Bytes

Page 1 4000 1 64000 10Page 2 3000 3 166000 8Page 3 500 1 40500 6Page 4 13300 2 580300 4Page 5 400000 1 500000 2

Table 4.1: Experimental Setup

The goal of this section is to show that Manual Page Creation and ManualPopularity work as desired. For this experiment we use the singlebell topologyand only simple connections in order to set the exclusive focus on Manual PageCreation. The server is populated with a set of five pages that are configured asshown in Table 4.1. The 420 clients issue requests for those five pages.

The simulation was run for 72 hours of simulated time. During this time theclients issued 85965 requests for web objects. The measured results are sum-marized in Table 4.2. As you can see the values for total size and number ofembedded objects in Table 4.2 match those of Table 4.1. We can thus see thatthe pages were created correctly. Figure 4.3 shows a plot of the relative access rate(red) against the relative popularity (blue). The two lines are almost identicaland show that the Manual Popularity is working as expected.

Page 37: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

4.2. EXPERIMENTS 31

Page Number of Total Size Number ofEmbedded Objects Bytes Accesses

Page 1 1 64000 28564Page 2 3 166000 22913Page 3 1 40500 17294Page 4 2 580300 11414Page 5 1 500000 5766

Table 4.2: Experimental Results

Figure 4.3: Page Accesses (Red – relative access rate; Blue – relative popularity)

4.2.2 Statistical Page Generation – Statistical Popularity

After we have shown that the low level methods for page creation and page ac-cess work properly we will go on with the according high level methods. Again,for simplification purposes, we will not use any advanced features like pipelin-ing, persistent or parallel connections. We use two different test setups for theevaluation. In the first one, all web files will be hosted by a single server; inthe second one the web files are distributed to several servers. Both scenariosare configured with the default parameters shown in Table 3.3. The popularitydistribution being used is a Pareto distribution with α = 1.5 and k = 1. The webobject vector is sorted ascending.

Page 38: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

32 CHAPTER 4. EVALUATION

Figure 4.4: Single Server Distributions

Page 39: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

4.2. EXPERIMENTS 33

Single Server

For the experiment with all pages located on a single server the, topology used isof course the singlebell topology. As in Section 4.2.1 there are 420 clients issuingrequests. The server is populated with a set of 200 Pages.

During the simulated time of 72 hours the clients issued a total of 104866requests. Figure 4.4 shows the resulting distributions for the scenario. CCDFplots are given for the sizes of single web files (a), for the size of whole webobjects (b) and additionally for the sizes of just embedded Objects. All threeplots show the typical properties of a Pareto distribution. This is consistentwith the distributions used to generate the sizes of the Web files and shows thatthe mechanism for generating web files works as desired. Figure 4.4 (c) showsa density plot for the number of embedded objects. By far most web objectsfeature one to three embedded objects and only very few have a larger numberof embedded objects. This is consistent with the Pareto distribution used

The lower row of Figure 4.4 shows CCDF plots of the transfer sizes (e) and thenumber of page accesses (f). Transfer sizes refers to the sizes of the transferredweb files. Plot (f) shows that most pages have been accessed very rarely andvery few pages have been accessed very often. In fact, of the 104866 requests,89384 requests were issued for a single page. This is due to the used popularitydistribution and thus as expected. The transfer sizes shown in Plot (e) are aresult of a combination of the sorted page vector and the access distribution.Thus small web objects are accessed by far more often than larger ones. Againthis is as desired.

Multiple Servers

The next scenario will be similar to that described in the last section. We arenow using the flexbell topology and distribute a set of pages over these servers.While we have used only a single server and 200 Pages in the last scenario wenow will populate forty servers with 8000 pages. The simulation is run for 72hours simulated time again. During this time the clients issued a total of 270693requests.

Figure 4.5 summarizes the measurements gained from this scenario. Thedensity plot of the web file sizes (a) shows to distinct peaks. This is consistentwith the configuration of our scenario: The two peaks are caused by the twodifferent distributions used for web page sizes (left peak) and embedded objectsizes (right peak). As expected, the density plot of transfer sizes (b) shows thesame two peaks as the web file size plot. Plot (c) showing a CCDF plot for pageaccesses is close to the corresponding one in the previous section, indicating thatthe page access works for multiple servers as ell as for a single server. The plotmost interesting for this scenario is the density plot of the numbers of web objectsper server (d). The server ids range from 420 to 459 and are plot on the x-axis.

Page 40: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

34 CHAPTER 4. EVALUATION

Figure 4.5: Multiple Server Distributions

As desired this plot shows properties of a uniform distribution meaning that allservers host roughly the same number of web objects.

4.2.3 Connection Types

Having shown that page creation and access work as desired, we now will focuson the various available connection types between client and server. We evaluatewhether the different connection types (simple, persistent, pipelined and parallel)work properly in two scenarios.

Both scenarios will use the unibell topology as described in Section 4.1.3. Wechose the unibell topology to have several servers available and to provide equalnetwork conditions for all page requests. All servers are configured to supportsimple as well as persistent connections. In SSFweb pipelined connections arehandled by the server the same way as persistent connections. On the other endof the network we use three different types of clients. The first type supports only

Page 41: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

4.2. EXPERIMENTS 35

simple connections, the second one uses persistent connections and our third typesupports pipelined connections. The only difference between the two scenarios isthe number of connections a client can open in parallel to one server. In the firstscenario the clients are only allowed to open a single connection to each server,while in the second scenario clients may open up to four connections.

1 Conn./Server 4 Conns./ServerConn. Type simple persist. pipel. simple persist. pipel.

# Connections 84540 28480 28780 84765 44021 44316avg. transfers 1.0 2.6 2.6 1.0 1.7 1.7per connectionmax. transfers 1 37 37 1 10 10per connectionavg. lifetime [sec] 148.78 445.45 437.86 175.51 330.54 329.48active time [sec] 140.30 436.98 429.39 165.47 320.49 319.43avg. data [kBytes] 480.0 1303.0 1278.6 478.7 840.0 837.2avg. transfer rate 3503.3 3053.4 3049.2 2962.4 2683.9 2683.8[bytes/sec]

Table 4.3: Statistics for Persistent/Pipelined Connection Test

Table 4.3 shows the connection statistics for both scenarios. For scenario one(single connection) the number of persistent and pipelined connections is almostidentical whereas the number of single connections is about three times as big.This matches the observed behavior of transfers per connection. Where the singleconnections transfer only one web file per connection, persistent and pipelinedtransfer an average of 2.6. When we now look at the results for our secondscenario (up to four connections in parallel per server) we can notice that usingseveral parallel connections has little influence on the total number of simpleconnections, if any at all. The number of persistent and pipelined connectionsin contrast to that almost doubles, while the number of transfers per connectiondrops to 1.7 accordingly.

The same behavior can be observed when looking at the average lifetime andactive time of a connection and the average number of bytes transferred. Incontrast to lifetime, active time refers to the time when there is actual trafficon the connection. The values obtained for simple connections remain almostunchanged for both scenarios with a slight increase on the lifetime. The lifetimeof persistent and pipelined connections does not drop as much as the number ofbytes transferred per connection. This can be explained by congestion on thenetwork caused by the larger number of TCP connections.

To further analyze the effects of the different connection types on loading aweb object we will look at the time needed to load a web page and all embeddedobjects. The time for loading a web object can be determined by calculating the

Page 42: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

36 CHAPTER 4. EVALUATION

1 Conn./Server 4 Conns./ServerConn. Type simple persist. pipel. simple persist. pipel.

247.7 kBytes 104.7 103.7 102.3 115.8 108.3 112.11 Embedded2840.2 kBytes 1014.8 886.5 767.2 795.3 765.7 724.14 Embeddeds4962.3 kBytes 2006.3 1323.6 891.2 2008.2 1165.8 744.312 Embeddeds

Table 4.4: Statistics for Persistent/Pipelined Connection Test

time from when the first request for the web page is issued until all embeddedobjects have been received by the client. The average response times for threesample web objects of our two scenarios are given as example in Table 4.4. Thefirst page is chosen as a web page with a single embedded object. As we have seenin the last sections this represents the majority of the web objects created. Thesecond web object is chosen to have as many embedded objects as the number ofconnections the clients are allowed to use in parallel to a single server. The thirdone represents a web object with far more embedded objects than the maximumnumber for parallel connections.

For the page with only one embedded object the difference between the sin-gle connection types is very small. This is due the the fact that persistent andpipelined connections are reused only once. Thus the advantage over simple con-nections is rather small. Multiple connections are meaningless for web objectswith just a single embedded object. This is due to the model incorporated inSSFweb. The web page is always loaded completely before the loading for em-bedded objects starts. Multiple connections thus only have an effect on embeddedobjects.

When the loaded page has a number of embedded objects that approximatelymatches the number of connections a client may use in parallel to a single server,persistent and pipelined connections offer a better response time than simpleconnections. When several connections are allowed to be used in parallel SSFwebwill always use the maximum number of connections allowed. In this case wherethe maximum number of connection matches that of embedded objects, eachobject is downloaded via its own connection. This is the reason why the differencein response times between simple, persistent and pipelined connections is muchless significant for multiple connections, although the response time decreases forall connection types.

We get a slightly different picture when we look at the response times fora web object with a significantly larger number of embedded objects than themaximum number of connections. In this scenario the advantage of persistentand pipelined connections is clearly shown. The differences between response

Page 43: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

4.3. SUMMARY 37

times for the single connection types increases distinctly. Multiple connectionsshould show significant impact on response times since every connection nowtransfers more than one file. However the decrease on response time is not asbig as expected. For multiple simple connections, the response time even slightlyincreases when compared to a single simple connection. This is due to networkcongestion caused by the large number of TCP connections.

The observed values show that persistent and pipelined connections can de-crease response time compared to simple connections. So can also the use ofmultiple connections in parallel. However the experiments have also shown thattoo many parallel connections can nullify the desired effects. The obtained resultsthis way match the expected ones and imply that the different connection typeswork correctly.

4.2.4 Session Model

As a last feature we will have a short look at the session model. We will use theresults obtained from the experiment in the previous section (Unibell topology –simple, persistent and pipelined connection – single connection per server).

Figure 4.6 (a) shows a density plot for the Inactive Off Time, (b) shows adensity plot for the Inter-Session Times. Both show the desired shape of the usedPareto distributions, whereas Inter-Session Times generally tend to be larger thanInactive Off Times, just as it is desired and thus indicating that off times workcorrectly. Figure 4.6 (c) is a density plot for the number of pages per session.The curve shows the properties of the desired distribution (Table 3.3) as well,indicating that the implementation of the session model (Section 3.3) works asdesigned.

4.3 Summary

The experiments of this chapter were designed to show that SSFweb implementsthe design features described in Chapter 3 correctly. The results of the experi-ments showed that both page creation mechanisms, low level as well as high levelwork as desired. We have moreover shown that our implementation for ManualPopularity also works properly. We have then shown that the selection mecha-nism with statistical popularities is working for a set of web pages situated on asingle server as well as a set of web pages distributed over several servers. Theexperiments also show that the configuration parameters have to be chosen withgreat care to get the desired results.

Moreover the various types of connections (simple, persistent and pipelined)have shown to work as expected. The final result of our tests is that the clientschange between activity and sleep times as desired. Tests on the scaling behaviorof SSFweb and an evaluation on how closely the generated traffic matches real-

Page 44: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

38 CHAPTER 4. EVALUATION

Figure 4.6: Session Model Distributions

world traffic would be desirable as well. However, such additional analysis isbeyond the scope of this work. We will thus leave this matter to our successorsworking on SSFweb.

Page 45: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

Chapter 5

Conclusion

The goal of this thesis was to develop and implement a Web workload gener-ator for the SSFNet simulation environment. This simulator was to be highlyparameterizable and to provide typical networking related features incorporatedin modern Web browsers and Web servers, such as persistent and pipelined con-nections.

We have started our work by looking at the properties of the World WideWeb. We have then introduced three different already existing web workloadgenerators. These three generators were the web workload implementation ofSSFNet (release 1.5), a tool set for workload generation called SURGE and finallya Web Workload Generator for the NS 2 network simulator called NSWeb. Fromthose three sources we derived the models and principals for our work.

We then focused on web content creation. Our basic model for web contentcreation are web pages that reference a number of embedded objects. Web pageand embedded objects are statically combined to web objects. We have intro-duced two different ways of creating web content. The first one is a low levelmethod and leaves all freedom and responsibility to the user. This method canbe cumbersome to use for larger scenarios; we have thus also introduced a highlevel method. With the high level method the web objects are created accordingto freely configurable distributions for file sizes and number of embedded objects.With these two methods we presented a highly parameterizable way of creatingweb content.

The next step we have done was to specify how often to access which pages.In order to also keep page selection as flexible as possible, we have adopted andextend the mechanism of NSWeb. SSFweb provides two significantly differentways of assigning popularities to web objects. The first offers the possibility tothe user to specify an overall popularity distribution according to which the webobjects are accessed. This mechanism is easy to use but lacks a certain amountof freedom. For this reason SSFweb also provides a second possibility. Thisalternative mechanism makes it possible to assign certain popularities targetedto specific web objects.

39

Page 46: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

40 CHAPTER 5. CONCLUSION

After having modeled which files are accessed and how often they are re-quested we continued by modeling when they are accessed. For this sake we havecombined the models of SURGE and NSWeb. A user is modeled as a process inan infinite loop that alters between requesting files and being idle. This way theburstiness of Web traffic is introduced.

As a last step of the design and implementation of our web workload gener-ator we had a look at the overall structure. We explained that there are threemain components for our workload generator. The central part is the Manager.The Manager is responsible for creating the web content and for managing theaccess to the web files. We have then described how the server handles clientrequests and along with this we also introduced a method of creating parallelismin SSFNet. We then described how the client manages connections and requestsand gave an example on how a request is scheduled. Finally we explained howour client and server can access the simulated network via sockets.

The third and last part of this thesis covers a simple evaluation on the sin-gle properties of our workload generator. Each of the desired properties areevaluated by using a specific simulation focused on a single property. We havedemonstrated that web content creation in our implementation works as desired.The different connection types were proven to work correctly in another tests.The final experiments showed that the clients change between activity and beingidle as desired.

The result of this work is thus a highly parameterizable simulation environ-ment that generates web workload.

Page 47: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

Chapter 6

Outlook and Open Problems

In this thesis we have developed a simulation environment called SSFweb. Simu-lation can always just be a tool to help analyzing the influence of different factorslike user behavior or network topology. It can however never be the sole answerto all questions that might appear in networking. A simulation environmentcan only be as realistic as the concepts it is based on and the parameters it isconfigured with. These parameters have to be found by analyzing real life traffic.

With SSFweb we have laid the foundation for a highly parameterizable simu-lation environment for SSFNet but there are still several things left for improve-ments of the single components:

A first possibility is given by the server in the way connections are handled.In the current implementation pipelined connections are treated the same wayas persistent connections. This should be changed in further releases. In realitya web server needs a certain time to process a request and thus time betweenreceiving a request and sending a response back to the client. This characteristicis currently not available in SSFweb. Our server sends the response the momentthe request is received. The solution for this problem might be using an additionalTimer to delay the response. So far no adequate distribution for this delay timehas been determined.

The client might be improved as well. A modern web browser starts processinga web page already while it is receiving the web page file. As soon as the browserfinds a reference to an embedded object it starts loading this object, no matter ifit has already finished loading the web page. This is different from the SSFwebclient: Our client starts loading the embedded objects only after it has finishedloading the entire web page. It is subject to further research to find an accuratemodel for this behavior. Another aspect of real world networks the simulationenvironment does not cover as well, are both client and proxy caches. Cachinghowever has an important effect on the resulting network traffic and is thus a topcandidate for possible extensions to SSFweb.

A last point for improvement is Page Access. A typical web user opens a webpage by providing the URL to the web browser. The user then reads the content

41

Page 48: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

42 CHAPTER 6. OUTLOOK AND OPEN PROBLEMS

and probably clicks on a link and the browser starts loading the referenced webobject. Those referenced web objects often are hosted by the same server as thereferencing web page. Currently the selection of a web object for download isabsolutely independent from the last page the client has loaded. In order to solvethis problem we suggest two different solutions. The decision which method isthe better one is subject to further research.

• The first solution proposed is to directly adopt the real world model bylinking the pages among each other. This would imply developing a modelfor the number of links that reference web objects on the same server or adifferent server respectively.

• A second possible solution is adding a second access method to the Managercomponent. The client specifies the server it wants to download the nextpage from and the Manager chooses a page according to this preference.

A cumbersome point of the configuration for the Manger is the statisticalpage access distribution. The user has to specify a cutoff manually, the cutoffshould better be calculated automatically according to the distribution param-eters. [Wal01] provides a solution for this using so called Access Generators.By using a page vector to assign popularities to web objects according to theirrank in the vector we have introduced a flexible way for accessing web objects.This flexibility might be taken further by creating the possibility to dynamicallychange the popularities of web objects during a simulation run by changing theirposition in the vector.

We hope that we have laid a good foundation for our successors working onSSFweb and that the open problems that were beyond the scope of this work canbe solved.

Page 49: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

Appendix A

Design Overview

Figure A.1 shows a UML-like classdiagram of SSFweb.

43

Page 50: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

44 APPENDIX A. DESIGN OVERVIEW

Figure A.1: SSFweb Structure

Page 51: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

Appendix B

Sample DML-Files

This section provides you with commented Sample DML files for configuring Webcontent, Server and Client. These files could be directly used and embedded intoyour DML-files.

B.1 Web Content Configuration

# webcontent-sample.dml## A basic example on using the Web-Workload-Generator.# This file is supposed to clarify the configuration# of the Manager-class.### The entrance point is webcontent. Webcontent must# be specified on the top level# of your DML-file, i.e. NOT inside e.g. Net [...].

webcontent [

# ’popularity’ is compulsory. You must specify# either "manual" or a random distribution.# in this case we will use "manual"

popularity manual

# as mentioned above the other option# available would be e.g.# popularity [# distribution [# name "Pareto" # distribution class name# k 2000.0 # scale (cutoff) parameter

45

Page 52: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

46 APPENDIX B. SAMPLE DML-FILES

# alpha 1.2 # shape (exponent) parameter# ]# cutoff 9.2# sort true# ]## where the value of cutoff is optional and# if specified is used to limit the# range of the given popularity distribution

# The actual webcontent will be grouped as pagesets.# There are 2 options available for pagesets# - manual i.e. you got to specify all data necessary# for a webpage yourself.# - statsitical i.e. webpages will be created through# statistical methods according to your specification# You can specify an arbirtary number of pagesets.## NOTE: If you mix manual and statistical pagesets.# All pages specified manually will be at the# begining of the list containing all pages.# This has to be considered if you specify a# popularity distribution.## We will give an example for each type of pageset.

pageset [# ’type’ is compulsory and can either be# ’manual’ or ’statistically’

type manual

# if ’type’ has been set to manual,# you now can specify an arbitrary number# of pages.

page [# page’s size (compulsory)

size 400

# page’s name (optional)

name test_page

Page 53: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

B.1. WEB CONTENT CONFIGURATION 47

# page’s popularity (compulsory if# .webcontent.popularity has been set to# ’manual’, unnecessary otherwise)# the actual size of the number set for# popularity does not matter, the only thing# that matters is the ratio of the page’s# popularities

popularity 40

# servers that will be hosting this page (compulsory)# You can specify an arbitrary number of servers.

servers [nhi 1(0) port 1600]servers [port 10 nhi_range [from 1:2(0) to 1:5(0)]]

# objects embeded into this page (optional)# You can specify an arbitrary number of objects.

object [# object size (compulsory)

size 600

# object location (optional)# default is same location as page

servers [nhi 1(0) port 1600]servers [port 10 nhi_range [from 1:2(0) to 1:5(0)]]

]]

]

pageset [type statistical

# if on_request is set to true pages# won’t be in memory but a new page will# will be created on each request

on_request false

Page 54: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

48 APPENDIX B. SAMPLE DML-FILES

# if ’type’ is set to ’statistical’ you got# to specify how many pages should be created

num_of_pages 400

# except ’type’ and ’num_of_pages’# all other attributes demonstrated here# are optional. If not specified, defaults will be used.

# popularity is compulsory iff# .webcontent.popularity has been set to manual# All pages created with this configuration will# be asigned the same popularity

popularity 1

# servers_per_page specifies on how many servers# each page will be hosted.# default is 1.

servers_per_page 2

# page_size can either be an Integer or a distribution# for default see webcontent-defaults.dml## page_size 400 or

page_size [distribution [name "Pareto" # distribution class namek 2000.0 # scale parameteralpha 1.2 # shape parameter

]]

# you can specify a list of servers hosting these pages# if nothing is specified pages will be equally# distributed to all servers available

servers [nhi 1(0) port 1600]servers [port 10 nhi_range [from 1:2(0) to 1:5(0)]]

# objects_per_page specifies how many objects are# embedded into each page# can be Integer or a Distribution

Page 55: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

B.1. WEB CONTENT CONFIGURATION 49

# for default see webcontent-defaults.dml## objects_per_page 5 or

objects_per_page [distribution [name "Pareto" # distribution class namek 0.6667 # scale parameteralpha 1.2 # shape parameter

]]

# servers_per_object specifies on how many# servers each object will be hosted.# default is 1.

servers_per_object 2

# object_size can either be an Integer or a distribution# for default see webcontent-defaults.dml## object_size 400 or

object_size [distribution [name "Pareto" # distribution class namek 2000.0 # scale parameteralpha 1.2 # shape parameter

]]

# you can specify a list of servers hosting these objects# default: Object will reside on same Servers as Page

object_servers [nhi 1(0) port 1600]object_servers [port 10 nhi_range [from 1:2(0) to 1:5(0)]]

]]

Page 56: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

50 APPENDIX B. SAMPLE DML-FILES

B.2 Server Configuration

# Configure this HttpServer. Example of a valid configuration:

ProtocolSession [

name server use de.tum.ssf.os.www.HttpServer

# server’s wellknow port (optional)# if omitted, default is 80

port 80

# use persistent connection (optional)# if omitted, default is true

persistent_connection true

# transport protocol used by this server (optional)# if omitted, default is tcp

transport_protocol tcp

# max number of simultaneous established# client connections (optional)# if omitted, default is unlimited.# If exceeded, connection request is ignored# and left to time out.

client_limit 10

# maximum size of pending connection request queue (optional)# if omitted default is 5.# Determines the total number of pending# connection requests (see SSF.OS.TCP.tcpSocket).# If exceeded, the listening socket sends RESET# to client and drops the request.

queue_limit 5

# nominal HTTP header size (virtual bytes) read from a socket# before reading data (if indicated by http header) (optional)# if omitted, default is 1000

http_hdr_size 1000

Page 57: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

B.3. CLIENT CONFIGURATION 51

# timeout for connection, if server doesn’t recieve client# request within specified time (optional)# if omitted, default is 1200

timeout 1200

# print verbose output (optional)# if omitted, default is false

debug false

# print logan output (optional)# if ommitted, default is false

logan false]

B.3 Client Configuration

# Configure the client. Example of a valid configuration:

ProtocolSession [

name client use de.tum.ssf.os.www.HttpClient

# use persistent connection (optional)# if omitted, default is true

persistent_connection true

# use pipelining (optional)# if omitted, default is false

pipelining false

# number of parallel connections opened to one server (optional)# if omitted, default is 1

parallel_connections 2

# transport protocol used by this server (optional)# if omitted, default is tcp

transport_protocol tcp

Page 58: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

52 APPENDIX B. SAMPLE DML-FILES

# nominal HTTP header size (virtual bytes) read from a socket# before reading data (if indicated by http header) (optional)# if omitted, default is 1000

http_hdr_size 1000

# timeout for connection, if client doesn’t recieve server# response within specified time (optional)# if omitted, default is 1200 sec

timeout 1200

# print verbose output (optional)# if omitted, default is false

debug false

# print report for each session (optional)# if omitted, default is false

show_session_report false

# print logan output (optional)# if ommitted, default is false

logan false

# distribution for sleep time between sessions (optional)inter_session_time [distribution [

name "Pareto"k 1alpha 1.5

]]

# distribution for number of pages per session (optional)pages_per_session [distribution [

name "Exponential"lambda 0.2

]]

Page 59: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

B.3. CLIENT CONFIGURATION 53

# distribution for sleep time between page requests (optional)inter_page_time [distribution [name "Pareto"k 1alpha 1.5

]]

# distribution for sleep time between object requests (optional)active_off_time [distribution [name "Weibull"scale 1.46shape 0.382

]]

]

Page 60: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

Appendix C

Log Files

In order to obtain results from a simulation, we need to know what happensduring a simulation run. If we could not get any information what is happeningduring the simulation, any kind of simulation would be useless. In addition tothe basic logging abilities of SSFNet we have added an additional possibility forSSFweb.

In order to be able to detect and log the results of protocol actions thatdepend on the interplay of multiple packets, we use the same format for outputas NSWeb. The output generated this way can be analyzed using a tool calledLOGAN. LOGAN was developed in conjunction with NSWeb. The details onusing LOGAN can be found in Appendix B of [Wal01]. However since the writingof [Wal01] there have been several new versions of LOGAN and the logging formathas slightly changed. Below we describe the log file format used in SSFweb.

Figure C.1: Event Log Line Example

When running SSFweb with the logan output, a file named logan out.log is

54

Page 61: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

55

generated. This log file is a simple line-oriented ASCII file. Each line describesa single event during the simulation. Figure C.1 shows a sample log line. Thetimestamp specifies the time within the simulation at which this event occurred.The Id of the logging application and the Flow ID are part of the output formatbut currently not used for the analysis by LOGAN. Client and server ID specifywhich entries of the simulation are communicating. The most important part isthe Event that occurred. We distinguish between two different types of Events:Connection Related Events and Request Transaction Events.

Connection Related Events

Connection Related Events are those events related to establishing and terminat-ing TCP connections. Below is a list of all connection related events.

CONNECT <SIMPLE|PERSISTENT|PIPELINED> A client begins to establish a newlogical connection to a server. The connection is of type SIMPLE, PERSISTENT

or PIPELINED. A TCP-SYN is sent from the client socket to the server socket.

TARGET CONNECTION ESTABLISHED A TCP-SYN packet has been received by a lis-tening server socket.

SOURCE CONNECTION ESTABLISHED A TCP-SYN/ACK packet has been received bya connection client socket.

TARGET CONNECTION ESTABLISHED A TCP-ACK packet has been received by a con-nection server socket as response to a TCP-SYN/ACK packet.

CONNECTION IDLE TIMEOUT <SERVER ID> The persistent or pipelined connectionto the give server was idle longer than allowed by the maximum idle timevalue. The connection is actively closed by the server application.

TARGET ACTIVE CLOSE A TCP-FIN packet is sent by the server socket to tear downthe connection.

TARGET DISCONNECTED A TCP-FIN/ACK packet has been received by the serversocket as response to a TCP-FIN packet.

SOURCE DISCONNECTED A TCP-ACK packet has been sent by the client socket asresponse to a TCP-FIN/ACK packet.

Request Transaction Related Events

With the help of Request Transaction Events we can log all states that occurwhen a request is processed and answered. Below is a list of all request relatedevents.

Page 62: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

56 APPENDIX C. LOG FILES

GET PAGE <OID> <REQ SIZE> A request for a page of size REQ SIZE is initiated.

GET EMB <OID> <REQ SIZE> A request for an embedded object of size REQ SIZE

is initiated. This normally happens after the page has been requested intowhich this object is embedded.

SND REQ <OID> <SIZE> <PAGE|EMB> A request header for a page or embeddedobject of size SIZE is sent to the server.

RECVD REQ <OID> <REQ SIZE> A request header of size REQ SIZE has been re-ceived by the server.

SND RESP <OID> <RESP SIZE + OBJ SIZE> A server is sending a response of thesize RESP SIZE + OBJ SIZE to the client.

RECVD RESP HEADER <OID> A client has received the header of a response fromthe server.

RECVD RESP <OID> <RESP SIZE + OBJ SIZE> <PAGE|EMB> A client has receivedthe entire response of size RESP SIZE + OBJ SIZE and given type for a sin-gle Web file from the server.

REQUEST DONE <OID> <#OBJS> <BYTES> A client has finished loading an entireweb object, that is the page with #OBJS embedded objects. Request head-ers, response headers and file sizes sum up to BYTES.

Page 63: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

Bibliography

[BC98] Paul Barford and Mark Crovella, Generating Representative Web Work-loads for Network and Performance Evaluation, In Proceedings of the 1998ACM SIGMETRICS International Conference on Measurement and Modelingof Computer Systems, pp. 151–160, 1998

[Bern94] Tim Berners-Lee, R. Cailliau, A. Luotonen,H. Frystyk Nielsen, and A.Secret, The World Wide Web, Communications of the ACM, Vol. 37, No. 8(Aug. 1994), pp. 76–82

[CB96] Mark E. Crovella and Azer Bestarvos, Self-similarity in World Wide WebTraffic: Evidence and Possible Causes, in Proceedings of the ACM SigmetricsConference on Measurement and Modeling of Computer Systems (SIGMET-RICS 96), pp. 160–169, 1996

[Fla00] David Flanagan, Java in a Nutshell – Deutsche Ausgabe fur Java 1.2 und1.3, O’Reilly, 3. Auflage, 2000

[FGHW99] Anja Feldmann, Anna C. Gilbert, Polly Huang and Walter Willinger,Dynamics of IP traffic: A study of the role of variability and the impact ofcontrol, in Proceedings of the ACM/SIGCOMM’99

[Heu00] Volker Heun, Grundlegende Algorithmen – Einfuhrung in den Entwurfund die Analyse effizienter Algorithmen, Vieweg,Braunschweig/Wiesbaden,2000

[Kop02] Helmut Kopka, Latex – Band 1: Einfuhrung, 3., uberarbeitete Auflage,Pearson Studium, Munchen, 2002

[KR03] James F. Kurose and Keith W. Ross, Computer Networking—A TopDown Approach Featuring the Internet—International Edition, Addison-Wesley, second edition, 2003

[NG97] H. F. Nielsen, J. Gettys, A. Baird-Smith, E. Prud’hommeaux, H.W.Lie, C. Lilley, Network Performance Effects of HTTP/1.1, CSS1, and PNG,http://www.acm.org/sigcomm/sigcomm97/papers/p102.html, Associationfor Computing Machinery Inc. (ACM), 1997

57

Page 64: Design and Implementation of a Web Workload …...Design and Implementation of a Web Workload Generator for the SSFNet Simulator Technische Universit˜at Munc˜ hen Fakult˜at fur˜

58 BIBLIOGRAPHY

[R03] W. N. Venables, D. M. Smith and the R Development Core Team, An In-troduction to R: Notes on R: A Programming Environment for Data Analysisand Graphics, http://www.r-project.org, 2003

[RFC 793] Information Sciences Institute, University of Southern California(USC), Transmission Control Protocol—DARPA Internet Protocol Specifica-tion, RFC0793/STD-7

[RFC 1945] T. Berners-Lee, R. Fielding, and H. Frystyk, Hypertext Transferprotocol—HTTP/1.0, Network Working Group, RFC 1945

[RFC 2616] R. Fielding, J. Gettys, H. Frystsyk, L. Masinter, P. Leach, andT. Berners-Lee, Hypertext Transfer Protocol—HTTP/1.1, Network WorkingGroup, RFC 2616

[Rum02] Dr. Bernhard Rumpe, Unterlagen zur Vorlesung Softwaretechnik – Win-tersemester 2002/2003, Technische Universitat Munchen, 2002/03

[S01] G. Sawitzki, Statistical Computing: Einfuhrung in S, http://www.

statlab.uni-heidelberg.de, 2001

[SSFa] Scalable Simulation Framework, http: // www. ssfnet. org/ : SSF Re-search Network 1999–2002

[SSFb] SSFNet tutorials, http: // www. ssfnet. org/ internetPage. html :SSF Research Network 1999–2002

[SSFc] Overview (Scalable Simulation Framework, http: // www. ssfnet. org/javadoc/

[SSt01] Thomas Schickinger, Angelika Steger, Distkrete Strukturen, Band 2 –Wahrscheinlichkeitstheorie und Statistik, Springer, 2001

[Sun04a] Java Technology Homepage, http: // java. sun. com/ : Sun Microsys-tems 1994–2004

[Sun04b] Java 2 Platform SE v1.4.2 API documentation, http: // java. sun.com/ j2se/ 1. 3/ docs/ api/ index. html : Sun Microsystems 2003

[Wal01] Jorg Wallerich, Design and Implementation of a WWW Workload Gen-erator for the NS-2 Network Simulator, Diplomarbeit, Universitat des Saar-landes, Saarbrucken 2001

[Zip49] George K. Zipf, Human Behavior and the Principle of Least Effort,Addison-Wesley, Cambridge, Mass., 1949