Upload
shanna-marshall
View
218
Download
1
Embed Size (px)
Citation preview
EtE MonitorH1
EtE: Passive End-to-End Internet Service Performance Monitoring
Yun Fu, Lucy Cherkasova, Wenting Tang, and Amin Vahdat
HPLabs and Duke University
EtE MonitorH2
HP.com???
A lot of research is done to optimize web server performance in order to improve client experience
BUT Do we know what is the client experience? What are the critical latency components in the end-to-end response time? Do we know whether the improvements on the web server side indeed improve end-user experience? Do we know who the clients are and where they are located on the Internet?
Service provider problems...
EtE MonitorH3
End-to-End Web Service Measurement: Why Is It Important?
Two main factors impact the response time perceived by the clients: network latency and server side processing time
Many web sites use complex multi-tiered architecture A set of new technologies, such as servlets and Javaserver pages,
extend the web servers to generate information-rich dynamic web pages and to leverage existing business systems
Combination of these technologies could lead to increased server-side processing time especially in distributed environment
New ad-hoc business metric: web service is considered to be “unavailable” if its response time exceeds 6 seconds
The service providers need a quantitative analysis of the major latency components contributing to the response time to achieve given business and QoS objectives: Invest in more powerful site infrastructure or Choose a CDN service?
EtE MonitorH4
Why Is It Difficult?
Web pages are complex objects with multiple embedded images HTTP protocol is stateless: different images are requested by
client browser independently: • Some of them are issued concurrently• Some of them use persistent connections• Some of them are obtained from proxies• Some of them are obtained from user browser caches
The response time of a web page observed by the client is the result of download of all page related images
EtE MonitorH5
What Are Currently Available Solutions?
Active periodic probing of a particular web page from a fixed number of clients across the Internet Keynote service
– Keynote “clients” are not the real web site clients– Allows monitoring of a particular web page– Always pulls the entire page (with all embedded images) from the server
Page instrumentation technique based on downloadable JavaScript or Java Applet to a client web browser HP Open View “Web Transaction Observer”
– The measurement starts after download of the main html page (significant portion of the response time is missing)
– Does not provide latency breakdown unless the web server is also instrumented
eBusiness Assurance (eBA, from Candle Corp) Quality of Service (QoS) Monitor (IBM, Tivoli) Research paper by Rajamony and Elnozahy from IBM (Austin) uses
JavaScript to instrument the links to particular pages. Somewhat more limited: cannot measure directly accessed pages, e.g “index.html”…
EtE MonitorH6
What Do We Propose?
EtE monitor Passive monitoring tool for end-to-end response time measurement Non-intrusive, does not require any changes or modifications to a site content, or server
side infrastructure, or client browsers Can be used for sites with static or dynamically generated content
What does it provide? End-to-end response measurement for all the pages and all the clients accessing the site Analysis of response components:
• Server processing time portion• Network transfer time portion
Reports the % of data delivered from the server vs the % of data cached on the client side Reports the % of aborted page accesses and the related performance reasons Analysis of the most frequently accessed documents and their response time Client clustering by ASes (Autonomous Systems)
• Requests (bytes) clustering by ASes and the corresponding response time And more …..
EtE MonitorH7
EtE Monitor Architecture
1. The Network Packet Collector module: collects network packets using tcpdump and records them in Network Trace enabling offline analysis.
2. In the Request-Response Reconstruction module, EtE monitor reconstructs all TCP connections from the Network Trace and extracts HTTP transactions (a request with corresponding response) from the payload. EtE monitor stores the HTTP header lines and other related information in the Transaction Log
3. The Web Page Reconstruction module is responsible for grouping the request-response pairs into logical web page accesses and stores them in the Web Page Session Log
4. The Performance Analysis and Statistics module summarizes a variety of performance characteristics integrated across all client accesses
EtE MonitorH8
Request-Response Reconstruction Module
The TCP connections are rebuilt from Network Trace using: The client IP address The client port number The request (response) TCP sequence number
Within the payload of the rebuilt TCP connections, HTTP transactions are delimited as defined by HTTP protocol
After reconstructing the HTTP transactions, the monitor records the HTTP header lines and other information of interest in the Transaction Log and discards the transaction body
EtE MonitorH9
Request-Response Reconstruction Module (continuation)
Each entry in the Transaction Log includes: The client IP address A unique flow ID for TCP connection The requested URL The content type The payload size The referer field The via field Whether the request was aborted The number of packets resent in the response The corresponding timestamps
EtE MonitorH10
Page Reconstruction Module
To measure the client perceived end-to-end response time for retrieving a web page, we need to group the objects in a web page access
We use two-pass heuristic method and statistical filtering mechanism to reconstruct different client page access First pass: EtE monitor uses the HTTP requests with referer field to
build a Knowledge Base of web pages and their embedded objects Second pass:
• EtE monitor reconstructs the page accesses without referer field using the Knowledge Base of web pages and some additional heuristics
• EtE monitor uses statistical analysis to identify valid access patterns and filter the accesses grouped incorrectly
EtE MonitorH11
Example
Example of initial html.file request and the following embedded object request with corresponding referer field:
EtE MonitorH12
First Pass: Client Access Table
EtE monitor stores web page access information into a hash table using client IP addresses: • If the content type is text/html, a new web page entry is created in the Web Page Table• For other types, the request URL is inserted according to its referer field
EtE MonitorH13
Building a Knowledge Base of Web Pages
From the Client Access Table, EtE monitor determines the content template of any given web page as a combined set of all objects that appear in all access patterns for this page
EtE MonitorH14
Second Pass: Reconstruction of Web Page Accesses
With the help of Knowledge Base, EtE monitor processes the entire Transaction Log again, and creates a new Client Access Table
This time it processes the objects without referer field: EtE monitor consults the Knowledge Base while checking all the page
entries in the Web Page Table to find the page an object might be embedded in, and appends it at the end of that page
If none of the web page entries in the Web Page Table contains the object based on the Knowledge Base then• EtE monitor searches for the page accessed with the same flow ID• Otherwise it appends the object to the latest accessed page (additionally it
uses configurable think time threshold to delimit web pages)• If the think time threshold is exceeded, the object is dropped
EtE MonitorH15
Identifying Valid Accesses Using Statistical Analysis of Access Patterns
Although the above two-pass process is very efficient, there could still be some accesses grouped incorrectly
We use a statistical analysis to better approximate the actual content of web pages and filter out the incorrectly constructed accesses
EtE MonitorH16
Metrics to Measure Web Service Performance
Response time metrics End-to-end response time observed by the client for a web page download
Latency breakdown: server related and network related portions
Connection set-up time
Metrics evaluating web service caching efficiency Server file hit ratio
Server byte hit ratio
Aborted pages and QoS Why the accesses are aborted:
• Bad performance?
• Client browsing patterns?
EtE MonitorH19
Metrics Evaluating Web Service Caching Efficiency
Original web page url1 (page template): • 10 objects, • 100 Kbytes.
Access to url1: Acc1• 5 objects, • 70 Kbytes.
Access to url1: Acc2• 7 objects, • 80 Kbytes.
FileHitRatio(Acc1) = 5/10, 50%ByteHitRatio(Acc1)=70/100, 70%
FileHitRatio(Acc1) = 7/10, 70%ByteHitRatio(Acc1)=80/100, 80%
ServerFileHitRatio(url1) = (5/10 + 7/10) / 2, 60%ServerByteHitRatio(url1) = (70/100 + 80/100) / 2, 75%
The smaller is the better!
EtE MonitorH20
Case Studies
HPL external site (HPL) From July12, 2001 to August 11, 2001 The site has mostly static content
Open View Support site (Support) From October 11, 2001 to October 25, 2001 The site uses JavaServer Pages technology for dynamic page
generation
EtE MonitorH22
HPLabs Site Case Study
• Figure shows the EtE time to index.html on hourly scale during a month• In spite of overall good performance, hourly averages reflect significant variation in response time observed by the clients
• Periods of increased latency correspond to weekends! What is the problem?
HPL site during a month (accesses to index.html page)
EtE MonitorH23
• Resent packets typically reflect network congestion or network–related bottlenecks• Periods of increased resent packets correspond to weekends
• The explanation: the client population significantly “changes” during weekends• Most of the clients access the web site from home via low-bandwidth connections
It is extremely important to understand the client population! Active probing approach using artificial clients (with typically “good” connection to the Internet) lacks this information
Understanding the Client Population
EtE MonitorH24
Performance Analysis of Accesses to itanium.html
First Figure:• Number of accesses to itanium.html page• From being the most popular page in the beginning of the study, it gets to the 7th place after 10 days
Second Figure• Percentage of accesses above 6 sec to itanium.html page• Question: why is the latency observed by the clients getting higher?
EtE MonitorH25
Caching Efficiency of the Page
When the page is getting less popular, “colder”, the number of objects and bytes retrieved from the original server increases significantly: i.e. fewer network caches store the page related objects
It translates into increased response time observed by the client
Active probing technique cannot reflect the caching efficiency of the siteThe tools based on instrumentation technique cannot provide insight into this problem either
EtE MonitorH26
Clients Clustering by ASes
• Clients grouped by ASes show a heavy tail distribution• These figures allow us to see large client clusters and their corresponding end-to-end response time• The ability of EtE monitor to measure performance metrics for a certain group of clients is particularly attractive for Service Providers to validate required SLAs
EtE MonitorH27
Validation Experiments
We performed two groups of experiments To validate the accuracy of EtE measurements To evaluate the page access reconstruction power of EtE
• How dependent are the reconstruction results on the existence of referer field information?
The results are encouraging: EtE provides a very close approximation of the response time EtE monitor does a good job of page reconstruction even when
the requests do not have any referer field! However, two-pass heuristic method and statistical filtering mechanism
we use to reconstruct page accesses increase the number of reconstructed pages by about 20-30%
EtE MonitorH28
Limitations
EtE monitor is not appropriate for sites that encrypt much of their data (e.g., via SSL)
EtE monitor is not appropriate for sites that “outsource” most of their content to CDNs
Similar limitation applies to pages with “mixed” content: if a portion of the page is served from some other remote sites. In this case, EtE will measure only response time for local site content
For clients coming behind the proxy, EtE monitor measures the response time as observed from the proxy
Since the tool is based on heuristics and statistics to reconstruct the page content, the best results are obtained when the sample size is large enough
Dynamically generated content creates additional challenges for EtE monitor (typical for other analysis tools too): a configuration file provided by a site administrator is needed
EtE MonitorH29
Conclusion and Future Work
Understanding performance characteristics of Internet services is critical to evolving and engineering the web services to match: Changing demand levels Client populations Global network characteristics
EtE monitor, based on a novel technique, offers a number of benefits unavailable from other tools and by other means.
EtE monitor can be extended to work in “almost real-time” to provide timely information about web services and their performance.
Extended analysis on client clustering will provide an opportunity to use the information from EtE monitor for intelligent decision making on service placement and service optimization
EtE MonitorH30
Acknowledgements
The tool and the study would not be possible without a generous help of our HP colleagues: HPLabs team:
• Mike Rodriquez, Annabelle Eseo, and Peter Haddad HPO, Managed Web Services:
• Guy Mathews OpenView team:
• Steve Yonkaitis, Bob Husted, Norm Follett, and Don Reab US support team
• Claude Villermain, Vincent Rabiller, Pierre-Emmanuel Delforge
Their help is highly appreciated !