Upload
vuongliem
View
214
Download
1
Embed Size (px)
Citation preview
R. Hashemian1, D. Krishnamurthy1, M. Arlitt2, N. Carlsson3
The 4th ACM/SPEC International Conference on
Performance Engineering ICPE 2013
1. University of Calgary
2. HP Labs
3. Linköping University
by: Raoufehsadat Hashemian
• Introduction
• Scalability Evaluation
• Scalability Enhancement Approach
• Validation
• Conclusion
Improving the Scalability of a Multi-core Web Server ICPE13 2
OUTLINE
• Enterprise applications
• Performance: Improving QoS
• e.g. Lower response times
• Cost: Less money spent on hardware
• e.g. Improving effective utilization
• Goal: Higher utilization and acceptable response time
• How to achieve this “Goal” for Web servers running on Multi-
core hardware?
INTRODUCTION
PROBLEM DESCRIPTION
Improving the Scalability of a Multi-core Web Server ICPE13 3
0
10
20
30
40
0 50 100 R
esp
on
se
Tim
e
CPU Utilization (%)
• Web servers before multi-core
• Mature topic, wide-ranging discussions
• Multi-core architecture
• Most research on batch (non-interactive) workload
• Web servers running on Multi-core
• BUS problem in UMA system (Veal et al.`07)
• Multiple Web server instances: 1 instance per
processor (Scogland et al.`09, Boyd et.al,10 Gaud et. al,11)
INTRODUCTION
BACKGROUND
Improving the Scalability of a Multi-core Web Server ICPE13 5
SCALABILITY EVALUATION
SCALABILITY MEASUREMENT
• Measure Web server scalability for two workloads
• Evaluate the effectiveness of multiple Web server
approach in the server’s scalability
• Scalability
• Maximum Achievable Throughput (MAT)
Improving the Scalability of a Multi-core Web Server ICPE13 4
• 2 x 4 core Intel Xeon E5620 processors
NUMA Architecture
• OS: Linux, kernel 3, Ubuntu
• Webserver: Lighttpd
• Application Server: php (FastCGI module)
SCALABILITY EVALUATION
EXPERIMENTAL SETUP
Nehalem Microarch.
2.4 GHz Frequency
32K IC - 32K DC L1 Cache
256K L2 Cache
12M (Inclusive) L3 Cache
QPI -5.86 GT/s Inter-conn.
16GB - DDR3-1333 Memory
Processor 0
L3
C
0
C
2
C
4
C
6
L2
L1 L1 L1 L1
L2 L2 L2
Processor 1
L3
C
0
C
2
C
4
C
6
L2
L1 L1 L1 L1
L2 L2 L2
Memory
Bank 0
Memory
Bank 1
Improving the Scalability of a Multi-core Web Server ICPE13 6
• TCP/IP Intensive workload
• High TCP connection rate
• Processing: low user level & high kernel level
• 1 KB static file, up to 155,000 requests/second
• SPECweb Support workload
• Both static requests and php requests
• Wider range of request types
• Processing: high user level & moderate kernel level
SCALABILITY EVALUATION
WORKLOADS
Improving the Scalability of a Multi-core Web Server ICPE13 7
• Change default lighttpd recommendation (1 Lighttpd worker
process per core)
• Disable default Linux scheduling (use affinity)
• Distribute interrupt handling load
SCALABILITY EVALUATION
CONFIGURATION TUNING
• Improved MAT up to 69%
• Balanced utilization levels for the eight cores
• Fully utilized the server
Improving the Scalability of a Multi-core Web Server ICPE13 8
• TCP/IP Intensive workload
• Sub-linear Maximum Achievable Throughput
146,000 req/sec
• SPECweb Support workload
• Almost linear Maximum Achievable Throughput
23,000 req/sec
Scala
bility
S
cala
bility
SCALABILITY EVALUATION
RESULTS
Improving the Scalability of a Multi-core Web Server ICPE13 9
Number of Cores
10-1
100
101
102
103
0
0.2
0.4
0.6
0.8
1
x = Response time (msec)
P [
X <
= x
]
1 Core 2 Core 4 Core 8 Core
Response time vs. Core Count
• “Low response time” requests
• Static requests
• Performance degrades
• “High response time” requests
• Dynamic requests
• Performance improves
Knowing this behavior, how can we
improve the scalability?
SCALABILITY EVALUATION
RESPONSE DISTRIBUTION ANALYSIS
Improving the Scalability of a Multi-core Web Server ICPE13 10
CDF of Response times
80% CPU Utilization
SPECweb Support Workload
100
102
0.9
0.92
0.94
0.96
0.98
1
SCALABILITY ENHANCEMENT
MULTIPLE WEBSITE REPLICAS
• Approach: Use 1 Web server instance per processor
• Goal: Reduce inter-processor data migration
Processor 0 Processor 1
Single Replica Process NIC1 Queue NIC 2 Queue
NIC 2
NIC 1
Replica 1 Process NIC 1 Queue Replica 2 Process NIC 2 Queue
Processor 0 Processor 1
NIC 2
NIC 1
Original Configuration
with one replica
Alternative Configuration
with two replicas
Improving the Scalability of a Multi-core Web Server ICPE13 11
Request rate (req/sec)
Response tim
e (
ms)
TCP/IP Intensive Workload
Scalability Improvement
MAT increment: 12.3%
SPECweb Support Workload
Scalability Degradation
MAT decrement: 10%
SCALABILITY ENHANCEMENT
EVALUATING NEW CONFIGURATION
Improving the Scalability of a Multi-core Web Server ICPE13 12
Request rate (req/sec)
Response tim
e (
ms)
• Hypothesis:
• Cache contention with 2-
replicas due to the larger
working set size of dynamic
requests
• The response time inflation for Dynamic requests dominates the improvement
achieved for Static requests
• Mean and 99.9th percentile response times increase with 2-replicas
CDF of Response times
80% CPU Utilization
22,000 req/sec
SPECweb Support workload
P [X
<=
x]
Response Time (ms)
100
102
104
0.9
0.95
1
SCALABILITY ENHANCEMENT
EVALUATING NEW CONFIGURATION
Improving the Scalability of a Multi-core Web Server ICPE13 13
Request Rate (req/sec)
Inte
r-co
nn
ect T
raffic
(Byte
s/s
ec)
Request Rate (req/sec)
VALIDATION
INTER-CONNECT TRAFFIC
Inte
r-co
nn
ect T
raffic
(Byte
s/s
ec)
• Inter-connect traffic decreased
significantly
• Improved performance
Improving the Scalability of a Multi-core Web Server ICPE13 14
TCP/IP Intensive Workload SPECweb Support Workload
• No significant decrement
• Improved performance for Static
requests
L3 C
ache H
IT R
atio
Request Rate (req/sec)
Improving the Scalability of a Multi-core Web Server ICPE13 15
VALIDATION
LAST LEVEL CACHE
SPECweb Support Workload
• Last Level cache (LLC) HIT ratio degrades with 2-replica configuration
Confirms the cache contention hypothesis
CONCLUSIONS
• Multi-core Web server: scalable after tuning
• 80% utilization with acceptable response time
• Multiple Website Replicas
• The effect on the scalability is workload dependent
• Dynamic requests trigger LLC contention
• Contention may be architecture and application dependent
• Future plan:
• Design and develop an automatic, workload adaptive technique
which decides about best configuration
Improving the Scalability of a Multi-core Web Server ICPE13 16
This work is financially supported by:
Raoufeh Hashemian
University of Calgary, Canada
Improving the Scalability of a Multi-core Web Server ICPE13 17
REFERENCES
• Cherkasova et al.`00:Characterizing Temporal Locality and its Impact on Web Server Performance,
International Conference on Computer Communications and Networks’00, Cherkasova; Ciardo; HP Labs
• Elnozahy et al.`03: Energy Conservation Policies for Web Servers, USITS '03, Elnozahy; Kistler; Ramakrishnan; IBM
• Majo et al.`12: Matching Memory Access Patterns and Data Placement for NUMA Systems, GC’12, Majo;
Gross; ETH
• Blagodurov et al.`11: A case for NUMA-aware contention management on multicore systems, USENIX
ATC'11, Blagodurov; Zhuravlev; Dashti; Fedorova; SFU
• Veal et al.`07: Performance scalability of a multi-core web server. ACM/IEEE ANCS’07,Veal; Foong; Intel
• Scogland et al.`09: Asymmetric interactions in symmetric multi-core systems: Analysis, enhancements and
evaluation. ACM/IEEE SC’08, Scogland; Balaji; Feng; Narayanaswamy,
• Boyd et.al`10: An analysis of linux scalability to many cores, USENIX OSDI’10, Boyd-Wickizer; Clements;
Mao; Pesterev; Kaashoek; Morris; Zeldovich, MIT
• Gaud et. al`11: Application-level optimizations on numa multicore architectures: the apache case study, RR-
LIG-011, Gaud; Lachaize; Lepers; Muller; Quema.
Improving the Scalability of a Multi-core Web Server ICPE13 -2
• Network interrupt handling
• 4 RSS queue per NIC port
• Each queue bind to one core
SCALABILITY EVALUATION
CONFIGURATION TUNING
Improving the Scalability of a Multi-core Web Server ICPE13 -3
0.0
0.5
1.0
1.5
2.0
0 50,000 100,000 150,000 200,000
Re
spo
nse
tim
e (
mse
c)
Rate (req/sec)
Before Distributing Int. Load
After Distributing Int. Load
• OS scheduling
• Binding each lighttpd process to 1 core
SCALABILITY EVALUATION
CONFIGURATION TUNING
Improving the Scalability of a Multi-core Web Server ICPE13 -4
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0 50000 100000 150000 200000
Re
spo
nse
tim
e (
mse
c)
Rate (req/sec)
No Affinity
With affinity
• Static: Requests with lower response time
• Processed only in Web tier (lighttpd)
• Dynamic: Requests with higher response time
• Processed only in Web and application tiers (lighttpd and php)
Re
sp
on
se
tim
e (
ms)
SCALABILITY EVALUATION
WEB TIER VS. APPLICATION TIER
Improving the Scalability of a Multi-core Web Server ICPE13 -5
File size (Byte)
SCALABILITY EVALUATION
EXPERIMENTAL SETUP
Improving the Scalability of a Multi-core Web Server ICPE13 -6