2
Topics
Conceptualizing OpenStack Performance Foundation
● Keystone Performance
OpenStack Nova● KVM Performance● Resource Over-commit● Nova Instance Storage – Boot Times and Snapshots● Nova Scheduler
OpenStack Cinder● Direct storage integration with QEMU● Glusterfs Performance Enhancements in Havana
Swift Performance● Swift and Blocking IO
What's Next
3
Background
Talk reflects work-in-progress Includes results from:
● RHEL-OSP 4 (Havana)● RHOS 3 (Grizzly)● RHOS 2 (Folsom)
Items not included in presentation● Neutron● Heat and most provisioning use-cases
5
High Level Architecture
Modular architecture
Designed to easily scale out
Based on (growing) set of core services
6
Control Plane vs Data Plane
Data Plane● Workloads in steady-state operation ● Performance dictated by components managed by OpenStack
Control Plane● Create/Delete/Start/Stop/Attach/Detach● Performance Dictated by OpenStack
Nova
Cinder
Neutron
Glance Swift
Hea
tVMs
Instance Storage
Volumes
Networks
Objects & ContainersImages
Lifecycle ops
Lifecycle ops
Lifecycle ops
Lif
ecyc
le o
ps
Control Plane Data Plane
Provisioning Steady State
7
Foundational Elements●Each service has associated databases●Extensive use of messaging for integration●Keystone as common identity service
8
Control Plane and Data Plane Technologies
Nova
Cinder
Neutron
Glance Swift
Hea
tVMs
Instance Storage
Volumes
Networks
Objects & ContainersImages
Lifecycle ops
Lifecycle ops
Lifecycle ops
Lif
ecyc
le o
ps
Control Plane Data Plane
Keystone
Ceilometer
Database
Messaging
ControlPlane
FoundationalServices
LDAP
MariaDB, MySQL, Galera
QPID, RabbitMQ
OpenStack HavanaPython 2.7
KVM, Xen, ESXi
Linux (RHEL)
XFS, RoC ...
Glusterfs, Ceph, NetApp, EMC, SolidFire ...
LinuxBridge, OVS, GRE, VXLAN, Nicera, BigSwitch ...Cisco Nexus, other switches
OpenStack HavanaPython 2.7
9
Control Plans and Data Plane TechnologiesTechnologies Used in Red Hat's Offering (RHEL-OSP 4)
Nova
Cinder
Neutron
Glance Swift
Hea
tVMs
Instance Storage
Volumes
Networks
Objects & ContainersImages
Lifecycle ops
Lifecycle ops
Lifecycle ops
Lif
ecyc
le o
ps
Control Plane Data Plane
Keystone
Ceilometer
Database
Messaging
Red Hat Identify Management
MariaDB, Galera
QPID
OpenStack HavanaPython 2.7
RHEL 6.5 with KVM
XFS, RoC ...
Glusterfs, Ceph, NetApp, EMC, SolidFire ...
LinuxBridge, OVS, GRE, VXLAN, Nicera, BigSwitch
Cisco Nexus, other switches
OpenStack HavanaPython 2.7
Note: Italics are examples of partner technologies
10
Factors influencing control plane performance demands
Size of Cell
in VM's
Dynamic nature of workloadProvisioning frequency
Static Production Workloads
Continuous Integration
Dev / Test
Autoscaling Production Workloads
Provisioning opera
tions p
er hr
12
Keystone Performance
Keystone findings - UUID in Folsom Chatty consumers: Multiple calls to keystone for new tokens
Database grows with no cleanup● As tokens expire they should eventually get removed
● Should help with indexing● For every 250K rows response times go up 0.1 secs
● Can be addressed via cron job● keystone-manage token_fl ush
Horizon Login 3 Tokens
Horizon Image page 2 Tokens
CLI (nova image-list) 2 Tokens
13
KeystoneInefficiencies in CLI due to Python libraries
Inefficiencies in CLI vs curl calls nova image-show
● Executes in 2.639s
curl -H “ “● Executes in .555s
Tracing of CLI shows that python is reading the data one byte at a time● Known httplib issue in the python standard library
● Next steps for testing are to move to Havana and PKI tokens
15
Understanding Nova Compute PerformanceKVM SPECvirt2010: RHEL 6 KVM Post Industry Leading Results
http://www.spec.org/virt_sc2010/results/
Virtualization Layer and HardwareBlue = Disk I/OGreen = Network I/O
Client Hardware
System Under Test (SUT)
Steady-State Performance of Nova Compute Nodes Strongly Determined by Hypervisor
For OpenStack this is typically KVM
Good news is RHEL / KVM has industry leading performance numbers.
16
VMware ESX 4.1 HP DL380 G7 (12 Cores, 78 VMs)
RHEL 6 (KVM) IBM HS22V (12 Cores, 84 VMs)
VMware ESXi 5.0 HP DL385 G7 (16 Cores, 102 VMs)
RHEV 3.1 HP DL380p gen8 (16 Cores, 150 VMs)
VMware ESXi 4.1 HP BL620c G7 (20 Cores, 120 VMs)
RHEL 6 (KVM) IBM HX5 w/ MAX5 (20 Cores, 132 VMs)
VMware ESXi 4.1 HP DL380 G7 (12 Cores, 168 Vms)
VMware ESXi 4.1 IBM x3850 X5 (40 Cores, 234 VMs)
RHEL 6 (KVM) HP DL580 G7 (40 Cores, 288 VMs)
RHEL 6 (KVM) IBM x3850 X5 (64 Cores, 336 VMs)
RHEL 6 (KVM) HP DL980 G7 (80 Cores, 552 VMs)
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
9,000
10,000
1,221 1,367 1,5702,442
1,878 2,1442,742
3,8244,682
5,467
8,956
Best SPECvirt_sc2010 Scores by CPU Cores
(As of May 30, 2013)
System
SP
EC
virt
_sc
201
0 sc
ore
Comparison based on best performing Red Hat and VMware solutions by cpu core count published at www.spec.org as of May 17, 2013. SPEC® and the benchmark name SPECvir_sct® are registered trademarks of the Standard Performance Evaluation Corporation. For more information about SPECvirt_sc2010, see www.spec.org/virt_sc2010/.
2-socket 162-socket 12 2-socket 20
4-socket 40
8-socket 64/80
Understanding Nova Compute PerformanceSPECvirt2010: Red Hat Owns Industry Leading Results
OpenStack sweet spot oVirt / RHEV sweet spot
17
Understanding Nova Compute PerformanceGuest Performance
Expect the similar out of the box performance as RHEL / KVM ● Added tuned virtual-host to the Nova compute node configuration● RHOS generates its own XML file to describe the guest
● About the same as virt-manager● Of course smart storage layout is critical
Tuning for performance● Common for OpenStack and standalone KVM
● Big Pages, NUMA, tuned profiles● Optimizations for standalone KVM not currently integrated into
OpenStack● SR-IOV, process-level node bindings
18
Nova Out of the Box PerformanceComparison with Standalone KVM Results
1vm 4vm 8vm 12vm0
100000
200000
300000
400000
500000
600000
RHOS vs libvirt/KVM
java workload
RHOS Libvirt/KVM untuned
bops
19
Nova ComputeResource Over-Commit
Nova default configuration has some aggressive over commit ratios CPU has an over commit of 16
● The scheduler will need multiple suggestions based on the instance workload
Memory over commit is a much lower 1.5● Again depends on the workload● Anything memory sensitive falls off the cliff if you need to swap
20
Nova ComputeUnderstanding CPU Over-Commit
1vm 4vm 8vm 12vm 16vm 20vm 32VM 48VM0
100000
200000
300000
400000
500000
600000
RHOS Java workload vm scaling
Impact of overcommiting CPU on aggregate throughput
bops
However beware of cliff in excessive paging
Saturation
21
Nova ComputeEphemeral Storage
Look at ephemeral storage configuration Help determine guidelines for balancing ephemeral storage performance vs
cost / configuration● Trade-off between footprint (number of drives) and performance
● Initial cost / configuration, rack space 1U vs 2U, Power / cooling● How does network based storage perform
● Need to ensure proper network bandwidth
Configuration for tests Each instance uses a different image Hardware configs:
● Single system disk● Seven disk internal array● Fiber channel SSD drives
22
Nova Boot Times
2vm 16vm0
10
20
30
40
50
60
70
Nova Boot Times (Multiple Images) - Folsom
virt-host profile
System Disk Local RAID Cinder (FC SSD)
aver
age
bo
ot
tim
e (s
ecs
)
23
Impact of storage config on Nova Boot TimesLocal RAID and Cinder
2vm 4vm 6vm 8vm 10vm 12vm 14vm 16vm 18VM 20VM0
10
20
30
40
50
60
70
80
Nova Boot Times (Multiple Images)
no tuned
System Disk Local RAID Cinder (FC SSD array)
aver
age
bo
ot
tim
es (
sec
s)
24
Nova ComputeEphemeral Storage Snapshots Performance
1 Snap 2 Snap 4 Snap 6 Snap0
20
40
60
80
100
120
140
160
180
200
RHOS Concurrent Snapshot Timings (qemu-img convert only)
System Disk Local RAID
Concurrent Snapshots
Ave
rage
Sna
psho
t TIm
e (s
ecs)
Backing image and qcow => new image● Via qemu-img convert
mechanism● Written to temporary snap
directory● This destination is a tunable
25
Impact of Storage Configuration on Snapshots
1 snapshot 2 snapshots 0
2
4
6
8
10
12
14
16
11
15
RHOS Snapshot Timings (qem-img convert only) - Grizzly
Rhos 7 dsk lun
Avg
. Sna
psho
t Tim
e (s
ecs)
26
m1.medium m1.large m1.xlarge0
50000
100000
150000
200000
250000
300000
350000
400000
450000
500000
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
500000
Impact of Adding Guests on Total YCSB Throughput
total throughput Baseline Throughput
Ave
rag
e T
hro
ugh
put
3.3 x Baseline50% @ 5 VMs
12.8 x Baseline61% @ 21 VMs
5 x Baseline50% @ 10 VMs
Medium = 21 guests Large = 10 Xlarge = 5
Single guest
Nova Compute Single-node Scalability
27
m1.medium m1.large m1.xlarge0
20000
40000
60000
80000
100000
120000
Impact of Adding Guests on Average YCSB Throughput
Average Throughput (ops/sec) Baseline Throughput
Ave
rage
Thr
ough
put
Medium = 21 guests Large = 10 Xlarge = 5
Single guest
Per-Guest Performance Degradation Due to Over-Subscription
28
m1.medium m1.large m1.xlarge0
50100150200250300350400450500
Impact of Adding Guests on YCSB Latency
Average app latency (us) Baseline Latency
Late
ncy
(use
c)
Medium = 21 guests Large = 10 Xlarge = 5
Single guest
Per-Guest Performance Degradation Due to Over-Subscription
29
Nova Scheduler – Heterogeneous Memory ConfigsOut-of-Box Behavior
The default Nova scheduler assigns based on free memory● Not much concern about other system resources (CPU, memory, IO, etc)● You can change / tune this● Be aware if you have machines with different configurations
24cpu/48GB 24cpu/48GB 24cpu/96GB 24cpu/96GB0
10
20
30
40
50
60
1 1
49 49
Nova Scheduler Default Behavior
inst
anc
es
lau
nce
d
30
24cpu/48GB 24VMs 24cpu/48G 24cpu/96GB 24cpu/96GB0
5
10
15
20
25
30
35
3 4
85
2
7
7
43
6
5
6
4
4
5
7
7
3
8
2
Nova Scheduler Instance Placement at 20 vm launch increments
scheduler_host_subset_size=4
1-20 21-40 41-60 61-80 81-100
inst
ance
s la
unc
ed
Nova Scheduler – Heterogeneous Memory ConfigExample of Results After Scheduler Tuning
32
Cinder QEMU Storage Integration
RHEL 6.5 and RHEL-OSP 4 (Havana) now includes tight QEMU integration with Glusterfs and Ceph clients
Benefits:● Direct QEMU integration avoids unnecessary context
switching● One client per guest configuration may help alleviate client-side
performance bottlenecks.
Integration model● QEMU includes hooks to dynamically link with storage vendor
supplied client● Delivered in a manner than preserved separation of concerns for
software distribution, updates and support● OpenStack and Linux, including QEMU, provided by Red Hat● Storage client libraries provided and supported by respective storage
vendors● Libgfapi for Glusterfs (RHS) supported by Red Hat● Librados for Ceph supported by InkTank
35
Swift Performance – Glusterfs as Pluggable BackendTuning Worker Count, Max Clients, Chunk Size
1 3 5 7 9 11 13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
45
47
49
51
53
55
57
59
61
63
65
67
69
71
73
75
77
79
81
83
85
87
89
91
93
95
97
99
10
11
03
10
51
07
10
911
111
311
511
711
91
21
12
31
25
12
71
29
13
11
33
13
51
37
13
91
41
14
31
45
14
71
49
0
0.5
1
1.5
2
2.5
3
150 30 MB Objects Transferred Sequentially10 Ge, Concurrent w Four Other Clients (3K, 30K, 300K, 3M)
30 MB Default 30 MB Tuned
Object # (Sequential)
Se
cond
s pe
r O
bje
ct
3K Client Finished
30K Client Finished
300K Client Finished
Issue: Blocking IO for filesystem operations with Python Eventlet packaged used by OpenStack
Recommended Tuning:Workers: 10 (was 2) Max Clients: 1 (was 1024)Chunk Size: 2M (was 64K)
Reducing Max Clients also helps vanilla Swift
37
Wrap Up
OpenStack is a rapidly evolving platform Out of the box performance is already pretty good
● Need to focus on infrastructure out of the box configuration and performance
Still just scratching the surface on the testing● Control plane performance via emulated Vms● Neutron performance (GRE, VXLAN)● Ceilometer● Performance impact of Active-Active foundational
components (DB, messaging)