CPU Optimizations in the CERN Cloud - February 2016

CPU optimizations in the CERN Cloud Ops Midcycle - High Performance Computing with OpenStack - Manchester, 2016

Belmiro Moreira

[email protected] @belmiromoreira

Arne Wiebalck Tim Bell

Sean Crosby (Univ. of Melbourne) Ulrich Schwickerath

What is CERN?

3

CERN Cloud – LHC and Experiments

4

CMS detector

https://www.google.com/maps/streetview/#cern

CERN Cloud – AMS

5

OpenStack at CERN by numbers

6

~ 5500 Compute Nodes (~140k cores) •  ~ 5300 KVM •  ~ 200 Hyper-V

~ 2800 Images ( ~ 44 TB in use)

~ 2000 Volumes ( ~ 800 TB allocated) ~ 2200 Users ~ 2500 Projects

> 17000 VMs running

Number of VMs created (green) and VMs deleted (red) every 30 minutes

The “20% overhead” problem •  When running the batch system on top of the Cloud Infrastructure

we reach the limit of the total number of hosts in LSF

•  On our batch full node VMs we noticed that the HS06 rating was ~20% lower than on the underlying host

•  Smaller VMs behaved much better: ~8% (sum of simultaneous HS06 runs on 4x8core VMs on a 32core host)

7

HS06 on virtual batch workers

8

HWDB HS06

VM Size (cores)

Per VM HS06 Total HS06 Overhead

357±16 4x 8 82.3±11 329 7.8%

2x 16 150±5 300 16% 1x 32 284±11 284 20.4%

Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz

Testing Optimizations – KSM off

9

•  ATLAS T0 batch VMs show an IOwait of 20-30% •  Compute nodes started to swap even when leaving 2 GB for

the OS

Optimization by numbers – EPT off

10

HWDB HS06 VM Size (cores) Per VM HS06 Total HS06 Overhead

357±16 4x 8 82.3±11 329 7.8%

2x 16 150±5 300 16% 1x 32 284±11 284 20.4%

HWDB HS06 VM Size (cores) Per VM HS06 Total HS06 Overhead Overhead

Reduction

357±16 4x 8 87±11 348 2.5% 68%

2x 16 163.5±1 327 8.4% 52% 1x 32 311±1 311 12.9% 37%

Before:

After:

General virtualization issue? •  Crosscheck w/ SLC6 VMs on Hyper-V

-  0.8% HS06 loss on 4x 8-core -  3.3% HS06 loss on 1x 32-core SLC6 VM

•  No general virtualization overhead issue! -  Rather a feature or configuration issue

•  What’s the difference between the VMs on Hyper-V and KVM?

11

NUMA •  Hyper-V VMs have vCPUs pinned to

physical NUMA nodes

-  Pinned to sets that correspond to physical NUMA nodes

•  OpenStack wider support for this is available in Kilo

12

NUMA - in the lab

… reduced the overhead to ~3% of the bare metal

13

Deploying in production •  EPT off; KSM on; NUMA-aware •  System services add ~1-2% overhead •  We got a total overhead of:

~5%

14

and then Extremely slow nodes... •  Small fraction of jobs 10x slower

-  VMs look OK, actually pretty good -  Hosts: 30-50% system load, >100k IRQ/s

(mostly TLB shoot-downs)

•  Load attributed to qemu-kvm

-  ‘perf top’: 90% in _raw_spin_lock -  ‘systemtap’: paging64_page_fault

and kvm_mmu_pte* …

15

VM CPU utilization

Compute Node CPU utilization

Back to the drawing board •  Needed to combine optimizations with EPT on

•  Huge pages a way out?

-  Idea: reduce the number of pages to be handled, increase hit ratio

•  1GB huge pages

-  Best HS06 results (with EPT on)

•  2MB huge pages

-  Also one of the default sizes -  Performance loss around 5% compared to bare metal on batch VMs

16

Optimization by numbers

17

- NUMA + Pinning

- 2MB huge pages - EPT on - KSM on

VM sizes (cores) Before After

4x 8 7.8% 3.3%

2x 16 16% 4.6%

1x 32 20.4% 3-6%

Deploy in production •  A small fraction can cause a lot of trouble…

18

Summary •  Reduced the virtualization HS06 overhead to a few

percent compared to bare metal -  On full node VMs! -  NUMA + pinning + huge pages + EPT on + KSM on

•  Pre-deployment testing very difficult

-  EPT off side-effects initially undetected

19

[email protected] @belmiromoreira

http://openstack-in-production.blogspot.com

Technology

CPU Optimizations in the CERN Cloud - February 2016