Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Migration Techniques in HPC Environments
Simon Pickartz1, Ramy Gad2, Stefan Lankes1, Lars Nagel2,Tim Süß2, André Brinkmann2, and Stephan Krempel3
1RWTH Aachen University 2Johannes Gutenberg Universität3ParTec Cluster Competence Center GmbH
In Conjunction with the FAST Project
08/26/2014
Agenda
The FAST Project
Migration in HPC EnvironmentsProcess-levelVirtualizationContainer-based
EvaluationOverheadMigration Time
Conclusion
08/26/2014 | ACS Automation of Complex Power Systems | 2Simon Pickartz
Exascale Systems
Characteristics/AssumptionsIncreasing amount of cores per nodeIncrease of CPU performance will not be matched by I/O performance
→ Most applications cannot exploit this parallelismConsequences
Exclusive node assignment has to be revokedDynamic scheduling during runtime will be necessary
→ Migration of processes between nodes indispensable
08/26/2014 | ACS Automation of Complex Power Systems | 4Simon Pickartz
Find a Suitable Topology forExascale Applications
A twofold scheduling approach1. Initial placement of job by a global
scheduler according to KPIsLoad of the individual nodesPower consumptionApplications’ resource requirements
2. Dynamic runtime adjustmentsMigration of processesTight coupling with the applicationsFeedback to the global scheduler
Cent
ralS
ched
uler
...
Agen
tAg
ent
Agen
t
App 1
App 2
App 1
App 1
App 2
App 2
08/26/2014 | ACS Automation of Complex Power Systems | 5Simon Pickartz
Migration in HPC Environments
08/26/2014 | ACS Automation of Complex Power Systems | 6Simon Pickartz
Process-level Migration
Process
OS
Host A
Process
OS
Host B
ProcessProcessProcess
IB Fabric
08/26/2014 | ACS Automation of Complex Power Systems | 7Simon Pickartz
Process-level Migration
Move a process from one node to anotherIncluding its execution context(i. e., register state and physical memory)
A special kind of Checkpoint/Restart (C/R) operationSeveral frameworks available
Condor’s checkpoint librarylibckptBerkley Lab Checkpoint/Restart (BLCR)
08/26/2014 | ACS Automation of Complex Power Systems | 8Simon Pickartz
Berkley Lab Checkpoint/Restart
Specifically designed for HPC applicationsTwo components
Kernel module for performing the C/R operationShared-library enabling user-space access
Cooperation with the C/R procedure via callback interfaceClose/open file descriptorsTear down/reestablish communication channels. . .
→ Residual dependencies that have to be resolved!
08/26/2014 | ACS Automation of Complex Power Systems | 9Simon Pickartz
Virtual Machine Migration
Process
VM
Hypervisor
Host A
Process
VM
Hypervisor
Host B
Process
VMProcess
VM
Process
VM
IB Fabric
08/26/2014 | ACS Automation of Complex Power Systems | 10Simon Pickartz
Virtual Machine Migration
Migration of a virtualized execution environmentReduction of the residual dependencies
File descriptors still valid after the migrationCommunication channels are automatically restored (e. g. TCP)
Performance degradation of I/O devices can be compensated byPass-through (e. g. Intel VT-d)Single Root I/O virtualization (SR-IOV)
Various hypervisors availableXenKernel Based Virtual Machine (KVM). . .
08/26/2014 | ACS Automation of Complex Power Systems | 11Simon Pickartz
Kernel Based Virtual Machine
A Linux kernel module that benefits from existing resourcesSchedulerMemory management. . .
Implements full-virtualization requiring hardware supportVMs are scheduled by the host like any other processAlready provides a migration framework
Live-migrationCold-migration
08/26/2014 | ACS Automation of Complex Power Systems | 12Simon Pickartz
Software-based Sharing
Hardware Physical NIC
Hypervisor Software Virtual Switch
eth0VMeth0VM
08/26/2014 | ACS Automation of Complex Power Systems | 13Simon Pickartz
Pass-Through
Hardware Physical NIC
Hypervisor
eth0VMeth0VM
08/26/2014 | ACS Automation of Complex Power Systems | 14Simon Pickartz
Single Root I/O Virtualization
Hardware VFVF PF
Hypervisor
eth0VMeth0VM
08/26/2014 | ACS Automation of Complex Power Systems | 15Simon Pickartz
Container-based Virtualization/
Hardware
Host Kernel
Hypervisor
GuestKernel
GuestKernel
App App App
(a) Virtual Machines
Hardware
Shared Kernel
Virtualization Layer
App AppApp
(b) Containers
08/26/2014 | ACS Automation of Complex Power Systems | 16Simon Pickartz
Container-based Virtualization/
Hardware
Host Kernel
Hypervisor
GuestKernel
GuestKernel
App App App
(a) Virtual Machines
Hardware
Shared Kernel
Virtualization Layer
App AppApp
(b) Containers
08/26/2014 | ACS Automation of Complex Power Systems | 16Simon Pickartz
Container-based Migration
Full-virtualization results in multiple kernels on one nodeIdea of container-based virtualization
→ Reuse the existing host kernel for the management of multipleuser-space instances
Host and guest have to use the same operating systemCommon representatives
OpenVZLinuX Containers (LXC)
08/26/2014 | ACS Automation of Complex Power Systems | 17Simon Pickartz
Test Environment
4-node Cluster2 Sandy-bridge Systems2 Ivy-bridge Systems
InfiniBand FDR Mellanox FabricUp to 56 GiB/sSupport for SR-IOV
OpenMPI 1.7 with BLCR support (except for the LXC results)
08/26/2014 | ACS Automation of Complex Power Systems | 19Simon Pickartz
Throughput
128 1 Ki 8 Ki 65 Ki 512 Ki 4 Mi0
2,000
4,000
6,000
Size in Byte
Thr
ough
put
inM
iB/s
ib_write_bw (Ivy-Bridge)
NativePass-ThroughSR-IOV
08/26/2014 | ACS Automation of Complex Power Systems | 20Simon Pickartz
Throughput
128 1 Ki 8 Ki 65 Ki 512 Ki 4 Mi 32 Mi0
2,000
4,000
6,000
Size in Byte
Thr
ough
put
inM
iB/s
OpenMPI (GCC 4.4.7, Ivy-Bridge)
NativePass-ThroughSR-IOVBLCR
08/26/2014 | ACS Automation of Complex Power Systems | 21Simon Pickartz
Latency
Native Pass-Through SR-IOV BLCR
InfiniBand Open MPI0
0.5
1
1.5
Late
ncy
inµ
s(R
TT
/2)
08/26/2014 | ACS Automation of Complex Power Systems | 22Simon Pickartz
NAS Parallel Benchmarks(Open MPI)
Native BLCR KVM
FT BT LU0
10
20
30
Spee
din
GFl
ops/
s
08/26/2014 | ACS Automation of Complex Power Systems | 23Simon Pickartz
NAS Parallel Benchmarks(Parastation)
Native LXC KVM
FT BT LU0
10
20
30
Spee
din
GFl
ops/
s
08/26/2014 | ACS Automation of Complex Power Systems | 24Simon Pickartz
mpiBLAST(Open MPI)
Native BLCR KVM
8 Procs. 16 Procs. 32 Procs.0
5
10
15
20
25Ru
ntim
ein
Min
utes
08/26/2014 | ACS Automation of Complex Power Systems | 25Simon Pickartz
mpiBLAST(Parastation)
Native LXC KVM
8 Procs. 16 Procs. 32 Procs.0
5
10
15
20
25Ru
ntim
ein
Min
utes
08/26/2014 | ACS Automation of Complex Power Systems | 26Simon Pickartz
Migration Time
BLCR KVM
Overall Downtime0
0.5
1
1.5
2
2.5
Tim
ein
s
08/26/2014 | ACS Automation of Complex Power Systems | 27Simon Pickartz
Conclusion
Inconclusive microbenchmarks analysisOverhead is highly application dependentMigration
Significant overhead of KVM concerning overall migration timeQualified by the downtime testKVM already supports live-migrationBLCR requires all processes to be stopped during migration
FlexibilityProcess-level migration generates residual dependencies
→ A non-transparent approach would be requiredVM/Container-based migration reduces these dependencies
08/26/2014 | ACS Automation of Complex Power Systems | 29Simon Pickartz
Outlook
We will focus on VM migration within FASTContainers may provide even better performance
Not as flexible as VMs (e. g., same OS)More isolation than process-level migration
Investigation of the application dependency observed during ourstudiesEnable VM migration with attached pass-through devices
→ More about FAST on http://www.en.fast-project.de
08/26/2014 | ACS Automation of Complex Power Systems | 30Simon Pickartz
Thank you for your kind attention!
Simon Pickartz1, Ramy Gad2, Stefan Lankes1, Lars Nagel2,Tim Süß2, André Brinkmann2, and Stephan Krempel3
1RWTH Aachen University 2Johannes Gutenberg Universität3ParTec Cluster Competence Center GmbH – [email protected]
Institute for Automation of Complex Power SystemsE.ON Energy Research Center, RWTH Aachen UniversityMathieustraße 1052074 Aachen
www.eonerc.rwth-aachen.de
Throughput(Fedora 20)
128 1 Ki 8 Ki 65 Ki 512 Ki 4 Mi0
2,000
4,000
6,000
Size in Byte
Thr
ough
put
inM
iB/s
ib_write_bw (Ivy-Bridge)
NativePass-ThroughSR-IOVLXC
08/26/2014 | ACS Automation of Complex Power Systems | 32Simon Pickartz
Throughput(Fedora 20)
128 1 Ki 8 Ki 65 Ki 512 Ki 4 Mi 32 Mi0
2,000
4,000
6,000
Size in Byte
Thr
ough
put
inM
iB/s
Parastation (GCC 4.4.7, Ivy-Bridge)
NativePass-ThroughSR-IOVLXC
08/26/2014 | ACS Automation of Complex Power Systems | 33Simon Pickartz