26
PDP – Toulouse, France – Feb 2008 System-level Virtualization for High Performance Computing Geoffroy Vallee Thomas Naughton Christian Engelmann Hong Ong Stephen L. Scott

System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

PDP – Toulouse, France – Feb 2008

System-level Virtualization for High Performance Computing

Geoffroy ValleeThomas Naughton

Christian EngelmannHong Ong

Stephen L. Scott

Page 2: System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

PDP 2008 – Toulouse, France 2

Introduction to Virtualization• System-level virtualization studied since the 70's

(Goldberg, Popek)

• Key concepts:− Virtual Machine (VM), guest OS: complete operating system

running in a virtual environment− Host OS: operating system running on top of the hardware,

interface between the user and the VMM and VMs− Virtual Machine Monitor (VMM), Hypervisor: manage VMs

(scheduling, hardware access)

Hardware

VMM

Host OS VM VM

Type-I VirtualizationHardware

Host OS

VMM

VM VM

Type-II Virtualization

Page 3: System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

PDP 2008 – Toulouse, France 3

The HPC Context

• The system should not interfere with running application(s)− minimize the OS footprint/noise: Catamount, CNK− priority to the application for resources access

• Support large-scale systems− thousands of distributed components− fault tolerance− system partitioning: compute nodes vs. service nodes

• Support HPC applications, e.g., MPI applications

Page 4: System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

PDP 2008 – Toulouse, France 4

Hypervisor for HPC

• What does it mean?− VMM for HPC

• minimize the VMM memory footprint• VMM/HostOS system footprint

− only the HostOS can execute privileged instructions on the behalf of VMs− possible domain context switches

− Provide a suitable execution environment to HPC applications

− Fault tolerance

− System Management of VMs/VMMs/HostOSes in large-scale distributed systems

− I/O & Storage• efficient access to resources• resource sharing between VMs running on different nodes

• Current virtualization solutions are not suitable for HPC

Page 5: System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

PDP 2008 – Toulouse, France 5

Hypervisor for High-Performance Computing

Page 6: System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

PDP 2008 – Toulouse, France 6

VMM for High-Performance Computing

• Minimize the system footprint− reduce the default VMM/HostOS memory usage− use hardware optimizations (AMD nested pages, hardware

IOMMU)

• Minimize the context switches between domains− pin Vms/HostOSes/VMM to a core/processor

• Avoid the usage of HostOSes for a direct access to the bare hardware− isolation vs. resource sharing− “VMM-bypass”

• Guarantee experiment reproductability− Have the same behavior between different application runs

Page 7: System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

PDP 2008 – Toulouse, France 7

Virtual System Environment

Page 8: System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

PDP 2008 – Toulouse, France 8

Virtual System Environment (VSE)

• Objective: “Adapt the operating system to the application, instead of adapting the application to the OS”− The science resides in the applications, not in the operating systems or

run-times

− Goal: application developers should not “port” their application every time they want to use a new execution platform

• System-level virtualization does offer interesting features− Application isolation within virtual machines (users can do

whatever they want)− All standard UNIX tools can be used

Page 9: System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

PDP 2008 – Toulouse, France 9

Challenges for the Usage of VSE in Distributed Systems

• System management− How can we deploy & manage both the HostOS and the VMs?

− How can users specify the system within a VM?

− How can sysadmin specify their constraints regarding execution environments?

− How can we switch between different virtualization solutions?

− How can we switch between a virtual environment and a normal environment (e.g., disk-less, disk-full)?

Important lack of tools

• Selected approach: the definition of Virtual System Environments

Page 10: System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

PDP 2008 – Toulouse, France 10

OSCAR-V: OSCAR Extension for Virtual Systems• Implement VSEs support

− without rewriting everything from scratch− potential support of all RPM and Debian based Linux distributions− abstracting existing system-level virtualization solutions

• Provide an integrated solution for:− the deployment & management of both HostOSes and VMs− the specification of VSE for both the user point-of-view and the

sysadmin point-of-view− unique interface for the manage of VMs: concept of profile− possible switch between disk-less, beowulf and partitioned

systems (ongoing work)

Page 11: System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

PDP 2008 – Toulouse, France 11

VSE Management - OverviewApplication's Developers

Needs & ConstraintsSystem AdministrationNeeds & Constraints

OS & RTE Definition

VSE Golden Image

VSE Definition

Image GenerationImage on the ManagementNode

BeowulfCluster

DisklessCluster

VirtualMachines

DeployedVSEs

Page 12: System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

PDP 2008 – Toulouse, France 12

Abstraction of the System-Level Virtualization Solutions• Users do not care about the technical details

− hide all the configuration details: Virtual Machine Management - V2M

− provide a simple API for the definition of VMs: profiles

• Profile example<?xml version="1.0"?><!DOCTYPE profile PUBLIC "" "v3m_profile.dtd"><profile> <name>test</name> <type>Xen</type> <image size="50">/home/gvallee/temp/v2m/test_xen.img</image> <nic1> <type>TUN/TAP</type> <mac>00:02:03:04:05:06</mac> </nic1></profile>

Page 13: System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

PDP 2008 – Toulouse, France 13

V2M - Architecture

High-Level Interface (vm_create, create_image_from_cdrom, create_image_with_oscar, vm_migrate,

vm_pause, vm_unpause)

Virtualization Abstraction

Qemu Xen VMWare ... V3MBack-ends

V3MFront-end

V2M(Virtual Machine Management

Command Line Interface)

KVMs(GUI for Linux - KDE/Qt)

Applicationsbased onlibv3m

Page 14: System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

PDP 2008 – Toulouse, France 14

System-level Virtualization and High Availability

Page 15: System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

PDP 2008 – Toulouse, France 15

VM Checkpoint / Restart

Already possible to checkpoint the memory (memory dump in a file)

File system checkpoint/restart

Collaboration with LATech (Box Leangsuksun)

Last File System Checkpoint(Read Only)

Current File System (Read/Write)

Checkpoint(merge)

New File SystemCheckpoint(Read Only)

Page 16: System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

Pro-active Fault Tolerance

Headnode w/ Fault Tolerant Job Scheduler1: Alarm: disk errors, so migrate

Fault TolerancePolicy (w/ failovermechanism)

Xen VM

Fault PredictionBased on

Hardware Monitoring

Fault PredictionBased on

Hardware Monitoring

Node i Node k

Avoid Jobs

2: VM Live Migration

Network

.......

Page 17: System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

Reactive Fault Tolerance

FailoverHeadnode

w/ Fault TolerantJob Scheduler

Headnodew/ Fault Tolerant

Job Scheduler

BLCR(in a Xen VM)

Fault PredictionBased on

Hardware Monitoring

Node i Spare Node k

Network

.......

BLCR(in a Xen VM)

Fault PredictionBased on

Hardware Monitoring

Failure Detected,Restart Process

Failure

Page 18: System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

PDP 2008 – Toulouse, France 18

System Management & Administration

Page 19: System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

PDP 2008 – Toulouse, France 19

Resource Management

• 2 different main challenges− System partitioning (becomes critical with multi-cores)− Application deployment

• Virtualization benefits− easily enable system partitioning− simplify the resource exposure

• Application/VMs deployment− on demand− before application deployment

• Example: Distributed Virtual Clustering (ASU / Cluster Resources Inc.)− Extension of MOAB for the virtualization support

Page 20: System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

PDP 2008 – Toulouse, France 20

I/O & Storage

Page 21: System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

PDP 2008 – Toulouse, France 21

I/O & Storage

• Critical for most of the HPC applications (communications, access to storage on service nodes)

• Access to the bare hardware− currently through the Host OS (isolation)− implies an overhead

• Possible solutions− VMM-bypass (direct mapping of resource into VMs)− Remote Direct Memory Access (RDMA)

Page 22: System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

PDP 2008 – Toulouse, France 22

Impact of Virtualization for HPC• Programming paradigm

− Challenges: How to move data to/from the application? How to parallelize applications to hundreds or even thousand of nodes? How to checkpoint/restart applications in order to guarantee resiliency?

− Opportunities: implicit communications - move the application to data; change the resource exposure

• Application development− emulation vs. virtualization; OS adaptation; VSE

− application developers can focus on the science

• System administration− separate system administrators and users constraints (VSE)

• Foster research and education− ease research in architecture & operating systems research

Page 23: System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

PDP 2008 – Toulouse, France 23

Conclusion

• Virtualization for HPC implies several challenges− a hypervisor suitable for HPC (i.e., with a small system footprint)− the support of virtual system environments− the support of high availability and fault tolerance capabilities− the support of advanced resource management capabilities− the use of system-level virtualization for resource management− the support of efficient I/O mechanisms and storage solutions for

virtualized environment.

• No current commercial solution provides such capabilities

• Virtualization solution are still immature for HPC (even if studied since the 70's) and still a lot of research to do

Page 24: System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

PDP 2008 – Toulouse, France 24

Acknowledgement

Ideas presented in this document are based on discussions with those attending the September 20-21, 2006, Nashville (Tennessee, USA) meeting on the role of virtualization in high performance computing. Attendees included: Stephen L. Scott – meeting chair (Oak Ridge National Laboratory), Barney Maccabe (University of New Mexico), Ron Brightwell (Sandia National Laboratory), Peter A. Dinda (Northwestern University), D.K. Panda (Ohio State University), Christian Engelmann (Oak Ridge National Laboratory), Ada Gavrilovska (Georgia Tech), Geoffroy Vallee (Oak Ridge National Laboratory), Greg Bronevetsky (Lawrence Livermore National Laboratory), Frank Mueller (North Carolina State University), Dan Stanzione (Arizona State University), Hong Ong (Oak Ridge National Laboratory), Seetharami R. Seelam (University of Texas at El Paso), Chokchai (Box) Leangsuksun (Louisiana Tech University), Sudharshan Vazhkudai (Oak Ridge National Laboratory), David Jackson (Cluster Resources Inc.), and Thomas Naughton (Oak Ridge National Laboratory).

We thank them for their time and suggestions for this document.

Page 25: System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

PDP 2008 – Toulouse, France 25

Contacts

Stephen L. ScottComputer Science Research GroupComputer Science and Mathematics [email protected]

Geoffroy ValleeComputer Science Research GroupComputer Science and Mathematics [email protected]

Join us at ACM HPCVirt'08, in conjuction with EuroSys 2008March 31, Glasgow, Scotlandhttp://www.csm.ornl.gov/srt/hpcvirt08/

Page 26: System-level Virtualization for High Performance Computing · 2009-07-04 · PDP 2008 – Toulouse, France 6 VMM for High-Performance Computing •Minimize the system footprint −reduce

PDP 2008 – Toulouse, France 26

Questions?