Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison jyoon@cs.wisc.edu ...

Preview:

Citation preview

Jaeyoung YoonComputer Sciences DepartmentUniversity of Wisconsin-Madison

jyoon@cs.wisc.eduhttp://www.cs.wisc.edu/condor

Virtual MachineUniverse in Condor

2www.cs.wisc.edu/condor

What is VM universe?

› A job user can submit a virtual

machine to Condor

› Condor runs the virtual machine

and sends back a result virtual

machine

› support VMware server and Xen

3www.cs.wisc.edu/condor

Big pictureSubmit machineSubmit machine

Schedd

VirtualMachine

Shadow

Execute machineExecute machine

Startd

Starter

VM GAHP

4www.cs.wisc.edu/condor

Benefits of VM universe

› platform independence

› environment independent on host machine

› checkpoint

› networking in a virtual machine

› snapshot disk

› input CDROM image

5www.cs.wisc.edu/condor

Snapshot disk

› All modified data will be stored into snapshot disks without changing original VM disk files.

› VM disk files in a shared file system can be safely shared among multiple jobs

› Can reduce disk space for result and checkpoint

6www.cs.wisc.edu/condor

Submit description file with shared file system

› universe = vm› executable = WindowsXP› vm_type = vmware› vm_memory = 256› vm_checkpoint = TRUE› vm_networking = TRUE› vm_networking_type = dhcp› vmware_dir = /shared/windows_vmvmware_dir = /shared/windows_vm› vmware_should_transfer_files = FALSEvmware_should_transfer_files = FALSE› vmware_snapshot_disk = TRUEvmware_snapshot_disk = TRUE› initialdir = /result1initialdir = /result1› QueueQueue› initialdir = /result2initialdir = /result2› QueueQueue

7www.cs.wisc.edu/condor

Snapshot disk with shared file system

Submit machineSubmit machine Execute machine 1Execute machine 1

Shared file system

Execute machine 2Execute machine 2

/windows_vm

Job 1 Snapshot Disk

Job 2Snapshot Disk

/result1

/result2

8www.cs.wisc.edu/condor

Submit description file without shared file

system› universe = vm› executable = WindowsXP› vm_type = vmware› vm_memory = 256› vm_checkpoint = TRUE› vm_networking = TRUE› vm_networking_type = dhcp› vmware_dir = /windows_vmvmware_dir = /windows_vm› vmware_should_transfer_files = TRUEvmware_should_transfer_files = TRUE› initialdir = /result1initialdir = /result1› vmware_snapshot_disk = TRUEvmware_snapshot_disk = TRUE› QueueQueue› initialdir = /result2initialdir = /result2› vmware_snapshot_disk = FALSEvmware_snapshot_disk = FALSE› QueueQueue

9www.cs.wisc.edu/condor

Snapshot disk without shared file system

Submit machineSubmit machine Execute machine 1Execute machine 1(Job 1)(Job 1)Job 1 submit descriptionJob 1 submit description

...vmware_snapshot_disk = TRUEInitialdir = /result1

Job 2 submit descriptionJob 2 submit description...vmware_snapshot_disk = FALSEInitialdir = /result2

/windows_vm

Execute machine 2Execute machine 2(Job 2)(Job 2)

snapshot snapshot disk disk

10www.cs.wisc.edu/condor

Submit machineSubmit machine

Job 1/result1

Job 2/result2

/windows_vm

Execute machine 1Execute machine 1(Job 1)(Job 1)

snapshot snapshot disk disk

Execute machine 2Execute machine 2(Job 2)(Job 2)

Snapshot disk without shared file system

11www.cs.wisc.edu/condor

Input CDROM image

› VM universe can not use input or

argument parameter in a job submit

description file as other universes do

› With input CDROM images, a job

user may run the same VM several

times on different input data sets

12www.cs.wisc.edu/condor

Submit description file with input CDROM image

› universe = vm› executable = WindowsXP› vm_type = vmware› vm_memory = 256› vm_checkpoint = TRUE› vm_networking = TRUE› vm_networking_type = dhcp› vmware_dir = /windows_vm› vmware_should_transfer_files = FALSE› vmware_snapshot_disk = TRUE› initialdir = /result1› vmware_cdrom_files = a.isovmware_cdrom_files = a.iso› QueueQueue› initialdir = /result2› vmware_cdrom_files = a.txt, b.txtvmware_cdrom_files = a.txt, b.txt› QueueQueue

13www.cs.wisc.edu/condor

Input CDROM imageSubmit machineSubmit machine Execute machine 1Execute machine 1

VM

a.iso

Job 1 submit descriptionJob 1 submit description...vmware_cdrom_files = a.iso

Job 2 submit descriptionJob 2 submit description...vmware_cdrom_files = a.txt, b.txt

Execute machine 2Execute machine 2

VM

a.txt b.txt

14www.cs.wisc.edu/condor

VMware VM universe

› Snapshot disk

› Input CDROM image

› Can be used on either Linux host or

Windows host

15www.cs.wisc.edu/condor

Xen VM universe

› No support of snapshot disk VM disk file in a shared file system

can not be shared among multiple job

unless it is read-only.

› Input CDROM image

› Can be used on only Linux host

16www.cs.wisc.edu/condor

Checkpoint

› Periodic checkpoint and vacate checkpoint

› All modified VM disk files and a file for VM

memory will be transferred back to a

submit machine

› When snapshot disks are used, snapshot

disk files and a file for VM memory will be

transferred.

17www.cs.wisc.edu/condor

Suspend› Hard suspend:

Memory being used by a VM will be released and the memory will be saved into a file

› Soft suspend:Memory being used by a VM will not be released and the VM will be just paused like SIGSTOP

18www.cs.wisc.edu/condor

Networking issues when restarting from

checkpoint› MAC and IP address for VM are also preserved when

checkpointed

› When restarting the checkpointed VM, MAC and IP address don’t change.

› If we use NAT for VM networking, different execution machines may have different MAC and IP address of NAT gateway.

› In VMware, if we install VMware tool inside VM, the tool program will automatically execute DHCP renew when a VM is restarted.

19www.cs.wisc.edu/condor

Future work

› Support snapshot disks in Xen VM universe

› For result, get only output files from VM instead of all VM files.

› Support another Virtual machine program (e.g. QEMU)

20www.cs.wisc.edu/condor

Summary

› We are testing VM universe.

› Hopefully VM universe will be included in Condor 6.9.x.

Questions?

Recommended