Upload
gwen-summers
View
214
Download
0
Embed Size (px)
Citation preview
Jaeyoung YoonComputer Sciences DepartmentUniversity of Wisconsin-Madison
[email protected]://www.cs.wisc.edu/condor
Virtual MachineUniverse in Condor
2www.cs.wisc.edu/condor
What is VM universe?
› A job user can submit a virtual
machine to Condor
› Condor runs the virtual machine
and sends back a result virtual
machine
› support VMware server and Xen
3www.cs.wisc.edu/condor
Big pictureSubmit machineSubmit machine
Schedd
VirtualMachine
Shadow
Execute machineExecute machine
Startd
Starter
VM GAHP
4www.cs.wisc.edu/condor
Benefits of VM universe
› platform independence
› environment independent on host machine
› checkpoint
› networking in a virtual machine
› snapshot disk
› input CDROM image
5www.cs.wisc.edu/condor
Snapshot disk
› All modified data will be stored into snapshot disks without changing original VM disk files.
› VM disk files in a shared file system can be safely shared among multiple jobs
› Can reduce disk space for result and checkpoint
6www.cs.wisc.edu/condor
Submit description file with shared file system
› universe = vm› executable = WindowsXP› vm_type = vmware› vm_memory = 256› vm_checkpoint = TRUE› vm_networking = TRUE› vm_networking_type = dhcp› vmware_dir = /shared/windows_vmvmware_dir = /shared/windows_vm› vmware_should_transfer_files = FALSEvmware_should_transfer_files = FALSE› vmware_snapshot_disk = TRUEvmware_snapshot_disk = TRUE› initialdir = /result1initialdir = /result1› QueueQueue› initialdir = /result2initialdir = /result2› QueueQueue
7www.cs.wisc.edu/condor
Snapshot disk with shared file system
Submit machineSubmit machine Execute machine 1Execute machine 1
Shared file system
Execute machine 2Execute machine 2
/windows_vm
Job 1 Snapshot Disk
Job 2Snapshot Disk
/result1
/result2
8www.cs.wisc.edu/condor
Submit description file without shared file
system› universe = vm› executable = WindowsXP› vm_type = vmware› vm_memory = 256› vm_checkpoint = TRUE› vm_networking = TRUE› vm_networking_type = dhcp› vmware_dir = /windows_vmvmware_dir = /windows_vm› vmware_should_transfer_files = TRUEvmware_should_transfer_files = TRUE› initialdir = /result1initialdir = /result1› vmware_snapshot_disk = TRUEvmware_snapshot_disk = TRUE› QueueQueue› initialdir = /result2initialdir = /result2› vmware_snapshot_disk = FALSEvmware_snapshot_disk = FALSE› QueueQueue
9www.cs.wisc.edu/condor
Snapshot disk without shared file system
Submit machineSubmit machine Execute machine 1Execute machine 1(Job 1)(Job 1)Job 1 submit descriptionJob 1 submit description
...vmware_snapshot_disk = TRUEInitialdir = /result1
Job 2 submit descriptionJob 2 submit description...vmware_snapshot_disk = FALSEInitialdir = /result2
/windows_vm
Execute machine 2Execute machine 2(Job 2)(Job 2)
snapshot snapshot disk disk
10www.cs.wisc.edu/condor
Submit machineSubmit machine
Job 1/result1
Job 2/result2
/windows_vm
Execute machine 1Execute machine 1(Job 1)(Job 1)
snapshot snapshot disk disk
Execute machine 2Execute machine 2(Job 2)(Job 2)
Snapshot disk without shared file system
11www.cs.wisc.edu/condor
Input CDROM image
› VM universe can not use input or
argument parameter in a job submit
description file as other universes do
› With input CDROM images, a job
user may run the same VM several
times on different input data sets
12www.cs.wisc.edu/condor
Submit description file with input CDROM image
› universe = vm› executable = WindowsXP› vm_type = vmware› vm_memory = 256› vm_checkpoint = TRUE› vm_networking = TRUE› vm_networking_type = dhcp› vmware_dir = /windows_vm› vmware_should_transfer_files = FALSE› vmware_snapshot_disk = TRUE› initialdir = /result1› vmware_cdrom_files = a.isovmware_cdrom_files = a.iso› QueueQueue› initialdir = /result2› vmware_cdrom_files = a.txt, b.txtvmware_cdrom_files = a.txt, b.txt› QueueQueue
13www.cs.wisc.edu/condor
Input CDROM imageSubmit machineSubmit machine Execute machine 1Execute machine 1
VM
a.iso
Job 1 submit descriptionJob 1 submit description...vmware_cdrom_files = a.iso
Job 2 submit descriptionJob 2 submit description...vmware_cdrom_files = a.txt, b.txt
Execute machine 2Execute machine 2
VM
a.txt b.txt
14www.cs.wisc.edu/condor
VMware VM universe
› Snapshot disk
› Input CDROM image
› Can be used on either Linux host or
Windows host
15www.cs.wisc.edu/condor
Xen VM universe
› No support of snapshot disk VM disk file in a shared file system
can not be shared among multiple job
unless it is read-only.
› Input CDROM image
› Can be used on only Linux host
16www.cs.wisc.edu/condor
Checkpoint
› Periodic checkpoint and vacate checkpoint
› All modified VM disk files and a file for VM
memory will be transferred back to a
submit machine
› When snapshot disks are used, snapshot
disk files and a file for VM memory will be
transferred.
17www.cs.wisc.edu/condor
Suspend› Hard suspend:
Memory being used by a VM will be released and the memory will be saved into a file
› Soft suspend:Memory being used by a VM will not be released and the VM will be just paused like SIGSTOP
18www.cs.wisc.edu/condor
Networking issues when restarting from
checkpoint› MAC and IP address for VM are also preserved when
checkpointed
› When restarting the checkpointed VM, MAC and IP address don’t change.
› If we use NAT for VM networking, different execution machines may have different MAC and IP address of NAT gateway.
› In VMware, if we install VMware tool inside VM, the tool program will automatically execute DHCP renew when a VM is restarted.
19www.cs.wisc.edu/condor
Future work
› Support snapshot disks in Xen VM universe
› For result, get only output files from VM instead of all VM files.
› Support another Virtual machine program (e.g. QEMU)
20www.cs.wisc.edu/condor
Summary
› We are testing VM universe.
› Hopefully VM universe will be included in Condor 6.9.x.
Questions?