Upload
tazanna-chan
View
49
Download
1
Embed Size (px)
DESCRIPTION
Using the Parallel Universe beyond MPI. Parallel Universe applications using Metronome. Metronome’s support for running parallel jobs builds on Condor’s Parallel Universe Possible to run coordinated Metronome jobs on multiple machines at the same time with available communication between them - PowerPoint PPT Presentation
Citation preview
Becky GietzelComputer Sciences DepartmentUniversity of Wisconsin-Madison
Using the Parallel Universe beyond MPI
www.cs.wisc.edu/~bgietzel
Parallel Universe applications using
Metronome Metronome’s support for running parallel
jobs builds on Condor’s Parallel Universe Possible to run coordinated Metronome
jobs on multiple machines at the same time with available communication between them
Provides advanced testing opportunities Some examples: client/server, cross-
platform, compatibility, stress/scalability
www.cs.wisc.edu/~bgietzel
Service testing challenges
Starting multiple services on the same machine does not allow for testing across a network or different platforms
Deciding when to start the services and when to start tests requires human intervention
Setup of the services is usually a manual process, or don’t bother testing.
Same goes for the teardown of services to return the machines to their original state
www.cs.wisc.edu/~bgietzel
Benefits of using Metronome
Condor manages dynamic claiming of resources, communication between job nodes and cleaning up after the jobs run
Metronome publishes basic information about each task to the job ad where it’s accessible by any node, acting as a “scratch space” for the job
The hostnames of all job nodes, the start time, return code, and end time for each task on each node are published to this shared job ad
This information is useful for communication between nodes and synchronization in the user’s glue scripts.
www.cs.wisc.edu/~bgietzel
Client/server test example
Submit Node
Execute Node 0
Execute
Node 1
Parallel Job
Start server
Send port to client
Handle client requests
Poll for ALLDONE from client
Exit
Discover server hostname and portStart client
Run queries against server
Send ALLDONE message to server
Exit
SERVER
CLIENT
www.cs.wisc.edu/~bgietzel
How to submit a parallel job in Metronome
Several minor modifications to the Metronome submit file are necessary for parallel jobs
List of platforms is comma separated with parentheses around the outside
Platforms = (x86_rhas_3, x86_rhas_4)
www.cs.wisc.edu/~bgietzel
Parallel job submit files continued
Add a glue script for each task/node combination to be executed remotely.
› platform_pre_0 = client/platform_pre
› platform_pre_1 = server/platform_pre
› remote_declare_0 = client/remote_declare
› remote_declare_1 = server/remote_declare
› remote_task_0 = client/remote_task
› remote_task_1 = server/remote_task
› remote_task_args_0 = 9000
› remote_task_args_1 = 9001
… and so forth for all glue scripts.
www.cs.wisc.edu/~bgietzel
Other parallel job use cases
Cross platform testing (Linux to Solaris)
Scalability/stress testing (1 server, many clients)
Compatibility testing (cross version, stable vs. development series)
www.cs.wisc.edu/~bgietzel
For more information
Documentation is available on the NMI site
See http://nmi.cs.wisc.edu/node/1001 for information on running parallel jobs using Metronome
http://nmi.cs.wisc.edu/node/282 describes how to set up your own Metronome installation for running parallel jobs