45
03/25/22 LACSI 2003 1 Faucets: Scheduling on Clusters and Across the Grid Presenter: Sameer Kumar Team: Sanjay Kalé, Sameer Kumar, Sindhura Bandhakavi, Justin Meyer Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana-Champaign http://charm. cs . uiuc . edu /

Faucets: Scheduling on Clusters and Across the Grid

  • Upload
    latika

  • View
    36

  • Download
    1

Embed Size (px)

DESCRIPTION

Faucets: Scheduling on Clusters and Across the Grid. Presenter: Sameer Kumar Team: Sanjay Kal é , Sameer Kumar, Sindhura Bandhakavi, Justin Meyer Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana-Champaign http://charm.cs.uiuc.edu/. Outline. - PowerPoint PPT Presentation

Citation preview

04/22/23 LACSI 2003 1

Faucets: Scheduling on Clusters and Across the

GridPresenter: Sameer KumarTeam: Sanjay Kalé, Sameer Kumar, Sindhura Bandhakavi, Justin Meyer

Parallel Programming LaboratoryDepartment of Computer Science

University of Illinois at Urbana-Champaignhttp://charm.cs.uiuc.edu/

[email protected] LACSI 2003 2

Outline High-level Description

Motivation Faucets, Cluster Bartering Adaptive jobs, Adaptive queuing system (AQS) Demo

Usage and Installation1. How to write an adaptive program2. Installing and Using the AQS3. Adding your cluster to an existing faucets server4. Installing a faucets server

[email protected] LACSI 2003 3

Motivation1. Demand for high end compute power, but

Dispersed Which machine would give me back my results quickest?

Hard to use Use ssh to login, ftp files, decide queue, create script,

submit Because of the hassle, users just submit same

script to same machine even if a better alternative exists

Monitor a running job2. Low operational efficiency of existing

computing systems

[email protected] LACSI 2003 4

Solution 1: Faucets Motivation #1: dispersed, hard to use Central source of compute power

Users Providers of compute resources User account not needed on every resource

Match users and providers Market economy ? Cluster bartering QoS requirements, contracts and bidding systems

GUI or web-based interface Submission Monitoring

[email protected] LACSI 2003 6

Motivation #2: Inefficient Utilization

Job A 10 processo

rs

Allocate A !

Job B8 processors

B QueuedConflict !16 Processor system

Job AJob B

Current Job Schedulers can have low system utilization !

[email protected] LACSI 2003 7

Solution : Adaptive Jobs Jobs that can shrink or expand the number

of processors they are running on at runtime Improve system utilization and response

time Properties

Min_pe, related to the memory requirements of the job

Max_pe, related to speedup

Scheduler can take advantage of this adaptivity

[email protected] LACSI 2003 8

Two Adaptive Jobs

Job A Max_pe = 10

Min_pe = 1

A Expands !

Job BMin_pe = 8Max_pe= 16

Shrink AAllocate B !16 Processor system

Job AJob B

B FinishesAllocate A !

[email protected] LACSI 2003 9

Adaptive Job Scheduler Maximize system utilization and minimize

response time Scheduling decisions

Shrink existing jobs when a new job arrives Expand jobs to use all processors when a job finishes

Processor map sent to the job Bit vector specifying which processors a job is

allowed to use 00011100 (use 3 4 and 5!)

Handles regular (non-adaptive) jobs

[email protected] LACSI 2003 10

Outline High-level description

Motivation Faucets, cluster bartering Adaptive jobs, adaptive queuing system (AQS) Demo

Usage and installation1. How to write an adaptive program2. Installing and using the AQS3. Adding your cluster to an existing faucets server4. Installing a faucets server

FAUCETS SERVER

GUICLIENT

(or)Web

Browser

CLUSTERDAEMON

CLUSTERADAPTIVEQ SYSTEM PE PE PE

CLUSTER

SystemOverview

FAUCETS SERVER

GUI CLIENT(or)

Web Browser(or)

Command-lineClient

CLUSTERDAEMON

CLUSTERADAPTIVEQ SYSTEM PE PE PE

CLUSTER

GUI Client

[email protected] LACSI 2003 13

Secure Communication SSL communication Certificate for Faucets Server

public key distributed on web page, in code

One certificate for each CD Future: Globus

[email protected] LACSI 2003 14

GUI Client One JAR file Runs on Win32 platform Faucets Server Certificate included in

code. GUI client gets CD certificates from

CS

Perf Monitor

FAUCETS SERVER

GUICLIENT

(or)Web

Browser

CLUSTERDAEMON

CLUSTERLOCAL

SCHEDULER PE PE PE

CLUSTER

Adaptive Jobs

[email protected] LACSI 2003 20

Adaptive Job Framework

Applications written in AMPI or Charm++

Scheduler controls the processor map for each job

Processor map is used by the job’s load balancer

Scheduler

Adaptive Application

AMPI

CHARM++

Loadbalancer

Converse

Proc. Map

[email protected] LACSI 2003 21

Charm++ Charm++: object based

virtualization Program written as a large number of

objects which can migrate Number of objects typically much larger

than processors Load-balancer can remap objects

Measurement based load balancing

[email protected] LACSI 2003 22

Adaptive Charm++ Programs

Charm++ program is adaptive automatically if an adaptive load-balancing strategy is used Currently CommLB and RandcentLB are

adaptive Compile with +balancer CommLB

[email protected] LACSI 2003 23

MPI Jobs How do we make MPI jobs adaptive? AMPI

AMPI maps the MPI processes to user level threads which can migrate

Each thread is embedded in a charm++ object, thus allowing load balancing and shrink-expand

[email protected] LACSI 2003 24

Writing Adaptive AMPI Programs

Build AMPI with an adaptive load balancing strategies

Call MPI_MIGRATE() at regular intervals in each MPI process, because it will not listen to the processor map otherwise

Use specific load-balancers

[email protected] LACSI 2003 25

Shrink Expand Overhead

Performance for MD program with 10MB migrated data per processor on NCSA Platinum

0.49 0.56 16 8 0.46 0.59 32 16 0.54 0.66 64 32 0.50 0.61128 64

Expand Time (s)Shrink Time (s)Processors

FAUCETS SERVER

GUICLIENT

(or)Web

Browser

CLUSTERDAEMON

CLUSTERADAPTIVEQ SYSTEM PE PE PE

CLUSTER

Adaptive Queuing System

[email protected] LACSI 2003 27

AQS Features

Multithreaded Reliable and robust Tested on Linux clusters at UIUC Supports most features of standard queuing

systems Has the ability to manage adaptive jobs

currently implemented in Charm++ and MPI For more details check out

http://charm.cs.uiuc.edu/research/faucets/faucets.html

[email protected] LACSI 2003 28

Components Database Job scheduler Compute cluster

[email protected] LACSI 2003 29

Installing Database Download latest version of MySql

http://www.mysql.com/ Install, then:

mysql> create database <dbname>;mysql> use <dbname>;mysql> create table jobInfo (id mediumint primary key NOT NULL DEFAULT '0' auto_increment, …..)

mysql> grant all on *.* to <user> identified by <passwd>;

[email protected] LACSI 2003 30

Installing Scheduler cd charm/net-linux/pgms/scheduler; make scheduler; make client; Edit Makefile, put correct path to MySql Running scheduler as root

su chown root scheduler; chmod +s scheduler

./startScheduler

[email protected] LACSI 2003 31

Installing Scheduler, contd. Edit the startScheduler file:

Edit Database to match <dbname> used earlier.

Edit PORT to point to port of the scheduler Edit DATABASE_HOST DATABASE_USER

and DATABASE_PASSWD to point to the database host, user and password

NODELIST points to the nodelist for the scheduler

[email protected] LACSI 2003 32

Configuring The Cluster User must have access to the cluster only

through the queuing system Each node runs an rsh daemon Access to rsh through a restrictive group

Job switches to the rsh group before running the job

only head node can rsh to the other nodes rsh disabled on the compute nodes

All connections through unix sockets

[email protected] LACSI 2003 33

Using the AQS locally frun runs a job interactively fsub submits a batch job fkill kills the job fjobs list the running and queued

jobs

[email protected] LACSI 2003 34

Scheduling Events When :

Job arrival Job completion Job requests change of number of

processors Job suspension

Scheduling Strategy A plugable component that makes decisions

on which jobs to schedule

[email protected] LACSI 2003 35

Scheduling Strategy Studied

Similar to equipartitioning [N Islam et al] On job arrival and job completion

All running jobs and the new one are allocated their minimum number of processors

Leftover processors are shared equally subject to each job's maximum processor usage

If it is not possible to allocate the new job its minimum number of processors, it is queued

[email protected] LACSI 2003 36

Scheduler Performance

λ=Arrival Rate, MRT=Mean Response Time Utilization=Processor utilization, Load Factor (lf)=Execution Time*λ

Simulation results on 64 processors with mean job execution time of 64.5 sec

1.087648892164601.0713968814364.50.654623360961000.322318531762000.1391651368500

Utilization (%)

MRT (s)Utilization (%)

MRT (s)lfTraditional JobsAdaptive Jobs1/(λ) (s)

[email protected] LACSI 2003 37

Experimental ResultsExperiments on Linux cluster on 64 processors

and mean job execution time of 60 sec

1.07430399211600.64911668761000.3231082970200

0.1291091789500

Utilization (%)

MRT (s)Utilization (%)

MRT (s)lfTraditional JobsAdaptive Jobs1/(λ) (s)

04/22/23 LACSI 2003 38

Adding a Cluster to Faucets

FAUCETS SERVER

GUICLIENT

(or)Web

Browser

CLUSTERDAEMON

CLUSTERLOCAL

SCHEDULER PE PE PE

CLUSTER

[email protected] LACSI 2003 40

Adding new cluster Prerequisites

Install Charm++ Install Adaptive Queuing System

Then Download the faucets software

http://charm.cs.uiuc.edu/ Compile the cluster daemon (CD)

cd faucets/cd; make Run the cluster daemon (CD)

cd .. java cd.ClusterDaemon <central server>

<central server port> -p <ClusterDaemon port> <working dir>

04/22/23 LACSI 2003 41

Installing a Faucets Server

FAUCETS SERVER

GUICLIENT

(or)Web

Browser

CLUSTERDAEMON

CLUSTERLOCAL

SCHEDULER PE PE PE

CLUSTER

[email protected] LACSI 2003 43

Installing a Faucets Server Install MySQL

create tables grant permissions

Download JDBC driver http://mmmysql.sourceforge.net/

Install CS download faucets code and unpack cd faucets/cs; make Edit faucets/cs/db.properties cd faucets java -cp .:/path/to/mm.mysql-2.0.8-bin.jar TheServer

[email protected] LACSI 2003 44

Installing Appspector Installation is a little involved Each application needs a display

module written in Java Contact us if you want to install

[email protected] LACSI 2003 45

Summary and Future Work Showed you how to use and install the Charm+

+/AMPI adaptive job system Download at http://charm.cs.uiuc.edu

/research/faucets Future

Extend the system to other parallel machines Eliminate residual processes Integrate the scheduler with Globus More comprehensive QoS contracts being developed Sophisticated bidding schemes for the faucets

framewor