Resource Management and Balancing

Resource Managementand Balancing

ECI, July 2005ECI, July 2005

ECI – July 2005 2

RMS – Overview

Resource management Job management Monitoring Resource balancing Information dissemination

ECI – July 2005 3

Job Management

The need Operating system offers job and resource

management service for a single computer The batch job control on multi-user mainframes

was performed outside the operating system Main advantages are:

Structured resource utilization planning and control

Abstraction, easy-to-{understand,use} for user Provide a vendor independent user interface

ECI – July 2005 4

Manager vs. Scheduler

Resource manager Locating and allocate resources Authentication Process creation and migration

Resource scheduler Queuing applications Drive manager (enforce policy)

ECI – July 2005 5

Job Management - Requirements

A typical job management system offers Heterogeneous support Batch support Parallel support Interactive support Checkpointing and process migration Load balancing Job run-time limits GUI

ECI – July 2005 6

RMS Architecture

Prerequisites Multi-user & multitasking capabilities Homogeneous OS are not a restriction

In practice “Similar” operating systems run on all

machines UNIX (in all variants) is very customary in the

context of using RMS

ECI – July 2005 7

Resource Description

Requirements Easy to generate simple description Powerful to generate complex description Portable representation Attributed components

RDL: Language to specify resources Administrator: describe what’s available User: describe what’s required Hierarchical

ECI – July 2005 8

RDL Example

A 1024 nodes transputer with unix front-endDECLARATION

BEGIN PROC Transputer DYNAMIC; EXCLUSIVE;

DECLARATION BEGIN PROC Backend DECLARATION { PROC; CPU=T8; MEMORY=4; SPEED=30; REPEAT=1024; } { PORT; REPEAT=4; } END PROC

BEGIN PROC Frontend DECLARATION { PROC; OS=Unix; Repeat = 4; } END PROC

CONNECTION FOR I = 0 to 3 DO Backend LINK i Frontend LINK i ODEND PROC

ECI – July 2005 9

RMS Components

User interface At the minimum - command line user interface GUI becoming indispensable Typical commands

Job submission to register for execution status display to monitor progress or failure of a job Job deletion to cancel jobs no longer needed

ECI – July 2005 10

RMS Components (contd)

Administrative environment Specify nodes characteristics Define feasible job classes and map to hosts Define user access permissions Specify resource limitations for users and jobs Specify policies for the assignment of jobs Control and ensure proper operation of the

RMS Analyze accounting data to tune the system

ECI – July 2005 11

RMS Entities

Queues Queues bound to hosts, jobs assigned to

queues Hosts

Compute hosts, control hosts Users

Capabilities, permissions, priorities Jobs Resources Policies

ECI – July 2005 12

RMS Entities – Jobs

Job: collection of computational tasks A single program, or several interacting

programs In the context of RMS

Batch Jobs: require no manual interaction as soon as started

Interactive Jobs: require input during runtime Parallel Jobs: subtasks spread across several

hosts in a cluster Check-pointing Jobs: periodically save status

to the file system and can be aborted anytime

ECI – July 2005 13

RMS Entities – Jobs

Batch jobs Dispatch jobs according to policy and

availability Suspend/Resume & checkpoint/restart

Interactive jobs Need to maintain a terminal connection “Watchdog” monitor withdraw from pool

Parallel jobs Need to integrate with parallel environment Scheduling policy is more complex

ECI – July 2005 14

RMS Entities - Resources

Available memory, CPU time, network bandwidth, and peripheral devices, licenses

Jobs declare resource requirements RMS enforces resource consumption

ensures quality of service prevents over-subscription detects over-usage

ECI – July 2005 15

RMS Entities - Policies

Abstract mechanisms to automate control imbalanced load is common in clusters important/urgent work starved unauthorized users may take advantage users may exceed desired resource usage over

time Resource Utilization Policies

Monitor resource consumption Dispatch of new jobs

Scheduling Policies Dispatch of new jobs Relocation of jobs

ECI – July 2005 16

Resource Utilization Policies

Share based Resource “credit” is assigned to users, depts… Hierarchical share tree defines sharing Establish entitlements within time frame Fair distribution of resources

Functional Assignment by functional importance (priority) Past usage is not taken into account

Deadline Time-critical applications

Manual override Administrators like power…

ECI – July 2005 17

Scheduling Policies

Dispatch time – who, where First-Come-First-Served Select-Least-Loaded Select-Fixed-Sequence Combinations above

Relocation – who, when, where Dynamic resource balancing

ECI – July 2005 18

Scheduling of Parallel Processes

Gang scheduling Requires tight-coupling (MPP’s)

Co-scheduling Demand-based

False priority Concurrent applications

Implicit Busy wait to not relinquish cpu

ECI – July 2005 19

RMS Challenges

Open Interfaces Export load balancing/distributed capabilities Export status info (load, job status, queues) Control/assistance from application Integration with other environments (MPI) Extend functionality for special cases API must be: simple, usable, abstract, robust

ECI – July 2005 20

RMS Example: CODINE

CODINE/GRD cod_qmaster: master daemon cod_schedd: scheduler daemon cod_execd: execution daemon

Continuously match utilization with policies GRD monitors and adjusts resource usage

correlated to all processes of a job Feedback to adjust shares towards changing

requirements

ECI – July 2005 21

Static Scheduling Scheme

ECI – July 2005 22

Dynamic Scheduling Scheme

ECI – July 2005 23

RMS Example: PBS

Portable Batch Sysetm Scheduler – job to node mapping, queues Server – communications, logs Control daemon (per node) – executive agent

Scope – single node Job arrays Task Management interface

ECI – July 2005 24

RMS Example: Condor

Condor: a distributed job scheduler Harvest idle workstations Job scheduling and migration Advertising mechanism

Both job and W/S advertise presence Jobs advertise requirements (job description

file) W/S advertise their capabilities

ECI – July 2005 25

Condor: Example JDF

universe = vanilla # select runtime environmentexecutable = some_jobrequirements = (Arch=="INTEL" && OpSys=="LINUX") rank = (Memory * 10000) + KFlops #targetarguments = -verboseinput = in.dat # redirect to stdinoutput = out.dat # redirect to stdoutlog = log.txtQueue # add job to queue

ECI – July 2005 26

RMS: Condor (contd)

Universe Vanilla: sequential apps (shared FS) MPI, PVM: integrated with parallel

environment Globus: grid computing environment Standard: enables process migration

Process migration Reschedule higher priority job User reclaims her W/S Must be linked with a special library

ECI – July 2005 27

RMS: Condor (contd)

Access to data Shared file system Condor file transfer mechanism

Automatically prefetch, postfetch Remote I/O calls (in standard universe)

Architecture Central manager Server on each node

ECI – July 2005 28

Known Condor Pools

ECI – July 2005 29

Monitoring

Design choices Centralized decentralized Periodic request driven Flat hierarchical Resolution of information Focused view

ECI – July 2005 30

Monitoring Example: Parmon

Features: Online creation of Node and Group database Component, Node, Group, or entire Cluster

level Monitoring of CPU, memory, disk and network,

processes, log files etc Facility to define events & automatic

notification Misc: message broadcast, remote admin, GUI

ECI – July 2005 31

Load Balancing

Application system Static dynamic adaptive Centralized decentralized Receiver initiated sender initiated Parallel applications On-line nature

ECI – July 2005 32

LB: Application Level

Application level Round robin Randomized Recursive bisection Other optimization

Hard to estimate execution times Indeterminate no. of steps Unpredictable load Communication delays

ECI – July 2005 33

LB: System Level

System level Round robin Randomized

Estimate run time Specified by job description Estimate from past experience

ECI – July 2005 34

LB Example: MOSIX

Decentralized Symmetric Deterministic Responsive Stable Competitive Resources: CPU, memory, I/O

ECI – July 2005 35

Load Balancing Over Network

Distribute workload or network traffic load across the cluster Nodes may be interconnected among

themselves Must be connected to the balancing device

Processing nodes provide status information current processor load the application system load number of active users the availability of network protocol buffers other specific resources

ECI – July 2005 36

Load Balancing Over Network

Balancing device monitors the status of all nodes dictates where to direct the next job a single unit or a group in tree

hierarchy use one or more algorithms or methods static or dynamic setting

Decide which node gets the next incoming connection request

ECI – July 2005 37

Factors in Network Balancing

Wire-speed processing Node operating system limitation

Packet processing, no. of connections, interrupts

Balancing device limitations Tables, memory

Session based traffic, non-session UDP Application dependencies (affinity)

ECI – July 2005 38

Simple Balancing Methods

Weighting Assign weights to the nodes of different capacities

Randomization Works good in identical node environment

Round-Robin Commonly used by itself in DNS (address caching) Effective where all the nodes in the cluster are

identical in capacity and performance Hashing

Packets from the same source address will always get assigned to the same server

ECI – July 2005 39

Simple Balancing Methods

Least Connections Assigns to the node which currently has the least

connections ( ≠ least load ) Minimum Misses

Assign to the nodes which has processed the least number of incoming request in its history

Fastest Response Assigns to the node with the fastest response Requires active monitoring of the individual nodes

Sending ICMP packets with the ‘ping’ command Proprietary mechanism based upon UDP packets

ECI – July 2005 40

Advanced Balancing Methods

Primary optimization vectors Node traffic – predict volume of traffic Network traffic – monitor node state Node-load based balancing – (which

load ?) DNS load balancing - simple Topology-based – reduce latency Application-specific performance Policy based optimization

Application ,bandwidth, admin, security

ECI – July 2005 41

Common Errors

There are four common errors Overflow Underflow Routing errors Induced network errors

May destabilize efficient network clustering

ECI – July 2005 42

Information Dissemination

Central Decentralized Load incurred on system

Processing load Network load

Partial knowledge – gossip algorithms Example: finding average load

Documents

Resource Management and Balancing