View
217
Download
1
Category
Tags:
Preview:
Citation preview
SProj 3SProj 3Libra: An Economy-Driven Cluster SchedulerLibra: An Economy-Driven Cluster Scheduler
Jahanzeb SherwaniJahanzeb SherwaniNosheen AliNosheen Ali
Nausheen LotiaNausheen LotiaZahra HayatZahra Hayat
Project Advisor/Client: Rajkumar BuyyaProject Advisor/Client: Rajkumar BuyyaFaculty Advisor: Dr. Arif ZamanFaculty Advisor: Dr. Arif Zaman
Problem StatementProblem Statement
Implementing a computational-Implementing a computational-
economy based user-centric economy based user-centric
scheduler for clustersscheduler for clusters
A collection of workstations interconnected via a A collection of workstations interconnected via a network technology, in order to take advantage network technology, in order to take advantage of combined computational power and resourcesof combined computational power and resources
An integrated collection of resources that can An integrated collection of resources that can provide a single system image spanning all its provide a single system image spanning all its nodes: a virtual supercomputernodes: a virtual supercomputer
Used for computation-intensive applications Used for computation-intensive applications such as AI expert systems, nuclear simulations, such as AI expert systems, nuclear simulations, and scientific calculationsand scientific calculations
What is a cluster?What is a cluster?
Why clusters?Why clusters?
Cost-effectiveness: low cost-performance ratio Cost-effectiveness: low cost-performance ratio compared to a specialized supercomputercompared to a specialized supercomputer
Increase in workstation performanceIncrease in workstation performance Increase in network bandwidthIncrease in network bandwidth Decrease in network latencyDecrease in network latency Scalability higher than that of a specialized Scalability higher than that of a specialized
supercomputersupercomputer Easier to integrate into an existing network than Easier to integrate into an existing network than
specialized supercomputersspecialized supercomputers
Computational EconomyComputational Economy
Traditional system-centric performance Traditional system-centric performance metricsmetrics CPU ThroughputCPU Throughput Mean Response TimeMean Response Time Shortest Job FirstShortest Job First
Computational economy is the inclusion Computational economy is the inclusion of user-specified quality of service of user-specified quality of service parameters with jobs so that resource parameters with jobs so that resource management is user-centric rather than management is user-centric rather than system-centricsystem-centric
Computational Economy (cont’d)Computational Economy (cont’d)
Project focus: to implement a scheduler Project focus: to implement a scheduler that aims to maximize user utilitythat aims to maximize user utility
Job parameters most relevant to user-Job parameters most relevant to user-centric schedulingcentric scheduling Budget allocated to job by userBudget allocated to job by user Deadline specified by userDeadline specified by user
Computational Economy for GridsComputational Economy for Grids
What is a grid?What is a grid? An infrastructure that couples resources such as computers An infrastructure that couples resources such as computers
(workstations or clusters ), software (for special purpose (workstations or clusters ), software (for special purpose applications) and devices (printers, scanners) across the applications) and devices (printers, scanners) across the Internet and presents them as a unified integrated single Internet and presents them as a unified integrated single resource that can be widely usedresource that can be widely used
How a grid differs from a clusterHow a grid differs from a cluster Wide geographical areaWide geographical area Non-dedicated resourcesNon-dedicated resources No centralized resource managementNo centralized resource management
Computational Economy for GridsComputational Economy for Grids
Management of resources and scheduling Management of resources and scheduling computations in a grid environment is complex as the computations in a grid environment is complex as the resources areresources are geographically distributedgeographically distributed heterogeneous in natureheterogeneous in nature owned by different individuals or organizations owned by different individuals or organizations have different access and cost modelshave different access and cost models resource discovery requiredresource discovery required security issuessecurity issues
Computational economy has been implemented for grids: the Computational economy has been implemented for grids: the Nimrod/G resource broker is a global resource management and Nimrod/G resource broker is a global resource management and scheduling system that supports deadline and economy-based scheduling system that supports deadline and economy-based computations in grid-computing environmentscomputations in grid-computing environments
Computational Economy for ClustersComputational Economy for Clusters
Market-based Proportional Resource Sharing for Market-based Proportional Resource Sharing for Clusters: Brent Chun and David E. Culler, University Clusters: Brent Chun and David E. Culler, University of California at Berkeley, Computer Science Divisionof California at Berkeley, Computer Science Division a market-based approach based on the notion of a a market-based approach based on the notion of a
computational economy which optimizes for computational economy which optimizes for user valuuser value. It e. It describes an architecture for market-based cluster resource describes an architecture for market-based cluster resource management based on the idea of proportional resource management based on the idea of proportional resource sharing of basic computing resources. Cluster nodes act as sharing of basic computing resources. Cluster nodes act as independent sellers of computing resources while user independent sellers of computing resources while user applications act as buyers who purchase resources . Users applications act as buyers who purchase resources . Users are allocated credits/tickets-the more tickets they have, the are allocated credits/tickets-the more tickets they have, the greater their CPU share. Ticket allocation is on the basis of greater their CPU share. Ticket allocation is on the basis of the amount the user is willing to pay: his valuation of the jobthe amount the user is willing to pay: his valuation of the job
Deadline not incorporatedDeadline not incorporated
Cluster ArchitectureCluster Architecture
Cluster Management SoftwareCluster Management Software
Cluster Management Software is designed to administer Cluster Management Software is designed to administer and manage application jobs submitted to workstation and manage application jobs submitted to workstation clusters.clusters.
Creates a Single System ImageCreates a Single System Image When a collection of interconnected computers appear to be a When a collection of interconnected computers appear to be a
unified resource, we say it possesses a Single System Image unified resource, we say it possesses a Single System Image The benefit of a Single System Image is that the exact location The benefit of a Single System Image is that the exact location
of the execution of a process is entirely concealed from the user. of the execution of a process is entirely concealed from the user. The user is offered the illusion of a single powerful computerThe user is offered the illusion of a single powerful computer
Maintains centralized information about cluster status Maintains centralized information about cluster status and resourcesand resources
Cluster Management SoftwareCluster Management Software
Commercial and Open-source Cluster Management Commercial and Open-source Cluster Management SoftwareSoftware
Open-source Cluster Management SoftwareOpen-source Cluster Management Software DQS (Distributed Queuing System )DQS (Distributed Queuing System ) CONDORCONDOR GNQS (Generalized Network Queuing System)GNQS (Generalized Network Queuing System) MOSIXMOSIX REXEC (Remote Execution) REXEC (Remote Execution) SGE (Sun Grid Engine)SGE (Sun Grid Engine) PBS (Portable Batch System)PBS (Portable Batch System)
Cluster Management SoftwareCluster Management Software
Why SGE was rejectedWhy SGE was rejected
lack of online supportlack of online support lack of stabilitylack of stability
Final choice of CMS: PBS(Portable Batch System )Final choice of CMS: PBS(Portable Batch System )
Pricing the Cluster ResourcesPricing the Cluster Resources
Cost= a (Job Execution Time) + b (Job Execution Cost= a (Job Execution Time) + b (Job Execution Time / Deadline)Time / Deadline)
Cost of using the cluster depends on job length and Cost of using the cluster depends on job length and job deadline: the longer the user is prepared to wait job deadline: the longer the user is prepared to wait for the results, the lower his costfor the results, the lower his cost
Cost formula forces user to reveal his true deadlineCost formula forces user to reveal his true deadline
Scheduling AlgorithmScheduling Algorithm
How to meet budget and deadline How to meet budget and deadline constraints?constraints?
Ensuring low run-time for the algorithmEnsuring low run-time for the algorithm Greedy AlgorithmGreedy Algorithm
Complex solutions unfeasibleComplex solutions unfeasible Test run of algorithm:Test run of algorithm:
5 jobs, arriving at time t=0, 5, 7, 9, 9, on a 3 5 jobs, arriving at time t=0, 5, 7, 9, 9, on a 3 node clusternode cluster
LIBRA with PBSLIBRA with PBS
Portable Batch System (PBS) as the Portable Batch System (PBS) as the Cluster Management Software (CMS)Cluster Management Software (CMS) Robust, portable, effective, extensible batch Robust, portable, effective, extensible batch
job queuing and resource management job queuing and resource management systemsystem
Supports different schedulersSupports different schedulers Job accountingJob accounting Technical SupportTechnical Support
Setting up the PBS ClusterSetting up the PBS Cluster
Installation of Linux with WindowsInstallation of Linux with Windows Installation of SGE as well as PBSInstallation of SGE as well as PBS
Setting up a Network File SystemSetting up a Network File System Configuring GridSim in JavaConfiguring GridSim in Java Configuring PBSWebConfiguring PBSWeb
Setting up the Apache WebServer Setting up the Apache WebServer PHP scripting for ApachePHP scripting for Apache Setting up PostgreSQLSetting up PostgreSQL Setting up SSHSetting up SSH
PBS OverviewPBS Overview
Main components of PBSMain components of PBS Job Server pbs_serverJob Server pbs_server Job Scheduler pbs_sched Job Scheduler pbs_sched Job Executor & Resource Monitor pbs_mom Job Executor & Resource Monitor pbs_mom
The server accepts commands and The server accepts commands and communicates with the daemonscommunicates with the daemons qsub - submit a jobqsub - submit a job qstat - view queue and job statusqstat - view queue and job status qalter - change job’s attributesqalter - change job’s attributes qdel - delete a jobqdel - delete a job
Xpbs – GUI for PBSXpbs – GUI for PBS
Xpbs --- GUI for PBSXpbs --- GUI for PBS
Job Scheduling in PBSJob Scheduling in PBS
The Libra SchedulerThe Libra Scheduler
Default FIFO Scheduler in PBSDefault FIFO Scheduler in PBS FIFO - sort jobs by job queuing time running FIFO - sort jobs by job queuing time running
the earliest job firstthe earliest job first Fair share: sort & schedule jobs based on Fair share: sort & schedule jobs based on
past usage of the machine by the job ownerspast usage of the machine by the job owners Round-robin - pick a job from each queueRound-robin - pick a job from each queue By key - sort jobs by a set of keys: By key - sort jobs by a set of keys:
shortest_job_first, smallest_memory_first shortest_job_first, smallest_memory_first
The Libra SchedulerThe Libra Scheduler
Job Input ControllerJob Input Controller Adding parameters at job submission timeAdding parameters at job submission time
deadlinedeadline budget budget executionTimeexecutionTime
Defining new attributes of jobDefining new attributes of job Job Acceptance and Assignment ControllerJob Acceptance and Assignment Controller
Budget checked through cost functionBudget checked through cost function Admission control through deadline schedulingAdmission control through deadline scheduling Execution host with the minimum load and ability to Execution host with the minimum load and ability to
finish job on time selectedfinish job on time selected Equal Share instead of Minimum ShareEqual Share instead of Minimum Share
The Libra SchedulerThe Libra Scheduler
Job Execution ControllerJob Execution Controller Job run on the best node according to Job run on the best node according to
algorithmalgorithm Cluster and node status updatedCluster and node status updated
runTimerunTime cpuLoadcpuLoad
Job Querying ControllerJob Querying Controller Server, Scheduler, Exec Host, and Server, Scheduler, Exec Host, and
Accounting LogsAccounting Logs
PBS-Libra Web --- Front-end for PBS-Libra Web --- Front-end for the Libra Enginethe Libra Engine
PBS-Libra WebPBS-Libra Web
PBS-Libra WebPBS-Libra Web
PBS-PBS-Libra Libra WebWeb
PBS-PBS-Libra Libra WebWeb
PBS-Libra WebPBS-Libra Web
PBS-PBS-Libra Libra WebWeb
PBS-Libra WebPBS-Libra Web
PBS-Libra WebPBS-Libra Web
SimulationsSimulations
Goal:Goal: Measure the performance of Libra SchedulerMeasure the performance of Libra Scheduler
Performance = ?Performance = ?
Maximize user satisfactionMaximize user satisfaction
SimulationsSimulations
Simulation SoftwareSimulation Software Alter GridSim (grid resource management Alter GridSim (grid resource management
simulation)simulation)
GridSim Class DiagramGridSim Class Diagram
SimulationsSimulations
MethodologyMethodology
WorkloadWorkload 120 jobs with deadlines and budgets120 jobs with deadlines and budgets Job lengths: 1000 to 10000Job lengths: 1000 to 10000
ResourcesResources 10 node, single processor (MIPS rating: 100) 10 node, single processor (MIPS rating: 100)
homogenous clusterhomogenous cluster
SimulationsSimulations
AssumptionsAssumptions Strict deadlinesStrict deadlines Ignores processing overhead due to Ignores processing overhead due to
scheduler and clock interruptscheduler and clock interrupt
Scheduler simulated as a functionScheduler simulated as a function Input: job size, deadline, budgetInput: job size, deadline, budget Output: accept/reject, node #, share allocatedOutput: accept/reject, node #, share allocated
SimulationsSimulations
Compared:Compared: Proportional ShareProportional Share FIFOFIFO
Experiments:Experiments: 120 jobs, 10 nodes120 jobs, 10 nodes Increasing workload to 150 and 200Increasing workload to 150 and 200 Increasing cluster size to 20Increasing cluster size to 20
Simulation ResultsSimulation Results
120 jobs, 20 did not meet budget120 jobs, 20 did not meet budget
100 Jobs, 10 Nodes100 Jobs, 10 NodesFIFO: 23 rejected - Proportional Share: 14 rejectedFIFO: 23 rejected - Proportional Share: 14 rejected
0
200
400
600
800
1000
1200
1 4 7
10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97
100
experiments
time
0
200
400
600
800
1000
1200
1 4 7
10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97
100
experiments
time
Simulation ResultsSimulation Results
Increase workload to 200 jobs on the same 10 Increase workload to 200 jobs on the same 10 node clusternode cluster
200 Jobs, 10 Nodes200 Jobs, 10 NodesFIFO: 105 rejected - Proportional Share: 93 rejectedFIFO: 105 rejected - Proportional Share: 93 rejected
0
200
400
600
800
1000
1200
1 7
13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
103
109
115
121
127
133
139
145
151
157
163
169
175
181
187
193
199
experiments
time
0
200
400
600
800
1000
1200
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103
109
115
121
127
133
139
145
151
157
163
169
175
181
187
193
199
experiments
time
Simulation ResultsSimulation Results
Scale the cluster up to 20 nodesScale the cluster up to 20 nodes
200 Jobs, 20 Nodes200 Jobs, 20 NodesFIFO: 35 rejected - Proportional Share: 23 rejectedFIFO: 35 rejected - Proportional Share: 23 rejected
0
200
400
600
800
1000
1200
1 7
13
19
25
31
37
43
49
55
61
67
73
79
85
91
97
10
3
10
9
11
5
12
1
12
7
13
3
13
9
14
5
15
1
15
7
16
3
16
9
17
5
18
1
18
7
19
3
19
9
experiments
time
0
200
400
600
800
1000
1200
1 7
13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
103
109
115
121
127
133
139
145
151
157
163
169
175
181
187
193
199
experiments
time
Simulation ResultsSimulation Results
Load on Node 4
020
4060
80100
120
11.09 23.18 28.16 32.18 36.27 38.24 44.27 56.28 73.36 97.5
time in s
CP
U U
tiliz
atio
n
Simulation ResultsSimulation Results
Load on Node 7
0
20
40
60
80
100
120
10.0
8
23.1
4
29.2
36.1
9
37.2
4
55.2
8
64.3
2
71.3
5
73.3
9
83.4
4
101.
5
170.
8
time in s
CP
U U
tiliz
atio
n
Simulation ResultsSimulation Results
Load on Node 9
0
50
100
150
11.1 15.1 19.1 29.2 66.3 67.3 67.3 82.3
time in s
CP
U U
tiliz
atio
n
Simulation ResultsSimulation Results
Load on Node 6
020406080
100120
6.064562 37.19915 152.6984 154.7456
time in s
CP
U U
tiliz
atio
n
Conclusion & Future WorkConclusion & Future Work
Succesfully implemented a Linux-based cluster Succesfully implemented a Linux-based cluster that schedules jobs using PBS with our that schedules jobs using PBS with our economy-driven Libra scheduler, and PBS-Libra economy-driven Libra scheduler, and PBS-Libra Web as the front end.Web as the front end.
Successfully tested our scheduling policySuccessfully tested our scheduling policy Proportional Share delivers more value to usersProportional Share delivers more value to users Exploring other pricing mechanisms Exploring other pricing mechanisms Expanding the cluster with more nodes and with Expanding the cluster with more nodes and with
support for parallel jobssupport for parallel jobs
Recommended