56
CUG 2006 This Presentation May Contain Preliminary Information That Is Subject To Change ALPS Application Level Placement Scheduler Michael Karo [email protected]

ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

CUG 2006

This Presentation May Contain Preliminary Information That Is Subject To Change

ALPSApplication Level Placement Scheduler

Michael [email protected]

Page 2: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 2

ALPS Design Goals (1)• Scalability• Thousands of OS instances and applications

• Efficiency• Maximize resource utilization• Minimize overhead

• Predictibility• Consistent performance of applications• Guaranteed resource availability

• Adaptability• Mask architecture specific details• Exploit architecture specific capabilities

Page 3: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 3

ALPS Design Goals (2)• Extensibility• Adaptable to future architectures• Simplified integration with workload management

systems• Maintainability• Reduce complexity• Separate policy and mechanism

• Availability• Recover quickly with minimal impact• Minimize single points of failure

Page 4: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 4

ALPS Operating Environment• Hardware• Multiple node types• Multiple processor types• Processor and memory variations• Distributed shared memory

• Software• Multiple parallel programming paradigms• Multiple OS instances• Supported on Compute Node Linux only• Multiple workload managers• Administration and configuration tools• Resource and event monitoring

Page 5: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 5

ALPS Core Services• Launch and cleanup applications• Binary executable distribution• Monitor and report application status• Application ID assignment• Resource reservation management• Signal propagation• Standard input, output, and error management• Resource availability monitoring• Provide external access to application processes

for debugging and performance analysis

Page 6: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 6

ALPS Features: Gang Scheduling• ALPS manages context switching• Consistent across entire application• Configurable interval

• Allows short and long running jobs to coexist• Supports configurable CPU oversubscription factor• No support for memory oversubscription

Page 7: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 7

ALPS Features: Reservations• Maintain resource availability for batch jobs• Support interactive users• Reservation states:• FILED - Request registered• CONFIRMED - Resources locked• CLAIMED - Resources in use

Page 8: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 8

ALPS Features: BASIL• Batch & Application Scheduler Interface Layer• Extensible XML-RPC implementation• Open interface specification• No proprietary APIs or libraries• Third party vendors manage integration• Three primary functions:• Inventory• Reservation creation• Reservation cancellation

• BASIL programmer’s guide

Page 9: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 9

ALPS Features: Fanout Tree

1082401699054681341315338254369585851541057273732173

331795321111113216842

Tree Radix

Tree

Dep

th

• Provides scalability• Supports parallel operation• Simulated broadcast on unicast network• Configurable radix:

Page 10: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 10

ALPS Components• Clients• aprun – Application submission• apstat – Application status• apkill – Signal delivery• apbasil – Workload manager interface

• Servers• apsys – Client interaction on login nodes• apinit – Process management on compute nodes• apsched – Reservations and placement• apbridge – System data collection• apwatch – Event monitoring

Page 11: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 11

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 12: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 12

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 13: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 13

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 14: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 14

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 15: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 15

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 16: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 16

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 17: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 17

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 18: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 18

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 19: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 19

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 20: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 20

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 21: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 21

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 22: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 22

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 23: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 23

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 24: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 24

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 25: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 25

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 26: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 26

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 27: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 27

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 28: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 28

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 29: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 29

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 30: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 30

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 31: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 31

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 32: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 32

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 33: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 33

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 34: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 34

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 35: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 35

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 36: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 36

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 37: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 37

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 38: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 38

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 39: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 39

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 40: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 40

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 41: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 41

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 42: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 42

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 43: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 43

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 44: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 44

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 45: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 45

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 46: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 46

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 47: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 47

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 48: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 48

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 49: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 49

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 50: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 50

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 51: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 51

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 52: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 52

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 53: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 53

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 54: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 54

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 55: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 55

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

Page 56: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 56

Q & A• Questions?• Thanks to the ALPS team:• Richard Lagerstrom - development• Marlys Kohnke - development• Carl Albing – development• Bob Gross - testing• Jan Gustafson - our current manager• Wayne Margotto - our former manager

• Thank You!