87
1 © 2016 Adaptive Computing Enterprises, Inc. Broadening Moab/TORQUE for Expanding User Needs Gary D. Brown HPC Product Manager CUG 2016

Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

1 © 2016 Adaptive Computing Enterprises, Inc.

Broadening Moab/TORQUE for Expanding User Needs

Gary D. Brown

HPC Product Manager

CUG 2016

Page 2: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

2 © 2016 Adaptive Computing Enterprises, Inc.

Agenda

▪ DataWarp

▪ Intel MIC “KNL”

▪ Viewpoint Web Portal

▪ User Portal

▪ Administrator Portal

▪ Remote Visualization

▪ Nitro HTC

Page 3: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

3 © 2016 Adaptive Computing Enterprises, Inc.

Agenda

▪ DataWarp (fast I/O)

▪ Intel MIC “KNL”

▪ Viewpoint Web Portal

▪ User Portal

▪ Administrator Portal

▪ Remote Visualization

▪ Nitro HTC

Page 4: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

4 © 2016 Adaptive Computing Enterprises, Inc.

Use Cases

▪ Stage job data with fast SSD-based storage

▪ Input data before job starts

▪ Output data after job ends

▪ Fast persistent storage for job workflows

▪ Fast job checkpoint/restore

Page 5: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

5 © 2016 Adaptive Computing Enterprises, Inc.

Cray DataWarp

▪ SSD-based storage system independent of parallel file system (PFS)

▪ Hosted on DataWarp (DW) service nodes ▪ Contain actual SSDs

▪ Nodes connect to Aries high-speed network

▪ DataWarp Service (DWS) ▪ Organizes DW storage for jobs and persistent storage

▪ Manages SSD storage across DW service nodes

Page 6: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

6 © 2016 Adaptive Computing Enterprises, Inc.

Cray DataWarp Physical Architecture

▪ DW storage

▪ 2-hop roundtrip ▪ Aries HSN

▪ Electronic access times

▪ PFS storage

▪ 4-hop roundtrip ▪ 2-hop Aries HSN

▪ 2-hop InfiniBand

▪ Electro-mechanical access times

▪ DWS not shown

Page 7: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

7 © 2016 Adaptive Computing Enterprises, Inc.

Cray DataWarp Support

▪ DataWarp Storage Types

▪ Job-based

▪ Persistent

▪ Script-based integration with Moab/TORQUE

▪ DW-enabled jobs processed by Moab “job submit filter”

▪ Moab automatically creates DW job workflow

Page 8: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

8 © 2016 Adaptive Computing Enterprises, Inc.

DataWarp Integration Project

▪ All DW storage scheduled by Moab

▪ Job DW storage by job scheduling

▪ Persistent DW storage by reservation scheduling

▪ Out-of-the-box functionality

▪ Some site-specific information configuration required

Page 9: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

9 © 2016 Adaptive Computing Enterprises, Inc.

Moab 3-Job DataWarp Model

▪ Follows Moab Data-staging Model

▪ Uses Moab “system” jobs to create/destroy DW storage and stage data in/out

▪ Automatically creates 3-job workflow

▪ DW creation/input data-staging system job

▪ User job

▪ Output data-staging/DW destruction system job

Page 10: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

10 © 2016 Adaptive Computing Enterprises, Inc.

DataWarp-enabled Job Workflow

▪ User submits user job

▪ Job Submit Filter detects #DW directives

▪ Job Submit Filter creates 3-job workflow

1. DW Creation / Input Data-staging System job

2. User Job

3. Output Data-staging / DW Destruction System Job

Page 11: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

11 © 2016 Adaptive Computing Enterprises, Inc.

Automatic Job/DW Workflow Generation

▪ What if user submits multiple-job workflow

▪ And some user jobs in job workflow use DataWarp storage?

▪ Job Submit Filter creates proper DW job dependencies, but not for entire user job workflow

▪ Example

▪ Data validation User Job A

▪ Compute (uses DataWarp) User Job B (DW)

▪ Visualization User Job C

Page 12: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

12 © 2016 Adaptive Computing Enterprises, Inc.

Automatic Job/DW Workflow Generation

▪ User Job Dependencies ▪ Data validation User Job A

▪ Compute User Job B (DataWarp-enabled job)

▪ Visualization User Job C

▪ Job A (ok) Job B (ok) Job C

▪ Job workflow generation script must work with Job Submit Filter to create proper job dependencies (solid lines below)

Page 13: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

13 © 2016 Adaptive Computing Enterprises, Inc.

Multi-Job Workflow Generation Script

# Submit data validation job A

jobAid=`msub -l nodes=1,walltime=10:00 myCheck.sh`

# Submit DataWarp-enabled compute job B

# (3 jobids returned (dwin, user, dwout); put into array)

jobBids=(`msub -l nodes=16,walltime=01:30:00,\

depend=afterok:${jobAid} ...\

mySim.sh`)

# Submit visualization job C

jobCid=`msub -l nodes=1,walltime=30:00,\

depend=afterok:${jobBids[1]}:${jobBids[2]} \

myViz.sh`

Page 14: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

14 © 2016 Adaptive Computing Enterprises, Inc.

Persistent DataWarp Storage

▪ Moab schedules and tracks all DW storage capacity

▪ Moab persistent DW storage uses DW storage reservation

▪ Persistent DataWarp storage management ▪ Created by reservation start trigger

▪ Destroyed by reservation end trigger

▪ Jobs get mount point

Page 15: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

15 © 2016 Adaptive Computing Enterprises, Inc.

Benefits

▪ Moab schedules and manages both job and persistent DataWarp storage

▪ Users have good user experience

▪ No race conditions cause cancelled jobs!

▪ Jobs have fast I/O

▪ Jobs execute in less time

Page 16: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

16 © 2016 Adaptive Computing Enterprises, Inc.

Agenda

▪ DataWarp

▪ Intel MIC “KNL” (re-provisioning)

▪ Viewpoint Web Portal

▪ User Portal

▪ Administrator Portal

▪ Remote Visualization

▪ Nitro HTC

Page 17: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

17 © 2016 Adaptive Computing Enterprises, Inc.

Use Case

▪ Intel introduces 2nd-gen Xeon Phi processor

▪ KNL supports

▪ 3 different NUMA configurations with 5 modes

▪ 3 different MCDRAM configurations with 4 modes

▪ 20 possible NUMA/MCDRAM configurations

▪ Some applications execute fastest with specific NUMA/MCDRAM configuration

Page 18: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

18 © 2016 Adaptive Computing Enterprises, Inc.

Intel MIC “Knights Landing” (KNL)

▪ 5 NUMA configurations for cores and DDR4 and MCDRAM memories

▪ No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

▪ a2a, hemi, quad, snc2, snc4

▪ 4 MCDRAM “mode” configurations

▪ Cache, flat, 50/50 hybrid, 25/75 hybrid

▪ cache, flat, equal, split

Page 19: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

19 © 2016 Adaptive Computing Enterprises, Inc.

Moab Cray KNL Support

▪ 20 pre-defined combinations

▪ Select combination as “OS” at job submission

msub nodes=20,os=CLE_snc4_flat myJobScript

▪ Moab re-provisions Cray compute nodes when necessary

▪ Uses Moab’s built-in re-provisioning capability

▪ Quickly re-provisions Cray KNL compute nodes for jobs requesting specific NUMA/MCDRAM modes

Page 20: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

20 © 2016 Adaptive Computing Enterprises, Inc.

Cray KNL Re-provisioning Process

▪ User submits job requiring specific KNL NUMA / MCDRAM

configuration

msub –l nodes=20,os=CLE_snc4_flat myJob

▪ Moab schedules KNL job to nodes

▪ Moab runs NodeModifyURL script to modify compute node

configurations and reboot them

▪ Script executes Cray capmc to set KNL NUMA node and

MCDRAM configurations and to reboot nodes

▪ Cray’s capmc modifies nodes’ BIOS settings and reboots nodes

▪ Cray reports new NUMA node / MCDRAM configurations

▪ pbs_reporter maps NUMA node/MCDRAM configuration to OS

name and reports OS name to pbs_server

▪ TORQUE pbs_server passes new OS name to Moab, which starts

the KNL job

Moab Scheduler

Node Modify Script

Modify Node URL

capmc

Modify NUMA / MCDRAM

Modify BIOS config and reboot

TORQUE pbs_server

TORQUE pbs_reporter

Node Update Report

ALPS

System Query

Report

NUMA / MCDRAM

config

New OS Name

Page 21: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

21 © 2016 Adaptive Computing Enterprises, Inc.

Benefits

▪ Users can run applications with KNL NUMA and MCDRAM configuration for optimal job execution.

▪ Easy to request specific KNL configuration with job submission

Page 22: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

22 © 2016 Adaptive Computing Enterprises, Inc.

Agenda

▪ DataWarp

▪ Intel MIC “KNL”

▪ Viewpoint Web Portal (non-technical users)

▪ User Portal

▪ Administrator Portal

▪ Remote Visualization

▪ Nitro HTC

Page 23: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

23 © 2016 Adaptive Computing Enterprises, Inc.

Use Case

▪ Many HPC sites asked to expand user base

▪ Non-technical users

▪ “What is a command line?”

▪ Non-traditional users

▪ Non-research Engineers and Scientists

▪ Humanities

▪ Local or regional industries with no HPC experience

Page 24: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

24 © 2016 Adaptive Computing Enterprises, Inc.

Agenda

▪ DataWarp

▪ Intel MIC “KNL”

▪ Viewpoint Web Portal (non-technical users)

▪ User Portal (job submission)

▪ Administrator Portal

▪ Remote Visualization

▪ Nitro HTC

Page 25: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

25 © 2016 Adaptive Computing Enterprises, Inc.

Use Case

▪ Someone said I need to use “xyz” program

▪ I have data

▪ Someone gave me my data

▪ Someone mentored/trained me and I created my own data

▪ I taught myself and created my own data

▪ How do I use “xyz” with my data?

Page 26: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

26 © 2016 Adaptive Computing Enterprises, Inc.

Log in as user to Viewpoint Web Portal

Page 27: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

27 © 2016 Adaptive Computing Enterprises, Inc.

Click Templates on your User Dashboard

Page 28: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

28 © 2016 Adaptive Computing Enterprises, Inc.

Use “xyz” application template

Page 29: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

29 © 2016 Adaptive Computing Enterprises, Inc.

Fill in application template

Page 30: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

30 © 2016 Adaptive Computing Enterprises, Inc.

Choose a time limit

Page 31: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

31 © 2016 Adaptive Computing Enterprises, Inc.

Tell it where your data file is

Page 32: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

32 © 2016 Adaptive Computing Enterprises, Inc.

Submit your “xyz” job

Page 33: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

33 © 2016 Adaptive Computing Enterprises, Inc.

Click Workload to check on your “xyz” job

Page 34: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

34 © 2016 Adaptive Computing Enterprises, Inc.

Check out your “xyz” job’s details

Page 35: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

35 © 2016 Adaptive Computing Enterprises, Inc.

Look at your job’s statistics

Page 36: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

36 © 2016 Adaptive Computing Enterprises, Inc.

Download your “xyz” job’s results

Page 37: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

37 © 2016 Adaptive Computing Enterprises, Inc.

Benefits

▪ Allows non-technical users to submit jobs using pre-defined application templates

▪ Application templates contain ▪ Job script for running application

▪ Web form design with pre-defined and/or custom widgets and their variables

▪ Job submission replaces variables within job script with form’s widget values

Page 38: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

38 © 2016 Adaptive Computing Enterprises, Inc.

Agenda

▪ DataWarp

▪ Intel MIC “KNL”

▪ Viewpoint Web Portal (non-technical users)

▪ User Portal

▪ Administrator Portal (application templates)

▪ Remote Visualization

▪ Nitro HTC

Page 39: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

39 © 2016 Adaptive Computing Enterprises, Inc.

Use Case

▪ Administrator needs to support many non-technical users and their applications

▪ Define application templates using editor

▪ Define and customize application template form using pre-defined “widget” types

▪ Define widgets’ variable name, default value, and other characteristics to build form

▪ Create job script

Page 40: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

40 © 2016 Adaptive Computing Enterprises, Inc.

Log in as admin to Viewpoint Web Portal

Page 41: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

41 © 2016 Adaptive Computing Enterprises, Inc.

Click on Templates in Admin Dashboard

Page 42: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

42 © 2016 Adaptive Computing Enterprises, Inc.

Choose an application template

Page 43: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

43 © 2016 Adaptive Computing Enterprises, Inc.

View application template history

Page 44: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

44 © 2016 Adaptive Computing Enterprises, Inc.

Edit and design application template

Page 45: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

45 © 2016 Adaptive Computing Enterprises, Inc.

Edit application template (lower part)

Page 46: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

46 © 2016 Adaptive Computing Enterprises, Inc.

Choose Basic Job Settings widgets

Page 47: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

47 © 2016 Adaptive Computing Enterprises, Inc.

Choose Resources widgets

Page 48: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

48 © 2016 Adaptive Computing Enterprises, Inc.

Choose Node Policies Settings widgets

Page 49: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

49 © 2016 Adaptive Computing Enterprises, Inc.

Design custom “xyz” Settings widgets

Page 50: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

50 © 2016 Adaptive Computing Enterprises, Inc.

Choose custom widget from widget type list

Page 51: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

51 © 2016 Adaptive Computing Enterprises, Inc.

Create/import job script with Script Builder

Page 52: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

52 © 2016 Adaptive Computing Enterprises, Inc.

Add widget variables to job script

Page 53: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

53 © 2016 Adaptive Computing Enterprises, Inc.

Benefits

▪ Application templates let an administrator create application templates for non-technical users.

▪ Non-technical users have a simple way to submit jobs for specific applications.

Page 54: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

54 © 2016 Adaptive Computing Enterprises, Inc.

Agenda

▪ DataWarp

▪ Intel MIC “KNL”

▪ Viewpoint Web Portal (non-technical users)

▪ User Portal

▪ Administrator Portal

▪ Remote Visualization (graphical interfaces)

▪ Nitro HTC

Page 55: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

55 © 2016 Adaptive Computing Enterprises, Inc.

Use Cases

▪ Users need to visualize their data to gain insights from it.

▪ Desktops or workstations with decent visualization hardware can be expensive.

▪ Data can be too large to move to another location.

▪ Non-technical users need a way to easily set up a remote visualization session.

Page 56: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

56 © 2016 Adaptive Computing Enterprises, Inc.

Choose Remote Visualization Templates

Page 57: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

57 © 2016 Adaptive Computing Enterprises, Inc.

Choose Remote Desktop App Template

Page 58: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

58 © 2016 Adaptive Computing Enterprises, Inc.

Choose Graphical Desktop Type

Page 59: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

59 © 2016 Adaptive Computing Enterprises, Inc.

Submit Remote Desktop Job

Page 60: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

60 © 2016 Adaptive Computing Enterprises, Inc.

Remote Desktop Job’s Compute Node

Page 61: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

61 © 2016 Adaptive Computing Enterprises, Inc.

Remote Desktop Session Screen

Page 62: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

62 © 2016 Adaptive Computing Enterprises, Inc.

Remote Visualization Session Controls

Page 63: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

63 © 2016 Adaptive Computing Enterprises, Inc.

Resized Remote Viz Session Desktop

Page 64: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

64 © 2016 Adaptive Computing Enterprises, Inc.

Job Detail with Remote Viz Session

Page 65: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

65 © 2016 Adaptive Computing Enterprises, Inc.

Canceling the Remote Desktop Session

Page 66: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

66 © 2016 Adaptive Computing Enterprises, Inc.

Choose ParaView Application Template

Page 67: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

67 © 2016 Adaptive Computing Enterprises, Inc.

ParaView Input Data file selection

Page 68: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

68 © 2016 Adaptive Computing Enterprises, Inc.

Submitting ParaView Remote Viz Job

Page 69: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

69 © 2016 Adaptive Computing Enterprises, Inc.

ParaView Remote Viz Session

Page 70: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

70 © 2016 Adaptive Computing Enterprises, Inc.

Adjusting Session Image Quality

Page 71: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

71 © 2016 Adaptive Computing Enterprises, Inc.

Remote Desktop / ParaView Session

Page 72: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

72 © 2016 Adaptive Computing Enterprises, Inc.

Benefits

▪ Technical and non-technical users can easily create remote visualization sessions with their applications and data.

Page 73: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

73 © 2016 Adaptive Computing Enterprises, Inc.

Agenda

▪ DataWarp

▪ Intel MIC “KNL”

▪ Viewpoint Web Portal (non-technical users)

▪ User Portal

▪ Administrator Portal

▪ Remote Visualization

▪ Nitro (HTC job scheduling)

Page 74: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

74 © 2016 Adaptive Computing Enterprises, Inc.

Use Cases

▪ A user can have thousands or millions of single-core to single-node jobs. ▪ Parameter sweeps

▪ Monte Carlo simulations

▪ Regression tests

▪ Other

▪ Large job queues

▪ Long scheduling cycles

Page 75: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

75 © 2016 Adaptive Computing Enterprises, Inc.

Nitro HTC Scheduler Application

▪ Scheduler-agnostic HTC scheduler application

▪ TORQUE

▪ Slurm

▪ Platform LSF

▪ Cray ALPS

▪ Submitted as HPC job to scheduler

Page 76: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

76 © 2016 Adaptive Computing Enterprises, Inc.

Nitro Job

▪ User converts HTC jobs to Nitro “tasks” ▪ Tasks defined in Nitro Task file

▪ Submit 1 Nitro job with Task file ▪ Schedule 1 parallel HPC job

▪ Execute millions of HTC tasks

▪ What about HTC for non-technical users?

Page 77: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

77 © 2016 Adaptive Computing Enterprises, Inc.

Choose Nitro Application Template

Page 78: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

78 © 2016 Adaptive Computing Enterprises, Inc.

Nitro Task File selection

Page 79: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

79 © 2016 Adaptive Computing Enterprises, Inc.

Nitro Task file upload and job submission

Page 80: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

80 © 2016 Adaptive Computing Enterprises, Inc.

Nitro job progress and statistics

Page 81: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

81 © 2016 Adaptive Computing Enterprises, Inc.

Nitro job progress

Page 82: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

82 © 2016 Adaptive Computing Enterprises, Inc.

Nitro Task Log file display during job

Page 83: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

83 © 2016 Adaptive Computing Enterprises, Inc.

Benefits

▪ Remove thousands-to-millions of HTC jobs from HPC scheduler queue(s)

▪ Execute HTC jobs faster

▪ Non-technical users can

▪ Submit Nitro jobs

▪ Monitor Nitro job progress

▪ Examine Nitro Task results

Page 84: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

© 2016 Adaptive Computing Enterprises, Inc. | CONFIDENTIAL AND PROPRIETARY 84

Questions?

Page 85: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

© 2016 Adaptive Computing Enterprises, Inc. | CONFIDENTIAL AND PROPRIETARY 85

PMIx Support

▪ Process Management Interface for eXascale ▪ Open-source Project on github ▪ Intel, Mellanox, IBM, Adaptive Computing, SchedMD

▪ Benefits ▪ “Instant On” very fast job start (10 minutes 2 seconds)

▪ Asynchronous, non-blocking operations

▪ Blob-based data transfers

▪ Fast broadcasts/collectives

▪ Interfaces with RMs (e.g., TORQUE)

▪ Will work with malleable/evolving jobs in future

Page 86: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

© 2016 Adaptive Computing Enterprises, Inc. | CONFIDENTIAL AND PROPRIETARY 86

Malleable and Evolving Jobs

▪ Next-generation application/RTE support ▪ Malleable jobs (scheduler application/RTE) ▪ Evolvable jobs (application/RTE scheduler) ▪ Need standard API to negotiate resources

▪ Like MPI for parallel communications

▪ Gathering use cases for PMIx API definition

▪ Benefits ▪ Standard application API for scheduler/app dialog

▪ Scheduler and Malleable/Evolving Application Dialog (SMEAD) API definition

▪ Allows applications to ask for and return resources ▪ Allows scheduler to give and take resources to and from a job

Page 87: Broadening Moab/TORQUE for Expanding User Needs...5 NUMA configurations for cores and DDR4 and MCDRAM memories No NUMA (all2all, hemisphere, quadrant) and sub-NUMA cluster 2 and 4

© 2016 Adaptive Computing Enterprises, Inc. | CONFIDENTIAL AND PROPRIETARY 87

Backup Slides