27
LENOVO System Management Solutions 2015 Lenovo All rights reserved. Luigi Brochard, Lenovo HPC Distinguished Engineer HPC Advisory Council 2016, Lugano April 21-23.

LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

LENOVOSystem Management Solutions

2015 Lenovo All rights reserved.

Luigi Brochard, Lenovo HPC Distinguished Engineer

HPC Advisory Council 2016, Lugano April 21-23.

Page 2: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

2

HPC Software Solutions through Partnerships

2015 Lenovo

• Building Partnerships to provide

the “Best In-Class” HPC Cluster

Solutions for our customers

• Collaborating with software vendors

to provide features that optimizes

customer workloads

• Leveraging “Open Source”

components that are production

ready

• Contributing to “Open Source” (i.e.

xCAT, Confluent, OpenStack) to

enhance our platforms

• Providing “Services” to help

customers deploy and optimize their

clusters

Customer Applications

Compute Storage Network

OFED

UFM

LenovoSystem x

Virtual, Physical, Desktop, Server

OS

VM

Systems

ManagementIBM PCM

xCatExtreme Cloud

Admin. Toolkit

Parallel File

SystemsIBM GPFS Lustre NFS

Workload &

Resources

IBM LSFHPC & Symphony

Adaptive

Moab

Maui/Torque

Slurm

Parallel

RuntimeIntel MPI Open MPI

MVAPICH,

IBM PMPI

Compilers &

Tools

Intel Parallel

Studio, MKL

Open Source Tools:

FFTW, PAPI, TAU, ..

Debuggers &

Monitoring

Eclipse PTP +

debugger, gdb,..ICINGA Ganglia

Ente

rprise S

olu

tion S

erv

ices

Insta

llatio

n a

nd

cu

sto

m s

erv

ice

s, m

ay n

ot in

clu

de

se

rvic

e s

up

po

rt fo

r th

ird

pa

rty s

oft

wa

re

OmniPath

Page 3: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

3

xCAT

2015 Lenovo

Open Source

Collaboration with IBM

Server Hardware Management

OS Deployment

IP and network service

management

Virtualization Management

CLI

Holistic solution management

Weak GUI

Complex to learn

Lacking structure

Poor enablement for web

development

Good for large clusters, difficult for

smaller solutions/enterprise

networks

Page 4: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

4

WEB ORCHESTRATIONInitial GOALs

Provide easy cluster access to new HPC customers using Open Source HPC

Infrastructure Low cost entry into HPC

Visual summary views to help understand cluster usage Admin Console – User management, Cluster Monitoring

User Console – Jobs submission, Job/Cluster Monitoring

Initial target and Proof of Concept trials – China Market Focus on China Market first – A lot of customers are just coming into HPC workloads

Collaborating with customers to understand their usage models and future requirements

Very positive feedback and market acceptance

LiCO – Lenovo Intelligent Computing Orchestration was released to China market

WW Market – Create English version and work with collaborators to release

the English version as “Open Source” project : OSMWC Oxford University collaboration

2015 Lenovo

Page 5: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

5

Lenovo Intelligent Cluster Orchestrator (LiCO)

What is Web Console:

An Unified GUI

• User Portal (dashboard, submit job, monitor job)

• Admin Portal (dashboard, user/account management)

Future Work Items:

• SLURM integration

• ICINGA integration

• Intel OPA integration

• LDAP integration

Lenovo components Open Source/3rd party Lenovo Hardware

xCAT/Confluent

Torque/MAUIGOLD/Ganglia

WEB CONSOLE GUI

Insta

llatio

n g

uid

e / s

crip

ts

Adm

in g

uid

e / s

crip

ts

OpenMPI, MVAPICH

MPICH, Intel Parallel studio

CentOS/RHEL Lustre OFED

Server Storage Network

Main HPC components below the GUI would be part of OpenHPC project

2015 Lenovo

Page 6: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

6

Open System Management Web Console (OSMWC)

What is Web Console:

An Unified GUI

• User Portal (dashboard, submit job, monitor job)

• Admin Portal (dashboard, user/account management)

Future Work Items:

• SLURM integration

• ICINGA integration

• Intel OPA integration

• LDAP integration

Lenovo components Open Source/3rd party Lenovo Hardware

xCAT/Confluent

Torque/MAUIGanglia

WEB CONSOLE GUI

Insta

llatio

n g

uid

e / s

crip

ts

Adm

in g

uid

e / s

crip

ts

OpenMPI, MVAPICH

MPICH, Intel Parallel studio

CentOS/RHEL Lustre OFED

Server Storage Network

Main HPC components below the GUI would be part of OpenHPC project

2015 Lenovo

Page 7: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

7

END USER PORTAL – TRANSLATED VIEW

2015 Lenovo

Page 8: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

8

ADMIN PORTAL – TRANSLATED VIEW

2015 Lenovo

Page 9: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

9

Confluent

2015 Lenovo

Page 10: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

10

Confluent Goals

2015 Lenovo

Lenovo led project to improve upon xCAT heritage

Carries on strong CLI and other facets of xCAT

More structured interface

Easier to learn

Web development enabled – RESTful APIs – good GUI possible

Faster performance/lower memory usage/higher scalability for large solutions

Better equipped to work in smaller configurations without full network control

Enhanced security model

Reuse effort across HPC, Openstack, xClarity efforts

Reuse development effort across multiple projects (Lenovo/external

Ecosystem)

More contributions from third parties

Page 11: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

11

Confluent updates

• xCAT style noderanges

• Client connections persist across server restart (e.g. consoles)

• xCAT style commands:– nodehealth (new)

– nodesensors (like rvitals)

– nodepower (like rpower)

– nodeeventlog (like reventlog)

– nodeconsole (like rcons)

– nodesetboot (like rsetboot)

– nodeidentify (like rbeacon)

– nodelist (like nodels)

• Inventory in API (nodeinventory to come, similar to rinv)

• Dynamic nodegroups (groups with a ‘noderange’ attribute get expanded)

• Enriched debugging facilities

• Rotating log support (defaults to daily)2015 Lenovo

Page 12: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

12

Confluent Web UI (consoles without plugin or java)

2015 Lenovo

Page 13: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

13

Confluent CLI – through confetti (RESTful API)

2015 Lenovo

Page 14: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

14

nodesensors (csv, and time series data)

2015 Lenovo

Page 15: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

15

Confluent performance

2015 Lenovo

Page 16: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

16

Future High Performance Computing Open Solutions

2015 Lenovo

• Partnering as founding member

of OpenHPC initiative to

establish a common Open HPC

Framework

• Collaborating with Oxford

University to create an Open

System Management framework

for small to medium clusters

• Leading Open Source system

management projects: Confluent

and soon to be formed OSMWC

• Contributing to xCAT Open Source

project to enhance our platforms

• Providing “Services” to help

customers deploy and optimize

their clusters

Customer Applications

Parallel File

SystemsLenovo

GSS

Intel

LustreNFS

Ente

rprise S

olu

tion S

erv

ices

Insta

llatio

n a

nd

cu

sto

m s

erv

ice

s, m

ay n

ot in

clu

de

se

rvic

e s

up

po

rt fo

r th

ird

pa

rty s

oft

wa

re

Systems

Management

Open System Management

WEB Console (OSMWC)

Confluent

xCatExtreme Cloud

Admin. Toolkit

OS

VMOFED

Compute Storage Network UFM

Leovo System x

Virtual, Physical, Desktop, Server

OmniPath

Page 17: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

17

Future High Performance Computing Solutions

2015 Lenovo

• Adding new features• Power & Energy awareness

• Light weight virtual HPC

• Big Data / Spark workload

• Managing more than the servers

Customer Applications

Parallel File

SystemsLenovo

GSS

Intel

LustreNFS

Ente

rprise P

rofe

ssio

nal S

erv

ices

Insta

llatio

n a

nd

cu

sto

m s

erv

ice

s, m

ay n

ot in

clu

de

se

rvic

e s

up

po

rt fo

r th

ird

pa

rty s

oft

wa

re

Open System Management WEB Console (OSMWC)

Integration with

OS

VMOFED

Compute Storage Network UFM

Lenovo System x

Virtual, Physical, Desktop, Server

OmniPath

xCat Extreme Cloud Admin Toolkit, Confluent

Page 18: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

18

Future HPC Software Solutions through Partnerships

2015 Lenovo

• Building Partnerships to provide

the “Best In-Class” HPC Cluster

Solutions for our customers

• Collaborating with software vendors

to provide features that optimizes

customer workloads

• Bright Computing

• Altair

• …

Customer Applications

Compute Storage Network

OFED

UFM

LenovoSystem x

Virtual, Physical, Desktop, Server

OS

VM

Systems

ManagementIBM PCM

xCatExtreme Cloud

Admin. Toolkit

Parallel File

SystemsIBM GPFS Lustre NFS

Workload &

Resources

IBM LSFHPC & Symphony

Adaptive

Moab

Maui/Torque/

Slurm/PBSPro

Parallel

RuntimeIntel MPI Open MPI

MVAPICH,

IBM PMPI

Compilers &

Tools

Intel Parallel

Studio, MKL

Open Source Tools:

FFTW, PAPI, TAU, ..

Debuggers &

Monitoring

Eclipse PTP +

debugger, gdb,..ICINGA Ganglia

Ente

rprise S

olu

tion S

erv

ices

Insta

llatio

n a

nd

cu

sto

m s

erv

ice

s, m

ay n

ot in

clu

de

se

rvic

e s

up

po

rt fo

r th

ird

pa

rty s

oft

wa

re

OmniPath

BC CM

Page 19: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration
Page 20: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

20

ADMIN PORTAL – TRANSLATED VIEW

2015 Lenovo

Page 21: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

21

User Job Submission views

2015 Lenovo

Page 22: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

22

User Job Submission – provide Scheduler job file

2015 Lenovo

Page 23: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

23

Admin / Operator views

2015 Lenovo

Page 24: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

24

ADMIN PORTAL – TRANSLATED VIEW

2015 Lenovo

Page 25: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

25

ADMIN PORTAL – TRANSLATED VIEW

2015 Lenovo

Page 26: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

26

ADMIN PORTAL – TRANSLATED VIEW

2015 Lenovo

Page 27: LENOVO System Management Solutions · 2020. 1. 15. · Monitoring Eclipse PTP + debugger, gdb,.. es lude ICINGA Ganglia e are OmniPath. 3 xCAT 2015 Lenovo Open Source Collaboration

27

nodehealth

2015 Lenovo