27
LENOVO System Management Solutions 2015 Lenovo All rights reserved. Luigi Brochard, Lenovo HPC Distinguished Engineer HPC Advisory Council 2016, Lugano April 21-23.

Lenovo system management solutions

Embed Size (px)

Citation preview

Page 1: Lenovo system management solutions

LENOVO System Management Solutions

2015 Lenovo All rights reserved.

Luigi Brochard, Lenovo HPC Distinguished Engineer

HPC Advisory Council 2016, Lugano April 21-23.

Page 2: Lenovo system management solutions

2

HPC Software Solutions through Partnerships

2015 Lenovo

•  Building Partnerships to provide the “Best In-Class” HPC Cluster Solutions for our customers

•  Collaborating with software vendors to provide features that optimizes customer workloads

•  Leveraging “Open Source” components that are production ready

•  Contributing to “Open Source” (i.e. xCAT, Confluent, OpenStack) to enhance our platforms

•  Providing “Services” to help customers deploy and optimize their clusters

Customer Applications

Compute Storage Network

OFED

UFM

LenovoSystem x Virtual, Physical, Desktop, Server

OS VM

Systems Management IBM PCM

xCat Extreme Cloud Admin. Toolkit

Parallel File Systems IBM GPFS Lustre NFS

Workload & Resources

IBM LSF HPC & Symphony

Adaptive Moab

Maui/Torque Slurm

Parallel Runtime Intel MPI Open MPI MVAPICH,

IBM PMPI

Compilers & Tools

Intel Parallel Studio, MKL

Open Source Tools: FFTW, PAPI, TAU, ..

Debuggers & Monitoring

Eclipse PTP + debugger, gdb,.. ICINGA Ganglia

Ent

erpr

ise

Sol

utio

n S

ervi

ces

Inst

alla

tion

and

cust

om s

ervi

ces,

may

not

incl

ude

se

rvic

e su

ppor

t for

third

par

ty s

oftw

are

OmniPath

Page 3: Lenovo system management solutions

3

xCAT

2015 Lenovo

§  Open Source §  Collaboration with IBM §  Server Hardware Management §  OS Deployment §  IP and network service

management §  Virtualization Management §  CLI §  Holistic solution management

§  Weak GUI §  Complex to learn §  Lacking structure §  Poor enablement for web

development §  Good for large clusters, difficult for

smaller solutions/enterprise networks

Page 4: Lenovo system management solutions

4

WEB ORCHESTRATION Initial GOALs §  Provide easy cluster access to new HPC customers using Open Source HPC

Infrastructure §  Low cost entry into HPC

§  Visual summary views to help understand cluster usage §  Admin Console – User management, Cluster Monitoring §  User Console – Jobs submission, Job/Cluster Monitoring

§  Initial target and Proof of Concept trials – China Market §  Focus on China Market first – A lot of customers are just coming into HPC workloads §  Collaborating with customers to understand their usage models and future requirements §  Very positive feedback and market acceptance §  LiCO – Lenovo Intelligent Computing Orchestration was released to China market

§  WW Market – Create English version and work with collaborators to release the English version as “Open Source” project : OSMWC §  Oxford University collaboration

2015 Lenovo

Page 5: Lenovo system management solutions

5

Lenovo Intelligent Cluster Orchestrator (LiCO) What is Web

Console: An Unified GUI •  User Portal

(dashboard, submit job, monitor job)

•  Admin Portal (dashboard, user/account management)

Future Work Items: •  SLURM integration •  ICINGA integration •  Intel OPA integration •  LDAP integration

Lenovo components Open Source/3rd party Lenovo Hardware

xCAT/Confluent

Torque/MAUI GOLD/Ganglia

WEB CONSOLE GUI Installation guide / scripts

Adm

in guide / scripts

OpenMPI, MVAPICH MPICH, Intel Parallel studio

CentOS/RHEL Lustre OFED

Server Storage Network

Main HPC components below the GUI would be part of OpenHPC project

2015 Lenovo

Page 6: Lenovo system management solutions

6

Open System Management Web Console (OSMWC)

What is Web Console:

An Unified GUI •  User Portal

(dashboard, submit job, monitor job)

•  Admin Portal (dashboard, user/account management)

Future Work Items: •  SLURM integration •  ICINGA integration •  Intel OPA integration •  LDAP integration

Lenovo components Open Source/3rd party Lenovo Hardware

xCAT/Confluent

Torque/MAUI Ganglia

WEB CONSOLE GUI Installation guide / scripts

Adm

in guide / scripts

OpenMPI, MVAPICH MPICH, Intel Parallel studio

CentOS/RHEL Lustre OFED

Server Storage Network

Main HPC components below the GUI would be part of OpenHPC project

2015 Lenovo

Page 7: Lenovo system management solutions

7

END USER PORTAL – TRANSLATED VIEW

2015 Lenovo

Page 8: Lenovo system management solutions

8

ADMIN PORTAL – TRANSLATED VIEW

2015 Lenovo

Page 9: Lenovo system management solutions

9

Confluent

2015 Lenovo

Page 10: Lenovo system management solutions

10

Confluent Goals

2015 Lenovo

§  Lenovo led project to improve upon xCAT heritage §  Carries on strong CLI and other facets of xCAT §  More structured interface §  Easier to learn §  Web development enabled – RESTful APIs – good GUI possible §  Faster performance/lower memory usage/higher scalability for large solutions §  Better equipped to work in smaller configurations without full network control §  Enhanced security model §  Reuse effort across HPC, Openstack, xClarity efforts §  Reuse development effort across multiple projects (Lenovo/external

Ecosystem) §  More contributions from third parties

Page 11: Lenovo system management solutions

11

Confluent updates • xCAT style noderanges • Client connections persist across server restart (e.g. consoles) • xCAT style commands:

–  nodehealth (new) –  nodesensors (like rvitals) –  nodepower (like rpower) –  nodeeventlog (like reventlog) –  nodeconsole (like rcons) –  nodesetboot (like rsetboot) –  nodeidentify (like rbeacon) –  nodelist (like nodels)

• Inventory in API (nodeinventory to come, similar to rinv) • Dynamic nodegroups (groups with a ‘noderange’ attribute get expanded) • Enriched debugging facilities • Rotating log support (defaults to daily) 2015 Lenovo

Page 12: Lenovo system management solutions

12

Confluent Web UI (consoles without plugin or java)

2015 Lenovo

Page 13: Lenovo system management solutions

13

Confluent CLI – through confetti (RESTful API)

2015 Lenovo

Page 14: Lenovo system management solutions

14

nodesensors (csv, and time series data)

2015 Lenovo

Page 15: Lenovo system management solutions

15

Confluent performance

2015 Lenovo

Page 16: Lenovo system management solutions

16

Future High Performance Computing Open Solutions

2015 Lenovo

•  Partnering as founding member of OpenHPC initiative to establish a common Open HPC Framework

•  Collaborating with Oxford University to create an Open System Management framework for small to medium clusters

•  Leading Open Source system management projects: Confluent and soon to be formed OSMWC

•  Contributing to xCAT Open Source project to enhance our platforms

•  Providing “Services” to help customers deploy and optimize their clusters

Customer Applications

Parallel File Systems

Lenovo GSS

Intel Lustre NFS

Ent

erpr

ise

Sol

utio

n S

ervi

ces

Inst

alla

tion

and

cust

om s

ervi

ces,

may

not

incl

ude

se

rvic

e su

ppor

t for

third

par

ty s

oftw

are

Systems Management

Open System Management WEB Console (OSMWC)

Confluent

xCat Extreme Cloud Admin. Toolkit

OS VM OFED

Compute Storage Network UFM

Leovo System x Virtual, Physical, Desktop, Server

OmniPath

Page 17: Lenovo system management solutions

17

Future High Performance Computing Solutions

2015 Lenovo

•  Adding new features •  Power & Energy awareness •  Light weight virtual HPC •  Big Data / Spark workload •  Managing more than the servers

Customer Applications

Parallel File Systems

Lenovo GSS

Intel Lustre NFS

Ent

erpr

ise

Pro

fess

iona

l Ser

vice

s In

stal

latio

n an

d cu

stom

ser

vice

s, m

ay n

ot in

clud

e

serv

ice

supp

ort f

or th

ird p

arty

sof

twar

e

Open System Management WEB Console (OSMWC) Integration with

OS VM OFED

Compute Storage Network UFM

Lenovo System x Virtual, Physical, Desktop, Server

OmniPath

xCat Extreme Cloud Admin Toolkit, Confluent

Page 18: Lenovo system management solutions

18

Future HPC Software Solutions through Partnerships

2015 Lenovo

•  Building Partnerships to provide the “Best In-Class” HPC Cluster Solutions for our customers

•  Collaborating with software vendors to provide features that optimizes customer workloads

•  Bright Computing •  Altair •  …

Customer Applications

Compute Storage Network

OFED

UFM

LenovoSystem x Virtual, Physical, Desktop, Server

OS VM

Systems Management IBM PCM

xCat Extreme Cloud Admin. Toolkit

Parallel File Systems IBM GPFS Lustre NFS

Workload & Resources

IBM LSF HPC & Symphony

Adaptive Moab

Maui/Torque/ Slurm/ PBSPro

Parallel Runtime Intel MPI Open MPI MVAPICH,

IBM PMPI

Compilers & Tools

Intel Parallel Studio, MKL

Open Source Tools: FFTW, PAPI, TAU, ..

Debuggers & Monitoring

Eclipse PTP + debugger, gdb,.. ICINGA Ganglia

Ent

erpr

ise

Sol

utio

n S

ervi

ces

Inst

alla

tion

and

cust

om s

ervi

ces,

may

not

incl

ude

se

rvic

e su

ppor

t for

third

par

ty s

oftw

are

OmniPath

BC CM

Page 19: Lenovo system management solutions
Page 20: Lenovo system management solutions

20

ADMIN PORTAL – TRANSLATED VIEW

2015 Lenovo

Page 21: Lenovo system management solutions

21

User Job Submission views

2015 Lenovo

Page 22: Lenovo system management solutions

22

User Job Submission – provide Scheduler job file

2015 Lenovo

Page 23: Lenovo system management solutions

23

Admin / Operator views

2015 Lenovo

Page 24: Lenovo system management solutions

24

ADMIN PORTAL – TRANSLATED VIEW

2015 Lenovo

Page 25: Lenovo system management solutions

25

ADMIN PORTAL – TRANSLATED VIEW

2015 Lenovo

Page 26: Lenovo system management solutions

26

ADMIN PORTAL – TRANSLATED VIEW

2015 Lenovo

Page 27: Lenovo system management solutions

27

nodehealth

2015 Lenovo