19

Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data
Page 2: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data

2

Tools and systems for HPC and AI at LRZ 22.7.2019 | Luigi Iapichino

Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino

Page 3: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data

About the presenter

• Team leader of the Application Lab for Astrophysics (LRZ

AstroLab)

• Lead of Quantum Computing @ LRZ

• Expert in computational astrophysics and simulations

• Member of the PRACE High-Level Support Team

Email: [email protected]

Thanks to Nicolay Hammer and David Brayford (LRZ) for having

provided many of the next slides

3

Dr. Luigi Iapichino

• Astrophysics and

Quantum

Computing

Application

Specialist

• High Performance

Systems Division,

LRZ

Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino

Page 4: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data

4

Leibniz Supercomputing CentreGarching, Germany

Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino

Page 5: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data

Bavarian Academy of Sciences and HumanitiesLeibniz Supercomputing Centre

Computer Centre

for all Munich Universities250employees

approx.

56years of

IT support

We are the computing backbone for advanced research science in Bavaria

Regional Computer Centre

for all Bavarian Universities

National Supercomputing Centre

(GCS)

European Supercomputing Centre

(PRACE)

5Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino

Page 6: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data

6

Operating Cutting-Edge IT InfrastructureLRZ as an IT Center of Excellence

Storage

Network

Cloud Computing

Cluster

HPC

Training

Consultancy

Email

High Speed Networking

Munich Scientific Network

High Performance Computing

SuperMUC-NG, Linux cluster

Big Data

Bavarian State Library Digital Archive

Virtual Reality and Visualisation

V2C (CAVE, Powerwall)

Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino

Page 7: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data

7

For German ResearchLRZ as IT Service Provider

Gauss Centre for Supercomputing (GCS)

Alliance for Germany’s Tier-0

high performance computing centers

• LRZ | Munich | SuperMUC-NG

• HLRS | Stuttgart | Hazel Hen

• JSC | Jülich | JUWELS

Founded 13. April 2007

Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi

Iapichino

Page 8: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data

8

For European ResearchLRZ as IT Service Provider

Partnership for Advanced Computing

in Europe (PRACE)

Federated, pan-European Tier-0

supercomputing infrastructure

25 Countries

Hosting Members:

• GCS (LRZ, HLRS, JSC)

BSC (Spain)

• CSCS (Switzerland)

• CINECA (Italy)

• GENCI (France)

PRACE 2: 2017 – 2020

Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi

Iapichino

Page 9: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data

LRZ Systems and Access

9

National and International

PRACE

GCS

SuperMUC-NGLocal and Regional

Munich TUM and LMU

~30% of SuperMUC

usage

*Students

Training future experts

Bavarian projects

<1 million CPU hours

Cluster

• CoolMUC-2

• CoolMUC-3

• Teramem

• DGX-1

• VM WareHigh Availability Cloud

• Compute CloudOpen Stack

Open Nebula

Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino

Page 10: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data

Data Intensive

Computing,

Data Analytics

& AI

High Performance Data Computing at LRZ

10

Emerging

Communities

HPC User

Communities

Increasing

computing demands

Increasing

analytics demands

Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino

Page 11: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data

Big Data and AI

Ability and Expertise

to Target Large

Scale Problems

A New World is Emerging: High Performance AI (HPAI)

New User

Communities with

New Workflows

HPC

11Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino

Page 12: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data

HPC AI &

Machine

Learning

Big Data

THE COMPUTING INNOVATION CYCLEAdvanced computing and huge volumes of data creates new opportunities for information insight.

Data ▻ Algorithms ▻

Computing ▻ Pattern Recognition Modeling & Simulation (M&S)

Natural World ▻ Hypothesis ▻

Equations ▻ Algorithms ▻

Computing ▻ Data ▻ Analysis

12Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino

Page 13: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data

HPC AI &

Machine

Learning

Big Data

THE COMPUTING INNOVATION CYCLEAdvanced computing and huge volumes of data creates new opportunities for information insight.

Data ▻ Algorithms ▻

Computing ▻ Pattern Recognition Modeling & Simulation (M&S)

Natural World ▻ Hypothesis ▻

Equations ▻ Algorithms ▻

Computing ▻ Data ▻ Analysis

13Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino

Page 14: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data

High Performance AI (HPAI) in a Container

14

Transition AI algorithms from the

laptop to supercomputer

with minimal effort

“It just works”

Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino

Page 15: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data

HPAI =

M&S• Equation based on model

• Computing driven

• Numerically intensive

• Creates simulations

• Monte Carlo

• Larger problems

• Iterative methods

• PDE

Analytics• Finds patterns

• Correlations in data

• Logic driven

• Creates inferences

• Knowledge discovery

• Graphs

• Data-driven science

• Predictions

• CNN

• RNN

+• Linear algebra

• Matrix operations

• Iterative methods

• Compute intensive

• Data transfer

• Predictive

• Probabilities

• Stencil codes

• Calculus

• Pattern recognition

• Graphs

15Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino

Page 16: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data

AI

Large number of small files

Large memory nodes (+1TB)

Single node

Single GPU/accelerator node

Local node storage

Data transfer within a single node. (PCI bus)

Matrices are typically small

Root privileges

HPC

Small number of large files

Memory per node (32/64GB)

Multiple nodes

Distribute compute over many nodes

Typically diskless systems (no local storage)

Data transfer between multiple nodes

Medium to large matrices

User privileges

16

Differences Between HPC & AIHPAI @ LRZ

Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino

Page 17: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data

Compute intensive hardware

Optimized AI frameworks (TensorFlow, Caffe)

Optimized software (numerical libraries, Python)

HPC specific software (distributed computing, workload manager)

Method of deploying the AI software in a simple, straightforward and flexible way

17

Requirements for AI on HPCHPAI @ LRZ

Need to get to: “It just works”

Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino

Page 18: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data

18

SuperMUC-NG

HPC System

(Tier-1)

LRZ Linux

Cluster

HPC System

(Tier-2)

LRZ

Compute Cloud

Compute System Diversity with fully integrated Central Data Silos

Special System

DGX-1,

Teramem,

Sharing Data

with outside world

LRZ Data

Science Storage

(DSS) Systems

Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino

Page 19: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data

Our training system: the LRZ Compute Cloud

A new service for LRZ users

It allows to upload and use your own virtual machines

Hardware overview:

• 82 nodes: Intel® Xeon ® Gold 6148 (40 cores) @ 2.40GHz, 192 GB memory

• 32 nodes: Intel ® Xeon ® Gold 6148 (40 cores) @ 2.40GHz, 768 GB memory, each with 2x

Nvidia Tesla V100 16 GB RAM

• 1 huge memory node: Intel® Xeon® Platinum 8160 (192 cores) @ 2.10GHz, 6000 GB

memory

19Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino