40
IBM HPC/HPDA/AI Solutions Albert Valls Badia IBM Client Technical Architect IBM Systems Hardware [email protected] June 15 th , 2017

IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

IBM HPC/HPDA/AI

Solutions

Albert Valls Badia IBM Client Technical Architect

IBM Systems Hardware

[email protected]

June 15th , 2017

Page 2: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

2

New Drivers and Directions – Datacentric

• Data Volumes are Exploding – Especially Unstructured Data

• Data Needs to e Colle ted, Ma aged, a d Digested

• Deriving Insight and Information from the Data requires:

• A variety of pro essi g steps i a Workflo

• A variety of processing optimizations

• Many Analytics Steps can make use of Large In Memory

Solvers

• Energy Efficiency requires:

• Processing Elements that are Optimized to the task

• Energy and Data aware Workflow Management

• The Open Power Foundation provides innovation

opportunities to a variety of Partners

• Making innovations like A elerators Co su a le is

critical

Pri

ce/P

erf

orm

an

ce

Full system stack innovation required

Technology

and

Processors

200

0

20

20

Firmware / OS

Accelerators Software Storage Network

Workflow

Dependency Graph

Page 3: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

OpenPOWER and Innovation (strategy started in 2014)

IBM Stack

Research

And

Innovation

IBM

Google

NVIDIA

TYAN

Mellanox OpenPower

Open Innovation

OpenPOWER: Bringing Partner Innovation to Power Systems

5 initial members

200+ members

24 countries

Page 4: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

OpenPOWER Innovation Pervasive in System Design (21 TFlops/node)

4

NVIDIA:

Tesla P100 GPU with NVLink

NVLink Interface

Ubuntu by Canonical:

Launch OS supporting NVLink and Page

Migration Engine

Wistron: Platform co-design

Mellanox: InfiniBand/Ethernet

Connectivity in and out of server

Samsung:

2.5” SSDs

HGST: Optional NVMe Adapters

Hynix, Samsung, Micron: DDR4

IBM: POWER8 CPU

Page 5: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

POWER8: Leadership performance - designed for Memory Intensive Workloads

5

Memory Buffer

DRAM Chips

POWER8

12 cores 96 threads 4 cache levels

Up to 1/2 TB per socket Up to 230 GB/s sustained

Consistent speed

Faster cores

8 Threads per Core

Bigger cache

Accelerator direct

links

3x higher memory

bandwidth,

1 TB/Socket

Page 6: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

Differentiated Acceleration - CAPI and NVLink

New Ecosystems with CAPI

Partners innovate, add value,

gain revenue together w/IBM

Technical and programming

ease: virtual addressing, cache

coherence

Accelerator is hardware peer FPGA or ASIC

NVIDIA Tesla GPU with NVLink

POWER8

with NVLink

80 GB/s

Peak*

Graphics Memory Graphics Memory

System Memory

40+40 GB/s

Coherence Bus

POWER8

CAPP

CAPI-attached Accelerators

Future, Innovative Systems with NVLink

Faster GPU-GPU communication

Breaks down barriers between CPU-GPU

New system architectures

PSL

6

Page 7: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

IBM Power Accelerated Computing Roadmap

2015 2016 2017

POWER8 POWER8 with NVLink

POWER9

CAPI

Interface

NVLink

Enhanced

CAPI &

NVLink

ConnectX-4 EDR Infiniband

PCIe Gen3

ConnectX-4 EDR Infiniband

CAPI over PCIe Gen3

HDR Infiniband Enhanced CAPI over PCIe Gen4

Mellanox Interconnect Technology

IBM CPUs

NVIDIA GPUs Kepler

PCIe Gen3 Volta

Enhanced NVLink Pascal NVLink

S822LC – Firesto e

Server

S8 LC for HPC Mi sk

POWER10

2020+

Witherspoo

TBD

TBD

System Name TBD

Page 8: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

7/3/2017 8

FLOPS are not the only PKI in HPC: example workflow in seismic analysis.

• Read from storage

• Memory load

• Preporcessing

• Realtime algorithm execution

• Visualization and Insight

• Simulation and modeling Every step in the workflow takes advantage of different

hardware capabilities. Therefore the need for a balanced

system design.

Page 9: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

IBM Data Centric Computing Strategy: HPC->HPDA

Page 10: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

Introducing IBM Spectrum Scale

• Remove data-related bottlenecks with a parallel, scale-out solution

• Enable global collaboration with unified storage and global namespace

• Optimize cost and performance with automated data placement

• Ensure data availability, integrity and security with erasure coding, replication, snapshots, and encryption

Highly scalable high-performance unified storage

for files and objects with integrated analytics

Unified Scale-out Data Lake

• File In/Out, Object In/Out; Analytics on demand.

• High-performance native protocols

• Single Management Plane

• Cluster replication & global namespace

• Enterprise storage features across file, object & HDFS

Spectrum Scale

SSD Disk

Fast Disk

Slow Disk

Tape

SSNR

Compression

NFS SMB POSIX Swift/S3 HDFS

Encryption

SSD Disk

Fast Disk

Slow Disk

Page 11: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

| 11

Page 12: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

IBM Spectrum Scale: Parallel Architecture

| 12

No Hot Spots

• All NSD servers export to all clients in active-active mode

• Spectrum Scale stripes files across NSD servers and NSDs in units of file-system block-size

• File-system load spread evenly

• Easy to scale file-system capacity and performance while keeping the architecture balanced

NSD Client does real-time parallel I/O

to all the NSD servers and storage volumes/NSDs

NSD Client

NSD Servers

Storage Storage

Page 13: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

Ethernet Network (TCP/IP) or Low Latency Network (Infiniband)

Heterogeneous Block

Storage

Block Storage

JBODs

JBODs

JBODs

Block Storage

IBM Elastic Storage

Solution

Spectrum Scale Native RAID Controllers Spectrum Scale File

Servers

Commodity Servers

(x86_64 or Power)

Application Nodes

(Oracle,ERP, HPC

Cluster)

Spectrum Scale Clients

/file_systemA

Spectrum Scale Protocol

Nodes

NFS, SMB, OpenStack

Swift

NFS exports

SMB Shares

HTTP GET/PUT (Swift)

Spectrum Scale NSD Protocol

NFS Clients

SMB Clients

OpenStack Swift Clients

Clustered

Failover

Up to 16 (SMB) or

32 (NFS) servers

Servers use Disk

Volumes/LUNs

File-system load spread

evenly across all the

servers. No Hot Spots

Data is stripped across

servers in block-size

No single-server

bottleneck

Can share access to

data with NFS, SMB and

Swift S3

Easy to scale while

keeping the architecture

balanced

Can add capacity and

performance

/file_systemA

/file_systemA

Spectrum Scale Cluster Overview

Page 14: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

Spectrum Scale Architecture Highlights: Scalability

Data scalability

Capacity: Large number of disks/LUNs in a single file system

Throughput: wide striping, large block size

Capacity efficient (data in i-node, fragments)

Multiple nodes write in parallel (even within single file)

Metadata scalability

Wide striping of all metadata (inodes, indirect blocks, directories, allocation maps...)

Scalable data structures: Segmented allocation map,

Extensible hashing for directories

Highly scalable, distributed lock manager:

After o tai i g lo k toke , each node can cache metadata, update locally, write back directly

Fine-grain locking, when necessary: shared inode write locks, byte-range locks

lock directory entries by name (hash)

Dy a i ally ele ted metanode collects inode, ind block & directory updates

Page 15: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

Speed and simplicity: Graphical user interface

• Reduce administration overhead • Graphical User Interface for common tasks

• Performance monitoring

• Problem determination

• Easy to adopt • Common IBM Storage UI Framework

• Integrated into Spectrum Control • Storage portfolio visibility

• Consolidated management

• Multiple clusters

Page 16: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

Spectrum Scale Built-in Tiering (ILM) Challenge

• Data growth is outpacing budget

• Low-cost archive is another storage silo

• Flash is u der utilized e ause it is t shared

• Lo ally atta hed disk a t e used ith e tralized storage

• Migration overhead is preventing storage upgrades

• Automated data placement

• Span entire storage portfolio, including DAS, with a single namespace

• Policy driven data placement & data migration

• Share storage, even low-latency flash

• Automatic failover and seamless file-system recovery

• Lower TCO

• Powerful policy engine

• Information Lifecycle Management

• Fast etadata s a i g a d data o e e t

• Automated data migration to based on threshold

• Users not affected by data migration

• Example: Online storage reaches 90% full then move all 1GB or larger files that are 60 days old to offline to free up space

Small files last accessed > 30 days

last accessed > 60days

Silver pool is >60% full Drain it to 20%

accessed today and file size is <1G

System pool

(Flash)

Gold pool

(SSD)

Silver pool

( NL SAS)

Automation

Page 17: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

Spectrum Scale HDFS Transparency

Challenge • Separate storage systems for ingest, analysis, results

• HDFS requires locality aware storage (namenode)

• Data transfer slows time to results • Different frameworks & analytics tools use data

differently

• HDFS Transparency

• Map/Reduce on shared, or shared nothing storage

• No waiting for data transfer between storage systems

• Immediately share results • Si gle Data Lake for all appli atio s • Enterprise data management • Archive and Analysis in-place

A A A

Existing System

Analytics

System Data

ingest

Export

result

Traditional Analytics

Solution

A A A

Existing System

Spectrum Scale File System File Object

Analytics

System

HDFS

Transparency

In-place Analytics Solution

Page 18: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

Spectrum Scale Compression

• Transparent compression for HDFS transparency, Object, NFS, SMB and POSIX interface.

• Improved storage efficiency

• Typically 2x improvement in storage efficiency

• Improved I/O bandwidth

• Read/write compressed data reduces load on storage

• Improved client side caching

• Caching compressed data increases apparent cache size

• Per file compression

• Use policies

• Compress cold data

– Data not being used/accessed

18

Page 19: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

Spectrum Scale Encryption

• Native Encryption of data at rest

• Files are encrypted before they are stored on disk

• Keys are never written to disk

• No data leakage in case disks are stolen or improperly decommissioned

• Secure deletion

• Ability to destroy arbitrarily large subsets of a file system

• No ‘digital shredding’, no overwriting: Security deletion is a cryptographic operation

• Use Spectrum Scale Policy to encrypted (or exclude) files in fileset or file system

• Generally < 5% performance impact

Page 20: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

Benefits

• Expands local node file cache (Pagepool)

• Leverages fast local storage

• Can reduce load on central storage

• Transparent to applications

• Can use inexpensive local devices

Where to use it

• Protocol Node

• Virtual Machine storage

• Large Memory Analytics

Easy to enable

NSD Type localCache

Define only this node as NSD server

LROC LROC

Application Nodes

Performance Feature Spectrum Scale Local Read-Only Cache (LROC)

Page 21: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

Benefits

• Speeds-up small writes

• Used by IBM Elastic Storage Server

Where to use it

• Logs handle small writes

• Any storage architecture

• Shared Disk

• Shared Nothing (Use replication)

• IO Sizes up to 64KiB

Easy to enable

Create a system.log pool

Enable write-cache on the file system

Application Nodes

Performance Feature Spectrum Scale Highly Available Write Cache (HAWC)

Flash

Local Storage

Shared Storage

Page 22: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

Spectrum Scale Multicluster: cross-cluster sharing

22

• Cross-mounting file systems between Spectrum Scale clusters

• Separate clusters = separate administration domains

• When connection is established, all nodes are interconnected

– All nodes in both clusters must be within same IP network segment / VLAN

– Channel can be encrypted (openssl)

Page 23: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

Synchronous Replication & Stretched Cluster

• Performed synchronously by the node who writes to disk

• Synchronous replication happens within Spectrum Scale cluster

• I/O it does not return to the application until both copies are written

• Active/Active data access

• Read from fastest source

• DR with automatic failover and seamless file-system recovery

• If replication between sites -> Spectrum Scale Stretched Cluster

Synchronous

replication

Application

Whichever

is fastest

23

Page 24: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

Spectrum Scale Active File Management (AFM) • An asynchronous, cross-cluster, data-sharing utility

• Functions well over unreliable and high latency networks

• Extends global name space between multiple WAN dispersed locations to share and exchange data asynchronously

• Ca hes lo al opies of data distri uted to o e or ore lusters to i pro e lo al read and write performance

• As data is written or modified at one location, all other locations see that same data

24

Page 25: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

Spectrum Scale AFM Main Concepts

• Home - Where the information lives. Owner of the data in a cache relationship

• Cache - Fileset in a remote cluster that points to home

• The relationship between a Cache and Home is one to one

• Cache knows about its Home. Home does not know a cache exists

• Data is copied to the cache when requested or data written at the cache is copied back to home as fast as possible

25

Page 26: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

Spectrum Scale Server 1

Spectrum Scale Server 2

Clients

FDR IB

10/40 GbE

IBM Elastic Storage Server (ESS) is a Software Defined Solution

Migrate RAID

and disk

management

to commodity

file servers !

Custom dedicated

Disk Controllers

JBOD Disk

enclosures

Spectrum Scale Server 1

Spectrum Scale Server 2

Clients

Spectrum Scale RAID

Commodity file

servers

FDR IB

10/40 GbE

JBOD Disk

enclosures

Spectrum Scale RAID

Commodity file

servers with

RAID and disk

management

Spectrum Scale Native RAID is a software implementation of storage RAID technologies within

Spectrum Scale.

It requires special Licensing

It is only approved for pre-certified architectures such as Lenovo-GSS, IBM-ESS (Elastic Storage

Server)

Page 27: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

Advantages of Spectrum Scale RAID

• Use of standard and i e pe sive disk drives • Erasure Code software implemented in Spectrum Scale

• Data is declustered and distributed to all disk drives with selected RAID protection

• 3-way, 4-way, RAID6 8+2P, RAID6 8+3P

• Faster rebuild times • As data is declustered, more disks are involved during rebuild

• Approx. 3.5 times faster than RAID-5

• Minimal impact of rebuild on system performance • Rebuild is done by many disks

• Rebuilds can be deferred with sufficient protection

• Better fault tolerance • End to end checksum

• Much higher mean-time-to-data-loss (MTTDL)

JBODs

Spectrum Scale RAID

Page 28: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

RAID algorithm • Two types of RAID:

• 3 or 4 way replication

• 8 + 2 or 3 way parity

• 2-fault and 3-fault tolera t odes RAID-D2, RAID-D3

3-way Replication (1+2) 8 + 2p Reed Solomon 2-fault

tolerant

codes

3-fault

tolerant

codes

1 strip

(GPFS

block)

2 or 3

replicated

strips

4-way Replication (1+3)

8 strips

(GPFS block)

2 or 3

redundancy

strips

8 + 3p Reed Solomon

Page 29: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

Rebuild overhead reduction example

| 31

Page 30: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

Declustered RAID6 example

Page 31: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

Critical Rebuild Performance on GL6 8+2p

JBODs

Spectrum Scale RAID

As one can see

during the critical

rebuild impact on

workload was high,

but as soon as we

were back to a

single parity

protection the

impact to the

customers

workload was <2%

Data Integrity Manager

prioritizes tasks:

Rebuild, Rebalance,

Data scrubbing and

proactive correction

6 minutes for a critical rebuild

Page 32: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

End-to-end checksum • True end-to-end checksum fro disk surfa e to lie t s Spe tru S ale i terfa e

• Repairs soft/latent read errors

• Repairs lost/missing writes.

• Checksums are maintained on disk and in memory and are transmitted to/from client.

• Checksum is stored in a 64-byte trailer of 32-KiB buffers • 8-byte checksum and 56 bytes of ID and version info

• Sequence number used to detect lost/missing writes.

8 data strips 3 parity strips

32-KiB buffer

64B trailer

¼ to 2-KiB

terminus

Page 33: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

IBM Elastic Storage Server family GS odels use U . JBODs or SSDs

Support drives: . TB, .8TB SAS, GB, 8 GB, . TB SSD .

GL odels use U . JBODs

Support drives: 4TB,6TB,8TB NL-SAS . HDDs

Supported NICs: 10GbE, 40GbE Ethernet and FDR or EDR Infiniband

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

FC

5887

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

FC

58

87

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

FC

58

87

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

FC

58

87

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

FC

58

87

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

FC

58

87

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

FC

58

87

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

FC

5887

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

FC

588

7

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

FC

5887

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

FC

58

87

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

FC

5887

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

FC

5887

Net Capacity

4TB = 327TB

6TB = 491TB

8TB = 655TB

Net Capacity

4TB = 673TB

6TB = 1PB

8TB = 1.3PB

Net Capacity

4TB = 1PB

6TB = 1.5PB

8TB = 2PB

Model GL4

Analytics and Cloud 4 Enclosures, 20U

232 NL-SAS, 2 SSD

10 to 16 GB/Sec

Model GL6

PetaScale Storage 6 Enclosures, 28U

348 NL-SAS, 2 SSD

10 to 25 GB/sec

Model GL2

Analytics Focused 2 Enclosures, 12U

116 NL-SAS, 2 SSD

5 - 8 GB/Sec

Model GS1 24 SSD

6 GB/Sec

Model GS2 46 SAS + 2 SSD or

48 SSD Drives

2 GB/Sec SAS

12 GB/Sec SSD

Model GS4 94 SAS + 2 SSD or

96 SSD Drives

5 GB/Sec SAS

16 GB/Sec SSD

Model GS6 142 SAS + 2 SSD

7 GB/Sec

Net Capacity

1.2TB = 121TB

1.6TB = 182TB

Net Capacity

400GB = 28TB

800GB = 57TB

1.6TB = 115TB

1.2TB = 78TB

1.6TB = 117TB

Net Capacity

400GB = 13TB

800GB = 26TB

1.6TB = 53TB

1.2TB = 35TB

1.6TB = 53TB

Net Capacity

400GB = 6TB

800GB = 13TB

1.6TB = 26TB

Page 34: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

ESS New Models Performance and Capacity

Spectrum

Scale

ESS

New! Model GL2S: 2 Enclosures, 14U 166 NL-SAS, 2 SSD

New! Model GL4S: 4 Enclosures, 24U 334 NL-SAS, 2 SSD

New! Model GL6S: 6 Enclosures, 34U 502 NL-SAS, 2 SSD

ESS

5U84 Storage

ESS

5U84 Storage

Max: .9PB raw Max: 1.6PB raw Max: 1.8PB raw Max: 3.3PB raw Max: 2.8PB raw Max: 5PB raw

Model GL2: 2 Enclosures, 12U 116 NL-SAS, 2 SSD

ESS

5U84

Storage

ESS

5U84 Storage

ESS

5U84

Storage

ESS

5U84

Storage

Model GL6: 6 Enclosures, 28U 348 NL-SAS, 2 SSD

Model GL4: 4 Enclosures, 20U 232 NL-SAS, 2 SSD

ESS

5U84

Storage

ESS

5U84

Storage

ESS

5U84

Storage

ESS

5U84

Storage

ESS

5U84 Storage

ESS

5U84

Storage

34 GB/s

25 GB/s

17 GB/s

11 GB/s

8 GB/s

23 GB/s

Net Capacity

4TB = 1.5PB

8TB = 3.1PB

10TB = 3.9PB

Net Capacity

4TB = 1PB

8TB = 2PB

10TB = 2.5PB

Net Capacity

4TB = 508PB

8TB = 1PB

10TB = 1.27PB

Page 35: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

Sequential throughput vs. Capacity

Page 36: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

Software Defined Compute: IBM Platform Computing Delivering a highly utilized shared services environment optimized for time to results

Application Examples

• Simulation

• Analysis

• Design

• Big data

IT constrained

• Long wait times

• Low utilization

• IT Sprawl

IBM Platform Computing

Big Data /

Hadoop

Simulation

& Modeling Analytics

Traditional Software Defined

Benefits

• High utilization

• Throughput

• Performance

• Prioritization

• Reduced cost Repeated for many apps and groups

• Clusters

• Grid

• Cloud

Faster results

Fewer resources

Long Running

Services Make lots of computers look like “one”

Prioritized matching of supply with demand

Application

Page 37: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

Overall Artificial Intelligence (AI) Space

39

Machine Learning

Deep Learning IT Systems break tasks into

Artificial Neural Networks

New Data

Sources:

NoSQL,

Hadoop &

Analytics

New class of applications

Machine Learing & Training

Pattern matching

Image

Real-time decision support

Complex workflows

Data Lakes

Extend Enterprise applications

Finance: Fraud detection /

prevention

Retail: shopping advisors

Healthcare: Diagnostics and

treatment

Supply chain and logistics

Extend Predictive Analytics to

Advance Analytics with AI

Human Intelligence Exhibited by Machines

Cognitive / ML/DL

“Human Trained” using large amounts of data & ability to learn how to perform the

task

Growing across Compute, Middleware, and Storage

Page 38: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

PowerAI Platform

40

Caffe NVCaffe Torch IBMCaffe

DL4J TensorFlow

OpenBLAS

Theano

Deep Learning

Frameworks

Accelerated

Servers and

Infrastructure

for Scaling

Spectrum Scale:

High-Speed

Parallel File System

Scale to

Cloud

Cluster of NVLink

Servers

Coming Soon

Bazel DIGITS NCCL Distributed

Frameworks

Supporting

Libraries

Page 39: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

Where to start?

• 20 x POWER8 cores with NvLink • Hasta 1TB DDR4 Mem with NvLink • Hasta Tesla P ’s . cores

+

Parallel Computing

Ej. Universidad Carlos III

Barcelona Supercomputing Center

GPU development

And optimisation

Ej. Molecular dynamics.

Centro de Biología Molecular

Machine Learning

Deep Learning

20 Core POWER8 + 256GB + 1

GPU Nvidia Volta

Starting at 27.500 € + IVA

IBM Power System S822LC

The Deep Learning Server

Page 40: IBM HPC/HPDA/AI Solutions - HPC Knowledge Portal · 2020. 5. 22. · (Oracle,ERP, HPC Cluster) Spectrum Scale Clients /file_systemA Spectrum Scale Protocol Nodes NFS, SMB, OpenStack

Questions?

42