32
Mitglied der Helmholtz-Gemeinschaft Einsatz von UNICORE in Rechenzentren 2017-03-16 Bj¨ornHagemeier

Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

Mit

glie

dd

erH

elm

hol

tz-G

emei

nsc

haf

t

Einsatz von UNICORE inRechenzentren

2017-03-16 Bjorn Hagemeier

Page 2: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

Part: About Us

2017-03-16 Bjorn Hagemeier Folie 2

Page 3: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

Forschungszentrum Julich and JSC

2017-03-16 Bjorn Hagemeier Folie 3

Page 4: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

Forschungszentrum Julich and JSC

2017-03-16 Bjorn Hagemeier Folie 3

Page 5: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

JUQUEEN

IBM Blue Gene/Q

28 racks, 458,752 cores

PowerPC A2 a.6GHz

16 cores per node

5.8 Petaflop/s peak

460 TByte main memory

5D network

2017-03-16 Bjorn Hagemeier Folie 4

Page 6: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

JURECA

1872 compute nodes

Intel Haswell with 2x12 cores @2.5GHz75 compute nodes equipped with 2NVIDIA K80 GPUsDDR4 memory (2133MHz)

1605 nodes with 128GiB memory128 nodes with 256 GiB memory64 nodes with 512 GiB memory

12 visualization nodes

2 NVidia K40 per nodes10 nodes with 512 GiB memory2 nodes with 1024 GiB memory

Total of 45,216 cores

100 GiB/s storage connection

2017-03-16 Bjorn Hagemeier Folie 5

Page 7: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

JUSTJuelich Storage Cluster

IBM GPFS

20.3 PB online storage

220 GB/s

Fileserver for

HPC-Systems: JUQUEEN,JURECADEEP (Dynamical Exascale EntryPlatform)

2017-03-16 Bjorn Hagemeier Folie 6

Page 8: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

Tape Libraries

Actual capacity: ∼99 PB

Theoretical capacity: 141 PB (16600x8.5TB)

Tape drives: 48

Libraries: 2

2017-03-16 Bjorn Hagemeier Folie 7

Page 9: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

Part: UNICORE

2017-03-16 Bjorn Hagemeier Folie 8

Page 10: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

UNICORE As We See It Today

A federation software suite

Secure and seamless access to compute and data resources

Focus on scientific applications and workflows

Complies with typical HPC centre policies

Complete solutions: APIs, clients, services, ...

Java/Python based, supports UNIX, MacOS, Windows andmany resource management systems (Torque, Slurm, SGE, ...)

Long development history (since 1997)

Open source, BSD licensed, visit http://www.unicore.eu

2017-03-16 Bjorn Hagemeier Folie 9

Page 11: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

Concepts

UNICORE ≡ UNIform Access to COmputing REsources

Site: a resource such as an HPC system including storage

Job: submitted through JSDL including data staging,resource requirements, executable definition and parameters

Hadoop (Yarn) jobs possible, too, in conjunction with HDFS

Resources: features of sites in terms of capacity and capability

Storages: a view into file systems at a certain base directory(mount point). Can be storage external to the site, e. g.Swift, S3, HDFS, CDMI, XtreemFS

Applications: abstractions of applications hiding site-localspecificities, e. g. installation paths or module activations

Workflows: a series of job executions guided by controlstructures, i. e. visual programming

2017-03-16 Bjorn Hagemeier Folie 10

Page 12: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

Architecture

2017-03-16 Bjorn Hagemeier Folie 11

Page 13: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

UnicoreMain Services

Compute

TargetSystemFactory

TargetSystem

JobManagement

Reservations

Storage and Data

StorageFactory

StorageManagement

FileTransfer

Metadata

Workflow

Workflowenactment

Task Execution ResourceBroker

Registry

2017-03-16 Bjorn Hagemeier Folie 12

Page 14: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

Default Setup

Access to resource manager and file system viaTargetSystemInterface (TSI) daemon installed on the clusterlogin node(s)

2017-03-16 Bjorn Hagemeier Folie 13

Page 15: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

Job Execution

2017-03-16 Bjorn Hagemeier Folie 14

Page 16: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

Storage Access

The UNICORE Storage ManagementService (“SMS”) provides a filesystem-likeview of data

Typical functions

mkdir, delete, ls, chmod etc

Start tile transfers

Import/export of data from/to the user’slocal machineSend/receive of data from other serversVarious supported file transfer protocols

2017-03-16 Bjorn Hagemeier Folie 15

Page 17: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

File Transfers

2017-03-16 Bjorn Hagemeier Folie 16

Page 18: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

Metadata Management ServiceMMS

Automatic extraction

Manual editing of metadata

Searching

2017-03-16 Bjorn Hagemeier Folie 17

Page 19: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

Applications

General

Identified by name and version

Site specifics

Pre and post commands for environment setup and tear down

Acquire and return licenses

MPI

Support for application metadata

2017-03-16 Bjorn Hagemeier Folie 18

Page 20: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

Generic ApplicationsAutomated generation of GUIs

UNICORE Rich Client and Portal support application metadata

Example

<jsdl:Argument Description="Check input file"

Type="boolean"

Default="..."

ValidValues="true false"

DependsOn="..."

Excludes="..."

IsEnabled="false"

IsMandatory="false">+v$CHECK?</jsdl:Argument>

Possible types: string, boolean, int, double, filename, choice.

Used to be defined by site administrators.

2017-03-16 Bjorn Hagemeier Folie 19

Page 21: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

User-Defined Applications

Allow mixing system and userdefined applications

Encourage users to play with anddevelop their own applicationdefinitions

Repository of common applicationdefinitions

Realized by merging system anduser specific IDB contributions

Users cannot change a site’sresources and thus not go beyondadministrator limits

2017-03-16 Bjorn Hagemeier Folie 20

Page 22: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

Workflow Features

Simple graphs (DAGs)

Workflow variables

Loops and control constructs

while, for-each, if-else

Conditions

Exit code, file existence, filesize, workflow variables

Clients UNICORE Rich clientCommandline client

2017-03-16 Bjorn Hagemeier Folie 21

Page 23: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

Authentication and AuthorizationAAI, for short

In addition to its own,home-grown usermanagement solution, aka.XUUDB, UNICORE supportsSAML-based authentication.

PULL and PUSH-mode arepossible

Typically only need a fewattributes

role (user, server, admin),xlogins, groups

2017-03-16 Bjorn Hagemeier Folie 22

Page 24: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

UNITY IdMIdentity Relationship Management

Complete solution for identity, federation andinter-federation managementCan serve as SP and IdP at the same time.

Use SAML 2, OAuth 2, OIDC, LDAP as upstream IdPsServe as IdP for SAML 2 (Web SSO, SOAP, PAOS bindings),SAML 2 Web & SOAP UNICORE Profile, OIDC, OAuth 2

2017-03-16 Bjorn Hagemeier Folie 23

Page 25: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

UNITY IdMInfrastructures

UNICORE Portal @JSC:https://unicore-portal.fz-juelich.de:8443/

DFN AAI for authenticationStill need proper account at JSC

2017-03-16 Bjorn Hagemeier Folie 24

Page 26: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

UNITY IdMInfrastructures

2017-03-16 Bjorn Hagemeier Folie 25

Page 27: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

Part: Installation

2017-03-16 Bjorn Hagemeier Folie 26

Page 28: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

InstallationGeneral

Latest releases of most important components linked on mainwebsite http://www.unicore.eu/

Detailed download section athttp://www.unicore.eu/download/ contains allcomponents

Packages are hosted on SourceForge

2017-03-16 Bjorn Hagemeier Folie 27

Page 29: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

InstallationBasic

Core Server Bundle

https://sourceforge.net/projects/unicore/files/

Servers/Core/

Content

GatewayUNICORE/XRegistryTSIXUUDB

Requirements

OpenJDK 8 or Oracle Java 8Python 2.7 or 3.x for the TSI

2017-03-16 Bjorn Hagemeier Folie 28

Page 30: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

InstallationWorkflow

https://sourceforge.net/projects/unicore/files/

Servers/Workflow/

Content

Workflow EngineResource broker aka. “Service Orchestrator”

2017-03-16 Bjorn Hagemeier Folie 29

Page 31: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

InstallationFederation

Common Registry

All services need to publish their availability to a commonregistry

Can publish to multiple registries

Clients support multiple registries

Authentication

Individual registrations (certificate)

Identity federation, e. g. via UNITY

2017-03-16 Bjorn Hagemeier Folie 30

Page 32: Einsatz von UNICORE in Rechenzentren · UNICORE UNIform Access to COmputing REsources Site: a resource such as an HPC system including storage Job: submitted through JSDL including

Acknowledgements

Most slides shamelessly copied from my colleague BerndSchuller.

Other team members

Valentina Huber, Andre Giesler, Maria Petrova-El Sayed, JedrzejRybicki, Rajveer Saini and many others at JSCKrzysztof Benedyczak, Marcelina Borcz, Rafa l Kluszczynski,Piotr Ba la and others at ICM / Warsaw UniversityRichard Grunzke and others at Technical University DresdenStudents: Burak Bengi, Maciej Golik, Konstantine Muradov... many others who reported bugs, suggested features,contributed code and provided patches

2017-03-16 Bjorn Hagemeier Folie 31