29
Scalability / Data / Tasks Meeting Scalability Requirements with Large Data and Complex Tasks: Adapting Existing Technologies and Best Practices in Slovenia Jan Jona Javor ek š Jo ef Stefan Institute ž [email protected] SLING – Slovenian Initiative for National Grid Jožef Stefan Institute http://www.ijs.si/ http://www.sling.si/

Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

Scalability / Data / TasksMeeting Scalability Requirements with Large Data and Complex Tasks: Adapting Existing Technologies and Best Practices in SloveniaJan Jona Javor ekšJo ef Stefan Institute ž [email protected] – Slovenian Initiative for National Grid

Jožef Stefan Institute

http://www.ijs.si/ http://www.sling.si/

Page 2: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch
Page 3: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

3/29

Historical

CDC Cyber 74

CONVEX C3860

CONVEX C3860

Zuse Z 23

Page 4: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

4/29

SLINGPRIKLJUČENI

CENTRIArctur* – 1024°Arnes – 4400°Atos* – 3000

CIPKeBiP – 990SiGNET – 4200

UNG – 120R4* – 1800°NSC – 1800°

PRIKLJUČENI CENTRI

Arctur* – 1024°Arnes – 4400°Atos* – 3000

CIPKeBiP – 990SiGNET – 4200

UNG – 120R4* – 1800°NSC – 1800°

8 sites

> 18.000 jeder

(> 11.000 ARC-active)

> 1PB disk

> 4 milion jobs / y

HPC, GPGPU, chroot

> 80% SLO capacity

CandidatesMeteo – 2200°

CI – 2000°ME – 1050°

CandidatesMeteo – 2200°

CI – 2000°ME – 1050°

Page 5: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

5/29

SLING users● Arnes NREN users● Cluster owners*● Projects*● Individual researchers● University professors● Student groups

*not always ARC

Page 6: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

6/29

Use Cases● Particle Physics:

– ATLAS

– Pierre Auger● Theoretical Physics

● Meteo/Geo Modelling

● Fluid Dynamics

● Reactor Physics Simulations

Pierre Auger Observatory

Page 7: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

7/29

Use Cases● Life Sciences,

mostly computational (bio-)chemistryand genomics

– IJS users(biology, chemistry,knowledge technologies)

– Collaboration with EMBL

– Diagnostic genomics

– ELIXIR

Page 8: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

8/29

Use Cases● Knowledge technologies

– Modelling for different fields

– Genetic alghoriths

– Big/Web data analyisis

– Advanced computationallinguistic models

– CLARIN.si

Page 9: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

9/29

Steam explosion moment

Page 10: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

10/29

Power distribution for Krsko NPP reactor

Parallel Monte Carlo simulation of neutron transport, F-8 department

Page 11: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

11/29

Innovation?● batch system● virtualisation● network?

Page 12: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

12/29

ARC and LRMS (batch system)

Page 13: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

13/29

ARC Computing Element

Page 14: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

14/29

ARC user accounts

Page 15: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

15/29

Mix'n'match...CERN Agile modelCVMFS

gLite NorduGrid ARC

SLURMTorque

OpenStack

KeyStone

VOMSdCache

Puppet

OpenMPGlobus

science portalsoVirtOpenNebula

PKI

VRC

Cinder

gFTPGlance

SaltCeph OpenCL

CUDA

Page 16: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

16/29

Software Deploymentand Virtualization● Admin install● Compile job● Install job● Shared disk● Shared image

● Environment Modules● Run Time Environments● CHROOTs● Containers● Docker● Shifter

Page 17: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

17/29

Storage

●Basic suport●Short-term / local storage●Medium-term storage●Long term storage

Page 18: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

18/29

User-Facing Issues● Batch / ARC interface / PKI / VOMS ● Software installations and use● Submission delays, error reporting and debugging● MPI scalability difficulties● Understanding of job and cluster topology● GPGPU use

Page 19: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

19/29

Groups and Projects● Job and task management scalability● Data management → task managers● Storage and troughputh→ hardware and cluster setup● Oppurtunistic resource use● Resource optimization→ innovative job models

Page 20: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

20/29

ATLAS as an example● ~100 distributed sites● 250k cores used all the time● 200PB of storage space ● 1M jobs/day● 2PB of data is transferred per day between computing sites● Sites include: WLCG GRID sites, HPCs, Clouds, Volunteer computing

Page 21: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

21/29

aCT: ARC Control TowerComponents:● Submitter● Status checker● Fetcher● (app verification)● Cleaner

aCT

ARC&table

ARC&engineARC&configApp&config

App&engine

Site&1ARC&CECluster

Site&2ARC&CECluster

Site&3ARC&CECluster

App&table

External&job&provider

DB&(Oracle/MySQL)

Page 22: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

22/29

Opportunistic Resouce Use● Grid clusters● HPC clusters● Private computers● Public (commercial) cloud● Microjobs

Page 23: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

23/29

ATLAS scaling2010Planned data distributionJobs go to dataMulti-hop data flowsPoor T2 networking across regions

~20 AOD copies distributed worldwide

Page 24: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

24/29

ATLAS scaling2010Planned data distributionJobs go to dataMulti-hop data flowsPoor T2 networking across regions

2013Planned & dynamic distribution data Jobs go to data & data to free sitesDirect data flows for most of T2sMany T2s connected to 10Gb/s link

~20 AOD copies distributed worldwide

4 AOD copies distributed worldwide

Page 25: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

25/29

Social Component● Accessibility beyond large projects● Long-term funding ● Perception of public clouds● Not invented here syndrome● Users with no Unix experience● Sustainability pressure

Page 26: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

26/29

People Involved

Andrej Filip i , č č JSIBarbara Kra ovecš , Arnes, JSIDejan Lesjak, JSIJanez Srakar, JSIJan Jona Javor ek, š JSI+ 4 site administrators

National Initiative:http://www.sling.si/

Page 27: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

27/29

Thanks!

Questions?

Page 28: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

28/29

New Computing Centre● 200 m² slightly dislocated● New network installation● Water cooling● Not enough power on-site yet● Housing Pikolit, NSC, parts of others● Interesting issues on cost sharing ...

Page 29: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch

29/29

New Cluster● Grid + HPC● GPGPU: 16 x K80● NorduGrid ARC + SLURM● Considering EGI● Users:

– IJS departments– related research– supported EU– infrastructures

NSC Cluster in Numbers

● ~1800 cores

● ~35 TB scratch

● ~35 TB storage

● ~8 TB RAM