27
Photos placed in horizontal position with even amount of white space between photos and header Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. Initial Experiences with Deploying Singularity on a Cray XC Supercomputer Andrew J. Younge, Kevin Pedretti {ajyoung,ktpedre}@sandia.gov

Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

Photos placed in horizontal position with even amount of white space

between photos and header

Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.

InitialExperienceswithDeployingSingularityonaCrayXCSupercomputer

AndrewJ.Younge,KevinPedretti{ajyoung,ktpedre}@sandia.gov

Page 2: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

Outline

§ OverviewofContainers§ ContainersinHPC§ WhySingularity?§ HPCcontainers@Sandia

§ Trilinos &ATDMapps§ HPCG

§ Dev-opsMechanisms§ InitialBenchmarking§ Conclusion

Page 3: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

WhatareContainers?§ “Anobjectthatcanbeusedtoholdortransportsomething.”§ Awaytopackagenecessarycomponentsofrunning

applications.§ Libraries,software,files,environmentsettings,etc.

§ OS-levelvirtualization§ Relieson1OSkernel– aka“chroot onsteroids”§ cgroups forresourceisolation,namespacesforprocessisolation,

chroot forfilesystemisolation.

§ DifferentthanHostVirtualization§ SingleOSKernelthatdoesallthehardwork

Page 4: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

ContainersinIndustry§ Containersareusedtocreatelarge-scalelooselycoupled

services§ Eachcontainerruns1userprocess– “micro-services”

§ 3httpd containers,2DBs,1logger,etc

§ Scalingachievedthroughloadbalancersandprovisioning§ Jammanycontainersonhostsforincreasedsystemutilization§ Helpswithdev-opsissues

§ Samesoftwareenvironmentfordevelopinganddeploying§ Onlyimageschangesarepushedtoproduction,notwholenewimage(CoW).§ Developonlaptop,pushtoproductionservers§ Interactwithgithub similartodevelopercodebases§ Uploadimagesto”hub”or“repository”wherebytheycanjustbepulledand

provisioned

Page 5: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

ContainerfeatureswantedinHPC

§ Developersprescriberunningsoftwareenvironment§ ”Bring-your-own-environment”§ Notboundbyvendorsoftwaredelivery§ Notboundbysysadminsupportforadditionallibraries§ Developersknowbesthowtorun,letusersjustspecifyit

§ Easydefinitionofapplicationcompilation&runtimesetup§ Integrationwithgithub orotherdevenvironments§ Couldenablebetterportabilitybetweenarchitectures

Page 6: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

ContainerfeaturesnotwantedinHPC

§ Overhead– cannotslowdownadvancedarchitecturesupercomputersbeyondreason§ Posit:<5%maybeok,anymoreisbigproblem

§ Micro-servicessupportandon-noderesourcepartitioning§ Don’tneedcgroups tosliceupindividualcomputenodes§ Notrunningservices,butrealapplications

§ Runningasroot!§ Networkingaspectscanbeleftout

Page 7: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

ContainerVision@Sandia

§ SupportsoftwaredevandtestingonlaptopswhichcreateworkingbuildsthatcanrunonHPCmachines§ MayalsoleverageVM/binarytranslation

§ LetdevelopersspecifyhowtobuildtheenvironmentANDtheapplication§ Usersjustimportcontainerandrunontargetplatform.§ Manycontainers,butcanhavedifferentcode“branches”forarch,

compilers,etc.§ Notboundtovendorandsysadminversions&releasecycles

§ Wantalltheperformance§ Wanttomanagepermutationsofarchitecturesand

compilers§ X86&KNL,ARM,POWER9,etc.§ Intel,GCC,LLVM

Page 8: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

WhySingularity?§ SingularityisasimplecontainersolutioncreatedbyLBNL§ Basedonsingletoncontainerimages

§ NotlayeredAUFSimagesordev-mapperinsanity§ Imagesharing&managementmadeeasy

§ Providesusernamespaces§ Userajyounge onHPCsystemmapstoajyounge incontainer§ RunningasrootonHPCresourcesnotallowed!

§ Sitefilesystemscanalsobemounted§ BringinMPIlibsortunedlibraries,etc

§ Integrationwithexistingschedulingsystems§ Makebinariesavailableoncomputenodes

§ NoVendorlock-in.WantportableHPCcontainersolution§ SupportedinOpenHPC via1.3.1releaselastweek

8

Page 9: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

SingularityonCrayXC-series

§ Craysarespecialmachines§ CrayCNLisread-onlyimagewithtmpfs mounts§ LustreorNFSoverCrayDVSfilesystem§ SpecializedLinuxkernelw/outstandardfeaturesets

§ HadtomodifyCNLtobuildinnecessarykernelfeatures§ XC30runs3.0.101kernel(old)§ RebuildCrayimagewithbuild-infeatures

§ LoopbackdevicesupportandEXT3

§ ProvisionnewCNLtointeractivenodesandcomputenodes§ SimilartoKVMonCrayeffort(RelatedWork)§ “EnablingDiverseSoftwareStacksonSupercomputersusingHighPerformanceVirtual

Clusters“

9

Page 10: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

ContainerBuild#1:Trilinos Muelu§ Trilinos providesmathlibrary

packagesformanyapplicationsofinterest@Sandia

§ Trilinos itselfdependsonnumerous3rd partylibraries

§ CancondensecomplexcompilationstepsdowntojustasimpleDockerfile§ Predictable&stable

environmentacrossdeployments§ Enablestestingacrossmultiple

architecturesandvalidationofTPLchanges

FROM ajyounge/dev-tpl

WORKDIR /opt/trilinos# Copy files to image COPY do-configure /opt/trilinos/ # Download Trilinos source tarballRUN wget -nvhttps://trilinos.org/oldsite/download/files/trilinos-12.8.1-Source.tar.gz -O /opt/trilinos/trilinos.tar.gz# Extract Trilinos source file & load mpi libraryRUN tar xf /opt/trilinos/trilinos.tar.gz -C /opt/trilinos/ RUN rm -f /opt/trilinos/trilinos.tar.gzRUN mv /opt/trilinos/trilinos-12.8.1-Source /opt/trilinos/trilinosRUN mkdir /opt/trilinos/trilinos-buildRUN module load mpi

# Compile TrilinosRUN /opt/trilinos/do-configure RUN cd /opt/trilinos/trilinos-build && make -j 3 #Link in a tutorial directory, and then set the workdirRUN ln -s /opt/trilinos/trilinos-build/packages/muelu/doc/Tutorial/src /opt/muelu-tutorial WORKDIR /opt/muelu-tutorialCMD ["/bin/bash"]

10

Page 11: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

ContainerBuild#2:HPCG§ Straight-forwardcontainer

buildforHPCG§ UseCentos7image,install

basicsoftware§ InstallIntelparallelstudio

2017§ Silentconfiguration§ Pullinsitelicenseorusetrial§ Cleanupinstallfiles(>8GB)

§ ExtractandbuildHPCG3.0withCXX = mpiicpc

FROM centos:7.2.1511ARG intel_file=parallel_studio_xe_2017_update2# Dependencies and MPICHRUN yum update -y && yum groupinstall -y "Development Tools”RUN yum install -y mpich-3.2 mpich-3.2-devel redhat-lsb

# Intel compiler installCOPY $intel_file.tgz /RUN tar xvfz /$intel_file.tgzRUN mkdir -p /opt/intel/licensesCOPY USE_SERVER.lic /opt/intel/licenses/#Silent configuration installationCOPY silent.cfg /$intel_file/silent.cfgRUN /$intel_file/install.sh --silent /$intel_file/silent.cfgRUN echo "source /opt/intel/bin/compilervars.sh intel64" >> /etc/bashrcRUN rm -rf /$intel_file && rm /$intel_file.tgz

#Build and HPCGCOPY hpcg-3.0.tar.gz /opt/RUN tar xvfz /opt/hpcg-3.0.tar.gz -C /opt/COPY Make.Linux_intel_mpich /opt/hpcg-3.0/setup/RUN mkdir -p /opt/hpcg-3.0/Linux_intel_mpich/WORKDIR /opt/hpcg-3.0/Linux_intel_mpichRUN ../configure Linux_intel_mpichRUN /bin/bash -c "source /opt/intel/bin/compilervars.shintel64 && make”CMD ["/bin/bash"]

11

Page 12: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

Dev-opsPathway

12

Gitlab ConainterRegistry

SingularityServer

Cray Login Server

Cray CNLLustre

/NFS

Page 13: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

Dev-opsPathway

13

Gitlab ConainterRegistry

SingularityServer

Cray Login Server

Cray CNLLustre

/NFS

lap$ docker login gitlab.sandia.govlap$ docker build .lap$ docker tag 0e5574283393 ajyounge/hpcg-containerlap$ docker push ajyounge/hpcg-container:latest

Page 14: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

Dev-opsPathway

14

Gitlab ConainterRegistry

SingularityServer

Cray Login Server

Cray CNLLustre

/NFS

lap$ docker login gitlab.sandia.govlap$ docker build .lap$ docker tag 0e5574283393 ajyounge/hpcg-containerlap$ docker push ajyounge/hpcg-container:latest

ss$ sudo singularity create –s 12G hpcg-container.imgss$ sudo singularity import hpcg-container.imgdocker://gitlab.sandia.gov/ajyounge/hpcg-container:latest

Page 15: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

Dev-opsPathway

15

Gitlab Container Registry

SingularityServer

Cray Login Server

Cray CNLLustre

/NFS

lap$ docker login gitlab.sandia.govlap$ docker build .lap$ docker tag 0e5574283393 ajyounge/hpcg-containerlap$ docker push ajyounge/hpcg-container:latest

ss$ sudo singularity create –s 12G hpcg-container.imgss$ sudo singularity import hpcg-container.imgdocker://gitlab.sandia.gov/ajyounge/hpcg-container:latest

cray$ scp ss:~/hpcg-container.img .cray$ aprun –n 24 –L 62,63 singularity exec hpcg-container.img ./xhpcg

Page 16: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

Dev-opsPathway(new)

16

Gitlab Container Registry

Cray Login Server

Cray CNLLustre

/NFS

lap$ docker login gitlab.sandia.govlap$ docker build .lap$ docker tag 0e5574283393 ajyounge/hpcg-containerlap$ docker push ajyounge/hpcg-container:latest

cray$ singularity pull –name hpcg.container.imgdocker://gitlab.sandia.gov/ajyoung/hpcg-container:latestcray$ aprun –n 24 –L 62,63 singularity exec hpcg-container.img ./xhpcg

Page 17: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

Singularity+CrayInterconnect§ UsingVoltatestbed– CrayXC30IvyBridge§ ContainerusingTCP/IP– nochangesnecessary

§ UseCray’sIP-over-AriesEthernetdevice(ipogif0)§ Betterthan10gbEthernetperformance(~32Gbs)

§ IntelMPInotoptimizedforAriesnetwork§ BringCray’sMPIimplementationincontainer

§ Mount/opt/cray§ Mount/var/opt/cray§ SetLD_LIBRARY_PATHaccordinglyincontainer

17

cray$ aprun -n 24 -L63 singularity exec hpcg-container.img /bin/bash -c "export LD_LIBRARY_PATH=/opt/cray/ugni/6.0-1.0502.10863.8.29.ari/lib64:/opt/cray/xpmem/0.1-2.0502.64982.5.3.ari/lib64:/opt/cray/pmi/5.0.11/lib64:/opt/cray/udreg/2.3.2-1.0502.10518.2.17.ari/lib64:/opt/cray/mpt/7.5.1/gni/mpich-intel-abi/16.0/lib:/opt/cray/alps/5.2.4-2.0502.9822.32.1.ari/lib64:/opt/cray/wlm_detect/1.0-1.0502.64649.2.1.ari/lib64:/opt/intel/lib/intel64:$LD_LIBRARY_PATH && /opt/hpcg-3.0/Linux_intel_mpich/bin/xhpcg"

Page 18: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

HPCGEfficiency

18

Page 19: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

HPCGPerformanceSummary

§ Singularitypresentsnear-nativeruntimeperformance§ KVMalsogood,buthasalittlemoreoverhead(likelyduetoIntelMPI)

§ ScalingresultsTBD,butexpectthesame§ KVMscales90%ofnative@786cores,Singularitywillbebetter§ UsingCrayMPI&AriesInterconnectisakeyfeaturetogettingnear-

nativeperformance§ StayingABIcompatibleforMPIismandatory

§ Imagedeploymentismostlikelysourceofoverhead§ ScalabilityofmountinglookbackimagesonLustre/NFS?§ Read-onlyhelps,butmaynotsolveallproblems

19

Page 20: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

Conclusion

§ SingularityworksonCrayXCseriessupercomputers§ ModificationstoCNLnecessary§ Performanceisnear-native

§ Additionalfeaturesneededforcleandeployment§ Site-specificENVvariables§ OverlayFS

§ Performancenear-nativewithHPCG§ Notsurprising§ UsingCrayMPIandABIcompatibility

§ SingularityisidealforHPCinteroperability

20

Page 21: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

FutureConsiderations

§ Containerstorageatscale§ Howtouseothertunedlibrariesandsite-specificsoftware

§ ABIcompatibility?

§ CantheHPCcommunityagreeoncontainerinteroperability?§ Imageformats,manifests,etc.

§ Multi-architecturesupport§ Vendorsupportforlaptopdevelopment?

21

Page 22: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

Photos placed in horizontal position with even amount

of white spacebetween photos

and header

Photos placed in horizontal position with even amount of white space

between photos and header

Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.

Thanks!

[email protected]

Page 23: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

Photos placed in horizontal position with even amount

of white spacebetween photos

and header

Photos placed in horizontal position with even amount of white space

between photos and header

Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.

BackupSlides

Page 24: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

Photos placed in horizontal position with even amount

of white spacebetween photos

and header

Photos placed in horizontal position with even amount of white space

between photos and header

Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.

Page 25: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

Photos placed in horizontal position with even amount

of white spacebetween photos

and header

Photos placed in horizontal position with even amount of white space

between photos and header

Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.

Page 26: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

Intra-nodecommunication–CrayMPIvsIntelMPI(KVM)

Page 27: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc

XC30HPCGKVMScaling

27