Making Scientific Software Installation Reproducible On ... ?· Making Scientific Software Installation…

Embed Size (px)

Text of Making Scientific Software Installation Reproducible On ... ?· Making Scientific Software...

  • Making Scientific Software Installation ReproducibleOn Cray Systems Using EasyBuild

    Petar ForaiResearch Institute ofMolecular Pathology

    Dr Bohrgasse 7A-1030 Vienna, Austria

    Kenneth HosteGhent University

    Krijgslaan 281, S9B-9000 Ghent, Belgium

    Guilherme Peretti-PezziSwiss National

    Supercomputing CentreVia Trevano, 131

    6900 Lugano,

    Brett BodeNational Center for

    Supercomputing ApplicationsUniversity of Illinois1205 W. Clark St.Urbana, IL 61801

    ABSTRACTCray provides a tuned and supported OS and programmingenvironment (PE), including compilers and libraries inte-grated with the modules system. While the Cray PE isupdated frequently, tools and libraries not in it quickly be-come outdated. In addition, the amount of tools, librariesand scientific applications that HPC user support teams areexpected to provide support for is increasing significantly.The uniformity of the software environment across Cray sitesmakes it an attractive target for to share this ubiquitousburden, and to collaborate on a common solution.

    EasyBuild is an open-source, community-driven frameworkto automatically and reproducibly install (scientific) soft-ware on HPC systems. This paper presents how EasyBuildhas been integrated with the Cray PE, in order to leveragethe provided optimized components and support an easyway to deploy tools, libraries and scientific applications onCray systems.

    We will discuss the changes that needed to be made to Easy-Build to achieve this, and outline the use case of providingthe Python distribution and accompanying Python packageson top of the Cray PE, to obtain a fully featured batter-ies included optimized Python installation that integrateswith the Cray-provided software stack. In addition, we willoutline how EasyBuild was deployed at the Swiss NationalSupercomputing Centre CSCS and how it is leveraged toobtain a consistent software stack and installation workflowacross multiple Cray (XC, CS-Storm) and non-Cray HPCsystems.

    1. INTRODUCTIONThe job of High-Performance Computing (HPC) supportteams is to enable the productive and efficient use of HPCresources. This ranges from helping out users by resolvingissues like login problems or unexpected software crashes,providing detailed answers to both simple and deeply tech-nical questions, installing requested software libraries, toolsand applications, to providing services such as performanceanalysis and optimization of scientific software being devel-oped.

    One particularly time-consuming task for HPC support teamsis installing (scientific) software. Due to the advanced na-ture of supercomputers (i.e., multiple multi-core processors,high performance network interconnect, need for most re-cent compilers and libraries, etc.), compiling software fromsource on the actual operating system and for the systemarchitecture that it is going to be used on is typically highlypreferred, if not required, as opposed to using readily avail-able binary packages that were built in a generic way.

    Support teams at HPC sites worldwide typically invest largeamounts of time and manpower tackling this tedious andtime consuming task of installing the (scientific) softwaretools, libraries, and applications that researchers are re-questing, while trying to maintain a coherent software stack.

    EasyBuild is a recent software build and installation frame-work that intends to relieve HPC support teams from theubiquitous burden of installing scientific software on HPCsystems. It supports fully automating the (often complex)installation procedure of scientific software, and includes fea-tures specifically targeted towards HPC systems while pro-viding a flexible yet powerful interface. As such, it hasquickly grown to become a platform for collaboration be-tween HPC sites worldwide. We discuss EasyBuild in moredetail in Section 2.

    The time-consuming problem of getting scientific softwareinstalled also presents itself on Cray systems, despite theextensive programming environment that Cray usually pro-vides. Both HPC support teams and end users of Craysystems struggle on a daily basis with getting the requiredtools and applications up and running (let alone doing so

  • in a somewhat optimal way). While a policy for providingsoftware installations in a consistent way to end users is acommon goal, the reality more often than not is that short-cuts are being taken to speed up the process, resulting in asoftware stack that is organised in a suboptimal (and some-times outright confusing) manner, with little consistency inthe software stacks provided on different systems beyondwhat is provided by Cray. Hence, it is clear that EasyBuildcould also be useful on Cray systems.

    Originally, the main target of EasyBuild was the standardGNU/Linux 64-bit x86 system architecture that is common-place on todays HPC systems. On these systems there typ-ically is an abundant lack of a decent (recent) basic softwarestack, i.e., compilers and libraries for MPI, linear algebra,etc., to build scientific applications on top of. EasyBuild wasdesigned to deal with this, by supporting the installationof compilers and accompanying libraries, and through thetoolchain mechanism it employs (see Section 2.2.4). As such,it was not well suited to Cray systems, where a well equippedprogramming environment is provided that is highly recom-mended to be used.

    In this paper, we present the changes that had to made toEasyBuild to provide stable integration with the availableprogramming environment on Cray systems (see Section 3).In addition, we outline the use case of installing Python andseveral common Python libraries using EasyBuild on Craysystems (Section 4), and discuss how EasyBuild deployedat the Swiss National Supercomputing Centre CSCS (Sec-tion 5).

    2. EASYBUILDEasyBuild [1719] is a tool for building and installing scien-tific software on HPC systems, implemented in Python. Itwas originally created by the HPC support team at GhentUniversity (Belgium) in 2009, and was publicly released in2012 under an open-source software license (GPLv2). Soonafter, it was adopted by various HPC sites and an activecommunity formed around it. Today, it is used by institu-tions worldwide that provide HPC services to the scientificresearch community (see Section 2.4).

    2.1 GoalsThe main goal of EasyBuild is to fully automate the taskof building and installing (scientific) software on HPC sys-tems, including taking care of the required dependencies andgenerating environment module files that match the instal-lations. It aims to be an expert system with which all as-pects of software installation procedures can be performedautonomously, adhering to the provided specifications.

    In addition, EasyBuild serves as a platform for collaborationwhere different HPC support teams, who likely apply differ-ent site policies concerning scientific software installations,can efficiently work together to implement best practices andbenefit from each others expertise.

    It aims to support achieving reproducibility in science, byenabling to easilyreproduce software installations that wereperformed previously. It allows sharing of installation recipesin order to enable others to perform particular software in-stallations. Several features are available in EasyBuild tofacilitate that; see also Section 2.3.5.

    EasyBuild intends to be flexible where needed so that HPCsupport teams can implement their own site policies, rangingfrom straightforward aspects like the installation prefix forsoftware and module files, to the module naming scheme

    and the particular compiler and libraries being employed;see also Section 2.3.6.

    2.2 TerminologyBefore going into more detail, we introduce some terminol-ogy specific to EasyBuild that will be used throughout thispaper.

    2.2.1 EasyBuild frameworkThe EasyBuild framework is the core of the tool that pro-vides the functionality that is commonly needed for buildingand installing scientific software on HPC systems. It consistsof a collection of Python modules that implement:

    the eb command line interface

    an abstract software installation procedure, split upinto different steps including configuring, building, test-ing, installing, etc. (see Section 2.3.1)

    functions to perform common tasks like downloadingand unpacking source tarballs, applying patch files, au-tonomously running (interactive) shell commands andcapturing output & exit codes, generating module files,etc.

    an interface to interact with the modules tool, to checkwhich modules are available, to load modules, etc.

    a mechanism to define the environment in which theinstallation will be performed, based on the compiler,libraries and dependencies being used (see also Sec-tion 2.2.4)

    2.2.2 EasyblocksThe implementation of a particular software installation pro-cedure is done in an easyblock, a Python module that defines,extends and/or replaces one or more of the steps of the ab-stract procedure defined by the EasyBuild framework. Easy-blocks leverage the supporting functionality provided by theEasyBuild framework, and can be viewed as plugins.

    A distinction is made between software-specific and genericeasyblocks. Software-specific easyblocks implement a proce-dure that is entirely custom to one particular software pack-age (e.g., OpenFOAM), while generic easyblocks implementa procedure using standard tools (e.g., CMake, make).

    Each easyblock must define the configuration, build and in-stall steps in one way or another; the EasyBuild frameworkleaves these steps purposely unimplemented since their im-plementation heavily depends on the tools being used in theinstallation procedure. Since easyblocks are implemented inan object-oriented scheme, the step methods im