VC3: Virtual Clusters for Community Computation VC3: Virtual Clusters for Community Computation Douglas

  • View
    1

  • Download
    0

Embed Size (px)

Text of VC3: Virtual Clusters for Community Computation VC3: Virtual Clusters for Community Computation...

  • VC3: Virtual Clusters for Community Computation

    Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago

    John Hover, Brookhaven National Lab

  • You have developed a large scale workload which runs successfully at a University cluster.

    Now, you want to migrate and expand that application to national-scale infrastructure. (And allow others to easily access and run similar workloads.)

    Traditional HPC Facility Distributed HTC Facility Commercial Cloud

  • IceCube Simulation DAG

    Signal Generator

    Background Generator

    Photon Propagator

    Photon Propagator

    Photon Propagator

    Photon Propagator

    CPU

    GPU

    Detector Detector Detector Detector Detector

    Filter Filter Filter Filter Filter

    Cleanup

    CPU

    CPU

    CPU

  • CMS Data Analysis w/Lobster

    Anna Woodard, Matthias Wolf, et al., Scaling Data Intensive Physics Applications to 10k Cores on Non-Dedicated Clusters with Lobster, IEEE Conference on Cluster Computing, September, 2015.

    Lobster Master Application

    Work Queue Master Library

    Submit Wait

    Foreman

    Foreman

    Foreman

    $$$

    $$$

    $$$

    16-core Worker 16-core Worker

    16-core Worker 16-core Worker

    $$$

    16-core Worker 16-core Worker

    16-core Worker 16-core Worker

    $$$

    16-core Worker 16-core Worker

    16-core Worker 16-core Worker

    $$$

    Local Files and Programs

    A B C

    http://ccl.cse.nd.edu/research/papers/lobster-cluster-2015.pdf

  • The Perils of Workload Migration • Dynamic resource configuration and scaling.

    – # nodes, cores/node, RAM/core, disk, GPUs • OS expectations:

    – Ubuntu, Cray, Red Hat, Debian, etc…. • Software dependencies.

    – Script languages, installed libraries, supporting tools… • Online service dependencies.

    – Batch systems, databases, web proxies, … • Network accessibility:

    – Addressibility, incoming/outgoing, port ranges, protocols… • Storage configuration:

    – Local, global, temporary, permanent, home/project/tmp…

  • Can we make HPC more like cloud?

    • User cluster specification: – 50-200 nodes of 24 cores and 64GB RAM/node – 150GB local disk per node – 100TB shared storage space – 10Gb outgoing public internet access for data – CMS software 8.1.3 and python 2.7 – Running Condor or Spark or Makeflow . . .

    • Of course, we cannot unilaterally change other computing sites!

  • So, that means containers and VMs? Not necessarily.

    VMs and containers are great, and we

    will use them where needed, but:

    1) Not all sites deploy them. 2) We want to use native hardware (and software) whenever possible.

  • Traditional HPC Facility Distributed HTC Facility Commercial Cloud

    Concept: Virtual Cluster • 200 nodes of 24 cores and 64GB RAM/node • 150GB local disk per node • 100TB shared storage space • 10Gb outgoing public internet access for data • CMS software 8.1.3 and python 2.7

    Virtual Cluster Service

    Virtual Cluster Factory

    Deploy Services Deploy Services Deploy Services

    Virtual Cluster Factory

    Virtual Cluster

    Virtual Cluster Factory

    Virtual Cluster Factory

    Virtual Cluster Factory

  • Project Status and Structure

    • Just getting started, funding began June 2016. • First milestone for PI meeting today:

    – VC across three sites at UC/ND runs IceCube.

    CSE: Douglas Thain Ben Tovar CMS: Kevin Lannon Michael Hildreth Kenyi Hurtado CRC: Paul Brenner

    Robert Gardner Lincoln Bryant Benedikt Riedel

    John Hover Jose Caballero

  • VC3 Architecture

    User Portal VC3 Service Instance

    Resource Provider

    Resource Provider

    Resource Provider

    Batch System Batch System Batch System

    VC3 Pilot Factory

    Cluster Spec

    Pilot Pilot

    Pilot

    Pilot Pilot Pilot

    Middleware Scheduler

    MW Nod

    e

    MW Nod

    e MW Nod

    e

    MW Nod

    e

    MW Nod

    e

    MW Nod

    e

    End user accesses the VC head node.

    Software Catalog

    Site Catalog

    Create a virtual cluster!

  • VC3 Service Instance

    VC3 Service Instance

    Teardown is Critical!

    User Portal

    Resource Provider

    Resource Provider

    Resource Provider

    Batch System Batch System Batch System

    VC3 Pilot Factory

    Cluster Spec

    Pilot Pilot

    Pilot

    Pilot Pilot Pilot

    Middleware Scheduler

    MW Nod

    e

    MW Nod

    e MW Nod

    e

    MW Nod

    e

    MW Nod

    e

    MW Nod

    e

    Software Catalog

    Site Catalog

    Destroy my virtual cluster!

  • VC3 Service Instance

    VC3 Service Instance

    Teardown is Critical!

    User Portal

    Resource Provider

    Resource Provider

    Resource Provider

    Batch System Batch System Batch System

    VC3 Pilot Factory

    Cluster Spec Middleware Scheduler

    Software Catalog

    Site Catalog

    Destroy my virtual cluster!

  • Inherent Challenges • Portal -> Service Instance

    – Reliability, specification, collaboration, discoverability, lifecycle management.

    • Cluster Factory – Configuration, impedance matching, response to outages, right-

    sizing to workload, authentication, cost management. • Environment Construction

    – Specification complexity and portability, detection of existing environments, environment sharing, resource consumption.

    • Performance Management – Want mall easy, big possible. Matching HW capability to

    middleware deployment. Environment compatible with manycore, GPU, FPGA.

    • Site Management – Work with the site owners, not against them. Collect relevant

    configuration data. Make VC deployment transparent to sites.

  • Changing Technology Landscape • Resource Management Systems

    – Condor, PBS, SLURM, Cobalt, UGE, Mesos, ??? • User Interests in Middleware

    – Workflows, GlideInWMS, PanDA, Hadoop, Spark, ???? • Software Deployment Technologies

    – VMs -> LXC -> Docker -> Singularity -> ??? – CVMFS, Tarballs, NixOS, Spack, ???

    • Access to Resources – Old way: SSH+Keys New Way: Two Factor Auth

    • Our approach: – Pick a place to stand, but keep specific technologies at

    arm’s length and be prepared to change.

  • Prototype Implementation • Portal -> Service Instance

    – (under construction) • Pilot Job Factory

    – AutoPyFactory (APF) from BNL – SSH/BOSCO to connect to resource providers

    • Pilot Job and Environment Deployment – Local software install via tarballs + PATH. (Groundwerk) – Access CVMFS via FUSE or Parrot, whichever available.

    • User Visible Middleware – Condor batch system (user level “glide-in”)

    • Application – IceCube data analysis

  • Key Idea:

    Specify requirements in abstract. Deliver requirements by

    matching or creating, or both. *

    * (only works if you can characterize requirements very accurately)

  • Tour of First Milestone Prototype:

    Application (Ice Cube Simulation) Environment Creation (VC3-Pilot) Cluster Factory (AutoPyFactory)

  • IceCube Software and Jobs • Experiment specific software stack

    – Dependencies not normal for particle physics experiment: Boost, hdf5, suite-parse, cfitsio, etc.

    – Distributed mostly through CVMFS global filesystem now, tarballs still used in edge cases, containers are an issue.

    – Moving to shipping C++11 compliant environment (own compiler, etc.)

    • Heavily invested in GPU accelerators • Average job: 2-4 GB RAM, 10 GB Disk, 2 hour wall time • Tail-end job: 6+ GB RAM, 100 GB Disk, 10s to 100s hours • Need to record all details about a job, forever: job

    configuration, where did it run, resource usage, efficiency, etc.

  • CVMFS Global Filesystem

    www server

    HEP Task

    Parrot / FUSE

    squid proxy squid

    proxy squid proxy

    CVMFS Driver meta data

    data

    data

    data

    meta data

    data

    data

    CAS Cache

    CMS Software

    967 GB 31M files

    Content Addressable

    Storage

    Bu ild

    C AS

    HTTP GET HTTP GET

    http://cernvm.cern.ch/portal/filesystem

  • CVMFS + HPC Challenges • Need disk local to node (ideal) or site (ok) for

    local cache management. (Project: RAM $$$) • Need FUSE (ideal) to mount FS, otherwise use

    Parrot (ok) for user level interception. • Must have a local HTTP proxy, otherwise

    CVMFS becomes a denial of service attack.