39
VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago John Hover, Brookhaven National Lab

VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

VC3: Virtual Clusters for Community Computation

Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago

John Hover, Brookhaven National Lab

Page 2: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

You have developed a large scale workload which runs successfully at a University cluster.

Now, you want to migrate and expand that application to national-scale infrastructure. (And allow others to easily access and run similar workloads.)

Traditional HPC Facility Distributed HTC Facility Commercial Cloud

Page 3: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

IceCube Simulation DAG

Signal Generator

Background Generator

Photon Propagator

Photon Propagator

Photon Propagator

Photon Propagator

CPU

GPU

Detector Detector Detector Detector Detector

Filter Filter Filter Filter Filter

Cleanup

CPU

CPU

CPU

Page 4: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

CMS Data Analysis w/Lobster

Anna Woodard, Matthias Wolf, et al., Scaling Data Intensive Physics Applications to 10k Cores on Non-Dedicated Clusters with Lobster, IEEE Conference on Cluster Computing, September, 2015.

Lobster Master Application

Work Queue Master Library

Submit Wait

Foreman

Foreman

Foreman

$$$

$$$

$$$

16-core Worker 16-core Worker

16-core Worker 16-core Worker

$$$

16-core Worker 16-core Worker

16-core Worker 16-core Worker

$$$

16-core Worker 16-core Worker

16-core Worker 16-core Worker

$$$

Local Files and Programs

A B C

Page 5: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

The Perils of Workload Migration • Dynamic resource configuration and scaling.

– # nodes, cores/node, RAM/core, disk, GPUs • OS expectations:

– Ubuntu, Cray, Red Hat, Debian, etc…. • Software dependencies.

– Script languages, installed libraries, supporting tools… • Online service dependencies.

– Batch systems, databases, web proxies, … • Network accessibility:

– Addressibility, incoming/outgoing, port ranges, protocols… • Storage configuration:

– Local, global, temporary, permanent, home/project/tmp…

Page 6: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

Can we make HPC more like cloud?

• User cluster specification: – 50-200 nodes of 24 cores and 64GB RAM/node – 150GB local disk per node – 100TB shared storage space – 10Gb outgoing public internet access for data – CMS software 8.1.3 and python 2.7 – Running Condor or Spark or Makeflow . . .

• Of course, we cannot unilaterally change other computing sites!

Page 7: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

So, that means containers and VMs? Not necessarily.

VMs and containers are great, and we

will use them where needed, but:

1) Not all sites deploy them. 2) We want to use native hardware (and software) whenever possible.

Page 8: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

Traditional HPC Facility Distributed HTC Facility Commercial Cloud

Concept: Virtual Cluster • 200 nodes of 24 cores and 64GB RAM/node • 150GB local disk per node • 100TB shared storage space • 10Gb outgoing public internet access for data • CMS software 8.1.3 and python 2.7

Virtual Cluster Service

Virtual Cluster Factory

Deploy Services Deploy Services Deploy Services

Virtual Cluster Factory

Virtual Cluster

Virtual Cluster Factory

Virtual Cluster Factory

Virtual Cluster Factory

Page 9: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John
Page 10: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

Project Status and Structure

• Just getting started, funding began June 2016. • First milestone for PI meeting today:

– VC across three sites at UC/ND runs IceCube.

CSE: Douglas Thain Ben Tovar CMS: Kevin Lannon Michael Hildreth Kenyi Hurtado CRC: Paul Brenner

Robert Gardner Lincoln Bryant Benedikt Riedel

John Hover Jose Caballero

Page 11: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

VC3 Architecture

User Portal VC3 Service Instance

Resource Provider

Resource Provider

Resource Provider

Batch System Batch System Batch System

VC3 Pilot Factory

Cluster Spec

Pilot Pilot

Pilot

Pilot Pilot Pilot

Middleware Scheduler

MW Nod

e

MW Nod

e MW Nod

e

MW Nod

e

MW Nod

e

MW Nod

e

End user accesses the VC head node.

Software Catalog

Site Catalog

Create a virtual cluster!

Page 12: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

VC3 Service Instance

VC3 Service Instance

Teardown is Critical!

User Portal

Resource Provider

Resource Provider

Resource Provider

Batch System Batch System Batch System

VC3 Pilot Factory

Cluster Spec

Pilot Pilot

Pilot

Pilot Pilot Pilot

Middleware Scheduler

MW Nod

e

MW Nod

e MW Nod

e

MW Nod

e

MW Nod

e

MW Nod

e

Software Catalog

Site Catalog

Destroy my virtual cluster!

Page 13: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

VC3 Service Instance

VC3 Service Instance

Teardown is Critical!

User Portal

Resource Provider

Resource Provider

Resource Provider

Batch System Batch System Batch System

VC3 Pilot Factory

Cluster Spec Middleware

Scheduler

Software Catalog

Site Catalog

Destroy my virtual cluster!

Page 14: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

Inherent Challenges • Portal -> Service Instance

– Reliability, specification, collaboration, discoverability, lifecycle management.

• Cluster Factory – Configuration, impedance matching, response to outages, right-

sizing to workload, authentication, cost management. • Environment Construction

– Specification complexity and portability, detection of existing environments, environment sharing, resource consumption.

• Performance Management – Want mall easy, big possible. Matching HW capability to

middleware deployment. Environment compatible with manycore, GPU, FPGA.

• Site Management – Work with the site owners, not against them. Collect relevant

configuration data. Make VC deployment transparent to sites.

Page 15: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

Changing Technology Landscape • Resource Management Systems

– Condor, PBS, SLURM, Cobalt, UGE, Mesos, ??? • User Interests in Middleware

– Workflows, GlideInWMS, PanDA, Hadoop, Spark, ???? • Software Deployment Technologies

– VMs -> LXC -> Docker -> Singularity -> ??? – CVMFS, Tarballs, NixOS, Spack, ???

• Access to Resources – Old way: SSH+Keys New Way: Two Factor Auth

• Our approach: – Pick a place to stand, but keep specific technologies at

arm’s length and be prepared to change.

Page 16: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

Prototype Implementation • Portal -> Service Instance

– (under construction) • Pilot Job Factory

– AutoPyFactory (APF) from BNL – SSH/BOSCO to connect to resource providers

• Pilot Job and Environment Deployment – Local software install via tarballs + PATH. (Groundwerk) – Access CVMFS via FUSE or Parrot, whichever available.

• User Visible Middleware – Condor batch system (user level “glide-in”)

• Application – IceCube data analysis

Page 17: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

Key Idea:

Specify requirements in abstract. Deliver requirements by

matching or creating, or both. *

* (only works if you can characterize requirements very accurately)

Page 18: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

Tour of First Milestone Prototype:

Application (Ice Cube Simulation) Environment Creation (VC3-Pilot) Cluster Factory (AutoPyFactory)

Page 19: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John
Page 20: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

IceCube Software and Jobs • Experiment specific software stack

– Dependencies not normal for particle physics experiment: Boost, hdf5, suite-parse, cfitsio, etc.

– Distributed mostly through CVMFS global filesystem now, tarballs still used in edge cases, containers are an issue.

– Moving to shipping C++11 compliant environment (own compiler, etc.)

• Heavily invested in GPU accelerators • Average job: 2-4 GB RAM, 10 GB Disk, 2 hour wall time • Tail-end job: 6+ GB RAM, 100 GB Disk, 10s to 100s hours • Need to record all details about a job, forever: job

configuration, where did it run, resource usage, efficiency, etc.

Page 21: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

CVMFS Global Filesystem

www server

HEP Task

Parrot / FUSE

squid proxy squid

proxy squid proxy

CVMFS Driver metadata

data

data

data

metadata

data

data

CAS Cache

CMS Software

967 GB 31M files

Content Addressable

Storage

Build

CAS

HTTP GET HTTP GET

http://cernvm.cern.ch/portal/filesystem

Page 22: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

CVMFS + HPC Challenges • Need disk local to node (ideal) or site (ok) for

local cache management. (Project: RAM $$$) • Need FUSE (ideal) to mount FS, otherwise use

Parrot (ok) for user level interception. • Must have a local HTTP proxy, otherwise

CVMFS becomes a denial of service attack. • Site operators dislike blocking CPU for data. • CVMFS itself has dependencies to install!

Jakob Blomer, Predrag Buncic, Rene Meusel, Gerardo Ganis, Igor Sfiligoi and Douglas Thain, The Evolution of Global Scale Filesystems for Scientific Software Distribution, IEEE/AIP Computing in Science and Engineering, 17(6), pages 61-71, December, 2015.

Page 23: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

Delivering Dependencies with VC3-Pilot

vc3-pilot –require python 2.7.12 icecube-sim input.dat • Query the current environment. • Install missing pieces (recursively) in /home • Run the program with a modified PATH.

Resource Provider

Resource Provider

Resource Provider

Python 2.6 Python 2.7 Python 3.0

Task Pilot Task Pilot Task Pilot

Python 2.7 Python 2.7

Page 24: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

"python":[ { "version":"v2.7.12", "versioncmd":"python --version", "versionreg":"Python ([0-9.]*).*", "sources":[ { "type":"tarball", "files":[ "Python-2.7.12.tgz" ], "recipe":[ "./configure --prefix=${VC3_PREFIX} --libdir=${VC3_PREFIX}/lib", "make", "make install", "ln -s ${VC3_PREFIX}/bin/pydoc{,2}" ] } ], "environment_variables":[ { "name":"PATH", "value":"bin" },

Recipes Define Environments

data dependencies

setup instructions

environment setup

app definition

Page 25: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

CVMFS Deployment via VC3-Pilot

vc3-pilot –require cvmfs icecube-sim input.dat • Search for existing services. • Download dependent software. • Deploy using Parrot (user level VM) if necessary.

Resource Provider

Resource Provider

Resource Provider

FUSE

Task Pilot Task Pilot Task Pilot

FUSE /cvmfs /cvmfs

Parrot /cvmfs

Page 26: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

CVMFS Deployment via VC3-Pilot Set the software environment required for scientific applications.

% stat /cvmfs/cms.cern.ch % stat: cannot stat '/cvmfs/cms.cern.ch': No such file or directory % ./vc3-pilot --require cvmfs -- stat /cvmfs/cms.cern.ch File: '/cvmfs/cms.cern.ch' Size: 4096 Blocks: 9 IO Block: 65336 ...

Page 27: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

Icecube demo dependencies according to the pilot

(host already has cvmfs)

Page 28: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

Icecube demo dependencies according to the pilot

(host does not have cvmfs)

Page 29: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

The MAKER Genomics Pipeline http://www.yandell-lab.org/software/maker.html

vc3-pilot –require maker maker -BIO

Custom docker container in Jetstream took weeks to install pieces by hand. Converted to vc3-pilot, successfully ported to Stampede in a single automated install.

Page 30: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

AutoPyFactory from BNL • Primary concern is intelligently, efficiently, and deterministically scaling

overlay submission to the WMS workload, based on policy. – How many pilots to submit, combining info from multiple sources?

• Chainable scheduler logic plugins allow “algorithms via config file”. • Single process, multi-threaded, no-database, object-oriented Python

daemon, resulting in high reliability/stability. • “Everything is a plugin” architecture makes new usage easy/safe. • Leverages developer effort, infrastructure, scalability, resource targets,

authorization mechanisms, and common interface (everything is a job) of the HTCondor project--which would need to be custom-coded without Condor. – Condor-G interface submits any executable, with job resource

requirements (memory, disk, corecount, waltime, etc.) if specified by the WMS queue.

– Scalability and speed allows rapid submission.

Page 31: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John
Page 32: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John
Page 33: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John
Page 34: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

Submission policies Current APF supports ‘demand-driven policy’, logic is driven by how much idle work is

waiting. A separate APF queue handles a demand level, e.g.:

[low-demand] sched.ready.offset=0

sched.maxtorun.maximum = 1000

[medium-demand]

sched.ready.offset=1000

sched.maxtorun.maximum = 500

Sched.scale.factor = 0.10

[high-demand]

sched.ready.offset=6000

sched.maxtorun.maximum = 100

Sched.scale.factor = 0.01

Page 35: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John
Page 36: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

Putting it All Together • VC created by AutoPyFactory on:

– UC ATLAS Tier-3 running Condor + FUSE – UC OSG Testbed running PBS w/o FUSE – ND CRC cluster running SGE w/o FUSE

• Payload: – VC3-Pilot deploys dependencies, mounts CVMFS. – Icecube data analysis task

• Truth in advertising: – GPU detection/configuration – Web proxy discovery.

Page 37: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

Demo Time Deploy CVMFS on the fly: vc3-pilot --require cvmfs https://asciinema.org/a/40j5dnd6m67yog3y4qa4tw957 Deploy MAKER on the fly:

vc3-pilot --require maker-ecoli-example-01 https://asciinema.org/a/4qzmcrpmrzssxen6s1knkgw86 VC jobs running at UC via “glide in”

http://asciinema.org/a/7a9ku2k4z3ujtnr1v4cjo6mq3 VC jobs running at ND via “hobble in”

https://asciinema.org/a/84798

Page 38: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

Much more to do… • User Portal and Dynamic Service Instances • Scale and Dynamic Behavior

– New problems with each order of magnitude. – Manage +10K cores on demand!

• Fitting into the Ecosystem – Work with sysadmins to synchronize user flexibility with respect

for local configuration and policy. • Deployment

– Per-site configuration, rather than per-job. – Better exploit existing packages / tools? – How to discover / deploy new services?

• Applications – Starting with LHC: CMS, ATLAS HEP: IceCube – Show generality with bio: MAKER, AWE, LifeMapper

Page 39: VC3: Virtual Clusters for Community Computation · VC3: Virtual Clusters for Community Computation Douglas Thain, University of Notre Dame Rob Gardner, University of Chicago . John

http://virtualclusters.org