24
HA HPC with OpenNebula Eliot Eshelman - Microway 2015-06-29

OpenNebula TechDay Boston 2015 - HA HPC with OpenNebula

Embed Size (px)

Citation preview

HA HPC with OpenNebulaEliot Eshelman - Microway

2015-06-29

HPC - what is it good for?

Lattice Quantum ChromoDynamics (QCD)RBC/UKQCD collaboration; Research Team: Dirk Broemmel, Thomas Rae, Ben Samways, Investigators: Jonathan Flynn

Physics

HPC - what is it good for?

Tech-X VORPAL for the DOE and NNSA

Physics

HPC - what is it good for?

AstrophysicsSimulation of a supernovaCourtesy of Oak Ridge National Laboratory, U.S. Dept. of Energy

HPC - what is it good for?

Planetary ScienceWRF 0.5km simulation of Hurricane SandyNCAR CISL VAPOR visualizations

HPC - what is it good for?

Life ScienceNAMD & GROMACS; visualized with VMD

HPC - what is it good for?

https://www.nersc.gov/assets/Trinity--NERSC-8-RFP/Documents/NERSCWorkloadAnalysisFeb2013.pdf

All Science!

HPC - what is it good for?

ALTAIR AcuSolve

Engineering:FEACFD

Multi-Physics

HPC - what is it good for?

Machine LearningNVIDIA DIGITS with Caffe from UC Berkeley

HPC - what is it good for?

Big Data

First: a discussion of scale

What types of HPC systems do we design?● up to ~512 nodes● budgets of $50K to $3MMost leadership-class HPC sites use similar designs, but source from the big vendors.

10,000-foot view of an HPC cluster

HPC clusters ready to ship

Microway's Test Drive cluster

● Owned and maintained by Microway● Used by customers for benchmarking● Used by employees for testing, replicating

customer issues & software development● Not actually mission-critical, but designed to

emulate those that are...

The Hardware

● (3) OpenNebula hosts● (4) Parallel storage servers● (6) Bare-metal CPU + GPU

compute nodes

● Gigabit Ethernet● 56Gbps FDR InfiniBand

Physical Network Topology

...

Logical Infrastructure

HPC Cluster Services

Compute Nodes

● Remaining bare metal for now○ Virtualizing GPUs has caveats

● Virtualizing the nodes does give a lot more flexibility to the admins and the users○ HPC users have very specific software needs○ VMs can enable reproducibility○ Some sites are trying out containers (Docker)

End Goal

● Each employee/customer can be assigned their own private HPC cluster

● Multiple cluster instances for:○ Development○ QA○ Production

What we gain

Flexibility:● Easy backups● Easy restores● Easy upgrades● Easy rollbacks● Faster software

development

Customer sees:● Better uptime● Quicker upgrades● Fewer bugs● Better performance

What we lose

Not much!● a little bit of performance

(~1% on CPU; up to 10% on I/O)● no more direct access to InfiniBand

(HPC folks like having access to bare metal)

Other tools to investigate...

What's next?

● Got a project in mind?● Inspired to speak at our next meetup?

Get in touch!

[email protected]