33
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science XSEDE’14 (16 July 2014) R. L. Moore, C. Baru, D. Baxter, G. Fox (Indiana U), A Majumdar, P Papadopoulos, W Pfeiffer, R. S. Sinkovits, S. Strande (NCAR), M. Tatineni, R. P. Wagner, N. Wilkins-Diehr, M. L. Norman UCSD/SDSC (except as noted)

Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

  • Upload
    kenton

  • View
    43

  • Download
    7

Embed Size (px)

DESCRIPTION

Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science. XSEDE’14 (16 July 2014) R. L. Moore, C. Baru, D. Baxter, G. Fox (Indiana U), A Majumdar, P Papadopoulos, W Pfeiffer, R. S. Sinkovits, S. Strande (NCAR), M. Tatineni, R. P. Wagner, N. Wilkins-Diehr, M. L. Norman - PowerPoint PPT Presentation

Citation preview

Page 1: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Gateways to Discovery:Cyberinfrastructure for the Long Tail of Science

XSEDE’14 (16 July 2014)

R. L. Moore, C. Baru, D. Baxter, G. Fox (Indiana U), A Majumdar, P Papadopoulos, W Pfeiffer, R. S. Sinkovits, S. Strande (NCAR), M.

Tatineni, R. P. Wagner, N. Wilkins-Diehr, M. L. NormanUCSD/SDSC (except as noted)

Page 2: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

HPC for the 99%

Page 3: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Comet is in response to NSF’s solicitation (13-528) to

• “… expand the use of high end resources to a much larger and more diverse community

• … support the entire spectrum of NSF communities

• ... promote a more comprehensive and balanced portfolio

• … include research communities that are not users of traditional HPC systems.“

The long tail of science needs HPC

Page 4: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Jobs and SUs at various scales across NSF resources

0

500

1000

1500

2000

2500

3000

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1 2 4 8 16 32 64 128 256 512 1K 2K 4K 8K 16K

Mill

ions

of X

D SU

s Cha

rged

Frac

tion

of A

ll Jo

bs C

harg

ed in

201

2

Job Size (Cores)

Percentage of Jobs (Left Axis)SUs Charged (Right Axis)

One node

• 99% of jobs run on NSF’s HPC resources in 2012 used <2048 cores

• And consumed ~50% of the total core-hours across NSF resources

Job Size (Cores)

Cum

ulat

ive

Usa

ge

Page 5: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Comet Will Serve the 99%

Page 6: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Comet: System Characteristics • Available January 2015 • Total flops ~1.8-2.0 PF• Dell primary integrator

• Intel next-gen processors, former codename Haswell, with AVX2

• Aeon storage vendor• Mellanox FDR InfiniBand

• Standard compute nodes• Dual Haswell processors• 128 GB DDR4 DRAM (64 GB/socket!)• 320 GB SSD (local scratch)

• GPU nodes• Four NVIDIA GPUs/node

• Large-memory nodes (Mar 2015)• 1.5 TB DRAM• Four Haswell processors/node

• Hybrid fat-tree topology• FDR (56 Gbps) InfiniBand• Rack-level (72 nodes) full bisection

bandwidth • 4:1 oversubscription cross-rack

• Performance Storage• 7 PB, 200 GB/s• Scratch & Persistent Storage

• Durable Storage (reliability)• 6 PB, 100 GB/s

• Gateway hosting nodes and VM image repository

• 100 Gbps external connectivity to Internet2 & ESNet

Page 7: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Comet Architecture

Juniper100 Gbps

Arista40GbE

(2x)

Data Mover (4x)

R&E Network AccessData Movers

Internet 2

7x 36-port FDR in each rack wired as full fat-tree. 4:1 over subscription between racks.

72 HSWL320 GB

IBCore(2x)

N GPU

4 Large-Memory

Bridge (4x)

Performance Storage7 PB, 200 GB/s

Durable Storage6 PB, 100 GB/s

Arista40GbE

(2x)

N racks

FDR 36p

FDR 36p

64 128

18

72 HSWL320 GB

72 HSWL

72

72

18 Mid-tier

Additional Support Components (not shown for clarity)NFS Servers, Virtual Image Repository, Gateway/Portal Hosting Nodes, Login Nodes, Ethernet Management Network, Rocks Management Nodes

Node-Local Storage 18

72FDRFDR

FDR

40GbE

40GbE

10GbE

Page 8: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

SSDs – building on Gordon successBased on our experiences with Gordon, a number of applications will benefits from continued access to flash• Applications that generate large numbers of temp files

• Computational finance – analysis of multiple markets (NASDAQ, etc.)• Text analytics – word correlations in Google Ngram data

• Computational chemistry codes that write one- and two-electron integral files to scratch

• Structural mechanics codes (e.g. Abaqus), which generate stiffness matrices that don’t fit into memory

Page 9: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Large memory nodesWhile most user applications will run well on the standard compute nodes, a few domains will benefit from the large memory (1.5 TB nodes)• De novo genome assembly: ALLPATHS-LG,

SOAPdenovo, Velvet• Finite-element calculations: Abaqus• Visualization of large data sets

Page 10: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

GPU nodesComet’s GPU nodes will serve a number of domains • Molecular dynamics applications have been one of the

biggest GPU success stories. Packages include Amber, CHARMM, Gromacs and NAMD

• Applications that depend heavily on linear algebra• Image and signal processing

Page 11: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Key Comet Strategies• Target modest-scale users and new users/communities:

goal of 10,000 users/year! • Support capacity computing, with a system optimized for

small/modest-scale jobs and quicker resource response using allocation/scheduling policies

• Build upon and expand efforts with Science Gateways, encouraging gateway usage and hosting via software and operating policies

• Provide a virtualized environment to support development of customized software stacks, virtual environments, and project control of workspaces

Page 12: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Comet will serve a large number of users, including new communities/disciplines

• Allocations/scheduling policies to optimize for high throughput of many modest-scale jobs (leveraging Trestles experience)• Optimized for rack-level jobs but cross-rack jobs feasible• Optimized for throughput (ala Trestles)• Per-project allocations caps to ensure large numbers of users• Rapid access for start-ups with one-day account generation• Limits on job sizes, with possibility of exceptions

• Gateway-friendly environment: Science gateways reach large communities w/ easy user access • e.g. CIPRES gateway alone currently accounts for ~25% of all users of NSF

resources, with 3,000 new users/year and ~5,000 users/year• Virtualization provides low barriers to entry (see later charts)

Page 13: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Changing the face of XSEDE HPC users• System design and policies

• Allocations, scheduling and security policies which favor gateways

• Support gateway middleware and gateway hosting machines

• Customized environments with high-performance virtualization

• Flexible allocations for bursty usage patterns

• Shared node runs for small jobs, user-settable reservations

• Third party apps

• Leverage and augment investments elsewhere• FutureGrid experience, image packaging, training, on-ramp

• XSEDE (ECSS NIP & Gateways, TEOS, Campus Champions)

• Build off established successes supporting new communities

• Example-based documentation in Comet focus areas

• Unique HPC University contributions to enable community growth

Page 14: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Virtualization Environment• Leveraging expertise of Indiana U/ FutureGrid team • VM jobs scheduled just like batch jobs (not conventional

cloud environment with immediate elastic access) • VMs will be easy on-ramp for new users/communities,

including low porting time• Flexible software environments for new communities

and apps• VM repository/library• Virtual HPC cluster (multi-node) with near-native IB

latency and minimal overhead (SRIOV)

Page 15: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Single Root I/O Virtualization in HPC

• Problem: complex workflows demand increasing flexibility from HPC platforms• Pro: Virtualization flexibility• Con: Virtualization IO performance loss

(e.g., excessive DMA interrupts)• Solution: SR-IOV and Mellanox ConnectX-3

InfiniBand HCAs • One physical function (PF) multiple virtual

functions (VF), each with own DMA streams, memory space, interrupts

• Allows DMA to bypass hypervisor to VMs

Page 16: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

High-Performance Virtualization on Comet • Mellanox FDR InfiniBand HCAs with SR-IOV• Rocks and OpenStack Nova to manage VMs• Flexibility to support complex science gateways and

web-based workflow engines• Custom compute appliances and virtual clusters developed with

FutureGrid and their existing expertise• Backed by virtualized Lustre running over virtualized InfiniBand

Page 17: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Benchmark comparisons of SR-IOV Cluster v AWS (early 2013): Hardware/Software Configuration

Native, SR-IOV Amazon EC2

Platform • Rocks 6.1 (EL6)• Virtualization via kvm

• Amazon Linux 2013.03 (EL6)• cc2.8xlarge Instances

CPUs • 2x Xeon E5-2660 (2.2GHz)• 16 cores per node

• 2x Xeon E5-2670 (2.6GHz)• 16 cores per node

RAM • 64 GB DDR3 DRAM • 60.5 DDR3 DRAM

Interconnect • QDR4X InfiniBand• Mellanox ConnectX-3 (MT27500)

• Intel VT-d, SR-IOV enabled in firmware, kernel, drivers

• mlx4_core 1.1• Mellanox OFED 2.0• HCA firmware 2.11.1192

• 10 GbE• common placement group

Page 18: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

50x less latency than Amazon EC2

18

• SR-IOV• < 30% overhead for

Messages < 128 bytes• < 10% overhead for

eager send/recv• Overhead 0% for

bandwidth-limited regime

• Amazon EC2• > 5000% worse latency• Time dependent (noisy)

OSU Microbenchmarks (3.9, osu_latency)

Page 19: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

10x more bandwidth than Amazon EC2

19

• SR-IOV• < 2% bandwidth loss

over entire range• > 95% peak bandwidth

• Amazon EC2• < 35% peak bandwidth• 900% to 2500% worse

bandwidth than virtualized InfiniBand

OSU Microbenchmarks (3.9, osu_bw)

Page 20: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Weather Modeling – 15% Overhead

• 96-core (6-node) calculation

• Nearest-neighbor communication

• Scalable algorithms• SR-IOV incurs modest

(15%) performance hit• ...but still still 20%

faster*** than AmazonWRF 3.4.1 – 3hr forecast

*** 20% faster despite SR-IOV cluster having 20% slower CPUs

Page 21: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Quantum ESPRESSO: 5x Faster than EC2

• 48-core (3 node) calculation

• CG matrix inversion (irregular comm.)

• 3D FFT matrix transposes (All-to-all communication)

• 28% slower w/ SR-IOV• SR-IOV still > 500%

faster*** than EC2 Quantum Espresso 5.0.2 – DEISA AUSURF112 benchmark*** 20% faster despite SR-IOV cluster having 20% slower CPUs

Page 22: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

SR-IOV is a huge step forward in high-performance virtualization

• Shows substantial improvement in latency over Amazon EC2, and it provides nearly zero bandwidth overhead

• Benchmark application performance confirms significant improvement over EC2

• SR-IOV lowers performance barrier to virtualizing the interconnect and makes fully virtualized HPC clusters viable

• Comet will deliver virtualized HPC to new/non-traditional communities that need flexibility without major loss of performance

Page 23: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Page 24: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

BACKUP

Page 25: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

NSF 13-528: Competitive proposals should address:• “Complement existing XD capabilities with new types of computational

resources attuned to less traditional computational science communities;• Incorporate innovative and reliable services within the HPC environment

to deal with complex and dynamic workflows that contribute significantly to the advancement of science and are difficult to achieve within XD;

• Facilitate transition from local to national environments via the use of virtual machines;

• Introduce highly useable and cost efficient cloud computing capabilities into XD to meet national scale requirements for new modes of computationally intensive scientific research;

• Expand the range of data intensive and/or computationally-challenging science and engineering applications that can be tackled with current XD resources;

• Provide reliable approaches to scientific communities needing a high-throughput capability.”

Page 26: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Page 27: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Page 28: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Page 29: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Page 30: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Page 31: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

VCs on Comet: Operational Details- one VM per physical node -

Physical node(XSEDE stack)

Virtual machine(User stack)

HN

Virtual clusterhead node

HN HN

HN

HN

VC0 VC1

VC2

VC3

Page 32: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

VCs on Comet: Operational Details- Head Node remains active after VC shutdown -

Physical node(XSEDE stack)

Virtual machine(User stack)

HN

Virtual clusterhead node

HN HN

HN

HN

VC0 VC1

VC2

VC3

Page 33: Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

VCs on Comet: Spinup/shutdown- Each VC has its own ZFS file system for storing VMIs –

- latency hiding tricks on startup -

Physical node(XSEDE stack)

Virtual machine(User stack)

HN

Virtual clusterhead node

HN HN

HN

HN

VC0 VC1

VC2

VC3

ZFS pool

Virtual machine disk image