23
Converging HPC & Big Data Nick Nystrom · Sr. Director of Research & Bridges PI · [email protected] Best Practices in Data Infrastructure Workshop · PSC · May 18, 2016 © 2016 Pittsburgh Supercomputing Center

Converging HPC Big Data · 2 Bridges, a new kind of supercomputer at PSC, is designed to empower new research communities, converge HPC and Big Data, bring desktop convenience to

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Converging HPC Big Data · 2 Bridges, a new kind of supercomputer at PSC, is designed to empower new research communities, converge HPC and Big Data, bring desktop convenience to

Converging HPC & Big Data

Nick Nystrom · Sr. Director of Research & Bridges PI · [email protected] Practices in Data Infrastructure Workshop · PSC · May 18, 2016

© 2016 Pittsburgh Supercomputing Center

Page 2: Converging HPC Big Data · 2 Bridges, a new kind of supercomputer at PSC, is designed to empower new research communities, converge HPC and Big Data, bring desktop convenience to

2

Bridges, a new kind of supercomputer at PSC, is designed to empower new research communities, converge HPC and Big Data, bring desktop convenience to high-end computing, expand campus access, and help researchers to work more intuitively.• Funded by a $9.65M NSF award, Bridges emphasizes usability, flexibility, and interactivity• Available at no charge for open research and course support• Popular programming languages and applications: R, MATLAB, Python, Hadoop, Spark, …• 846 compute nodes containing 128GB, 3TB, or 12TB of RAM each• Dedicated nodes for persistent databases and web services• The first deployment of the Intel Omni-Path Architecture fabric

Page 3: Converging HPC Big Data · 2 Bridges, a new kind of supercomputer at PSC, is designed to empower new research communities, converge HPC and Big Data, bring desktop convenience to

3

The $9.65M Bridges acquisition is made possible by National Science Foundation (NSF) award #ACI-1445606:

Bridges: From Communities and Data to Workflows and Insight

is delivering Bridges

Disclaimer: The following presentation conveys the plan for Bridges. Certain details are subject to change.

Page 4: Converging HPC Big Data · 2 Bridges, a new kind of supercomputer at PSC, is designed to empower new research communities, converge HPC and Big Data, bring desktop convenience to

4

High-throughputgenome sequencers

Data Infrastructure for theNational Advanced Cyberinfrastructure Ecosystem

Bridges is a new resource on XSEDE andcan interoperate with other XSEDE resources,Advanced Cyberinfrastructure (ACI) projects,campuses, and instruments nationwide.

Data Infrastructure Building Blocks (DIBBs)‒ Data Exacell (DXC)‒ Integrating Geospatial Capabilities into HUBzero‒ Building a Scalable Infrastructure for Data-

Driven Discovery & Innovation in Education‒ Other DIBBs projectsOther ACI projects

Reconstructing brain circuits from high-resolution electron microscopy

Temple University’s new Science, Education, and Research Center

Carnegie Mellon University’s Gates Center for Computer Science

Examples:

Social networks and the Internet

Page 5: Converging HPC Big Data · 2 Bridges, a new kind of supercomputer at PSC, is designed to empower new research communities, converge HPC and Big Data, bring desktop convenience to

5

Data-intensive applications & workflows

Gateways – the power of HPC without the programmingShared data collections & related analysis tools

Cross-domain analytics

Graph analytics, machine learning, genome sequence assembly, and other large-memory applications

Scaling research questions beyond the laptop

Scaling research from individuals to teams and collaborations

Very large in-memory databasesOptimization & parameter sweeps

Distributed & service-oriented architectures

Data assimilation from large instruments and Internet data

Leveraging an extensive collection of interoperating software

Research areas that haven’t used HPC

Nontraditional HPC approaches to fields such as the physical sciences

Coupling applications in novel ways

Leveraging large memory and high bandwidth

Motivating Use Cases(examples)

Page 6: Converging HPC Big Data · 2 Bridges, a new kind of supercomputer at PSC, is designed to empower new research communities, converge HPC and Big Data, bring desktop convenience to

6

Objectives and Approach• Bring HPC to nontraditional users and

research communities.

• Allow high-performance computing to beapplied effectively to big data.

• Bridge to campuses to streamline accessand provide cloud-like burst capability.

• Leveraging PSC’s expertise with sharedmemory, Bridges will feature 3 tiers oflarge, coherent shared-memory nodes:12TB, 3TB, and 128GB.

• Bridges implements a uniquely flexible environment featuring interactivity, gateways, databases, distributed (web) services,high-productivity programming languages and frameworks, and virtualization, and campus bridging.

EMBO Mol Med (2013) DOI: 10.1002/emmm.201202388: Proliferation of cancer-causing mutations throughout life

Alex Hauptmann et. al.: Efficient large-scalecontent-based multimedia event detection

Page 7: Converging HPC Big Data · 2 Bridges, a new kind of supercomputer at PSC, is designed to empower new research communities, converge HPC and Big Data, bring desktop convenience to

7

Interactivity

• Interactivity is the feature mostfrequently requested bynontraditional HPC communities.

• Interactivity provides immediatefeedback for doing exploratorydata analytics and testing hypotheses.

• Bridges offers interactivity through a combination of shared and dedicated resources to maximize availability while accommodating different needs.

Page 8: Converging HPC Big Data · 2 Bridges, a new kind of supercomputer at PSC, is designed to empower new research communities, converge HPC and Big Data, bring desktop convenience to

8

Gateways and Tools for Building Them

Gateways provide easy-to-use access to Bridges’ HPC and data resources, allowing users to launch jobs, orchestrate complex workflows, and manage data from their browsers.

- Provide “HPC Software-as-a-Service”- Extensive use of VMs, databases, and distributed services

Interactive pipeline creation in GenePattern (Broad Institute)

Galaxyhttps://galaxyproject.org/

Col*Fusion portal for the systematic accumulation, integration, and utilization

of historical data, from http://colfusion.exp.sis.pitt.edu/colfusion/

Page 9: Converging HPC Big Data · 2 Bridges, a new kind of supercomputer at PSC, is designed to empower new research communities, converge HPC and Big Data, bring desktop convenience to

9

Virtualization and Containers

• Virtual Machines (VMs) enable flexibility, security, customization, reproducibility, ease of use, and interoperability with other services.

• User demand is for custom database and web server installations to develop data-intensive, distributed applications and containers for custom software stacks and portability.

• Bridges leverages OpenStack to provision resources, between interactive, batch, Hadoop, and VM uses.

Page 10: Converging HPC Big Data · 2 Bridges, a new kind of supercomputer at PSC, is designed to empower new research communities, converge HPC and Big Data, bring desktop convenience to

10

High-Productivity Programming

Supporting languages that communities already use is vital for them to apply HPC to their research questions.

Page 11: Converging HPC Big Data · 2 Bridges, a new kind of supercomputer at PSC, is designed to empower new research communities, converge HPC and Big Data, bring desktop convenience to

11

Bridges’ large memory is great for Spark!

Bridges enables workflows thatintegrate Spark/Hadoop, HPC, GPU,and/or large shared-memory components.

Spark, Hadoop & Related Approaches

Page 12: Converging HPC Big Data · 2 Bridges, a new kind of supercomputer at PSC, is designed to empower new research communities, converge HPC and Big Data, bring desktop convenience to

12

Campus Bridging

• Through a pilot project with Temple University, the Bridges project will explore new ways to transition data and computing seamlessly between campus and XSEDE resources.

• Federated identity management will allow users to use their local credentials for single sign-on to remote resources, facilitating data transfers between Bridges and Temple’s local storage systems.

• Burst offload will enable cloud-like offloading of jobs from Temple to Bridgesand vice versa during periods of unusually heavy load.

Federated identity management

Burst offload

http://www.temple.edu/medicine/research/RESEARCH_TUSM/

Page 13: Converging HPC Big Data · 2 Bridges, a new kind of supercomputer at PSC, is designed to empower new research communities, converge HPC and Big Data, bring desktop convenience to

13

800 HPE Apollo 2000 (128GB) compute nodes

20 “leaf” Intel® OPA edge switches

6 “core” Intel® OPA edge switches:fully interconnected,2 links per switch

42 HPE ProLiant DL580 (3TB) compute nodes

20 Storage Building Blocks, implementing the parallel Pylonfilesystem (~10PB) using PSC’s SLASH2 filesystem

4 HPE Integrity Superdome X (12TB)

compute nodes …

12 HPE ProLiant DL380 database nodes

6 HPE ProLiant DL360 web server nodes

4 MDS nodes2 front-end nodes

2 boot nodes8 management nodes

Intel® OPA cables

Purpose-built Intel® Omni-Path topology for data-intensive HPC

http://psc.edu/bridges

16 RSM nodes with NVIDIA K80 GPUs

32 RSM nodes with NVIDIA P100 GPUs

http://staff.psc.edu/nystrom/bvtBridges Virtual Tour:

… each with 2 OPA↔IB gateway nodes

Page 14: Converging HPC Big Data · 2 Bridges, a new kind of supercomputer at PSC, is designed to empower new research communities, converge HPC and Big Data, bring desktop convenience to

14

Type RAMa Ph n CPU / GPU / other Server

ESM 12TB1 2 16 × Intel Xeon E7-8880 v3 (18c, 2.3/3.1 GHz, 45MB LLC) HPE Integrity

Superdome X2 2 16 × Intel Xeon E7-8880 v4

LSM 3TB1 8 4 × Intel Xeon E5-8860 v3 (16c, 2.2/3.2 GHz, 40 MB LLC)

HPE ProLiant DL5802 34 4 × Intel Xeon E7-8880 v4

RSM 128GB 1 752 2 × Intel Xeon E5-2695 v3 (14c, 2.3/3.3 GHz, 35MB LLC)

HPE Apollo 2000RSM-GPU

128GB1 16 2 × Intel Xeon E5-2695 v3 + 2 × NVIDIA K80

2 32 2 × Intel Xeon E5-2695 v3 + 2 × NVIDIA P100

DB-s128GB 1

6 2 × Intel Xeon E5-2695 v3 + SSD HPE ProLiant DL360

DB-h 6 2 × Intel Xeon E5-2695 v3 + HDDs HPE ProLiant DL380

Web 128GB 1 6 2 × Intel Xeon E5-2695 v3 HPE ProLiant DL360

Otherb 128GB 1 16 2 × Intel Xeon E5-2695 v3 HPE ProLiant DL360, DL380

Gate-way

128GB1 4

2 × Intel Xeon E5-2695 v3 HPE ProLiant DL3802 4

Storage 128GB 1 20 2 × Intel Xeon E5-2680 v3 (12c, 2.5/3.3 GHz, 30 MB LLC) Supermicro X10DRi

Total 281.75TB 908

a. All RAM in these nodes is DDR4-2133b. Other nodes = front end (2) + management/log (8) + boot (4) + MDS (4)

Node Types

Page 15: Converging HPC Big Data · 2 Bridges, a new kind of supercomputer at PSC, is designed to empower new research communities, converge HPC and Big Data, bring desktop convenience to

15

Database and Web Server Nodes

• Dedicated database nodes will power persistent relational and NoSQL databases– Support data management and data-driven workflows– SSDs for high IOPs; HDDs for high capacity

• Dedicated web server nodes– Enable distributed, service-oriented architectures– High-bandwidth connections to XSEDE and the Internet

(examples)

Page 16: Converging HPC Big Data · 2 Bridges, a new kind of supercomputer at PSC, is designed to empower new research communities, converge HPC and Big Data, bring desktop convenience to

16

Data Management

• Pylon: A large, central, high-performance storage system– 10 PB usable– Visible to all compute nodes– Currently implemented as two complementary file systems:

• /pylon1: Lustre, targeted for $SCRATCH use; non-wiped directories available by request

• /pylon2: SLASH2, targeted for large datasets, community repositories, and distributed clients

• Distributed (node-local) storage– Enhance application portability– Improve overall system performance– Improve performance consistency to the shared filesystem– Aggregate 7.2 PB (6 PB non-GPU RSM: Hadoop, Spark)

Page 17: Converging HPC Big Data · 2 Bridges, a new kind of supercomputer at PSC, is designed to empower new research communities, converge HPC and Big Data, bring desktop convenience to

17

Intel® Omni-Path Architecture (OPA)

• Bridges is the first production deployment of Omni-Path• Omni-Path connects all nodes and the shared filesystem, providing

Bridges and its users with:– 100 Gbps line speed per port;

25 GB/s bidirectional bandwidth per port– Measured 0.93μs latency, 12.36 GB/s/dir– 160M MPI messages per second– 48-port edge switch reduces

interconnect complexity and cost– HPC performance, reliability, and QoS– OFA-compliant applications supported without modification– Early access to this new, important, forward-looking technology

• Bridges deploys OPA in a two-tier island (leaf-spine) topology developed by PSC for cost-effective, data-intensive HPC

Page 18: Converging HPC Big Data · 2 Bridges, a new kind of supercomputer at PSC, is designed to empower new research communities, converge HPC and Big Data, bring desktop convenience to

18

Example: Causal Discovery PortalCenter for Causal Discovery, an NIH Big Data to Knowledge Center of Excellence

Web node

VMApache Tomcat

Messaging

Database node

VM

Other DBs

LSM Node (3TB)

ESM Node(12TB)

Analytics:FGS and other algorithms,

building on TETRAD

Browser-based UI

• Authentication• Data• Provenance

Execute causal discovery algorithms

Omni-Path

• Prepare and upload data

• Run causal discovery algorithms

• Visualize results

Internet

Memory-resident datasets

Pylon filesystem

TCGAfMRI

Page 19: Converging HPC Big Data · 2 Bridges, a new kind of supercomputer at PSC, is designed to empower new research communities, converge HPC and Big Data, bring desktop convenience to

19

Gaining Access to Bridges

Bridges is allocated through XSEDE: https://www.xsede.org/allocations– Startup Allocation

• Can request anytime• Up to 50,000 core-hours on RSM and GPU (128GB) nodes,

1,000 TB-hours on LSM (3TB) and ESM (12TB) nodes,and a user-specified (in GB) amount of storage on Pylon

• Can request XSEDE ECSS (Extended Collaborative Support Service

– Research Allocation (XRAC)• Appropriate for larger requests; can request ECSS• Community allocations can support gateways• Can be up to millions to tens of millions of SUs• Quarterly submission windows: March 15–April 15, June 15–July 15,

September 15–October 15, December 15–January 15

– Educational Allocations• To support use of Bridges for relevant courses

Up to 10% of Bridges’ SUs are available on a discretionary basis to industrial affiliates, Pennsylvania-based researchers, and others to foster discovery and innovation and broaden participation in data-intensive computing.

XSEDE calls these “Service Units” (SUs)

Page 20: Converging HPC Big Data · 2 Bridges, a new kind of supercomputer at PSC, is designed to empower new research communities, converge HPC and Big Data, bring desktop convenience to

20

Thank you.

Questions?

Page 21: Converging HPC Big Data · 2 Bridges, a new kind of supercomputer at PSC, is designed to empower new research communities, converge HPC and Big Data, bring desktop convenience to

21

Supplementary Material

Page 22: Converging HPC Big Data · 2 Bridges, a new kind of supercomputer at PSC, is designed to empower new research communities, converge HPC and Big Data, bring desktop convenience to

22

Phase 1 Phase 2 Technical Upgrade*

RSM RSM GPU LSM ESM Ph. 1

TotalRSMGPU LSM ESM Ph. 2

Total Total

Peak fp64 (Tf/s) 774.9 80.5 18.0 21.2 895 372.2 95.7 22.5 502 1,397

RAM (TB) 94 2 24 24 144 4 102 24 130 274

HDD (TB) 6016 128 128 128 6400 256 544 128 928 7,328

System Capacities

Phase 1 Phase 2 Total

PB, raw 3.5 10.5 14

PB, usable 2.5 7.5 10

Page 23: Converging HPC Big Data · 2 Bridges, a new kind of supercomputer at PSC, is designed to empower new research communities, converge HPC and Big Data, bring desktop convenience to

23

For Additional Information

Website: www.psc.edu/bridges

Bridges PI: Nick [email protected]

Co-PIs: Michael J. LevineRalph RoskiesJ Ray Scott

Project Manager: Robin Scibek