36
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO Gordon: Design, Performance, & Experiences Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12 July 16-19, 2012 Chicago, IL

Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Gordon: Design, Performance, & Experiences Deploying & Supporting a Data-Intensive

Supercomputer

Shawn Strande

Gordon Project Manager San Diego Supercomputer Center

XSEDE ‘12

July 16-19, 2012

Chicago, IL

Page 2: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

Allan Snavely 1962 - 2012

“Though I am clearly a ‘rolleur’ a cyclist who

goes faster on the flats as opposed to a

‘grimpeur’ a cyclist who goes faster uphill, for

some reason I actually prefer

climbing.”

Page 3: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Gordon – An Innovative Data-Intensive Supercomputer

• Designed to accelerate access to massive amounts of data in areas of genomics, earth science, engineering, medicine, and others.

• Emphasizes memory and IO over FLOPS. • Appro integrated 1,024 node Sandy Bridge

cluster. • 300 TB of high performance Intel flash. • Large memory supernodes via vSMP

Foundation from ScaleMP. • 3D torus interconnect from Mellanox. • In production operation since February 2012. • Funded by the NSF and available through the

Extreme Science and Engineering Discovery Environment program (XSEDE).

Page 4: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Gordon Design: Two Driving Ideas

• Observation #1: Data keeps getting further away from processor cores (“red shift”) • Do we need a new level in the memory hierarchy?

• Observation #2: Many data-intensive

applications are serial and difficult to parallelize • Would a large, shared memory machine be better from the

standpoint of researcher productivity for some of these? • Rapid prototyping of new approaches to data analysis

Page 5: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

The Memory Hierarchy of a Typical Supercomputer

Shared memory Programming (single node)

Message passing programming

Latency Gap

Disk I/O BIG DATA

Page 6: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

The Memory Hierarchy of Gordon

Shared memory Programming

(vSMP)

Disk I/O BIG DATA

Page 7: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Gordon Design Highlights

• 3D Torus • Dual rail QDR

• 64, 2S Westmere I/O nodes

• 12 core, 48 GB/node • 4 LSI controllers • 16 SSDs • Dual 10GbE • SuperMicro motherboard • PCI Gen2

• 300 GB Intel 710 eMLC SSDs

• 300 TB aggregate

• 1,024 2S Xeon E5 (Sandy Bridge) nodes

• 16 cores, 64 GB/node • Intel Jefferson Pass

motherboard • PCI Gen3

• Large Memory vSMP Supernodes

• 2TB DRAM • 10 TB Flash

“Data Oasis” Lustre PFS 100 GB/sec, 4 PB

Page 8: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

Flash Drive (e.g., SLC, eMLC)

Typical HDD

Good for Data Intensive Apps

Latency < .1 ms 10 ms ✔

Bandwidth (r/w) 270 / 210 MB/s 100-150 MB/s ✔

IOPS (r/w) 38,500 / 2000 100 ✔

Power consumption (when doing r/w)

2-5 W 6-10 W ✔

Price/GB $3/GB $.50/GB -

Endurance 2-10PB N/A ✔

Total Cost of Ownership Jury is still out.

(Some) SSDs are a good fit for data-intensive computing

.

Page 9: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

vSMP aggregation SW

Gordon 32-way Supernode

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

Dual SB CN

ION

4.8 TB flash SSD

Dual WM IOP

ION

4.8 TB flash SSD

Dual WM IOP

Page 10: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Gordon 3D Torus Interconnect Fabric 4x4x4 3D Torus Topology

IO

CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN

36 Port Fabric Switch

36 Port Fabric Switch

18 x 4X IB Network Connections

18 x 4X IB Network Connections

IO

CN

Dual-Rail Network increased Bandwidth & Redundancy

Single Connection to each Network 16 Compute Nodes, 2 IO Nodes

4X4X4 Mesh Ends are folded on all three

Dimensions to form a 3DTorus Why a 3D torus interconnect?

• Lower Cost :40% as many switches, 25%

to 50% fewer cables compared to a fat tree

• Works well for localized communication • Linearly expandable • Simple wiring pattern • Short Cables- Fiber Optic cables generally

not required • Fault Tolerant within the mesh with 2QoS

Alternate Routing • Fault Tolerant with Dual-Rails for all

routing algorithms • Based on OFED IB stack

Page 11: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

Full System

• 16 Compute Node Racks (all racks 48U)

• 4 I/O Node Racks

• 1 Service Node Rack

• Hot aisle containment

• 500kW

• Earthquake isobases

Page 12: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

Gordon Network Architecture

QDR 40 Gb/s GbE 2x10GbE 10GbE

3D torus: rail 1 3D torus: rail 2

Mgmt. Nodes (2x)

Mgmt. Edge & Core Ethernet

Public Edge & Core Ethernet

NFS Server (4x)

Compute Node

Compute Node

Compute Node

Data Movers (4x)

Data Oasis Lustre PFS

4 PB

XSEDE & R&E Networks

SDSC Network

IO Nodes

IO Nodes

Login Nodes (4x)

Compute Node 1,024

64

• Dual-rail IB • Dual 10GbE storage • GbE management • GbE public • Round robin login • Mirrored NFS • Redundant front-end

Page 13: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

Data Oasis Heterogeneous Architecture Lustre-based Parallel File System

OSS 72TB

64 OSS (Object Storage Servers)

Provide 100GB/s Performance and >4PB Raw Capacity

JBOD 90TB

JBODs (Just a Bunch Of Disks)

Provide Capacity Scale-out to an Additional 5.8PB

Arista 7508 10G

Arista 7508 10G

Redundant Switches for Reliability and

Performance

3 Distinct Network Architectures

OSS 72TB

JBOD 90TB

OSS 72TB

JBOD 90TB

OSS 72TB

JBOD 90TB

64 Lustre LNET Routers 100 GB/s

Mellanox 5020 Bridge 12 GB/s

MDS

MDS

Myrinet 10G Switch 25 GB/s

MDS

GORDON IB cluster

TRITON Myrinet cluster

TRESTLES IB cluster

Metadata Servers

Page 14: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Innovation carries risk, and Gordon had equal amounts of both

• Sandy Bridge processor wasn’t available; delivery schedule was uncertain

• SSD market in the midst of a revolution

• vSMP new to large, multi-user HPC environment

• Dual-rail 3D torus had never been deployed

• Data intensive user community not well defined

Source Wikipedia:

Page 15: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Risk Reduction

Source: Wikipedia

Deployed Dash prototype

vSMP 16-way testing

Dash available to users

vSMP 32-way testing

Deployed 16 Gordon I/O nodes With Postville SSD

Early delivery of all I/O nodes

Full system delivery

3D torus prototype demonstration

Page 16: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Testing, testing, and more testing

Page 17: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Challenge: Intel SSD Roadmap Changes Necessitated a Revisit of SSD Options

• Rigorous acceptance criteria required high IOPS, endurance, capacity, and low UBER

• Tested numerous drives • Performed paper studies

of many more • $ was an issue for the

vendor • Dash prototype was

crucial

The final choice was the new Intel 710 eMLC, 300 GB SSD Launched at IDF 2011. There are 1,024 of these in Gordon

Page 18: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Challenge: Exporting & Preserving Flash Performance • There are several layers of

overhead that reduce performance (SATA, Linux, network)

• I/O models need to be driven by the applications

• No one had really done this before

• iSCSIoRDMA was the best protocol

• XFS performs well • Early work with OCFS is

promising for a shared file system

Page 19: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Challenge: vSMP had not been used in large scale, multi-user HPC system

• Dash prototype was used for engineering scale-up work (16 and 32 way)

• SDSC did significant systems and application testing

• Users had early access to Dash • First Gordon SB nodes were

shipped to ScaleMP for certification

• ScaleMP has been a partner throughout the project vSMP is in production on Gordon. Most users need 16-way

(1 TB), but larger nodes can be provisioned.

Page 20: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Challenge: User Outreach to Identify Good Applications for Gordon

• Many traditional HPC users are not “data-intensive”

• Mined the existing NSF allocations database to identify potential users

• Conducted data intensive summer institutes

• Reached out to new communities in linguistics, political science, and others

• Revised the allocations models for Gordon to encourage new users to apply for time

• We’re still not quite there

Page 21: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Applications

Page 22: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Computational Style Code Answering the question: Why Gordon?

V M F

C T L

V: Uses vSMP C: Computationally intensive, leverages Sandy Bridge architecture M:Uses large memory/core on Gordon (4GB/core) T: Threaded F: Uses Flash L: Lustre I/O intensive

Page 23: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Breadth First Search Comparison using SSD and HDD

V M F

C T L

Source: Sandeep Gupta, San Diego Supercomputer Center. Used by permission. 2011

Graphs are mathematical and computational representations of relationships of objects in a network. Such networks occur in many natural and man-made scenarios, including communication, biological, and social contexts. Understanding the structure of these graphs is important for uncovering important relationships among the members.

• Implementation of Breadth-first search (BFS) graph algorithm developed by Munagala and Ranade

• 134 million nodes • Flash drives reduced

I/O time by factor of 6.5x • Problem converted from I/O

bound to compute bound

Page 24: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Postgres pgbench result for a Gordon I/O node pgbench: Standard Postgres benchmark to test performance using a real-world banking scenario. Tests are performed for a range of database sizes and client connections.

Achieves high TPS (transactions per second) at large scale (150GB) and high client count.

V M F

C T L

Query, update, insert (read/write)

Gordon I/O node • 2x6C Westmere • 48 GB DRAM • 4.4 TB of high performance flash Benchmark Scale = number of bank branches 10 tellers and 100,000 accounts per branch Each client executes 100,000 transactions

Random Select (read only)

Source: Kai Lin, San Diego Supercomputer Center. 2012

Page 25: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

PDB Query Comparisons, with DB2 Database on two Gordon I/O Nodes: One with HDD’s, One with SSD’s

V M F

C T L Source: Vishwinath Nandigam, San Diego Supercomputer Center. 2011

The Protein Data Bank (PDB): Is the single worldwide repository of information about the 3D structures of large biological molecules. These are the molecules of life that are found in all organisms. Understanding the shape of a molecule helps to understand how it works.

• For single queries, HDD and SSD perform about the same.

• For concurrent queries, SSD’s achieve big speedup.

• Q5B is > 10x, and performance varies by type of query

Page 26: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Daphnia Genome Assembly using Velvet and vSMP

V M F

C T L Source: Wayne Pfeiffer, San Diego Supercomputer Center. Used by permission.

Daphnia (a.k.a. water flea), is a model species used for understanding mechanisms of inheritance and evolution, and as a surrogate species for studying human health in responses to environmental changes.

De novo assembly of short DNA reads using the de Bruijn graph algorithm. Code parallelized using OpenMP directives. Benchmark problem: Daphnia genome assembly from 44-bp and 75-bp reads using 35-mer

Photo: Dr. Jan Michels, Christian-Albrechts-University, Kiel

Page 27: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Foxglove Calculation using Gaussian 09 with vSMP - MP2 Energy Gradient Calculation

V M F

C T L

Source: Jerry Greenberg, San Diego Supercomputer Center. January, 2012.

The Foxglove plant (Digitalis) is studied for its medicinal uses. Digoxin, an extract of the Foxglove, is used to treat a variety of conditions including diseases of the heart. There is some recent research that suggests it may also be a beneficial cancer treatment.

Time to solution: 43,000s

Processor footprint - 4 nodes 64 threads

Memory footprint – 10 nodes 700 GB

1 Compute node = (16 cores/node) 64 GB/node)

Page 28: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Axial compression of caudal rat vertebra using Abaqus and vSMP

V M F

C T L Source: Matthew Goff, Chris Hernandez. Cornell University. Used by permission. 2012

The goal of the simulations is to analyze how small variances in boundary conditions effect high strain regions in the model. The research goal is to understand the response of trabecular bone to mechanical stimuli. This has relevance for paleontologists to infer habitual locomotion of ancient people and animals, and in treatment strategies for populations with fragile bones such as the elderly.

• 5 million quadratic, 8 noded elements

• Model created with custom Matlab application that converts 253 micro CT images into voxel-based finite element models

Page 29: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Cosmology simulation - matter power spectrum measurement using vSMP

Source: Rick Wagner, Michael L. Norman. SDSC.

Goal is to measure the effect of the light from the first stars on the evolution of the universe. To quantitatively compare the matter distribution of each simulation, we use radially binned 3D power spectra.

• 2 simulations • 32003 uniform 3D grids • 15k+ files each

Individual simulations

Difference

V M F

C T L

Power spectra

• Existing OpenMP code • ~256GB memory used • ~5 ½ hours per field • 0 development effort

Page 30: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Impact of high-frequency trading on financial markets

V M F

C T L Source: Mao Ye, Dept. of Finance, U. Illinois. Used by permission. 6/1/2012

To determine the impact of high-frequency trading activity on financial markets, it is necessary to construct nanosecond resolution limit order books – records of all unexecuted orders to buy/sell stock at a specified price. Analysis provides evidence of quote stuffing: a manipulative practice that involves submitting a large number of orders with immediate cancellation to generate congestion

Time to construct limit order books now under 15 minutes for threaded application using 16 cores on single Gordon compute node

Page 31: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Massive Data Analysis of Large-eddy Simulation of Deep Convection in Atmosphere (Clouds) using vSMP

Simulation Details • GigaLES Model Run Dataset (partial) • 40 time-steps (24 hour simulation) • 256 vertical layers • 204.8 x 204.8 kilometers • 100 m horizontal resolution

R Analysis • 160 GB data set (40 netCDF files @ 4 GB each) • 340 GB memory footprint • ~ 3 ½ hours for data input and analysis

The Center for Multi-scale Modeling of Atmospheric Processes (CMMAP) is an NSF Science and Technology Center focused on improving the representation of cloud processes in climate models.

V M F

C T L

• System for Atmospheric Modeling: M. Kharoutdinov, SUNY Stonybrook

• Visualization: J. Helly, A. Chourasia • Analysis: J. Helly, S. Strande

Page 32: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

MrBayes Running on Gordon through the CIPRES Gateway

V M F

C T L Source: Wayne Pfeiffer, San Diego Supercomputer Center.

MrBayes 3.1.2 is used extensively via the CIPRES Science Gateway to infer phylogenetic trees. The hybrid parallel version running at SDSC uses both MPI and OpenMP.

• CIPRES has allowed over 4000 biologists world-wide to run parallel tree inference codes via a simple-to-use web interface.

• Applications can be targeted to appropriate architectures.

• Gordon provides a significant speedup for unpartitioned data sets over the SDSC Trestles system.

• A model for future data intensive projects

Page 33: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Application-aware, Digital Voltage Frequency Scaling Saves an Average of 12% Energy on HPC Workloads

S M I

C T P Source: Laura Carrington, PMaC Lab; San Diego Supercomputer Center. May, 2012

A series of HPC applications run on 1,024 cores using the Intel baseline power savings vs application aware settings. Average performance penalty is 7.9%. LAMMPS realizes a power savings of 31.7% with a performance penalty of 3.9%.

Page 34: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Gordon Impact as a Resource Provider

Page 35: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Conclusions • The nature of computational research is

becoming more data-intensive, requiring new kinds of high-performance computer architectures.

• Gordon is an innovative system that addresses a range of challenges associated with data intensive computing.

• A prototype system and significant testing mitigated the challenges of deploying Gordon.

• Outreach to new user communities takes concerted and ongoing effort.

• Gordon supports a wide range of applications: large memory, MPI applications, and dedicated I/O node.

• Productive data intensive computing is being done.

Page 36: Home - XSEDE - Gordon: Design, Performance ......Deploying & Supporting a Data-Intensive Supercomputer Shawn Strande Gordon Project Manager San Diego Supercomputer Center XSEDE ‘12

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Thank you very much!

[email protected]

And thank you to the co-authors:

Pietro Cicotti Bob Sinkovits

Bill Young Rick Wagner

Mahidhar Tatineni Eva Hocks

Allan Snavely Mike Norman