34
Achieving Higher Productivity on Abaqus Simulations for Small and Mid-size Enterprises with HPC and Clustering Technologies Pak Lui Application Performance Manager SIMULIA COMMUNITY CONFERENCE MAY 18-21, 2015 | BERLIN, GERMANY

Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

Achieving Higher Productivity on Abaqus Simulations for Small and Mid-size Enterprises

with HPC and Clustering Technologies

Pak Lui

Application Performance Manager

SIMULIA COMMUNITY CONFERENCEMAY 18-21, 2015 | BERLIN, GERMANY

Page 2: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

2

The HPC Advisory Council

• World-wide HPC non-profit organization (400+ members)

• Bridges the gap between HPC usage and its potential

• Provides best practices and a support/development center

• Explores future technologies and future developments

• Leading edge solutions and technology demonstrations

Page 3: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

3

HPC Advisory Council Members

Page 4: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

4

Special Interest Subgroups

• HPC|Scale Subgroup

– Explore usage of commodity HPC as a

replacement for multi-million dollar

mainframes and proprietary based

supercomputers

• HPC|Cloud Subgroup

– Explore usage of HPC components as part of

the creation of external/public/internal/private

cloud computing environments.

• HPC|Works Subgroup

– Provide best practices for building balanced

and scalable HPC systems, performance

tuning and application guidelines.

• HPC|Storage Subgroup

– Demonstrate how to build high-performance

storage solutions and their affect on

application performance and productivity

• HPC|GPU Subgroup

– Explore usage models of GPU components

as part of next generation compute

environments and potential optimizations for

GPU based computing

• HPC|FSI Subgroup

– Explore the usage of high-performance

computing solutions for low latency trading,

productive simulations and overall more

efficient financial services

• HPC|Music

– To enable HPC in music production and to

develop HPC cluster solutions that further

enable the future of music production

Page 5: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

5

HPC Advisory Council HPC CenterInfiniBand-based Storage (Lustre) Juniper Heimdall

Plutus Janus Athena

Vesta Thor Mala

Lustre FS 640 cores

456 cores 192 cores

704 cores 896 cores

280 cores

16 GPUs

80 cores

Page 6: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

6

• LS-DYNA

• miniFE

• MILC

• MSC Nastran

• MR Bayes

• MM5

• MPQC

• NAMD

• Nekbone

• NEMO

• NWChem

• Octopus

• OpenAtom

• OpenFOAM

• MILC

• OpenMX

• PARATEC

• PFA

• PFLOTRAN

• Quantum ESPRESSO

• RADIOSS

• SPECFEM3D

• WRF

130 Applications Best Practices Published

• Abaqus

• AcuSolve

• Amber

• AMG

• AMR

• ABySS

• ANSYS CFX

• ANSYS FLUENT

• ANSYS Mechanics

• BQCD

• CCSM

• CESM

• COSMO

• CP2K

• CPMD

• Dacapo

• Desmond

• DL-POLY

• Eclipse

• FLOW-3D

• GADGET-2

• GROMACS

• Himeno

• HOOMD-blue

• HYCOM

• ICON

• Lattice QCD

• LAMMPS

For More Information, visit: http://hpcadvisorycouncil.com/best_practices.php

Page 7: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

7

University Award Program

• University award program– Universities are encouraged to submit proposals for advanced research – Once / twice a year, the HPC Advisory Council will select a few proposals

• Selected proposal will be provided with:– Exclusive computation time on the HPC Advisory Council’s Compute Center– Invitation to present in one of the HPC Advisory Council’s worldwide workshops– Publication of the research results on the HPC Advisory Council website

• 2010 award winner is Dr. Xiangqian Hu, Duke University– Topic: “Massively Parallel Quantum Mechanical Simulations for Liquid Water”

• 2011 award winner is Dr. Marco Aldinucci, University of Torino– Topic: “Effective Streaming on Multi-core by Way of the FastFlow Framework’

• 2012 award winner is Jacob Nelson, University of Washington– “Runtime Support for Sparse Graph Applications”

• 2013 award winner is Antonis Karalis– Topic: “Music Production using HPC”

• To submit a proposal – please check the HPC Advisory Council web site

Page 8: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

8

2015 ISC Student Cluster Competition

• University-based teams to compete and demonstrate the incredible

capabilities of state-of- the-art HPC systems and applications on the

ISC’15 show-floor

• The Student Cluster Challenge is designed to introduce the next

generation of students to the high performance computing world and

community

Page 9: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

9

ISC’15 – Student Cluster Competition Teams

Page 10: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

10

HPCAC – 2015 ISC High Performance

– Student Cluster Competition Teams

Page 11: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

11

HPC Advisory Council

• HPC Advisory Council (HPCAC)

– 400+ members

– http://www.hpcadvisorycouncil.com/

– Application best practices, case studies (Over 150)

– Benchmarking center with remote access for users

– World-wide workshops

– Value add for your customers to stay up to date

and in tune to HPC market

• 2015 Workshops

– USA (Stanford University) – February 2015

– Switzerland – March 2015

– Brazil – August 2015

– Spain – September 2015

– China (HPC China) – Oct 2015

• For more information

– www.hpcadvisorycouncil.com

[email protected]

Page 12: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

12

HPCAC Conferences

Page 13: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

13

Note

• The following research was performed under the HPC

Advisory Council activities

– Special thanks for: HP, Mellanox

• For more information on the supporting vendors solutions

please refer to:

– www.mellanox.com, http://www.hp.com/go/hpc

• For more information on the application:

– http://www.simulia.com

Page 14: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

14

Abaqus by SIMULIA

• Abaqus Unified FEA product suite offers powerful and complete solutions

for both routine and sophisticated engineering problems covering a vast

spectrum of industrial applications

• The Abaqus analysis products listed below focus on:

– Nonlinear finite element analysis (FEA)

– Advanced linear and dynamics application problems

• Abaqus/Standard

– General-purpose FEA that includes broad range of analysis capabilities

• Abaqus/Explicit

– Nonlinear, transient, dynamic analysis of solids and structures using explicit

time integration

Page 15: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

15

Objectives

• The presented research was done to provide best practices

– Abaqus performance benchmarking

– Interconnect performance comparisons

– CPU performance comparisons

– Understanding Abaqus communication patterns

• The presented results will demonstrate

– The scalability of the compute environment to provide nearly linear

application scalability

Page 16: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

16

• HP Apollo 6000 “Heimdall” cluster

– HP ProLiant XL230a Gen9 10-node “Heimdall” cluster

– Processors: Dual-Socket 14-core Intel Xeon E5-2697v3 @ 2.6 GHz CPUs (Turbo Mode on; Home

Snoop as default)

– Memory: 64GB per node, 2133MHz DDR4 DIMMs

– OS: RHEL 6 Update 6, OFED 2.4-1.0.0 InfiniBand SW stack

• Mellanox Connect-IB 56Gb/s FDR InfiniBand adapters

• Mellanox ConnectX-3 Pro 10GbE Ethernet adapters

• Mellanox SwitchX SX6036 56Gb/s FDR InfiniBand and Ethernet VPI Switch

• MPI: Platform MPI 9.1.2 and Intel MPI 4.1.3

• Application: Abaqus 6.14-2 (unless otherwise stated)

• Benchmark Workload:

– Abaqus/Standard: S2: Flywheel with centrifugal load, S4 Cylinder head bolt-up

– Abaqus/Explicit: E6 Concentric Spheres

Test Cluster Configuration

Page 17: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

17

Item HP ProLiant XL230a Gen9 Server

Processor Two Intel® Xeon® E5-2600 v3 Series, 6/8/10/12/14/16 Cores

Chipset Intel Xeon E5-2600 v3 series

Memory 512 GB (16 x 32 GB) 16 DIMM slots, DDR3 up to DDR4; R-DIMM/LR-DIMM; 2,133 MHz

Max Memory 512 GB

Internal Storage

1 HP Dynamic Smart Array B140i

SATA controller

HP H240 Host Bus Adapter

NetworkingNetwork module supporting

various FlexibleLOMs: 1GbE, 10GbE, and/or InfiniBand

Expansion Slots1 Internal PCIe:

1 PCIe x 16 Gen3, half-height

PortsFront: (1) Management, (2) 1GbE, (1) Serial, (1) S.U.V port, (2) PCIe, and Internal Micro SD

card & Active Health

Power SuppliesHP 2,400 or 2,650 W Platinum hot-plug power supplies

delivered by HP Apollo 6000 Power Shelf

Integrated ManagementHP iLO (Firmware: HP iLO 4)

Option: HP Advanced Power Manager

Additional FeaturesShared Power & Cooling and up to 8 nodes per 4U chassis, single GPU support, Fusion I/O

support

Form Factor 10 servers in 5U chassis

HP ProLiant XL230a Gen9 Server

Page 18: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

18

Abaqus/Explicit Performance –CPU Generation

All measurement

based on v6.12.2

• Intel E5-2697V3 (Haswell) cluster outperforms prior system generations

– Performs 113% higher vs Westmere cluster, 36% vs Sandy Bridge cluster, and 23% vs Ivy Bridge

cluster at 4 nodes

• System components used:

– Heimdall (HSW): 2-socket 14-core Intel E5-2697v3 @ 2.6GHz, 2133MHz DIMMs, FDR IB, 1HDD

– Athena (IVB): 2-socket 10-core Intel E5-2680v2 @ 2.7GHz, 1600MHz DIMMs, FDR IB, 1HDD

– Athena (SNB): 2-socket 8-core Intel E5-2680 @ 2.7GHz, 1600MHz DIMMs, FDR IB, 1HDD

– Plutus (WSM): 2-socket 6-core Intel X5670 @ 2.93GHz, 1333MHz DIMMs, QDR IB, 1HDD

Performance Rating:

based on total

runtime

Page 19: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

19

Abaqus/Explicit Performance vs Power

• Modest dependency on CPU frequencies for Explicit case

– Approximately 23% of the gain between 2000MHz vs 2800MHz

– Tradeoff: About 51% of power needed to run from 2GHz to 2.8GHz CPUs

• Enabling Turbo Mode provides little difference performance for Explicit

– Turbo model provides ~4% of performance gain (for CPU cores running at 2800 MHz)

– It also resulted in 18% of higher power usage

Higher is better

18%

51%23%4%

Ivy Bridge

Page 20: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

20

Abaqus/Standard Performance vs Power

• Similar behavior regarding CPU frequency and Turbo mode is seen with Standard

– Up to 25% of the improvement at 51% of higher power utilization

– Small gain (5%) of performance when using Turbo Mode, at 18% higher power usage

• Using kernel tools called “msr-tools” to adjust Turbo Mode dynamically

– Allows dynamically turn off/on Turbo mode in the OS level

– Turbo Mode: Boosting core frequency; consequently resulted in higher power

consumption

Higher is better

18%51%25%

5%

Ivy Bridge

Page 21: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

21

Abaqus/Explicit Performance - Interconnect

• FDR InfiniBand is the most efficient network interconnect for Abaqus/Explicit

– FDR IB outperforms 1GbE by 5.2x, 10GbE by 87%, and at 8 nodes

– InfiniBand reduces communication time; provides more time for computation

– With RDMA, IB offloads CPU from communications, thus CPU can focus on computation

Higher is better

Page 22: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

22

Abaqus/Explicit Profiling – MPI Time Ratio

• FDR InfiniBand reduces the MPI communication time

– InfiniBand FDR consumes about 37% of total runtime at 4 nodes

– Ethernet solutions consume from 53% to 72% at 4 nodes

Page 23: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

23

Abaqus/Explicit Profiling – Time Spent in MPI

• Abaqus/Explicit: More time spent on MPI collective operations:

– InfiniBand FDR: MPI_Gather(54%), MPI_Allreduce(9%), MPI_Scatterv(9%)

– 1GbE: MPI_Gather(43%), MPI_Irecv(13%), MPI_Scatterv(14%)

Page 24: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

24

• Abaqus/Standard performs faster than the previous version

– 6.14-2 outperforms 6.13-2 by 8% at 8 nodes / 224 cores

– MPI library versions have also changed

• The gain on Abaqus/Explicit for the newer version is not as dramatic

– Only slight gain can be seen on the dataset tested

Abaqus Performance – Software Versions

8%

Higher is better

Page 25: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

25

• Changing QPI Snoop Configuration may improve performance in certain workload

– Home Snoop provide high memory bandwidth in average NUMA environment (default)

– Early Snoop may decrease memory latency but lower bandwidth

– Cluster on Die may provide increased memory bandwidth in highly optimized NUMA

workloads

– Only slight improvement is seen using Early Snoop for Abaqus/Standard, no difference

in Explicit

Abaqus Performance –Snoop Configuration

Higher is better

Page 26: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

26

• Abaqus/Explicit performs marginally faster with SRQ enabled

– Enabling SRQ performs 3% faster at 8 nodes / 224 cores

Abaqus Performance – SRQ

Higher is better

3%

Page 27: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

27

• Some improvement is observed by enabling Turbo Mode in Abaqus

– About 5% improvement seen for both Abaqus/Standard and Abaqus/Explicit

– The CPU clock runs at a higher clock frequency when turbo is enabled

– Additional power use can occur when Turbo Mode is on

Abaqus Performance – Turbo Mode

Higher is better

5%5%

Page 28: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

28

• Abaqus starts to support Intel MPI starting in version 6.14

– Platform MPI performs slightly higher performance than Intel MPI of

3% at 8 nodes

Abaqus Performance – MPI Libraries

3%

Higher is better

Page 29: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

29

Abaqus/Explicit Profiling – MPI calls

• Abaqus/Standard uses a wide range of MPI APIs

– MPI_Test dominates the MPI function calls (over 97%)

• Abaqus/Explicit shows high usage for testing non-blocking

messages

– MPI_Iprobe (95%), MPI_Test (3%)

Page 30: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

30

Abaqus/Explicit Profiling – Time Spent in MPI

• Abaqus/Standard shows high usage for non-blocking messages

• MPI_Gather dominates the MPI communication time in

Abaqus/Explicit

Page 31: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

31

Abaqus Profiling – Message Sizes

• Abaqus/Standard uses small and medium MPI message sizes

– Most message sizes are between 0B to 64B, and 65B to 256B

– Some medium size concentration in 64KB to 256KB

• Abaqus/Explicit shows a wide distribution of small message sizes

– Small messages peak in the range from 65B to 256B

Page 32: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

32

Abaqus Performance – GPU

• GPU Support for Abaqus/Standard since 2011

• Current supported version: Abaqus 6.14

– Direct Sparse solvers – symmetric & unsymmetric

– Multi-GPU/node; multi-node DMP clusters

– Flexibility to run jobs on specific GPUs

• Customer adoptions increasing across industry segments

• Abaqus GPU licensing based on tokens

– Same token scheme for CPU core & GPU

• Performance gains vary: 2-3x on average is common

Page 33: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

33

Abaqus Summary

• HP ProLiant Gen9 servers deliver superior compute power over predecessor

– Intel E5 2600V3 series (Haswell) processors and FDR IB outperforms prior generations

– Performs 113% higher vs Westmere, 36% vs Sandy Bridge, and 23% vs Ivy Bridge

• FDR InfiniBand is the most efficient cluster interconnect for Abaqus

– FDR IB outperforms 1GbE by 5.2 times, 10GbE by 87% at 8 nodes

• Abaqus/Standard performs faster than the previous version

– Version 6.14-2 outperforms version 6.13-2 by 8% at 4 nodes / 80 cores

– Both Platform MPI and Intel MPI is included in version 6.14-2

– Enabling SRQ may improve performance at scale

Page 34: Achieving Higher Productivity on Abaqus Simulations for ... · • Dacapo • Desmond • DL-POLY ... card & Active Health Power Supplies HP 2,400 or 2,650 W Platinum hot-plug power

3434

Thank YouHPC Advisory Council

All trademarks are property of their respective owners. All information is provided “As-Is” without any kind of warranty. The HPC Advisory Council makes no representation to the accuracy and

completeness of the information contained herein. HPC Advisory Council undertakes no duty and assumes no obligation to update or correct any information presented herein