36
© 2012 ANSYS, Inc. April 26, 2012 1 High-Performance Cluster Configurations for ANSYS Fluid Dynamics ANSYS IT Solutions Webcast Series

High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 1

High-Performance Cluster Configurations for ANSYS Fluid Dynamics

ANSYS IT Solutions Webcast Series

Page 2: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 2

2012 webcast series from ANSYS and our partners

Our goal is to provide ANSYS customers with

•Recommendations on HW and system specification

•Best practice configuration, setup, management

•Roadmap and vision for planning

Upcoming Topics

•Storage and Data Management

•Remote Access / Centralized HPC

•Workstation Refresh ROI

•Cloud, mobile platforms, …

IT Solutions for ANSYS - Webcast Series

http://www.ansys.com/Support/Platform+Support/IT+Solutions+for+ANSYS+Webcast+Series+2012

Page 3: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 3

ANSYS Focus on IT Solutions

IT is the enabler for more effective use of engineering simulation.

High Fidelity Results

Simulation allows engineering to know, not guess – but only if IT can deliver infrastructure suited for ever larger “mega”-simulations

Design Exploration / Optimization

Product integrity requirements drive automated execution of 100’s of simulations, with important implications for the IT infrastructure

Average HPC infrastructure for

ANSYS in our largest customers is

well over 10,000 cores –

and growing very fast

Page 4: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 4

High-Performance Cluster Configurations for ANSYS Fluid Dynamics

Today’s Agenda and Speakers

•Best Practice Cluster System Specification • Hari Reddy, Senior Technical Consultant, IBM

•IBM Solution Recommendations • Guarav Chaudhry, X-Server WW Marketing, IBM

• William Lu, Product Management, Platform Computing, an IBM Company

•Question / Answer

Page 5: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 5

Best Practice Cluster System Specification

Hari Reddy, Senior Technical Consultant, IBM

[email protected]

Page 6: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 6

IBM IT Selection Guide for ANSYS Fluent

Decision Points Cluster

– Collection of nodes – Blade, Rack, High Density

Network/Interconnect – Interconnect type

Processors/Socket/CPU – Cores per socket

CPU Cores – Clock, – TURBO, Hyper-Threading

Memory (DIMMS, Channels) – Size and distribution

Storage – Types of storage

Resource Management Recommended Configurations

– Small, Medium, Large

Page 7: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 7

Impact of CPU Speed on ANSYS Fluent PerformanceProcessor: Xeon X5600 Series, 12 cores per node

Hyper Threading: OFF, TURBO: ONModel: truck_14m

(performance measure is improvement relative to CPU Clock 2.66 GHz)

0.800.850.900.951.001.051.101.151.201.251.301.35

Clock ratio 1 2 4

Number of Nodes Allocated to a Single JobC

lock

Impr

ovem

ent

rel

to 2

.67

GH

z

2.66 GHz2.93 GHz3.47 GHz

CPU Clock

Decision point – What CPU clock to use?

Xeon X5600 clock range – 2.67 GHz to 3.47 GHz (3.6 GHz)

Always some improvement – 5% to 10% – well short of 30% or more clock

ratio – ANSYS Fluent specific

Improves utilization of fixed number of licenses Recommendation

– Use Xeon X5675 (3.06 GHz, 95 watt)

– Consider other parts of cluster for further investment before choosing fastest clock

© 2012 IBM Corporation

Page 8: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 8

Turbo Boost Evaluation of TURBO Boost on ANSYS FLUENT 14.0

Performance dx360 M3 (Intel Xeon x5670, 6 core, 2.93 GHz)

Hyper Threading: OFF; Memory speed: 1333 MHz(performance measure is improvement relative to TURBO OFF)

1.00

1.02

1.04

1.06

8 Cores (Quad core processor)

12 Cores (6 core processor)

Number of Cores Used in a Node

TUR

BO

Boo

st Im

prov

emen

t .

eddy_417K turbo_500K aircraft_2M sedan_4M truck_14M

Evaluation of TURBO Boost on ANSYS FLUENT 14.0 Performance

dx360 M3 (Intel Xeon x5670, 2.93 GHz) 4X QDR IBHyper Threading: OFF; Memory speed: 1333 MHz

Model: Truck_14M(measurement is improvement relative to TURBO Boost OFF)

100%

102%

104%

106%

1 2 4 8 16 32Number of Nodes Allocated to a single job

TURB

O Bo

ost I

mpr

ovem

ent

.

8 cores (quad core/processor)12 cores (6-cores/processor)

Decision point – TURN TURBO Boost ON or OFF?

Intel® Turbo Boost Technology 2.0 automatically allows processor cores to run faster than the base operating frequency if it is operating below power, current, and temperature specification limits More benefit when

– less cores are active – the workloads are moderate

TURBO gives Quad-core slightly more boost than 6-core In larger clusters, as the workload is spread

over more nodes there are more opportunities for 6-core processors to achieve similar boost in performance as quad-core Recommendation

– Turbo Boost should always be turned on to get more performance

© 2012 IBM Corporation

Page 9: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 9

Hyper-threading

Evaluation of Hyperthreading on ANSYS Fluent 14.0 Performance

iDataplex M3 (Intel Xeon x5670, 2.93 GHz)Networ: 4X QDR Infiniband

TURBO BOOST: ON; Memory speed: 1333 MHzModel: Truck_14M

80%

85%

90%

95%

100%

105%

110%

115%

120%

1 2 4 8 16 32

Number of Nodes Allocated to a Single job

Impr

ovem

ent d

ue to

H

yper

thre

adin

g

HT OFF (12 threads on 12 physical cores) HT ON (24 threads on 12 physical cores)

Decision point – Use Hyper-Threading or NOT?

Intel® Hyper-Threading (HT) Technology uses processor resources more efficiently, enabling two threads to run on each core – App see twice the cores HT improves ANSYS Fluent performance

by a small % Requires double the number of licenses Recommendation

– If the license configuration permits • HT should be used to improve

HW and SW utilization – If the number of licenses is limited

• HT is not an efficient utilization of the licenses and is not recommended (HT may turned on but is used by ANSYS Fluent)

© 2012 IBM Corporation

Page 10: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 10

ANSYS/FLUENT 14.0 Performance on Two-socket and 4-Sock2-Socket System: dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz)4-Socket System: System x3850 (Xeon E7-8837, 8C, 2.23 GHz)

Hyperthreading: OFF, TURBO: ONModel: Truck_14m

88

188

0

20

40

60

80

100

120

140

160

180

200

2-socket (X5600) Node 4-Socket (E7-8837) NodeNumber of Sockets

Flue

nt R

atin

g (h

ighe

r is

bette

r)

Number of Sockets

Decision Point – Use 2-socket or 4-socket systems?

2-socket based system use X5600 series – up to maximum of 12 cores per system

4-socket systems use e7-8800 series – up to a maximum of 40 cores per

system; good scalability – 32 cores (one 4-socket system) give

performance equivalent to 24 cores (two 2-socket systems)

Recommendation – If the application can run within a single

4-socket • Use 4-socket based cluster • No high-speed network is needed

– Otherwise use clusters made out of 2-socket based system

• High speed network needed

Fluent Rating = Number of Benchmark jobs that can be run in 24 hours

(higher values are better)

© 2012 IBM Corporation

Page 11: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 11

ANSYS/FLUENT 14.0 Performance Quad-core vs. 6 Core Processors

IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON

Cores per job: 96; Nodes are equally loadedModels: Sedan_4m

6736

8533

4000

5000

6000

7000

8000

9000

6-core Processor (8 nodes)

Quad-core Processor(12 nodes)

Processor Density

Flue

nt R

atin

g (h

ighe

r is

bette

r)

Processor Core Density Selection

Decision Point – Use quad-core or 6-core processors?

Processors come in Quad and Hex cores Quad core processors have higher per core

cache and memory bandwidth – ANSYS Fluent runs 20% to 30% faster

on quad-core based systems • Improves productivity of fixed

number of licenses – Costs 50% more than 6-core based

systems for the same total core count Recommendation

– If the primary goal is to improve productivity of licenses

• Used quad-core based systems – If the total cluster cost is a primary

consideration • Use 6-core based systems

Fluent Rating = Number of Benchmark jobs that can be run in 24 hours

(higher values are better)

© 2012 IBM Corporation

Page 12: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 12

Memory Configuration

Impact of DIMM speed on ANSYS Fluent 14.0 Performance (Intel Xeon x5670, 6C, 2.93 GHz)

Hyper Threading: OFF, TURBO: ONEach job uses all 12 cores

80%

85%

90%

95%

100%

105%

110%

115%

120%

eddy_417K turbo_500K aircraft_2M sedan_4M truck_14M

ANSYS/FLUENT Model

Impa

ct o

f Mem

ory

Spe

ed

1066 MHz1333 MHz

Decision Point – What it the most efficient memory config?

Faster memory improves performance of ANSYS Fluent – about 10% Xeon 5600 Series Processors/Memory can

run at the maximum speed of 1333 MHz Memory should be configured properly

– Otherwise the memory may run at slower speeds of 1066 MHz or even at 800 MHz

Recommendation – To operate at maximum speed all memory

channels in both processors should be populated with equal amounts of memory

• memory guidelines specify that total memory is in discrete amounts of 24 GB, 36 GB, 48 GB … per node.

– 24 GB memory per node is sufficient

© 2012 IBM Corporation

Page 13: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 13

Networks

Network Protocol Latency (micro sec)

Bandwidth (Mbytes/sec)

Gigabit TCP/IP 26 110

10-Gigabit iWARP 8.4 1150

4X QDR Infiniband

VERBS or PSM

1.6 3200

ANSYS/FLUENT 14.0 Performance dx360 M3 (Intel Xeon x5670, 6C 2.93 GHz)

Network: Gigabit, 10-Gigabit, 4X QDR Infiniband Hyperthreading: OFF, TURBO: ON

Models: truck_14M

0

1000

2000

3000

4000

5000

1 2 4 8 16 32 64Number of Nodes Allocated to a Single Job

FLU

ENT

Rat

ing

4X QDR IB 10-Gigabit Gigabit

Decision Point – What network to use?

Choices are 10 Gigabit and Infiniband Measures to evaluate

– Latency and bandwidth – ANSYS Fluent is more sensitive to

latency than bandwidth – messages <1K can be as high

as 80% on large networks ANSYS Fluent performance is comparable

on both 10-Gigibit and Infiniband for small clusters

– Use direct access (e.g., iWARP) protocols for 10-Gigabit networks

Infiniband will optimize application performance resulting in better scalability for larger clusters Recommendation

– Use 10G for small clusters – Infiniband for larger clusters

Fluent Rating = Number of Benchmark Jobs that can be run in 24 hours (higher values are better)

© 2012 IBM Corporation

Page 14: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 14

Infiniband Network

12-Port Switch

Link connecting switch elements

Level 1

Level 2

24 Compute Servers

Level 1

Level 2

24 Compute Servers

2:1 Blocking Network

1:1 Non-Blocking

Link connecting server to switch

Model Fluent Rating (Higher Values are better)

Non-blocking 4:1 Blocking 8:1 Blocking

Truck_14m

329 327

Eddy_417k 16255 16225

Decision Point – What type of Infiniband network?

Several types of IB networks – DDR, QDR, FDR so far, EDR next – Bandwidth is the main differentiator

– Some improvement in latency – ANSYS Fluent is more sensitive to

latency than bandwidth – Performance of FDR may not

be commensurate with its cost

ANSYS Fluent does not seem to be sensitive to blocking in Infiniband Recommendation

– Use 4X QDR Infiniband – A blocking factor of 2:1 or even 4:1 – PSM for qLogic/Intel – VERBs for Voltaire/Mellanox

Source: Intel/qLogic © 2012 IBM Corporation

Page 15: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 15

ANSYS FLUENT Price Performance Analysis

100%

46%

24%

12%

6% 2%3%2%1 2 4

8

16

128

32

64

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1 2 4 8 16 32 64 128Number of Nodes Allocated to a Single Job

Tim

e R

el to

1 N

ode

Run

0

16

32

48

64

80

96

112

128

Tota

l Cos

t rel

to 1

Nod

e

Time Rel to 1 Node Run HW Cost Rel to 1 Node

Cluster Size (nodes)

Feasible Performance-Price Spectrum

Decision Point – How to determine number of

nodes in the cluster? Depends on a number of factors:

– Business value, budget – Environment (space, power etc)

Cluster size can be highly scalable but not feasible due to total cost and other factors Recommendation

– Determine the Feasible Performance-Price Spectrum

• Eliminating some obvious infeasible choices

– e.g. too big, too small, does not fit in the existing data center

– Conduct ROI Analysis on the promising candidates

© 2012 IBM Corporation

Page 16: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 16

Storage Options

ANSYS/FLEUNT I/O Performance on different storage solutions

Cluster: dx360 M3 (Intel Xeon x5670, 2.93 GHz)Multiple simultaneous jobs - Each job using 96 cores

20

220

420

620

820

1020

1220

1420

1620

1 2 4Number of simultaneous jobs (each using 96 cores)

Ave

rage

tota

l I/O

tim

e (s

ec)

NFS (1Gbit)Local File SystemGPFS (Infiniband)

The options are numerous – Ranging from simple local disks to

enterprise level storage – Simple local file system to NFS to

parallel file system (GPFS) One solution does not fit all NFS has some limitations but simple GPFS scalable and more complex A full webcast is planned to discuss the

choices and recommendations – Coming up in June

© 2012 IBM Corporation

Page 17: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 17

Westmere/Sandy Bridge Comparison

Westmere Sandy Bridge

Processor X5600 series E5-2600 series

Cores/processor 6 8

Cores/system 12 16

Specfp Rate (single system) 1.0 1.9

Specint Rate (single system) 1.0 1.85

Fluent (single system) 1.0 1.6

Source: Intel.com

© 2012 IBM Corporation

Pack more compute power (performance) – 32 nodes (12 cores each) of Westmere –

384 cores – 32 nodes (16 cores each) Sandy bridge –

512 cores – ~40 to 50% faster (estimate) – More licenses

Reduce the size of cluster (cost) – 32 nodes (12 cores each) of Westmere –

384 cores – 24 nodes (16 cores each) Sandy bridge –

384 cores – ~10% to 20% faster (estimate) – Same number of licenses

Page 18: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 18

Cluster Packaging

© 2012 IBM Corporation

• Decision Point – Rack, Blade, High Density server?

• ANSYS Fluent performance is not affected by what packaging you select

• Total freedom to choose the right packaging based on other factors – Data center environment – Budget

A Recommendation • For low entry point (one or two servers)

– Use rack server • Small and Medium

– Use blade configuration • Large

– Use high density configuration

Capability Rack Blade High Density ANSYS Fluent

Density (# servers/sq ft floor space)

Low Medium High Not a factor

Node Capabilities

Lots of I/O options, memory

Much less Much less Not a factor

Processors speeds

Top speed slightly more

Little less Little less Not a factor

Integration Less Moderate High Not a factor

Page 19: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 19

Recap

© 2012 IBM Corporation

Cluster Selection • Concepts • Decision points

– Best Practices • Precise where possible • Guidelines otherwise • ANSYS Fluent specific

– Take other coexisting apps into consideration

• Technology Transition – Intel Westmere – Intel Sandy Bridge

Next Steps • Translate into real products • Manage the cluster resources

Page 20: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 20

IBM System Recommendations

Gaurav Chaudhry, System-X Worldwide Marketing, IBM

[email protected]

William Lu, Product Management, Platform Computing, an IBM Company

[email protected]

Page 21: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 21

Recommended Configurations BladeCenter S

Head Node/ File Server/ Compute Node1

Compute Node 2 Compute Node 3 … Compute Node 6

Shared (NFS) BladeCenter S Storage

SAS

Conn

ect

Gig

abit

SMALL – 2-Socket based Typical Users

– Several simultaneous single node jobs • High speed interconnect is not required

– Size of each job under a few million cells Blade Center S (BC-S) with up to 6 HS22 blades

– HS22 - two Xeon X5675 3.06 GHz 6C - 48GB File System/Storage is through up to 12 SAS Drives in BC-S, NFS OS Support: RedHat, SuSe and Windows Access: through ANSYS Remote Simulation Manager, Platform LSF

BladeCenter S

Head Node/ File Server/ Compute Node1

Compute Node 2 Compute Node 3

Shared (NFS) BladeCenter S Storage

SAS

Conn

ect

Gig

abit

SMALL – 4-Socket based Typical Users

– Several simultaneous larger single node jobs • High speed interconnect is not required

– Size of each job up to ~10 million cells Blade Center S (BC-S) with up to 3 HX5 blades

– HX5 – four Xeon E7-8837 3.06 GHz 8C - 128GB File System/Storage is through up to 12 SAS Drives in BC-S, NFS OS Support: RedHat, SuSe and Windows Access: through ANSYS Remote Simulation Manager, Platform LSF

© 2012 IBM Corporation

Page 22: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 22

Recommended Configurations

BladeCenter H

Head Node/ File Server/ Compute Node1

Compute Node 2 Compute Node 3 … Compute Node 14

(NFS) Storage DS3500

SAS

Gig

abit

10 G

igab

it

Medium - 2-Socket based Typical Users

– Several simultaneous multi-node jobs – Size of each job up to ~10 million cells

Blade Center H (BC-H) with up to 14 HS22 blades (up to 168 cores)

– HS22 - two Xeon X5675 3.06 GHz 6C - 48GB Network: Gigabit, 10-Gigabit File System/Storage

– External DS3500 disk system with SAS connectivity – NFS

OS Support – RedHat, SuSe and Windows

Access: through ANSYS Remote Simulation Manager, Platform LSF

Page 23: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 23

Recommended Configurations

iDataplex

Head Node/ NFS File Server

Compute Node 1 Compute Node 3 … Compute Node 69

SAS

Gig

abit

GPFS Server 1 GPFS Server 2

(NFS and GPFS) Storage DCS3700

4X Q

DR

2:1

bloc

king

Large - 2-Socket based Typical Users

– A large number of simultaneous multi-node ANSYS Fluent Solver Phase jobs

– One extreme-scale Solver Phase job using all nodes (using up to 828 cores)

– The size of each job can range from a few million to hundreds of millions

iDataplex with up to 72 dx360 M3 nodes – Dx360 M3 - two Xeon X5675 3.06 GHz 6C - 48GB

Network: Gigabit, Infiniband File System/Storage

– External disk DCS3700 with SAS connectivity – GPFS

OS Support: RedHat, SuSe and Windows HPC Access: through ANSYS Remote Simulation Manager,

Platform LSF

Page 24: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 24

Do more now in the datacenter you own today

Key Features

• Integrated 10GbE Virtual Fabric for high speed networking

• Includes latest compute, network, and storage technology

• IBM FastSetup wizard tool for day 0 deployments

• Up to 18 networking ports of Ethernet, FCoE, and iSCSI

• Support for existing chassis infrastructure

Benefits

• Lower cost Up to 30% savings for the solution3

• Enhanced performance4 62% more compute power, 20% more VMs

• Simple setup Accelerate time to value for deployment

• Breakthrough networking flexibility Built in support for multiple technologies

• Complete investment protection Add capability to existing investments

3 Using integrated Virtual Fabric solution vs. separate components. 4 Source: Intel Corp.

IBM BladeCenter HS23

Page 25: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 25

IBM System x3650 M4

Benefits

• Performance Leadership in its class

• Business flexibility Optimize for performance and cost

• Advanced networking Optimize for performance and connectivity

• Reliable IT Highest level quality to safely run your business

Key Features

• Up to 80% more performance5 for customers’ critical applications and top scores for SAP, OLTP and virtualization

• 2.6x memory with up to 768 GB memory, full set of processors, 2.5” or 3.5” drives

• Twice the 1GbE ports and slotless 10GbE Virtual Fabric upgrade from multiple vendors and protocols

• #1 customer satisfaction from TBR6 due to features like Predictive Failure Analysis and Light Path Diagnostics

Expandable 2U for business critical workloads

5 Source: Intel Corp. 6 IBM has been ranked #1 in x86 server satisfaction from Technology Business Research for 10 straight quarters.

Page 26: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 26

IBM System x3550 M4

Dense 1U for business critical workloads

Key Features

• Up to 80% more performance for customers’ critical applications for up to 24:1 consolidation of 3 year old servers¹

• 2.6x memory with up to 768 GB memory, full set of processors, 2.5” or 3.5” drives

• Twice the 1GbE ports and slotless 10GbE Virtual Fabric upgrade from multiple vendors and protocols

• #1 customer satisfaction from TBR² due to features like Predictive Failure Analysis and Light Path Diagnostics

Benefits

• High Performance Run more applications than ever before

• Business flexibility Optimize for performance and cost

• Advanced networking Optimize for performance and connectivity

• Reliable IT Leadership quality for distributed environments running critical workloads

7 Source: Intel Corp. and the IBM Systems Consolidation Evaluation Tool. http://ibm.com/systems/x/resources/tools/sconevalltool_intel/index.html

Page 27: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 27

Ideal for distributed locations

Key Features

• Up to 80% more performance for customers’ critical applications for up to 24:1 consolidation of 3 year old servers8

• 4x memory with up to 768 GB memory, full set of processors, 2.5” or 3.5” drives

• Twice the 1GbE ports and 10GbE Virtual Fabric upgrade from mulitple vendors and protocols

• Up to 32 2.5” hard drives with flexible RAID in a tower or 5U rack design

Benefits

• High Performance Run more applications than ever before

• Business flexibility Optimize for performance and cost

• Advanced networking Optimize for performance and connectivity

• Room to grow Massive internal storage and I/O for distributed environments

8 Source: Intel Corp. and the IBM Systems Consolidation Evaluation Tool. http://ibm.com/systems/x/resources/tools/sconevalltool_intel/index.html

IBM System x3500 M4

Page 28: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 28

High performance for power and cooling constrained environments

Key Features

• Up to 120% more performance for HPC applications¹

• Up to 4 GPUs per chassis, 1.5x memory capacity, slotless InfiniBand networking

• Industry leading node level optional direct warm water cooling technology

• Available slotless InfiniBand or 10Gb Ethernet upgrade with a variety of vendors and protocols

Benefits

• High Performance Run more applications than ever before

• Designed for HPC Outstanding performance for computationally intensive workloads

• High efficiency Up to 40% efficiency advantage over air cooled systems9

• Business flexibility Outstanding flexibility and performance for support of high performance solutions

9 Compared with iDataPlex dx360 M4 systems without warm-water cooling.. 10 Source: Intel Corp.

IBM System x iDataPlex dx360 M4

Page 29: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 29

IBM Intelligent Cluster

Serv

ers

Inte

rcon

nect

Ope

rati

ng

Syst

ems

Man

agem

ent

Soft

war

e

Stor

age

IBM

Ser

vice

s

Intelligent cluster combines all haDavid G Walker/Raleigh/Contr/IBM@IBMUSrdware, software, services and support into a single integrated product offering, providing clients the benefit of a single point-of-contact for the entire cluster that is easily deployed and managed

System x servers OEM IO Interconnects Storage Cabling Integration, configuration and testing Delivery, on-site setup and support

Integrated Solution

GPFS

1/10/40Gb Eth QDR/FDR IB

Page 30: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 30

Management Software for System x Servers

Fully integrated cluster management software for ease of deployment, ease of use, and ease of operation

Page 31: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 31

Integration with ANSYS applications

ANSYS Remote Solver Manager •Submit and manage ANSYS FLUENT jobs through ANSYS RSM with the integration of Platform LSF and Platform MPI

Remote Visualization Integration •Schedule and launch remote visualization sessions using Platform LSF and Platform Application Center

Page 32: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 32

Wrap Up / Next Steps

Specifying a cluster for ANSYS simulation can be challenging; Let ANSYS and IBM help you succeed!

Contact us: • [email protected]

[email protected]

[email protected]

[email protected]

And look for our invitation to download the

“IBM IT Guide for ANSYS Fluent Customers”

Page 33: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 33

Register Now:

www.ansys.com/Confidence

Confidence by Design Workshops

Newark April 19

Orlando May 2

Minneapolis May 8

Seattle May 15

Chicago June 14

Phoenix April 26

Salt Lake City May 4

San Diego May 8

Boston May 17

Houston June 20

Denver May 2

Baltimore May 7

San Jose May 10

Detroit June 5

Page 34: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 34

ANSYS Training

Interested in Live Training Sessions with ANSYS Experts? Training classes are offered both locally and online. View the entire schedule at: http://www.ansys.com/Support/Training+Center Topics include: • Introduction to ANSYS Mechanical • Introduction to ANSYS DesignModeler • Introduction to ANSYS FLUENT • Introduction to ANSYS Icepak • Introduction to ANSYS HFSS • Introduction to ANSYS Explicit Dynamics • Introduction to ANSYS ICEM CFD • And Many More…

Page 35: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 35

To Ask a Question:

Click on the Q&A tab in the WebEx Toolbar

Webinar Recording: Available in one week’s time in the

ANSYS Resource Library at www.ansys.com/resource+library

Page 36: High-Performance Cluster Configurations for ANSYS Fluid ... · IBM dx360 M3 (Intel Xeon x5670, 6C, 2.93 GHz), 4X QDR IB Hyperthreading: OFF, TURBO: ON Cores per job: 96; Nodes are

© 2012 ANSYS, Inc. April 26, 2012 36 © 2012 IBM Corporation

System x®

36

Trademarks and disclaimers Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries./ Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. UNIX is a registered trademark of The Open Group in the United States and other countries. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Other company, product, or service names may be trademarks or service marks of others. Information is provided "AS IS" without warranty of any kind. The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products. All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here. Prices are suggested U.S. list prices and are subject to change without notice. Starting price may not include a hard drive, operating system or other features. Contact your IBM representative or Business Partner for the most current pricing in your geography. Photographs shown may be engineering prototypes. Changes may be incorporated in production models. © IBM Corporation 20112 All rights reserved. References in this document to IBM products or services do not imply that IBM intends to make them available in every country. Trademarks of International Business Machines Corporation in the United States, other countries, or both can be found on the World Wide Web at http://www.ibm.com/legal/copytrade.shtml.