30
Intel tools to optimize HPC systems May 2014

Intel tools to optimize HPC systems

Embed Size (px)

DESCRIPTION

Intel Software Conference 2014 Werner Krotz-Vogel [email protected]

Citation preview

Page 1: Intel tools to optimize HPC systems

Intel tools to optimize HPC systems May 2014

Page 2: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Agenda

Intel® Developer Products Overview

Intel® Parallel Studio XE and Cluster Studio XE 2013 Overview

What’s new with XE 2015 Beta ?

Where to get ?

Intel Confidential

Page 3: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Web App Performance Client System Technical Computing

Deploy apps on multiple platforms

using one codebase

Native cross-platform C++ development for multimedia apps and

more

Create fast, efficient embedded & mobile devices/systems in

less time

Improve application performance, scalability and

reliability

Intel® Developer Products

Intel® XDK

Intel® Quark

Intel® INDE

Intel Confidential

Page 4: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

• Industry-leading performance from advanced compilers

• Comprehensive libraries

• Parallel programming models

• Insightful analysis tools

More Cores. Wider Vectors. Performance Delivered. Intel® Parallel Studio XE and Intel® Cluster Studio XE

Serial Performance

Scaling Performance

Efficiently Multicore Many-core

128 Bits

256 Bits

512 Bits

50+ cores

More Cores

Wider Vectors Task & Data

Parallel Performance

Distributed Performance

Intel Confidential

Page 5: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Phase Product Feature Benefit

Build

Intel® Composer XE Compilers, Performance and Threading Libraries Out of the box performance

Intel® MPI Library† High Performance Message Passing (MPI) Library Interconnect independence

Intel® Advisor XE Threading Prototyping Tool (Studio XE products only) Simplifies parallel application design

Verify & Tune

Intel® VTune™ Amplifier XE Performance Profiler Find performance bottlenecks

Intel® Inspector XE Memory & Threading Dynamic and Static Analysis Code quality, improved security

Intel® Trace Analyzer & Collector† MPI Performance Profiler Find performance bottlenecks in

cluster-based applications

Efficiently Produce Fast, Scalable and Reliable Applications

Intel® Parallel Studio XE 2013 and Intel® Cluster Studio XE 2013 Service Pack 1

Intel Confidential

Page 6: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Support for Latest Intel Processors and Coprocessors

† Hardware events for new processors added as new processors ship. †† Analysis runs on multicore processors, provides analysis for multicore and many-core processors.

New Product Announcements Embargoed until September 4,

8am Pacific Time

Intel® Haswell microarchitecture

Intel® Broadwell microarchitecture

Intel® Xeon Phi™ coprocessor

Intel® C++ and Fortran Compiler

✔ ✔ ✔

Intel® TBB library ✔ ✔ ✔

Intel® MKL library ✔ ✔ ✔

Intel® MPI library ✔ ✔ ✔

Intel® VTune™ Amplifier XE† ✔ ✔ ✔

Intel® Inspector XE†† ✔ ✔ ✔

Now with Windows* support

Intel Confidential

Page 7: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Intel® C++ and Fortran Compiler

Page 8: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Intel® C++, Intel® Fortran, with Performance Libraries Intel® Composer XE

Industry leading application performance, serial and parallel

Intel compilers: Intel Fortran and Intel C++ with Intel® Cilk Plus

Intel Performance Libraries Intel® Threading Building Blocks Intel® Math Kernel Library Intel® Integrated Performance Primitives

Architecture support: IA 32, Intel 64, Intel® Xeon Phi™ product family, Intel compatible processors

Compatibility Windows: Visual* C++ and Visual Studio* 2008, 2010, 2012 Linux, Mac OS X, including Mountain Lion: gcc and, for C++ Eclipse & Xcode for Mac

Performance Compatibility Support Intel Confidential

Page 9: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Leadership Application Performance More Performance for your C++ applications

Just recompile Uses Intel® AVX and Intel® AVX2 instructions Intel® Xeon Phi™ product family support (Linux) Intel® Cilk™ Plus: Tasking and vectorization

More Performance for your Fortran applications

Just recompile Intel® Xeon Phi™ product family: Linux compiler, debugger support Access to Intel® AVX and Intel® AVX2 instructions (-xa or /Qxa) Auto-parallelizer & directives to access SIMD instructions Coarrays & synchronization constructs support parallel programming Loop optimization directives: VECTOR, PARALLEL, SIMD More control over array data alignment (align arrayNbytes) New in 2013 XE SP1 release: more Fortran 2008 support

Intel Confidential

Page 10: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Up to 4x Faster Performance with Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Support

SSE / SSE2

AVX-512

AVX / AVX2

Enables higher performance for the most demanding computational tasks

Intel® Compilers and Intel® Math Kernel Library will be updated in Q4 with AVX-512 support - Significant leap to 512-bit SIMD support

- Increased compatibility with AVX

- One byte longer EVEX prefix, enabling additional

functionality

- First implemented in the future Intel® Xeon Phi™ coprocessor, code named Knights Landing

4x up to

faster

2x

up to

faster

Peak single precision floating point performance

Intel Confidential

Page 11: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Support for offload to Intel® Graphics Technology Redesign of optimization reports including vectorization report New icl/icl++ compilers on OS X* Full C++11 language support: virtual overrides inheriting constructors, deprecation of exception specifications user defined literals thread_local

Full Fortran 2003 support (Parameterized Derived Types added) Fortran 2008 Blocks support Almost all OpenMP* 4.0 (only missing user-defined reductions) Keyword versions of SIMD pragmas added _Simd, _Safelen, _Reduction

Use arithmetical and logical operators with SIMD data types (like __m128)

What’s New in Intel Composer XE 2015 Beta

Intel Confidential

Page 12: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

-ansi-alias enabled by default at –O2 and above on Linux C++ -fast/-Ofast enables –fp-model fast=2 gcc-compatible function multiversioning aligned_new header Fortran option –init=snan to initialize all uninitialized SAVEd scalar and

array variables of type REAL and COMPLEX to signaling NaNs __intel_simd_lane() intrinsic to represent simd lane number in a SIMD

vector function Compiler option –no-opt-dynamic-align to disable generation of multiple

code paths depending on alignment of data Improved lambda function debugging Permit non-contiguous data transfers on #pragma offload gdb* debugger supports Fortran (Intel® Debugger removed) Ability to create custom install packages from online install

New Features Overview

Intel Confidential

Page 13: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Released Linux* operating systems supported: Fedora* 20 Red Hat Enterprise Linux* 6 SUSE LINUX Enterprise Server* 11 Ubuntu* 12.04 LTS (64-bit only), 13.10 Debian* 6.0, 7.0

Also intending to support these operating systems†: Fedora 21 Red Hat Enterprise Linux 7 SUSE LINUX Enterprise Server 12 Ubuntu* 14.04 LTS

Note the following are now not supported in this release: Red Hat Enterprise Linux 5 SUSE LINUX Enterprise Server 10

Supported Platforms in 2015 Beta

† These operating systems are not released as of the date of this presentation. Intel Composer XE does not support operating systems until after they are officially released. Refer to product release notes for support details.

Intel Confidential

Page 14: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Every report (-opt-report, -vec-report, -par-report, -openmp-report) now put under single –opt-report interface

Other report options will still work, but report information generated and the way it’s generated will map to new model

Output now defaults to a one report file per generated object model. Can be changed using –opt-report-file=<filename|stderr|stdout>.

Report information designed to be more readable and actionable

Redesigned Optimization Reports

Intel Confidential

Page 15: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library

Page 16: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library (Intel® MKL) • Vectorized and threaded for highest performance on all Intel

and compatible processors • De facto standard APIs for simple code integration • Compatible with all C, C++ and Fortran compilers • Royalty-free, per developer licensing for low cost deployment

#1 used math library

in the world Source: Evans Data 2011- 2013 WW Developer Surveys

Just Link to the Next Intel® MKL Version to Realize New Processor Performance 16

Page 17: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Cluster PARDISO Intel® Direct Sparse Solver for Cluster (Intel® CPardiso) is a powerful tool set for solving

system of linear equations with sparse matrix of millions rows/columns size. Intel® CPardiso provides an advanced implementation of the modern algorithms and could be considerate an expansion of Intel MKL Pardiso on cluster computations

Atom optimizations (for Airmont) For the BLAS and FFT Domains

S/C/Z/DGEMM improvements on small matrix sizes

What’s New in Intel MKL 11.2 Beta

Intel Confidential

Page 18: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Intel MKL Cookbook recipes New document with recipes for assembling Intel MKL routines for solving complex

problems

Verbose mode for BLAS and LAPACK MKL Verbose mode provides information about usage of MKL routines called by

customers (set environment variable MKL_VERBOSE=1)

What’s New in Intel MKL 11.2 Beta

Intel Confidential

Page 19: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Intel® Inspector XE

Page 20: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Static Analysis Code & Security Errors

Dynamic Analysis Memory Errors

Intel® Inspector XE - Deliver More Reliable Applications Intel® Inspector XE and Intel® Parallel Studio XE family of suites

20

Threading Errors

Static Analysis & Pointer Checker are only available in the Parallel Studio XE family of suites. Not sold separately.

Pointer Checker Pointer Errors

Intel®

Inspector XE alone

Added bonus features in Intel®

Parallel Studio XE and Intel® Cluster Studio XE suites

Intel Inspector XE dynamically instruments & runs the application and watches for errors. Use any build, any compiler (debug build is best).

Intel compiler inspects source. Use any compiler for production.

Intel compiler run time checks. Use any compiler for production.

Page 21: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

What’s New in Intel Inspector XE 2015 Beta

Improved On-Demand Leak Reporting and Memory Growth Control!

New Memory usage graph – Get real-time information about memory in use on your system!

Thread Checking performance improved by 3X – with a reduction in memory footprint as well!

Intel Confidential

Page 22: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Intel® Advisor XE

Page 23: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

1) Analyze it.

3) Tune it.

4) Check it.

5) Do it!

2) Design it. (Compiler ignores these annotations.)

Design Then Implement Intel® Advisor XE – Threading Prototyping Tool

23

Less Effort, Less Risk, More Impact

Design Parallelism • No disruption to regular

development • All test cases continue to

work • Tune and debug the design

before you implement it

Implement Parallelism

Page 24: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

What’s New in Intel Advisor XE 2015 Beta Improved Viewing and Advanced

Modeling of Suitability information!

New Target Platforms option – See modeling based on Xeon or Xeon Phi!

New Iteration Space Modeling section – Run a smaller sample and see what happens when you scale up!

New Task details option - Information about differences between iterations moved to its own view for additional clarity!

Intel Confidential

Page 25: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 25

Intel Confidential

Page 26: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Where to get?

Page 27: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Intel® Parallel Studio XE Suites Leading development suites for application performance

27

Create fast, reliable code

Intel® Cluster

Studio XE

Intel® Parallel

Studio XE

Ana

lysi

s

● ● Intel® VTune™ Amplifier XE - Performance Profiler

● ● Intel® Inspector XE - Memory & Thread Analyzer

● ● Static Analysis & Pointer Checker - Find Coding & Security Errors

● ● Intel® Advisor XE - Threading Prototyping Tool

● Intel® Trace Analyzer & Collector - MPI Optimizing Tool

Com

pile

rs

&

Libr

arie

s

● ● Intel® Compiler - Optimizing Compiler for C, C++ and Fortran

● ● Intel® Integrated Performance Primitives† - Media and Data Optimizations

● ● Intel® Threading Building Blocks† - Parallelize Applications for Performance

● ● Intel® Math Kernel Library - High Performance Math

● Intel® MPI Library - Flexible, Efficient and Scalable Messaging

† Available for C, C++ only C, C++ only and Fortran only versions of Parallel Studio XE are also available.

Page 28: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Pricing and Availability

Includes Intel® C++ Composer XE

Intel® Fortran Composer

XE

Intel® Inspector XE

Intel® VTune™ Amplifier XE

Intel® MPI

Library

Intel® Trace Analyzer and

Collector

Price

Intel® Parallel Studio XE • • • • $2,299

Intel® Cluster Studio XE • • • • • • $2,949

Additional configurations including, floating and academic, are available at:

http://intel.com/software/products

Page 29: Intel tools to optimize HPC systems

Intel Confidential — Do Not Forward

Page 30: Intel tools to optimize HPC systems

Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Legal Disclaimer & Optimization Notice

INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

Copyright © , Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Xeon Phi, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries.

Optimization Notice

Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

Intel Confidential