9
Product Technology Evaluation Report: IBM Power System S824L

Product Technology Evaluation Report: IBM Power System S824L · IBM Power System S824L The IBM Power System S824L server is a Linux-based two-socket server supporting 20 or 24 POWER8

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Product Technology Evaluation Report: IBM Power System S824L · IBM Power System S824L The IBM Power System S824L server is a Linux-based two-socket server supporting 20 or 24 POWER8

Product Technology Evaluation Report:

IBM Power System S824L

Page 2: Product Technology Evaluation Report: IBM Power System S824L · IBM Power System S824L The IBM Power System S824L server is a Linux-based two-socket server supporting 20 or 24 POWER8

Table of Contents

Executive Summary 3

CFMS Product Technology Evaluation Programme 4

IBM Power System S842L Product Technology Evaluation Objectives 4

IBM Power System S824L System 5

Experimental Testing Plan 6

Benchmarking & Testing Outcome - Technical Testing Results 7

About CFMS 9

Page 3: Product Technology Evaluation Report: IBM Power System S824L · IBM Power System S824L The IBM Power System S824L server is a Linux-based two-socket server supporting 20 or 24 POWER8

Executive Summary

There is an expectation that an affordable route to exascale computing will involve some disruption of the existing CPU+RAM+In niband architecture. Although accelerators like NVIDIA® Tesla® and Intel Xeon Phi have good positions in the Top500 List, actual industrial uptake is limited.

The upward trajectory of architecting everything in the same way and scaling in size to achieve extra performance doesn’t fully address all the challenges, and is limiting in terms of a realistic solution for the future. Take power for example, one of the most critical and debated points in the industry.The largest supercomputer in the world is a tenth of the way to exascale with over a million cores, but yet is too expensive to run due to the required power usage. Building a machine 10 times bigger and absorbing the power bill, does not address the heart of the problem. There may be some advantages in looking at the next generation of x86 technology, and in 5 years time we might manage to double the performance with the same power budget, but that’s not the step change we’re looking for. It has to come from a uni ed rearchitecturing of the software and hardware interaction.

It will be interesting to see the impact of IBM Power systems as a catalyst for disruption, especially the introduction of NVLink and a matured CAPI, a basis for new technologies in the future.With CAPI currently available, and NVLink in the launch pipeline, what they both potentially offer is a step change. The product testing undertaken with the IBM Power System S824L is early level testing. To understand NVLink and CAPI, getting an application running on the platform and achieving a reasonable level of performance is a requisite before contemplating the introduction of new and different technologies. Running Linux on x86, and Little Endian Linux on IBM POWER8, comparing and benchmarking these platforms will provide valuable insight. When NVLink is commercially available, the base groundwork will have already have been completed, reducing time, cost and productivity in the process of getting up and running.

The results from the technology evaluation of the IBM Power System S824L against the de ned experimental plan were positive, interesting and insightful. To summarise, the S824L:

• Meets system performance levels for technical computing

• Is appealing in terms of exibility and simplicity

• Is on the path to maturity, and gaps identi ed are not insurmountable

As part of the BProduct Technology Evaluation function, The Centre for Modelling & Simulation (CFMS) also invites third parties to collaborate on testing. In addition to CFMS, a leading engine manufacturer in the aerospace industry was involved in testing the S824L. Their outcome also re ects our conclusions from the programme.The S824L tested was 4U with a signi cant number of PCI slots for IO expansion, which results in a relatively low compute density. We look forward to a more HPC focussed offering. Overall, testing the S824L demonstrates the potential, and is the start of an interesting journey which will be welcomed within industry.

3

Page 4: Product Technology Evaluation Report: IBM Power System S824L · IBM Power System S824L The IBM Power System S824L server is a Linux-based two-socket server supporting 20 or 24 POWER8

Supporting the development and delivery of high value design methods and tools, CFMS provides access to and advises on best in class technologies that accelerate advanced modelling and simulation and HPC.

Providing insight from an actual user perspective, Product Technology Evaluation consists of evaluating product performance through to product validation to establish suitability of use for customer projects, to working with manufacturers, inputting into product design and development to con rm the proposed approach. Through our trusted, independent Technology Lab, we can replicate and setup technical environments, running scenarios for testing and evaluation we offer a choice of project outputs including written reports, presentations or collaborative arrangements, working with all parties involved.

From reducing risk in the development process and helping vendors understand how their technology will be used, through to taking hardware, software or an end-to-end solution we help by validating suitability for customer and research projects. Supported by our team of Modelling and Simulation, HPC and IT systems specialists, we work with industry end users to manufacturers, providing analysis of product performance and greater insight and feedback into product development and technology research projects.

IBM Power System S824L Product Technology Evaluation Objectives

The IBM POWER8 System is the rst generation of chip architecture developed for the OpenPower Foundation and together with the Nvidia® Tesla® GPU forms the basis of the next generation DOE HPC infrastructure. This is also the rst Power Processor from IBM being offered with support for Little Endian Byte ordering used by x86 processors from Intel and AMD which reduces codes changes required to migrate existing x86 code.

Technical computing applications typically stretch development tool chains due to the complexity of software, and exercise the operating system as they make use of many features like af nity to maximise application performance. These applications are an ideal test for system readiness whilst also providing performance results on real engineering workloads that can be directly compared with other systems.

The combination of the IBM POWER8 and the Nvidia® Kepler GPU creates a challenge to nd suitable industrially relevant application software that can exercise all components in the system. zCFD from Zenotech was chosen as it is able to exercise both types of processors and it implements industry standard algorithms.

A range of tests were undertaken to assess the following system properties:

• System performance for technical computing and engineering workloads

• Quality of the tool chain (compilers, environment, etc.)

• Integrated energy ef ciency and scale

• Total Cost of Ownership (TCO) and management

• Assessment of system for readiness

CFMS Product Technology Evaluation 4

Page 5: Product Technology Evaluation Report: IBM Power System S824L · IBM Power System S824L The IBM Power System S824L server is a Linux-based two-socket server supporting 20 or 24 POWER8

IBM Power System S824L

The IBM Power System S824L server is a Linux-based two-socket server supporting 20 or 24 POWER8 cores in a dense, 4U rack-optimised form factor.

IBM Power System S824L product highlights:

• The first server to leverage OpenPOWER Foundation technology to dramatically accelerate Java, big data and technical computing applications. Running multiple concurrent queries that take advantage of industry- leading memory and I/O bandwidths, this leads to highly supported utilisation rates

• Delivers faster query acceleration for Java applications with NVIDIA GPUs

• Boosts workload performance by of oading highly parallel operations to GPU accelerator(s)

• Has twice the bandwidth of prior servers and lower hardware and power requirements, allowing superior scale-out ef ciencies with Open technologies like Linux and OpenStack that economically enable these capabilities

• Will enable future integrated hardware solutions that dramatically accelerate compute- and data- intensive tasks due to its open standards based platform

The summary speci cation of the POWER8 server used for testing is shown below:

Microprocessors Two 10-core 3.42 gigahertz (GHZ) POWER8 processor cards

Level 2 (L2) Cache 512 kilobyte (KB) L2 cache per core

Level 3 (L3) Cache 8 megabyte (MB) L3 cache per core

Level 4 (L4) Cache 16 MB per dual inline memory module (DIMM)

Memory Min/Max 512 gigabyte (GB) RAM

Processor-to-memory bandwidth 192 gigabytes per second (GBps) per socket

5

Page 6: Product Technology Evaluation Report: IBM Power System S824L · IBM Power System S824L The IBM Power System S824L server is a Linux-based two-socket server supporting 20 or 24 POWER8

Experimental Testing Plan

A number of tests were selected to exercise different elements of the system: High Performance Linpack (HPL - currently industry standard), zCFD, Solar and OpenFOAM. CFMS has access to the source code of all these applications, which (with the exception of HPL) are used for solving industrial scale problems.

Between these tests, the performance evaluation of single-threaded, pure-MPI and hybrid MPI/OpenMP workloads was assessed.

Metric Test Requirements/Risks

Installation Access, weight, power

System performance for tech-nical computing

Single thread

MPI

MPI/OpenMP hybrid

For each test, compare runtime with single node Intel Ivybridge

performance

Compile and run HPL

Compile and run zCFD

Complier support

Third party library support

NVIDIA GPU performance

CUDA software stack

Compile and run zCFD

Compare with K20 on Intel Ivy-bridge

Compiler Support

Third party library support

Integrated Energy Efficiency/ Scale

Measure power draw under load

Install and Configuration

Storage:• RAID1 for persistent data (2 disks)

• RAID0 for application scratch (6 disks)

Network:• Single GbE connection to site network

OS/Software:• Ubuntu 14.10

• IBM XL C/C++ and Fortran compilers

• IBM Engineering and Scienti c Subroutine Library (ESSL)

• GNU 4.9 C/C++ and Fortran compilers

• CUDA Toolkit 7.0-rc

• OpenMPI 6.5

Where possible, software binaries were installed from the of cial canonical repositories via APT.

6

Page 7: Product Technology Evaluation Report: IBM Power System S824L · IBM Power System S824L The IBM Power System S824L server is a Linux-based two-socket server supporting 20 or 24 POWER8

Product Evaluation Outcome - Technical Results

FlexibilityMost hardware multithreading solutions are enabled/disabled in UEFI/BIOS, and will require a reboot to change. This leads most HPC systems to leave multithreading either on or off, depending on the performance impact that it will have on the applications in use.

By contrast, SMT8 can be recon gured while the system is running. This allows for the SMT con guration to be optimised for each simulation run, or even at each simulation job step.

POWER8 provides greater Non-Uniform Memory Access (NUMA) control, allowing the optimisation of the process layout on the hardware topology.

SimplicityAs HPC application workloads become more speci c and optimised, we have observed an ongoing trend within the HPC landscape for heterogeneous clusters, rather than attempting a ‘best- t’ homogenous con guration. These heterogeneous clusters may combine standard CPU only compute nodes with either ‘high-memory’ nodes, or with compute nodes equipped with GPGPU or other accelerators.

Tools like xCAT (eXtreme Cloud Administration Toolkit) and IBM Spectrum Scale (formerly IBM GPFS) can be used to deploy and run mixed x86 and Power systems (equipped with NVIDIA GPGPUs) to accelerate speci c workloads, while providing a consistent experience for end users.

Porting existing CUDA software to run on the S824L’s K40 GPUs was trivial, primarily due to the common interfaces provided by the CUDA toolkits on x86 and POWER8.

PerformanceSystem performance assessment focused on compute rather that I/O, as typical HPC workloads would exercise processors with storage being implemented as a shared parallel le system. The performance was measured by running zCFD on a standard aerospace test case from NASA. The latest CUDA 7.0rc Toolkit from NVIDIA was utilised, together with the gcc 4.9 providing OpenMP thread based parallelism and OpenMPI 6.5 providing MPMD parallelism. The IBM XL C/C++ for Linux Compiler was also used but the version supporting OpenMP was not available during the test period so its performance was extrapolated.

To get the best performance from the POWER8 processor requires the use of a combination of thread level and process level parallelism. The benchmark runs were undertaken with one MPI process per NUMA node (i.e. 2 MPI processes per POWER8 processor socket) with a processor and memory af nity set, and the number of OpenMP threads were varied according to the SMT setting. The NVIDIA K40 benchmark was run with one MPI process per GPU.

The results were compared to a dual socket Intel(R) Xeon(R) CPU E5-2648L v2 @ 1.9GHz system from IBM with hyperthreading switched off, and the software compiled using Intel 15.0 compilers and OpenMPI 6.5.

The dual socket based on POWER8 is 1.2x faster than the Intel Ivybridge system when using the gcc 4.9 compiler. The extrapolated results from the limited runs using the IBM XL C/C++ for Linux Compiler shows a potential of over 2x speed up but this needs to be validated when the compilers are released by IBM.

7

Page 8: Product Technology Evaluation Report: IBM Power System S824L · IBM Power System S824L The IBM Power System S824L server is a Linux-based two-socket server supporting 20 or 24 POWER8

Maturity One of the challenges with porting test workloads to the S824L was the discovery of problems with packages which were available in the of cial Ubuntu repositories. When ling bug reports, the response from some application maintainers is that although POWER8 support was included in the latest releases, opportunities for full testing had been minimal. Raising this point with IBM, the intention is to make more test and development servers based on POWER8 available to the software development community, which should mitigate these issues in future.

Although testing achieved good performance from the POWER8 processors, the energy cost to solution provided fewer FLOPS/watt compared to equivalent x86, partly due to the higher clock frequency.

It has been widely reported that the POWER8 processor card costs less than the equivalent Intel x86 processor. However, the remainder of the server still carries a price premium, and the price performance for the S824L has yet to mature. This is to be expected with a new product to market and may be achievable with greater availability via OpenPOWER.

8

Page 9: Product Technology Evaluation Report: IBM Power System S824L · IBM Power System S824L The IBM Power System S824L server is a Linux-based two-socket server supporting 20 or 24 POWER8

About CFMSThe Centre for Modelling & Simulation (CFMS) is a not-for-profit specialist in high-value design. As a trusted and neutral provider, our vision is to be the recognised, independent, digital test bed for the design of high value engineering products and processes.

Facilitating a greater understanding of how a product will perform throughout its lifecycle, our digital test bed forms the foundation for Through-Life Engineering, creating a virtual replica of systems and processes used for investigation of options and opportunities, in advance of physical development.

Through our four core product lines and activities, we enable our customers to accelerate design and manufacturing productivity and competitiveness for their products, processes and services;

• Accelerated insight into the performance of their products or processes in service through the use of advanced modelling and simulation

• Integrated system engineering architectures that can deliver improved performance• Realisation of the potential of simulated or real-world data for informed decision making• Cost effective access to state-of the art computational infrastructure

CFMS Bristol and Bath Science Park // Dirac Crescent // Emersons Green // Bristol // BS16 7FR

w: e: t:

[email protected] 906 1100

9