18
Research University founded 1825 HIGH PERFORMANCE SCIENTIFIC COMPUTING Karlsruhe Institute of Technology

HIGH PERFORMANCE SCIENTIFIC COMPUTING - KIT - SCC · High Performance Computing Competence Centre Baden-Württemberg (hkz- bw) combines HPC resources in the State of Baden-Württemberg

Embed Size (px)

Citation preview

Page 1: HIGH PERFORMANCE SCIENTIFIC COMPUTING - KIT - SCC · High Performance Computing Competence Centre Baden-Württemberg (hkz- bw) combines HPC resources in the State of Baden-Württemberg

Research University founded 1825

HIGH PERFORMANCE SCIENTIFIC COMPUTING

Karlsruhe Institute of Technology

Page 2: HIGH PERFORMANCE SCIENTIFIC COMPUTING - KIT - SCC · High Performance Computing Competence Centre Baden-Württemberg (hkz- bw) combines HPC resources in the State of Baden-Württemberg
Page 3: HIGH PERFORMANCE SCIENTIFIC COMPUTING - KIT - SCC · High Performance Computing Competence Centre Baden-Württemberg (hkz- bw) combines HPC resources in the State of Baden-Württemberg

Contents

Scientific Supercomputing at Karlsruhe ..................................................................... 4Consultation and Support ............................................................................................ 5HP XC Systems at Karlsruhe ..................................................................................... 6Architecture of HP XC4000 ........................................................................................ 7Configuration of HP XC4000 ...................................................................................... 8Architecture of HP XC6000 ........................................................................................ 10Configuration of HP XC6000 ...................................................................................... 10The Parallel File System HP SFS .............................................................................. 12Performance of HP SFS .............................................................................................. 13Applications on HP XC Systems ................................................................................ 14Program Development Environment .......................................................................... 15High Performance Technical Computing Competence Centre (HPTC3) .................... 16

Universität Karlsruhe (TH)RechenzentrumZirkel 2D-76128 Karlsruhe

Phone: +49 721 608 8011Fax: +49 721 �[email protected]://www.rz.uni-karlsruhe.de/ssck

Editor: Prof. Dr. Wilfried Juling

February 2007

Page 4: HIGH PERFORMANCE SCIENTIFIC COMPUTING - KIT - SCC · High Performance Computing Competence Centre Baden-Württemberg (hkz- bw) combines HPC resources in the State of Baden-Württemberg

4

Scientific Supercomputing at Karlsruhe

As a service unit of the University’s Computing Centre the Scientific Supercom-puting Group Karlsruhe (SSCK) cares for supercomputer customers. The SSCK provides support for experts as well as for novices in parallel computing. Our goal is to help our customers in all problems related to scientific supercomput-ing. Our services are not only confined to advice on how to use supercomputers efficiently but you will also get qualified help if you are looking for appropriate mathematical methods or simply having problems to login.

Since the installation of the first supercomputer in Karlsruhe in 1983 the Comput-ing Centre of the University has been continuously engaged in supercomputing. Besides the operating of the machines Karlsruhe has always been establishing expert groups for scientific supercomputing. In the 1980s and 1990s experts of the Computing Centre in Karlsruhe tuned numerical libraries for vector comput-ers as well as microprocessor based parallel supercomputers. From this time on close cooperation with other institutes of the university and industrial companies has been initiated.

At the Computing Centre optimised solvers for systems of nonlinear partial dif-ferential equations, adaptive finite element techniques, visualisation of numerical data and iterative linear solvers have been developed since many years. Thus, the experts of the SSCK know about the needs and problems of their customers from their own experience. The University’s Computing Centre has allied with the HLRS from Stuttgart University and the IWR from Heidelberg University. The High Performance Computing Competence Centre Baden-Württemberg (hkz-bw) combines HPC resources in the State of Baden-Württemberg in order to provide leading edge HPC service, support and research.

In 2006 the Universität Karlsruhe (TH) and the Forschungszentrum Karlsruhe founded the “Karlsruhe Institute of Technology”.

With the creation of KIT also the “Steinbuch Centre for Computing” (SCC) has been established. This is a new scientific institution resulting from the merger of the Computing Centre of the Universität Karlsruhe (TH) and the Institute for Scientific Computing of the Forschungszentrum Karlsruhe. Both institutions have ranked for decades among the most powerful scientific computing centres in Germany. Already since 1996 they have cooperated in the “Virtual Computing Centre” the resources of which scientists of both institutions can use.

Within the SCC the cooperation in the fields “High Performance Computing” as well as “Grid Computing” are to be intensified and newly organised.

Page 5: HIGH PERFORMANCE SCIENTIFIC COMPUTING - KIT - SCC · High Performance Computing Competence Centre Baden-Württemberg (hkz- bw) combines HPC resources in the State of Baden-Württemberg

5

Consultation and Support

According to the mission of the SSCK you will have support on different levels: • Hotline The SSCK has set up a helpline where you rapidly get support from Monday to Friday from 9:00 to 17:00. Simply call +49 721 608-8011. If your problem cannot immediately be solved, it will be pursued by our experts.You can contact us as well by e-mail: [email protected].

• Basic support For parallel computers there are rules that should be obeyed in order to get efficient codes based on the hardware design. We will provide you with all our knowledge how to use massively parallel and vector comput- ers efficiently

• Standard solutions The SSCK provides experts who know both: the application software packages and the necessary libraries you need to solve your problem. This means that you obtain assistance from the beginning in selecting your appropriate solution and later in using it.

• Individual solutions If you want to solve your particular problem for the first time with the aid of a high performance computer, you will find even experts for modelling and state-of-the art solvers. This unique feature is highly considered by all our customers.

The SSCK offers several ways to communicate with its customers using the HP XC systems:

• Mailing list for communication between SSCK and users [email protected]

• E-mail hotline for all inquiries, comments etc. [email protected]

• SSCK Website with lots of information on HP XC systems, its hardware and soft- ware environment and project submission interface http://www.rz.uni-karlsruhe.de/ssck

Page 6: HIGH PERFORMANCE SCIENTIFIC COMPUTING - KIT - SCC · High Performance Computing Competence Centre Baden-Württemberg (hkz- bw) combines HPC resources in the State of Baden-Württemberg

6

HP XC Systems at Karlsruhe

The largest system that is currently operated by the SSCK is an HP XC4000 system. This supercomputer is provided for projects from the universities of the State of Baden-Württemberg requiring computing power which cannot be satis-fied by local resources. The system is embedded into the infrastructure of the High Performance Computing Competence Centre Baden-Württemberg (hkz-bw). Through the private-public partnership “hww” the system is also available to industrial customers.

The HP XC6000 system that has been the predecessor of the HP XC4000 sys-tem is now at the disposal of the members of Karlsruhe University.

The installation of HP XC systems began in April 2004 with a small 16 node test system named xc0. In December 2004 the first production system (xc1) with 108 HP Integrity rx2600 servers followed. After a short installation and internal test phase first customers were able to access the system by the end of 2004. Full production started on March 1, 2005. In May 2005 another six 16-way rx8620 servers were installed and integrated into the xc1 system.

In June 2006 the test system for the HP XC4000 System (xc3) and in Septem-ber/October 2006 the production system HP XC4000 (xc2) was delivered. In January 2007 the production system HP XC4000 was put into operation.

The four steps of the HP XC systems installation at Karlsruhe are:Development and test system xc0 (April 2004) • 12 two-way rx2600 nodes with 4 GB main memory • 4 file server nodes • Single rail QsNet II interconnect • 2 TB shared storage

Production system xc1 (December 2004 / Mai 2005) • 108 two-way rx2600 nodes with 12 GB main memory • 6 sixteen-way nodes with 128 GB main memory (configured as 12 eight-way nodes) • 8 file server nodes • Single rail QsNet II interconnect • 11 TB global disk space Test system xc� (June 2006) • 11 four-way nodes and 1 eight-way node with 8 GB main memory • Two or four sockets • Dual core AMD Opteron processor • 4 file server nodes • InfiniBand 4X DDR interconnect • 6 TB global disk space

Production system xc2 (January 2007) • 758 four-way nodes and two 8-way nodes • Two or four sockets • Dual core AMD Opteron processor • 10 file server nodes • InfiniBand 4X DDR interconnect • 56 TB global disk space

Page 7: HIGH PERFORMANCE SCIENTIFIC COMPUTING - KIT - SCC · High Performance Computing Competence Centre Baden-Württemberg (hkz- bw) combines HPC resources in the State of Baden-Württemberg

7

Architecture of HP XC4000

The HP XC4000 system is a distributed memory parallel computer where each node has two or more AMD Opteron sockets, local memory, disks and network adapters. Each socket comprises two cores. Special file server nodes are added to HP XC4000 to provide a fast and scalable parallel file system. All nodes are connected by an InfiniBand 4X DDR interconnect.

The basic operating system on each node is HP XC Linux for HPC, a Linux imple-mentation which is compatible to Redhat AS 4.0. On top of this operating sys-tem a set of open source as well as proprietary software components constitutes the XC software environment. This architecture implements a production class environment for scalable clusters based on open standards enabling the appli-cation programmer to easily port applications to this system.

The nodes of the HP XC4000 system may have different roles. According to the services supplied by the nodes, they are separated into disjoint groups. From an end user’s point of view the different groups of nodes are login nodes, compute nodes, file server nodes and other server nodes. • Login Nodes The login nodes are the only nodes that are directly accessible by end users. These nodes are used for interactive login, file management, program development and interactive pre- and post-processing. Several nodes are dedicated to this service, but they are all accessible via one address, and the Linux Virtual Server (LVS) will distribute the login sessions to the different login nodes. • Compute Nodes The majority of nodes are compute nodes which are managed by the batch system. Users submit their jobs to the batch system SLURM and a job will be executed depending on its priority, when the required resources become available. • File Server Nodes Special dedicated nodes are used as servers for the HP StorageWorks Scalable File Share (HP SFS). HP SFS is a parallel and scalable file system product from HP based on the Lustre file system. In addition to shared file space there is also local storage on the disks of each node. • Administrative Server Nodes Some other nodes are delivering additional services within HP XC4000 like resource management, external network connection, administration etc. These nodes can only be accessed by system administrators.

All management tasks are served by dedicated management nodes. So there is a clear separation between resources that are used for management or admin-istration and resources used for computing, i.e. the compute nodes are mostly freed from administrative tasks.

Page 8: HIGH PERFORMANCE SCIENTIFIC COMPUTING - KIT - SCC · High Performance Computing Competence Centre Baden-Württemberg (hkz- bw) combines HPC resources in the State of Baden-Württemberg

8

Configuration of HP XC4000

The HP XC4000 system at Karlsruhe consists of

• 2 four-way HP ProLiant DL585 login nodes Each of these nodes contains four AMD Opteron DC sockets with two cores each running at a clock speed of 2.6 GHz and is equipped with �2 GB of main memory.

• 750 two-way HP ProLiant DL145 G2 nodes Each of these nodes contains two AMD Opteron sockets running at a speed of 2.6 GHz and is equipped with 16 GB of main memory. Each socket has two cores with 1MB of level 2 cache each. Thus each node provides a peak performance of 20.8 GFlops. Two 72 GB local disks and an adapter to the InfiniBand interconnect are additional features of each node.

• 10 two-way HP ProLiant DL�80 G4 file server nodes 8 nodes have the Object Storage Server (OSS) role and the remaining nodes the Meta Data Server (MDS) role. Each of the file server nodes is dual connected to multiple storage systems allowing automatic failover in case of a server failure. The storage systems comprise �6 HP SFS20 enclosures which provide the usable capacity of 56 TB globally shared storage. It is subdivided into a part used for home directories and a larger part for non permanent or scratch files.

• 8 two-way HP ProLiant DL�85 and alternatively DL585 service nodes Each of these nodes contains two AMD Opteron DC sockets with two cores each running at a clock speed of 2.6 GHz and is equipped with 8 GB main memory.

Figure 1: General layout of HP XC4000 at Karlsruhe

application nodes

login nodes

service nodes

InfiniBand 4X DDR Interconnect

SFS20 SFS20

meta data server (MDS)

object storage server (OSS)

Page 9: HIGH PERFORMANCE SCIENTIFIC COMPUTING - KIT - SCC · High Performance Computing Competence Centre Baden-Württemberg (hkz- bw) combines HPC resources in the State of Baden-Württemberg

MPI latency between nodes

0123456789

0 4 8 16 32 64 128 256 512 1024 2048

message length [Bytes]

time

[µse

c]

MPI bandwidth between nodes

0200400600800

1000120014001600

0 32 64 96 128 160 192 224 256

message length [kB]

trans

fer s

peed

[MB/

s]

9

The heart of the system is the fast InfiniBand 4X DDR interconnect. All nodes are attached to this interconnect which is characterised by its very low latency of just � microseconds and a point to point bandwidth between two nodes of about 1500 MB/s for single tasks. Both values, latency as well as bandwidth, have been measured at MPI level. Especially the very short latency makes the HP XC4000 system ideal for communication intensive applications and applications doing a lot of MPI communications.

Figure 2: Performance of the InfiniBand interconnect

The key figures of HP XC4000 are:

Number of nodes 750

CPUs per node 4

Peak performance per CPU 5.2 GFlops

Memory per Node 16 GB

Local disk space per node 144 GB

Peak network bandwidth 2.0 GB/s

Sustained network bandwidth 1.5 GB/s

Total peak performance 15.6 TFlops

Total main memory 12 TB

Total local disk space 10 PB

Shared disk space 56 TB

Bisection bandwidth 1500 GB/s

Sustained bisection bandwidth 1100 GB/s

Page 10: HIGH PERFORMANCE SCIENTIFIC COMPUTING - KIT - SCC · High Performance Computing Competence Centre Baden-Württemberg (hkz- bw) combines HPC resources in the State of Baden-Württemberg

application nodes login nodes service nodes

Quadrics QsNet II

FC networks

10

Architecture of HP XC6000

The HP XC6000 system is a distributed memory parallel computer where each node has two or more Intel Itanium2 processors, local memory, disks and net-work adapters. Special file server nodes are added to HP XC6000 to provide a fast and scalable parallel file system. All nodes are connected by a Quadrics QsNet II interconnect.

The HP XC software environment is not different to HP XC4000 and the nodes have the same roles as on HP XC4000.

Figure �: General layout of HP XC6000 at Karlsruhe

Configuration of HP XC6000

The HP XC6000 system at Karlsruhe consists of

• 108 two-way HP Integrity rx2600 nodes Each of these nodes contains two Intel Itanium2 processors which run at a clock speed of 1.5 GHz and have 6 MB of level � cache on the processor chip. Each node has 12 GB of main memory, 146 GB local disk space and an adapter to connect to the Quadrics QsNet II interconnect.

• 6 16-way HP Integrity rx8620 nodes Up to now each of these nodes is partitioned into two units, with 8 processors, 6 GB of main memory and 500 GB local disk space. The Intel Itanium2 processors of the rx8620 nodes run at a speed of 1.6 GHz and have 64 MB of level � cache. Each eight CPU partition has its own connection to the QsNet II interconnect.

• 8 HP ProLiant DL�60 file server nodes Through a storage area network these nodes are connected to seven HP EVA5000 disk systems. This globally shared storage has a capacity of 11 TB. It is subdivided into a part used for home directories and a larger part for non permanent files.

Page 11: HIGH PERFORMANCE SCIENTIFIC COMPUTING - KIT - SCC · High Performance Computing Competence Centre Baden-Württemberg (hkz- bw) combines HPC resources in the State of Baden-Württemberg

MPI bandwidth between nodes

0100200300400500600700800900

0 32 64 96 128 160 192 224 256

message length [kB]

trans

fer s

peed

[MB/

s]

MPI latency between nodes

012345678

0 4 8 16 32 64 128 256 512 1024 2048

message length [Bytes]

time

[µse

c]

11

Two eight-way partitions of the rx8620 nodes are used as login nodes while the other eight-way nodes as well as the majority of the two-way nodes are compute nodes running parallel user jobs.

The heart of the system is the fast Quadrics QsNet II interconnect. All nodes are attached to this interconnect which is characterised by its very low latency of less than � microseconds and a point to point bandwidth of more than 800 MB/s between any pair of nodes. Both values, latency as well as bandwidth, have been measured at MPI level. The sustained bisection bandwidth of the 128 node network of HP XC4000 is more than 51 GB/s. These excellent performance figures of the cluster interconnect makes this parallel computer ideal for communication intensive applications.

Figure 4: Performance of the Quadrics interconnect

The key figures of HP XC6000 are:

2-way rx2600 nodes 16-way rx8620 nodes

Number of nodes 108 6

CPUs per node 2 16

Peak performance per CPU 6 GFlops 6.4 GFlops

Memory per node 12 GB 128 GB

Local disk space per node 146 GB 1060 GB

Peak network bandwidth 1.� GB/s 2.6 GB/s

Sustained network bandwidth 800 MB/s 1600 MB/s

Total peak performance 1.9 TFlops

Total main memory 2 TB

Total local disk space 22 TB

Shared disk space 11 TB

Bisection bandwidth 8� GB/s

Sustained bisection bandwidth 51 GB/s

Page 12: HIGH PERFORMANCE SCIENTIFIC COMPUTING - KIT - SCC · High Performance Computing Competence Centre Baden-Württemberg (hkz- bw) combines HPC resources in the State of Baden-Württemberg

12

C C C C C C C C C C C C

InfiniBand 4X DDR Interconnect

$HOME $WORK

meta data server (MDS)

object storage server (OSS)

The Parallel File System HP SFS

In modern compute clusters the CPU performance, the number of nodes and the available memory is steadily increasing, and the applications’ I/O requirements are often increasing in a similar way. In order to avoid loosing lots of compute cycles by applications waiting for I/O, compute clusters have to offer an efficient parallel file system. The parallel file system product on HP XC clusters is HP Storage-Works Scalable File Share (HP SFS). It is based on Lustre technology from Cluster Filesystems Inc. (see www.lustre.org).

Architecture of HP SFS

In addition to the standard Lustre product HP SFS includes additional software for failover and management and is restricted to certain hardware components.

Figure 5: HP SFS system on HP XC4000

Figure 5 shows the structure of the HP SFS system at the HP XC4000 system. The Meta Data Server (MDS) and the Object Storage Server (OSS) nodes are configured in failover pairs so that for each server node there is another one that can act as its backup. If the MDS node of one file system fails, a takeover of the corresponding services is done by the MDS node of the other file system.

At the HP XC4000 system 10 HP SFS servers and two file systems are config-ured. One file system called data, is used for home directories and software, has 8 TB storage capacity, and uses two designated OSS. The other file system called work is used for scratch data, e.g. intermediate files between different job runs. It has 48 TB of capacity and uses six OSS. The underlying storage sys-tems are SFS20 storage arrays. They use SATA disks and RAID6 is configured as redundancy level.

The HP SFS system on HP XC6000 looks pretty similar. It has eight servers and four of them are used for the work file system. The capacity is �.8 TB for the data file system and 7.6 TB for the work file system. The underlying storage arrays are EVA5000 systems with FC disks and RAID5 redundancy level.

Page 13: HIGH PERFORMANCE SCIENTIFIC COMPUTING - KIT - SCC · High Performance Computing Competence Centre Baden-Württemberg (hkz- bw) combines HPC resources in the State of Baden-Württemberg

1�

Performance of HP SFS

The most important requirement for parallel file systems is performance. Parallel file systems typically offer completely parallel data paths from clients to disks even if the files are stored in the same subdirectory. In HP SFS data is transferred from clients via the fast cluster interconnect by using efficient Remote Direct Memory Access (RDMA) mechanisms to the servers and from there to the attached stor-age. The data is striped over multiple servers and multiple storage arrays.

Figure 6: Sequential write and read performance of the file system work on HP XC4000

Figure 6 shows measurements for the sequential write and read performance of the file system work on HP XC4000. A single client can reach 600 MB/s for writes and 400 MB/s for reads. On the server side the bottleneck for writes is the stor-age subsystem for both HP XC systems. For reads the FC adapter on the serv-ers is the restricting hardware component on HP XC6000. On HP XC4000 the read bottleneck is again the storage subsystem. Since the file system data on HP XC6000 uses half the number of OSS and storage arrays, the throughput per-formance is half as high as for the file system work. On HP XC4000 the data file system uses eight mirrored SFS20 storage arrays and the file system work has 24 attached storage arrays. Hence the overall write performance is about six times faster for the file system work. A total write performance of more than 2 GB/s can be reached if many clients are writing to different files which are automatically striped over different storage arrays. In case many clients use a single huge file the default stripe count of 4 has to be increased to 24 in order to achieve the best possible performance.

Besides throughput the metadata performance is important. In fact many parallel file systems are restricted to allow only few hundreds of file creations per second. On the HP XC systems at Karlsruhe more than 5000 file creates per second have been measured. In large systems the scalability of the file system is required: The parallel file system must work even if all clients are heavily doing I/O. Performance measurements on both XC systems have shown that the performance stays at the same high rate when hundreds of clients are doing I/O.

Page 14: HIGH PERFORMANCE SCIENTIFIC COMPUTING - KIT - SCC · High Performance Computing Competence Centre Baden-Württemberg (hkz- bw) combines HPC resources in the State of Baden-Württemberg

14

Applications on HP XC Systems

With its different types of nodes the HP XC systems at Karlsruhe can meet the requirements of a broad range of applications:

• applications that are parallelised by the message passing paradigm and use high numbers of processors will run on a subset of two-way or four-way nodes and exchange messages over the fast interconnect.

• applications that are parallelised using shared memory either by OpenMP or by explicit multithreading with Pthreads can run on four-way or eight-way nodes.

Many applications should benefit from hybrid parallelisation, i.e. parallelisation by OpenMP or Pthreads within nodes and message passing with MPI between nodes.

As all nodes have at least 4 GB (HP XC4000) or 6 GB (HP XC6000) of main memory for each CPU, the system is especially suited for applications with high memory requirements.

Both the AMD Opteron and the Intel Itanium2 processor are characterised in particular by a high floating point performance and a large cache on the Opteron and a very large cache on the Itanium2 processor. The caches of both proces-sors are located on the processor chip. Therefore they can be accessed with very short latency and at extremely high bandwidth. Thus optimised applications particularly with regard to cache reuse can be processed with a high degree of efficiency.

User projects from the following different research fields have been initiated including:

• Numerical Analysis • Chemistry • Life Sciences • Computational Fluid Dynamics • Structural Mechanics • Computer Science • Electrical Engineering • Solid State Physics • Quantum Physics

Instead of developing and porting own programs CAE software from indepen-dent software vendors may be used on the HP XC systems. The same CAE packages are available on both systems; they are: ANSYS-CFX, FLUENT, STAR-CD, MSC.Nastran, ABAQUS, PERMAS and PowerFlow.

Page 15: HIGH PERFORMANCE SCIENTIFIC COMPUTING - KIT - SCC · High Performance Computing Competence Centre Baden-Württemberg (hkz- bw) combines HPC resources in the State of Baden-Württemberg

15

Program Development Environment

Since the operating system of both systems is Linux and many parts of the XC system software are based on open source products a rich set of program devel-opment tools is available or can be installed within the XC environment.

CompilersOn HP XC4000 the latest C/C++ and Fortran compilers from Intel, PGI and PathScale can be used. Additionally two versions of the GNU compiler suite are installed. On HP XC6000 different versions of the Intel C/C++ and Fortran com-pilers, again two versions of the GNU compiler suite as well as the NAG Fortran95 compiler can be used. All compilers can compile programs that are parallelised with the message passing paradigm. The Intel, PGI and PathScale compilers and the NAG compiler also support the OpenMP standard.

Parallelisation EnvironmentsFor distributed memory parallelism HP MPI is available on both systems. It sup-ports the full MPI 2 specification. MPI IO is supported for the parallel file system HP SFS and shows high parallel performance. Specific interface libraries allow object code compatibility with MPICH.

DebuggersBesides the serial debugger idb form Intel and the GNU debugger gdb the Dis-tributed Debugging Tool (DDT), a parallel debugger from Allinea Ltd., is available on both systems. DDT supports a graphical user interface and is able to debug serial and MPI-parallelised as well as thread-parallelised programs.

Performance Analysis ToolsOn HP XC4000 the open source package CodeAnalyst form AMD is available. On HP XC6000 the distribution of communication and computation can be visu-alised and analysed with Intel’s tracecollector and traceanalyser package. Some other performance analysis features are offered by the runtime environment of HP MPI. In order to do a detailed analysis of resource usage the profiling tool PAPI supporting interfaces to all performance counters of both processors Op-teron and Itanium2 is installed on both systems.

Mathematical LibrariesNumerical subprogram libraries that have been installed on HP XC4000 include AMD Core Math Library (ACML) and the ScaLAPACK and BLACS libraries that are not tuned for AMD processors. Numerical subprogram libraries that have been installed on HP XC6000 include Intel’s Mathematical Kernel Library (MKL) and the NAG C and Fortran libraries. Tuned implementations of well established open source libraries like BLAS, LAPACK, ScaLAPACK or Metis are part of MKL. The linear solver package LINSOL from the SSCK is installed on both systems.

More detailed information on the software environment can be found on the web (see http://www.rz.uni-karlsruhe.de/ssck/software).

Page 16: HIGH PERFORMANCE SCIENTIFIC COMPUTING - KIT - SCC · High Performance Computing Competence Centre Baden-Württemberg (hkz- bw) combines HPC resources in the State of Baden-Württemberg

16

High Performance Technical Computing Competence Centre (HPTC3)

Together with Hewlett Packard and Intel, the SSCK has established the High Per-formance Technical Computing Competence Centre (HPTC�). The main goals of this partnership are the further development of the HP XC systems to make it even more usable and reliable for the customers of the SSCK. This includes develop-ment, testing and early deployment of advanced tools supporting the applica-tion development process. In close cooperation with end users and independent software vendors the HPTC� will extend the portfolio of application packages that is available on HP XC6000. The tuning and optimisation of applications to reach highest possible performance is another challenging goal of HPTC�.

Projects that have been started in the framework of HPTC� include: • Integration of LDAP into the XC software environment • Improvements of high availability of critical resources on HP XC systems • Tuning of application codes • Early test of new tools in a production environment • Organisation of workshops

Page 17: HIGH PERFORMANCE SCIENTIFIC COMPUTING - KIT - SCC · High Performance Computing Competence Centre Baden-Württemberg (hkz- bw) combines HPC resources in the State of Baden-Württemberg
Page 18: HIGH PERFORMANCE SCIENTIFIC COMPUTING - KIT - SCC · High Performance Computing Competence Centre Baden-Württemberg (hkz- bw) combines HPC resources in the State of Baden-Württemberg