44
www.cineca.i t CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation Department, CINECA, Italy [email protected]

Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

Embed Size (px)

Citation preview

Page 1: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

CINECA: the Italian HPC infrastructure and his evolution in the European scenario

Giovanni Erbacci, Supercomputing, Applications and Innovation Department, CINECA, Italy

[email protected]

Page 2: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

Agenda

• CINECA: the Italian HPC Infrastructure• CINECA and the Euroepan HPC Infrastructure • Evolution: Parallel Programming Trends in Extremely Scalable Architectures

Page 3: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

Agenda

• CINECA: the Italian HPC Infrastructure• CINECA and the Euroepan HPC Infrastructure • Evolution: Parallel Programming Trends in Extremely Scalable Architectures

Page 4: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

CINECA

CINECA non profit Consortium, made up of 50 Italian universities*, The National Institute of Oceanography and Experimental Geophysics - OGS, the CNR (National Research Council), and the Ministry of Education, University and Research (MIUR).

CINECA is the largest Italian computing centre, one of the most important worldwide.The HPC department - manages the HPC infrastructure, - provide support end HPC resources to Italian and European researchers, - promote technology transfer initiatives for industry.

Page 5: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

1969: CDC 6600 1st system for scientific computing1975: CDC 7600 1st supercomputer1985: Cray X-MP / 4 8 1st vector supercomputer 1989: Cray Y-MP / 4 64 1993: Cray C-90 / 2 128 1994: Cray T3D 64 1st parallel supercomputer1995: Cray T3D 1281998: Cray T3E 256 1st MPP supercomputer2002: IBM SP4 512 1 Teraflops2005: IBM SP5 512 2006: IBM BCX 10 Teraflops2009: IBM SP6 100 Teraflops2012: IBM BG/Q > 1 Petaflops

The StoryThe Story

Page 6: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

CINECA and Top 500CINECA and Top 500

Page 7: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

Trend di sviluppoTrend di sviluppo

10 PF

1 PF

100 TF

10 TF

1 TF

100 GF

10 GF

1 GF

>2 PF

Page 8: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

HPC Infrastructure for Scientific computingHPC Infrastructure for Scientific computing

Logical Name SP6 (Sep 2009) BGP (jan 2010) PLX (2011)

Model IBM P575 IBM BG / P IBM IDATAPLEX

Architecture SMP MPP Linux Cluster

Processor IBM Power 6 4.7 Ghz IBM PowerPC 0,85 GHz Intel Westmere Ec 2.4 Ghz

# of core 5376 40963288 + 548 GPGPU Nvidia Fermi

M2070

# of node 168 32 274

# of rack 12 1 10

Total RAM 20 Tera Byte 2 Tera Byte ~ 13 Tera Byte

Interconnection Qlogic Infiniband DDR 4x IBM 3D Torus Qlogiq QDR 4x

Operating System AIX Suse RedHat

Total Power ~ 800 Kwatts ~ 80 Kwatts ~ 200 Kwatts

Peak Performance > 101 Tera Flops ~ 14 Tera Flops ~ 300 Tera Flops

Page 9: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

Visualisation system Visualisation system

Visualisation and computer graphycs Virtual Theater6 video-projectors BARCO SIM5 Audio surround systemCylindric screen 9.4x2.7 m, angle 120°Ws + Nvidia cardsRVN nodes on PLX system

Page 10: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

Storage Infrastructure Storage Infrastructure

System

Available bandwidth

(GB/s) Space (TB)Connection Tecnology

Disk Tecnology

2 x S2A9500 3,2 140 FCP 4Gb/s FC

4 x S2A9500 3,2 140 FCP 4Gb/s FC

6 x DCS9900 5,0 540 FCP 8Gb/s SATA

4 x DCS9900 5,0 720 FCP 4Gb/s SATA

3 x DCS9900 5,0 1500 FCP 4Gb/s SATA

Hitachi Ds 3,2 360 FCP 4Gb/s SATA

3 x SFA1000 10,0 2200 QDR SATA

1 x IBM5100 3,2 66 FCP 8Gb/s FC

> 5,6 PB

Page 11: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

SP Power6 @ CINECASP Power6 @ CINECA

- 168 compute nodes IBM p575 Power6 - 168 compute nodes IBM p575 Power6 (4.7GHz)(4.7GHz)- 5376 compute cores (32 core / node)- 5376 compute cores (32 core / node)- 128 Gbyte RAM / node (21Tbyte RAM)- 128 Gbyte RAM / node (21Tbyte RAM)- IB x 4 DDR (double data rate) - IB x 4 DDR (double data rate) Peak performance 101 TFlops Peak performance 101 TFlops Rmax 76.41 Tflop/sRmax 76.41 Tflop/s Efficiency (workload) 75.83 % Efficiency (workload) 75.83 % N. 116 Top500 (June 11)N. 116 Top500 (June 11)

- 2 login nodes IBM p560- 2 login nodes IBM p560- 21 I/O + service nodes IBM p520- 21 I/O + service nodes IBM p520- 1.2 PByte Storage row: - 1.2 PByte Storage row:

500 Tbyte working area High 500 Tbyte working area High PerformancePerformance 700 Tbyte data repository700 Tbyte data repository

Page 12: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

BGP @ CINECABGP @ CINECA

Model: IBM BlueGene / P Architecture: MPPProcessor Type: IBM PowerPC 0,85 GHz Compute Nodes: 1024 (quad core, 4096 total)RAM: 4 GB/compute node (4096 GB total)Internal Network: IBM 3D TorusOS: Linux (login nodes) CNK (compute nodes)

Peak Performance: 14.0 TFlop/s

Page 13: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

PLX @ CINECAPLX @ CINECAIBM Server dx360M3 – Compute nodeIBM Server dx360M3 – Compute node2 x processori Intel Westmere 6c X5645 2.40GHz 2 x processori Intel Westmere 6c X5645 2.40GHz 12MB Cache, DDR3 1333MHz 80W12MB Cache, DDR3 1333MHz 80W48GB RAM su 12 DIMM 4GB DDR3 1333MHz48GB RAM su 12 DIMM 4GB DDR3 1333MHz1 x HDD 250GB SATA1 x HDD 250GB SATA1 x QDR Infiniband Card 40Gbs1 x QDR Infiniband Card 40Gbs2 x NVIDIA m2070 (m2070q su 10 nodi)2 x NVIDIA m2070 (m2070q su 10 nodi)

Peak performance 32 TFlops Peak performance 32 TFlops (3288 cores a 2.40GHz)(3288 cores a 2.40GHz) Peak performance Peak performance 565 TFlops Single Precision o 565 TFlops Single Precision o 283 TFlops Double Precision (548 Nvidia M2070) 283 TFlops Double Precision (548 Nvidia M2070)

N. 54 Top500 (June 11)N. 54 Top500 (June 11)

Page 14: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

Science @ CINECAScience @ CINECAScientific AreaChemistry Physics Life Science Engineering AstronomyGeophysicsClimateCultural Heritage

National InstitutionsINFM-CNRSISSA INAF INSTMOGSINGVICTPAcademic Institutions

Main ActivitiesMolecular DynamicsMaterial Science SimulationsCosmology SimulationsGenomic AnalysisGeophysics SimulationsFluid dynamics SimulationsEngineering Applications

Application Code development/ parallelization/optimizationHelp desk and advanced User support Consultancy for scientific software Consultancy and research activities supportScientific visualization support

Page 15: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

The HPC Model at CINECA From agreements with National Institutions to National HPC Agency in an European context

- Big Science – complex problems - Support Advanced computational science projects - HPC support for computational sciences at National and European level - CINECA calls for advanced National Computational Projects

ISCRA Italian SuperComputing Resource Allocation

http://iscra.cineca.itObjective: support large-scale, computationally intensive projects not possible or productive without terascale, and in future petascale, computing.

Class A: Large Projects (> 300.000 CPUh x project): two calls per Year Class B: Standard Projects. two calls per Year Class C: Test and Development Projects (< 40.000 CPU h x project):

continuous submission scheme; proposals reviewed 4 times per year,

Page 16: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

ISCRA: Italian SuperComputing Resource Allocation

iscra.cineca.it

Sp6, 80TFlops (5376 core)

N. 116 Top500, June 2011

BGP, 17 TFlops (4096 core) PLX, 142TFlops (3288 core + 548 nVidia M2070)N. 54 Top500 June 2011

- National scientific committee- Blind National Peer review system- Allocation procedure

Page 17: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

CINECA and Industry CINECA provides HPC service to Industry:

– ENI (geophysics)– BMW-Oracle (American cup, CFD structure)– Arpa (weather forecast, Meteoclimatology)– Dompé (pharmaceutical)

CINECA hosts the ENI HPC system: HP ProLiant SL390s G7 Xeon 6C X5650, Infiniband,

Cluster Linux HP, 15360 cores N. 60 Top 500 (June 2011) 163.43 Tflop/s Peak, 131.2

Linpack

Page 18: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

CINECA Summer schools

Page 19: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

Agenda

• CINECA: the Italian HPC Infrastructure• CINECA and the Euroepan HPC Infrastructure • Evolution: Parallel Programming Trends in Extremely Scalable Architectures

Page 20: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

PRACE

PRACE Research Infrastructure (www.prace-ri.eu): the top level of the European HPC ecosystem

CINECA: - represents Italy in PRACE- hosting member in PRACE - Tier-1 system

> 5 % PLX + SP6 - Tier-0 system in 2012

BG/Q 2 Pflop/s- involved in PRACE 1IP, 2IP- PRACE 2IP prototype EoI

European (PRACE)

Local

Tier 0

Tier 1

Tier 2

National

The European HPC-Ecosystem

Creation of a European HPC ecosystem involving all stakeholders HPC service providers on all tiers Scientific and industrial user communities The European HPC hw and sw industry

Page 21: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

HPC-Europa 2: Providing access to HPC resources

HPC-Europa 2 - consortium of seven European HPC infrastructures - integrated provision of advanced computational services to the European research community - Provision of transnational access to some of the most powerful HPC facilities in Europe - Opportunities to collaborate with scientists working in related fields at a relevant local research institute

http://www.hpc-europa.eu/

HP-Europa 2: 2009 – 2012

(FP7-INFRASTRUCTURES-2008-1)

Page 22: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

DEISA PRACE

EUDAT

EMI

HPC-EuropaVerce

MMM@HPC

Plug-it

Europlanet

Vph-op

EESI

HPCworld

VMUST

Deep Montblanc

Page 23: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

Agenda

• CINECA: the Italian HPC Infrastructure• CINECA and the Euroepan HPC Infrastructure • Evolution: Parallel Programming Trends in Extremely Scalable Architectures

Page 24: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

BG/Q in CINECAThe Power A2 core has a 64-bit instruction set (unlike the prior 32-bit PowerPC chips used in BG/L and BG/P The A2 core have four threads and has in-order dispatch, execution, and completion instead of out-of-order execution common in many RISC processor designs. The A2 core has 16KB of L1 data cache and another 16KB of L1 instruction cache. Each core also includes a quad-pumped double-precision floating point unit: Each FPU on each core has four pipelines, which can be used to execute scalar floating point instructions, four-wide SIMD instructions, or two-wide complex arithmetic SIMD instructions.

This Power A2 core has a 64-bit instruction set, like other commercial Power-based processors sold by IBM since 1995 but unlike the prior 32-bit PowerPC chips used in prior BlueGene/L and BlueGene/P supercomputers. The A2 core have four threads and has in-order dispatch, execution, and completion instead of out-of-order execution common in many RISC processor designs. The A2 core has 16KB of L1 data cache and another 16KB of L1 instruction cache. Each core also includes a quad-pumped double-precision floating point unit, which is blocked out thus:

16 core chip @ 1.6 GHz

- a crossbar switch links the cores and L2 cache memory together.

- 5D torus interconnect

Page 25: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

HPC EvolutionMoore’s law is holding, in the number of transistors– Transistors on an ASIC still doubling every 18 months at constant cost– 15 years of exponential clock rate growth has ended

Moore’s Law reinterpreted– Performance improvements are now coming from the increase in the number of cores on a processor (ASIC)– #cores per chip doubles every 18 months instead of clock– 64-512 threads per node will become visible soon

From Herb Sutter<[email protected]>

Page 26: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

Top500

….

Number of cores of no 1 system from Top500

0

100000

200000

300000

400000

500000

600000

Jun-93

Jun-94

Jun-95

Jun-96

Jun-97

Jun-98

Jun-99

Jun-00

Jun-01

Jun-02

Jun-03

Jun-04

Jun-05

Jun-06

Jun-07

Jun-08

Jun-09

Jun-10

Jun-11

Num

ber

of c

ores Paradigm Change in HPC

What about applications?

Next HPC systems with more than e in the order of 500.000 cores

Page 27: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

Real HPC Crisis is with SoftwareA supercomputer application and software are usually much more long-lived than a hardware

- Hardware life typically four-five years at most.- Fortran and C are still the main programming models

Programming is stuck- Arguably hasn’t changed so much since the 70’s

Software is a major cost component of modern technologies.- The tradition in HPC system procurement is to assume that the software is free.

It’s time for a change - Complexity is rising dramatically - Challenges for the applications on Petaflop systems - Improvement of existing codes will become complex and partly impossible - The use of O(100K) cores implies dramatic optimization effort - New paradigm as the support of a hundred threads in one node implies new parallelization strategies - Implementation of new parallel programming methods in existing large

applications has not always a promising perspective There is the need for new community codes

Page 28: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

Roadmap to Exascale(architectural trends)

Page 29: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

What about parallel App?

In a massively parallel context, an upper limit for the scalability of parallel applications is determined by the fraction of the overall execution time spent in non-scalable operations (Amdahl's law).

maximum speedup tends to 1 / ( 1 − P )

P= parallel fraction

1000000 core

P = 0.999999

serial fraction= 0.000001

Page 30: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

Programming Models

• Message Passing (MPI)• Shared Memory (OpenMP)• Partitioned Global Address Space Programming (PGAS)

Languages UPC, Coarray Fortran, Titanium

• Next Generation Programming Languages and Models Chapel, X10, Fortress

• Languages and Paradigm for Hardware Accelerators CUDA, OpenCL

• Hybrid: MPI + OpenMP + CUDA/OpenCL

Page 31: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

trends

Vector

Distributed memory

Shared Memory

Hybrid codes

MPP System, Message Passing: MPI

Multi core nodes: OpenMP

Accelerator (GPGPU, FPGA): Cuda, OpenCL

Scalar Application

Page 32: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

Message Passingdomain decomposition

memory

CPU

node

memory

CPU

node

memory

CPU

node

memory

CPU

node

memory

CPU

node

memory

CPU

node

Internal High Performance Network

Page 33: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

Ghost Cells - Data exchange

i,j i+1,ji-1,j

i,j+1

i,j-1

sub-domain boundaries

Ghost Cells

i,j i+1,ji-1,j

i,j+1

i,j-1

Processor 1

Processor 2

i,j i+1,ji-1,j

i,j+1

Ghost Cells exchangedbetween processors at every update

i,j i+1,ji-1,j

i,j+1

i,j-1

Processor 1

Processor 2

i,j i+1,ji-1,j

i,j+1

Page 34: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

Message Passing: MPI

Main Characteristic• Library• Coarse grain• Inter node parallelization

(few real alternative)• Domain partition• Distributed Memory• Almost all HPC parallel

App

Open Issue• Latency• OS jitter• Scalability

Page 35: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

Shared memory

mem

ory

CPU

node

CPU

CPU

CPU

Thread 0

Thread 1

Thread 2

Thread 3x

y

Page 36: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

Shared Memory: OpenMP

Main Characteristic• Compiler directives• Medium grain• Intra node parallelization (pthreads)• Loop or iteration partition• Shared memory• Many HPC App

Open Issue• Thread creation overhead• Memory/core affinity• Interface with MPI

Page 37: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

OpenMP !$omp parallel dodo i = 1 , nsl call 1DFFT along z ( f [ offset( threadid ) ] )end do!$omp end parallel docall fw_scatter ( . . . )!$omp paralleldo i = 1 , nzl !$omp parallel do do j = 1 , Nx call 1DFFT along y ( f [ offset( threadid ) ] ) end do!$omp parallel do do j = 1, Ny call 1DFFT along x ( f [ offset( threadid ) ] ) end doend do!$omp end parallel

Page 38: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

Accelerator/GPGPU

Sum of 1D array

+

Page 39: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

CUDA samplevoid  CPUCode( int* input1, int* input2, int* output, int length) {                for ( int  i = 0; i < length; ++i ) {                      output[ i ] = input1[ i ] + input2[ i ];               }}

__global__void  GPUCode( int* input1, int*input2, int* output, int length) {               int idx = blockDim.x * blockIdx.x + threadIdx.x;                if ( idx < length ) {                      output[ idx ] = input1[ idx ] + input2[ idx ];               }}

Each thread execute one loop iteration

Page 40: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

CUDAOpenCL

Main Characteristic• Ad-hoc compiler• Fine grain• offload parallelization (GPU)• Single iteration parallelization• Ad-hoc memory• Few HPC App

Open Issue• Memory copy• Standard• Tools• Integration with other

languages

Page 41: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

Hybrid (MPI+OpenMP+CUDA+…

Take the positive off all modelsExploit memory hierarchyMany HPC applications are adopting this modelMainly due to developer inertiaHard to rewrite million of source lines

…+python)

Page 42: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

Hybrid parallel programming

MPI: Domain partition

OpenMP: External loop partition

CUDA: assign inner loopsIteration to GPU threads

http://www.qe-forge.org/ Quantum ESPRESSO

Python: Ensemble simulations

Page 43: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

Storage I/O

• The I/O subsystem is not keeping the pace with CPU

• Checkpointing will not be possible

• Reduce I/O• On the fly analysis and

statistics• Disk only for archiving• Scratch on non volatile

memory (“close to RAM”)

Page 44: Www.cineca.it CINECA: the Italian HPC infrastructure and his evolution in the European scenario Giovanni Erbacci, Supercomputing, Applications and Innovation

www.cineca.it

Conclusion

• Exploit millions of ALU• Hybrid Hardware• Hybrid codes• Memory Hierarchy• Flops/Watt (more that Flops/Sec)• I/O subsystem• Non volatile memory• Fault Tolerance!

Parallel programming trends in extremely scalable architectures