Program Systems Institute of the Russian Academy of Sciences Supercomputer Projects SKIF and...

Preview:

Citation preview

Program Systems Institute of the Russian Academy of Sciences

Supercomputer Projects SKIF and SKIF-GRID of Russia and Belorussia

Sergei AbramovWorkshop introducing the AURORA

project4 June 2009 Conference Room FBK,

via Sommarive, 18 - Povo. Trento, Italy

SKIF-AURORA SKIF-AURORA ProjectProject

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»

Outline

Pereslavl-Zalessky and Program Systems Institute of the RAS: Short introduction

Supercomputer Projects SKIF and SKIF-GRID of Russia and Belorussia

2008–2010: Series 4 of SKIF supercomputersSeries 4 of SKIF supercomputer == SKIF-

AURORASKIF-AURORA Selected Topics

Management Subsystem 3D-torus Interconnect Combining standard CPUs and FPGA-accelerators

ConclusionApril 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 2

Program Systems Institute of the Russian Academy of Sciences

Supercomputer Projects SKIF and SKIF-GRID of Russia and Belorussia

Pereslavl-Zalessky andPereslavl-Zalessky andProgram Systems Institute of the Program Systems Institute of the

RAS:RAS:Short introductionShort introduction

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»

Pereslavl-Zalessky

Beautiful ancient Russian Beautiful ancient Russian town, 860 years oldtown, 860 years old

The center of the The center of the Russian Golden Ring Russian Golden Ring CityCity

Hometown of Great Hometown of Great Dukes of RussiaDukes of Russia

The first building site The first building site Peter The Great navyPeter The Great navy

Ancient capital of Russian Ancient capital of Russian Orthodox churchOrthodox church

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 4

Moscow

Pereslavl-Zalessky

120

km

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»

PSI RAS, Pereslavl-Zalesski

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 5

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»

Foundation of the Institute

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 6

The Program Systems Institute was founded in 1984 by a decree of the USSR government. The foundation was aimed at the development of computer science in the country.The first (1984–2003) director of the Institute wasProf. A. Ailamazyan

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»2009: Organization of the Institute

Artificial Intelligence Research Center

Medical Informatics Research Center

Research Center for Multiprocessor Systems

System Analysis Research Center

Control Processes Research Center

Scientific and Educational Center — International Children’s Computer CenterApril 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 7

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»Ailamazyan University of Pereslavl

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 8

Program Systems Institute of the Russian Academy of Sciences

Supercomputer Projects SKIF and SKIF-GRID of Russia and Belorussia

Supercomputer Supercomputer ProjectsProjects

SKIF and SKIF-GRID SKIF and SKIF-GRID ofof

Russia and BelorussiaRussia and Belorussia

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»SKIF and SKIF-GRIDSupercomputing Projects

Joint Supercomputing Projects ofRussian Federation and Republic of Belarus

R&D in all directions and levels of supercomputer and grid-technologies: hardware, operating system, parallel programming systems, applications etc.

SKIF: 2000–2004,10 + 10 = 20 organizations

SKIF-GRID: 2007–2010,12 + 23 = 35 organizations

PSI RAS is lead organizationfrom Russian Federation

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 10

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»

SKIF-GRID Project organization

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 11

Project directions1.Grid technology2.Supercomputer

s

• SW

• HW3.Security4.Pilot projects —

applications of HPC and grid technology

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»Series 1, 2, and 3 of the SKIF supercomputers

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 12

Series 1 (2000–2003)2000: SKIF Firstborn 0.02/0.0112001: SKIF ВМ-5100 0.048/0.0262003: SKIF ES1710.03 0.04/.023

Series 2 (2003–2007)2003: SKIF -Forge-32 0.1/0.0742003: SKIF K-500 0.717/0.4172004: SKIF К-1000 2.53/2.03

Series 3 (2007–2008)2007: SKIF Cyberia 12/9.012008: SKIF Ural 15.94/12.2 2008: SKIF MSU 60/47.17

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»Flagship of SKIF supercomputers:SKIF MSU (March 2008)

June 2008: #36 in Top500Peak performance 60 Tflops, Linpack: 47

TflopsOriginal blade design, CPU model: 4-cores

Intel XEON E5472 3,0 GHzNodes (dual CPU): 625CPU cores total: 5,000 Interconnect:

Infiniband DDR,Fat Tree

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 13

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 14

10,000,000

1,000,000

100,000

10,000

1,000

100

Top 1Top10Top100Top200Top300Top400Top500M ade in Russia

21

3

45

6

LipackGflops

2002 JuneMVS 1000M0.734/1.024 TFlops

2003 NovemberSKIF K-5000.423/0.717 TFlops

2004 NovemberSKIF K-10002.032/2.534 TFlops

2007 FebruarySKIF Cyberia9.013/12.002 TFlops

2008 MaySKIF Ural12.2/15.9 TFlops

2008 майSKIF MSU47.1/60 TFlops

Only six developed in Russia supercomputers were ranked in the Top500… Five of them are SKIFs

Only six developed in Russia supercomputers were ranked in the Top500… Five of them are SKIFs

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 15

Top1Top10Top100Top200Top300Top400Top500TopSKIF

100,000,000

1,000,000

100,000

10,000

1,000

100

LipackGflops

10

10,000,000

SeriesSeries11

Series Series 22

Series Series 33

Series Series 44

2032 Gflops SKIF K-1000472 Gflops SKIF K-500

57 Gflops Firstborn-M26 Gflops VM510011 Gflops Firstborn

47.17 Tflops SKIF MSU12.2 Tflops SKIF Ural9 Тflops SKIF Syberia

Top1Top10Top100Top200Top300Top400Top500TopSKIF

100,000,000

1,000,000

100,000

10,000

1,000

100

LipackGflops

10

10,000,000

1Q 2012 SKIF P~5.03Q 2010 SKIF P-1.03Q 2009 SKIF P-0.5

Completed: Series 1–3Nearest plan: Series 4

Linpack

Series 1, 2, 3 and 4 of SKIF supercomputers

Program Systems Institute of the Russian Academy of Sciences

Supercomputer Projects SKIF and SKIF-GRID of Russia and Belorussia

2008–2010: 2008–2010: Series Series 44 of ofSKIF supercomputersSKIF supercomputers

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»

SKIF Series 4: Aims of R&D

Highest density of performance(biggest possible number CPU per 1U) Smaller latency Less cables and connectors — better reliability Enlarged emission of heat per 1U

• We need new technology of cooling… How to? Improved Interconnect: we need better

scalability, bandwidth and latency that it’s provided by best available solutions (eg. Infiniband QDR)

New approach to monitoring and management of the supercomputer

Combining standard CPUs and accelerators in computational nodes of the supercomputer

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 17

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»Spring’2008: SKIF Series 4 — How To?

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 18

How to enlarge number of

CPU per 1U?

How to enlarge number of

CPU per 1U?

How to cool supercomputer

nodes?

How to cool supercomputer

nodes?

How to developimproved

interconnect?

How to developimproved

interconnect?

How to combinestandard CPUs and

accelerators?

How to combinestandard CPUs and

accelerators?

How developmanagementsubsystem?

How developmanagementsubsystem?

SKIF series 4SKIF series 4is extremelyis extremely

complex project.complex project.We need strongWe need strong

partners!partners!

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»Summer’2008: SKIF Series 4 — Know How!

Italian-Russian Cooperation«SKIF Series 4» ==

«SKIF-AURORA Project»Designed by an alliance of

Eurotech, PSI RAS and RSC SKIF with support by Intel

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 19

Program SystemsInstitute of RAS

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»SKIF-AURORA: Designed by the alliance of Eurotech, PSI RAS and RSC SKIF

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 20

Program SystemsInstitute of RAS

PCBs, schematics,

mechanics, power

supply, cooling,

1 and 2 levels of

management system

3 level of management

system, Interconnect

(3D-torus: firmware,

routing, drivers,

MPI-2…), FPGA as

accelerator

Program Systems Institute of the Russian Academy of Sciences

Supercomputer Projects SKIF and SKIF-GRID of Russia and Belorussia

SKIF-AURORA:SKIF-AURORA:State of the ProjectState of the Project

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 21

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»

Node Card

19 апреля, 2023

СКИФ-ГРИД © 2009 Все права защищены Слайд 22

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»

PSU Card

19 апреля, 2023 СКИФ-ГРИД © 2009 Все права защищены Слайд 23

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»

Root Card

19 апреля, 2023 СКИФ-ГРИД © 2009 Все права защищены Слайд 24

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»

Chassis

19 апреля, 2023 СКИФ-ГРИД © 2009 Все права защищены Слайд 25

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»

Chassis

19 апреля, 2023 СКИФ-ГРИД © 2009 Все права защищены Слайд 26

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»

Chassis

19 апреля, 2023 СКИФ-ГРИД © 2009 Все права защищены Слайд 27

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»ISC’09, Hamburg, June 23–25, 2009

19 апреля, 2023 СКИФ-ГРИД © 2009 Все права защищены Слайд 28

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»

RackRack

19 апреля, 2023 СКИФ-ГРИД © 2009 Все права защищены Слайд 29

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»SystemPoject SKIF-Aurora 500 Tflops

19 апреля, 2023 СКИФ-ГРИД © 2009 Все права защищены Слайд 30

Program Systems Institute of the Russian Academy of Sciences

Supercomputer Projects SKIF and SKIF-GRID of Russia and Belorussia

SKIF-AURORA:SKIF-AURORA:Management SubsystemManagement Subsystem

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 31

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»Subjects of Management Subsystem

1 Pflops = 42 racks == 10,752 nodes+ 672 DC/DC trays+ 672 root nodes

For scalability we need robust and redundant management subsystem

Comprehensive monitoring and control in all situations

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 32

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»1st Level of Management Subsystem

Standard solution: IPMI over TCP/IP (Infiniband)

Available when nodes, root card, and IB-network are powered on and work properly

Root cards and DC/DC trays are not covered by monitoring and control

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 33

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»2nd Level of Management Subsystem

Catalyst module on the root card implements node power control and serial console for the nodes

Available when root card and IB-network are powered on and work properly

Root cards and DC/DC trays are not covered by monitoring and control

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 34

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»3rd Level of Management Subsystem

SKIF Servnet: independent sensor network

Available always, uses dedicated power network, power consumption: 3W per chassis

Accessible over dedicated network: Ethernet + CANbus + I2C

Monitors temperature, humidity, supply voltages on node cards, root card, DC/DC tray. Transfer this information to 2nd level (to Catalist)

Can turn off DC/DC PSU in case of emergency

Turn-off decision is made locally by ARM microcontroller located on the root cardApril 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 35

Program SystemsInstitute of RAS

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»SKIF-AURORA Management Subsystem: Total monitoring and control

3-way redundantDesigned for “dark

datacenter”Robust management

subsystem

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 36

Program SystemsInstitute of RAS

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»

SKIF-AURORA Management Subsystem: Total monitoring and control

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved

Program Systems Institute of the Russian Academy of Sciences

Supercomputer Projects SKIF and SKIF-GRID of Russia and Belorussia

3D-torus Interconnect3D-torus Interconnect

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 38

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»

3D-torus Interconnect

Only QCD specific is implemented by Italian teamRussian teams to upgrade network to general-

purpose interconnect (MPI 2.0)Due to appear fall 2009 Support and improvements in 2010–2012

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 39

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»3D-torus Interconnect. Current status

Simple rounting implemented on a prototype (SKIFino)

Routing on single-FPGA prototype is working

MPI is based on MPICH2 codebase — prototyped

MPICH2 self-test implemented

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 40

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»R&D Directions Using FPGA Resources

Collective MPI operations using FPGA FPGA to facilitate support of PGAS-languages

(UPC, Titanium, etc) FPGA+CPU hybrid computing

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 41

System Interconnect, 3D-torus

Subsidiary Interconnect, Infiniband

FPGA FPGA FPGA FPGA...

CPU CPU CPU CPUstandard part

non-standard part

Program Systems Institute of the Russian Academy of Sciences

Supercomputer Projects SKIF and SKIF-GRID of Russia and Belorussia

ConclusionsConclusions

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 42

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»

Conclusions

SKIF-AURORA project Is based on collaboration between international

teams Harnesses shared expertise and results Aimed to develop a family of top-level

supercomputers with innovative techniques: Higher density of CPUs (flops per volume) Efficient water cooling system Efficient power supply system Scalable powerful 3D-Torus Interconnect Most modern standard CPUs for computation and

FPGA for its acceleration Redundant robust management subsystem Etc.

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 43

Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»

Conclusions

The collaboration between Italian and Russian teams Allows to obtain world class supercomputer

technologies Provides leading positions in supercomputer

industry (at least in the nearest future) for all participants of the collaboration

Makes all results available in reasonable time and by reasonable efforts and resources

April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 44

Program Systems Institute of the Russian Academy of Sciences

Supercomputer Projects SKIF and SKIF-GRID of Russia and Belorussia

Grazie per l’attenzione

!

Recommended