View
214
Download
0
Tags:
Embed Size (px)
Citation preview
11
Program Systems Institute Russian Academy of Sciences
Recent Advances inRecent Advances inParallel Computing Parallel Computing
TechnologiesTechnologiesat PSI RAS RCMSat PSI RAS RCMSProgram Systems Institute RAS, Program Systems Institute RAS,
Alexander Moskovsky, Sergei Abramov Alexander Moskovsky, Sergei Abramov 06/09/0506/09/05
Pereslavl-ZalesskyPereslavl-Zalessky
22
Program Systems Institute Russian Academy of Sciences
Supercomputing Supercomputing Project “SKIF”Project “SKIF”
33
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. SKIF Supercomputing SKIF Supercomputing
ProjectProjectSKIF Supercomputing SKIF Supercomputing
ProjectProject Joint of Russian Federation Joint of Russian Federation
and Republic of Belarusand Republic of Belarus 2000-2004 2000-2004 10 + 10 organizations10 + 10 organizations PSI RAS is lead organization PSI RAS is lead organization
from Russian Federationfrom Russian Federation Hardware and SoftwareHardware and Software
55
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
T-System HistoryT-System HistoryT-System HistoryT-System History Mid-Mid-80-80-iesies
Basic ideasBasic ideas of T-Systemof T-System 1990-1990-iesies
First implementationFirst implementation of T-Systemof T-System 2001-20022001-2002, “SKIF” , “SKIF”
GRACE — Graph Reduction Applied to GRACE — Graph Reduction Applied to Cluster Environment Cluster Environment
2003-current, “SKIF”2003-current, “SKIF” Open TS — Open T-systemOpen TS — Open T-system
66
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Comparison: T-System and Comparison: T-System and MPIMPI
Comparison: T-System and Comparison: T-System and MPIMPI
C/Fortran T-System
Assembler MPI
High-levela few
keywords
Low-levelhundred(s)primitives
Sequential Parallel
77
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Related workRelated workRelated workRelated work Parallel Programming Parallel Programming
Using C++ (Scientific Using C++ (Scientific and Engineering and Engineering Computation) by Computation) by Gregory V. Wilson Gregory V. Wilson (Editor), Paul Lu (Editor)(Editor), Paul Lu (Editor)
ABC++, Amelia, CC++, ABC++, Amelia, CC++, CHAOS++, COOL, C+CHAOS++, COOL, C++//, ICC++, Mentat, +//, ICC++, Mentat, MPC++, MPI++, pC++, MPC++, MPI++, pC++, POOMA, TAU, UC++POOMA, TAU, UC++
88
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
T-System in ComparisonT-System in ComparisonT-System in ComparisonT-System in ComparisonRelated workRelated work Open TS differentiatorOpen TS differentiator
Charm++Charm++ FP-based approachFP-based approach
UPC, mpC++UPC, mpC++ Implicit parallelismImplicit parallelism
Glasgow Glasgow Parallel HaskellParallel Haskell
Allows C/C++ based low-Allows C/C++ based low-level optimizationlevel optimization
OMPC++OMPC++ Provides both language Provides both language and C++ templates and C++ templates librarylibrary
CilkCilk Supports SMP, MPI, PVM, Supports SMP, MPI, PVM, and GRID platformsand GRID platforms
99
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Open TS: an OutlineOpen TS: an OutlineOpen TS: an OutlineOpen TS: an Outline High-performance computing High-performance computing ““Automatic dynamic parallelization”Automatic dynamic parallelization” Combining functional and Combining functional and
imperative approaches, high-level imperative approaches, high-level parallel programmingparallel programming
Т++ Т++ language: “Parallel dialect” of language: “Parallel dialect” of C++ — an approach popular in 90-C++ — an approach popular in 90-iesies
1010
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Т-Т-ApproachApproachТ-Т-ApproachApproach ““Pure” function (Pure” function (tfunctiontfunction) invocations ) invocations
produce grains of parallelismproduce grains of parallelism T-Program isT-Program is
Functional – on higher levelFunctional – on higher level Imperative – on low level (optimization)Imperative – on low level (optimization)
C-compatible execution modelC-compatible execution model Non-ready variables, Multiple Non-ready variables, Multiple
assignmentassignment ““Seamless” C-extension Seamless” C-extension (or Fortran-(or Fortran-
extension)extension)
1111
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Т++Т++ Keywords KeywordsТ++Т++ Keywords Keywords tfuntfun —— Т-Т-functionfunction tvaltval—— Т-Т-variablevariable tptrtptr—— Т-Т-pointerpointer touttout —— Output parameter (like &) Output parameter (like &) tdroptdrop —— Make ready Make ready twaittwait —— Wait for readiness Wait for readiness tcttct —— Т-Т-contextcontext
1212
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Sample ProgramSample ProgramSample ProgramSample Program#include <stdio.h>#include <stdio.h>
tfuntfun int fib (int n) { int fib (int n) { return n < 2 ? n : fib(n-1)+fib(n-2);return n < 2 ? n : fib(n-1)+fib(n-2);}}
tfuntfun int main (int argc, char **argv) { int main (int argc, char **argv) { if (argc != 2) { printf("Usage: fib <n>\n"); return 1; }if (argc != 2) { printf("Usage: fib <n>\n"); return 1; } int n = atoi(argv[1]);int n = atoi(argv[1]); printf("fib(%d) = %d\n", n, printf("fib(%d) = %d\n", n, (int)fib(n));(int)fib(n)); return 0;return 0;}}
1313
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Open TSOpen TS: : EnvironmentEnvironmentOpen TSOpen TS: : EnvironmentEnvironment
Supports 1000 000 threads per CPU
1414
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
NPB, TestNPB, Test ЕР ЕРRewritten @OpenTSRewritten @OpenTS
NPB, TestNPB, Test ЕР ЕРRewritten @OpenTSRewritten @OpenTS
ЕР – ЕР – EmbarrassinglEmbarrassingly Parallely Parallel
NASA Parallel NASA Parallel Benchmarks Benchmarks suitesuite
SpeedupSpeedup = = 96%96%of theoretical of theoretical maximummaximum(on 10 nodes)(on 10 nodes)
Time, % of sequential
Efficiency,
% of theoretical
1515
Program Systems Institute Russian Academy of Sciences
Open TS vs MPI case Open TS vs MPI case studystudy
1616
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
ApplicationsApplicationsApplicationsApplications Popular and widely used Popular and widely used Developed by independent teams (MPI Developed by independent teams (MPI
experts)experts)
PovRayPovRay – Persistence of Vision Ray- – Persistence of Vision Ray-tracer, enabled for parallel run by a tracer, enabled for parallel run by a patchpatch
ALCMD/MP_liteALCMD/MP_lite – molecular dynamics – molecular dynamics package (Ames Lab)package (Ames Lab)
1717
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
T-PovRay vs MPI PovRay: T-PovRay vs MPI PovRay: code complexitycode complexity
T-PovRay vs MPI PovRay: T-PovRay vs MPI PovRay: code complexitycode complexity
ProgramProgram Source code Source code volumevolume
MPI modules for MPI modules for PovRay 3.10gPovRay 3.10g
1,500 lines1,500 lines
MPI patch for MPI patch for PovRay 3.50cPovRay 3.50c
3,000 lines3,000 lines
T++ modules (for T++ modules (for both versions 3.10g & both versions 3.10g & 3.50c)3.50c)
200 lines200 lines
1818
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
T-PovRay vs MPI PovRay: T-PovRay vs MPI PovRay: performanceperformance
T-PovRay vs MPI PovRay: T-PovRay vs MPI PovRay: performanceperformance
90%100%110%120%130%140%150%160%170%180%190%200%210%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of processors
Time MPI/Time OpenTS
16 dual Athlon 1800, AMD Athlon MP 1800+ RAM 1GB, FastEthernet, LAM 7.0.6
1919
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
T-PovRay vs MPI PovRay: T-PovRay vs MPI PovRay: performanceperformance
T-PovRay vs MPI PovRay: T-PovRay vs MPI PovRay: performanceperformance
90%100%110%120%130%140%150%160%170%180%190%200%210%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of processors
Time MPI/Time OpenTS
2CPUs AMD Opteron 248 2.2 GHz RAM 4GB, GigE, LAM 7.1.1
2020
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. ALCMD/MPI vs ALCMD/MPI vs
ALCMD/OpenTS ALCMD/OpenTS ALCMD/MPI vs ALCMD/MPI vs
ALCMD/OpenTS ALCMD/OpenTS MP_Lite component of ALCMD MP_Lite component of ALCMD
rewritten in T++rewritten in T++ Fortran code is left intact Fortran code is left intact
M PI
M PIM P_Lite
ALCMD
OpenTS
OpenTSM P_Lite
ALCMD
2121
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. ALCMD/MPI vs ALCMD/MPI vs
ALCMD/OpenTS : ALCMD/OpenTS : code complexitycode complexity
ALCMD/MPI vs ALCMD/MPI vs ALCMD/OpenTS : ALCMD/OpenTS : code complexitycode complexity
ProgramProgram Source code Source code volumevolume
MP_Lite total/MPIMP_Lite total/MPI ~20,000 lines~20,000 lines
MP_Lite,ALCMD-MP_Lite,ALCMD-related/related/MPIMPI
~3,500 lines~3,500 lines
MP_Lite,ALCMD-MP_Lite,ALCMD-related/related/OpenTSOpenTS
500 lines500 lines
2222
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. ALCMD/MPI vs ALCMD/MPI vs
ALCMD/OpenTS: ALCMD/OpenTS: performanceperformance
ALCMD/MPI vs ALCMD/MPI vs ALCMD/OpenTS: ALCMD/OpenTS:
performanceperformance
80%
90%
100%
110%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of processors
Time MPI/Time OpenTS
16 dual Athlon 1800, AMD Athlon MP 1800+ RAM 1GB, FastEthernet, LAM 7.0.6, Lennard-Jones MD, 512000 atoms
2323
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. ALCMD/MPI vs ALCMD/MPI vs
ALCMD/OpenTS: ALCMD/OpenTS: performanceperformance
ALCMD/MPI vs ALCMD/MPI vs ALCMD/OpenTS: ALCMD/OpenTS:
performanceperformance
80%
90%
100%
110%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of processors
Time MPI/Time OpenTS
2CPUs AMD Opteron 248 2.2 GHz RAM 4GB, GigE, LAM 7.1.1, Lennard-Jones MD, 512000 atoms
2424
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. ALCMD/MPI vs ALCMD/MPI vs
ALCMD/OpenTS: ALCMD/OpenTS: performanceperformance
ALCMD/MPI vs ALCMD/MPI vs ALCMD/OpenTS: ALCMD/OpenTS:
performanceperformance
80%
90%
100%
110%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of processors
Time MPI/Time OpenTS
2CPUs AMD Opteron 248 2.2 GHz RAM 4GB, InfiniBand,MVAMPICH 0.9.4, Lennard-Jones MD,512000 atoms
2626
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Т-Т-ApplicationsApplicationsТ-Т-ApplicationsApplications MultiGen – biological activity estimationMultiGen – biological activity estimation Remote sensing applicationsRemote sensing applications Plasma modelingPlasma modeling Protein simulationProtein simulation AeromechanicsAeromechanics Query engine for XMLQuery engine for XML AI-applicationsAI-applications etc.etc.
2727
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
MultiGenMultiGenChelyabinsk State UniversityChelyabinsk State University
MultiGenMultiGenChelyabinsk State UniversityChelyabinsk State University
Level 0
Level 1
Level 2
Multi-conformation model
К0
К11 К12
К21 К22
2828
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
MultiGen: SpeedupMultiGen: Speedup
Substance Atom number
Rotations number
Conformers Exectution time (min.:с)
1 node 4 nodes 16 nodes
NCI-609067 28 4 13 9:33 3:21 1:22
TOSLAB A2-0261 82 18 49 115:27 39:23 16:09
NCI-641295 126 25 74 266:19 95:57 34:48
National Cancer Institute USAReg.No. NCI-609067(AIDS drug lead)
TOSLAB company (Russia-Belgium)Reg.No. TOSLAB A2-0261(antiphlogistic drug lead)
National Cancer Institute USAReg.No. NCI-641295(AIDS drug lead)
2929
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
AeromechanicsAeromechanicsInstitute of Mechanics, MSUInstitute of Mechanics, MSU
AeromechanicsAeromechanicsInstitute of Mechanics, MSUInstitute of Mechanics, MSU
3030
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
AEROMECHANICSAEROMECHANICSInstitute of Mechanics, MSUInstitute of Mechanics, MSU
AEROMECHANICSAEROMECHANICSInstitute of Mechanics, MSUInstitute of Mechanics, MSU
3131
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Creating space-born radar image from Creating space-born radar image from hologramhologram
Creating space-born radar image from Creating space-born radar image from hologramhologram
0
5
10
15
20
25
30
35
40
45
1 4 8 12 16 20 24 28
0
5
10
15
20
25
30
35
40
45
1 4 8 12 16 20 24 28
3232
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Simulating broadband radar Simulating broadband radar signalsignal
Simulating broadband radar Simulating broadband radar signalsignal
Graphical User Interface
Non-PSI RAS development team (Space research institute of Khrunichev corp.)
0
50
100
150
200
250
300
1 4 8 12 16 20 24 28
0
50
100
150
200
250
300
1 4 8 12 16 20 24 28
3333
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. Landsat Image Landsat Image
ClassificationClassification Landsat Image Landsat Image
ClassificationClassification Computational Computational “web-service”“web-service”
3434
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Future WorkFuture WorkFuture WorkFuture Work
Multi-kernel CPU support Multi-kernel CPU support Distributed computingDistributed computing
SchedulersSchedulers TransportTransport Interface to web-servicesInterface to web-services
Fault-toleranceFault-tolerance Optimizing for modern CPUsOptimizing for modern CPUs Algorithmic skeletons, patterns and high Algorithmic skeletons, patterns and high
level parallel librarieslevel parallel libraries
3535
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Out of Presentation ScopeOut of Presentation ScopeOut of Presentation ScopeOut of Presentation Scope Other T-languages: T-Refal, T-FortanOther T-languages: T-Refal, T-Fortan MemoizationMemoization Automatically choosing between call-Automatically choosing between call-
style and fork-style of function invocationstyle and fork-style of function invocation CheckpointingCheckpointing Heartbeat mechanismHeartbeat mechanism FlavoursFlavours of data references: “normal”, of data references: “normal”,
“glue” and “magnetic” “glue” and “magnetic” — — lazy, eager and lazy, eager and ultra-eager (speculative) data transferultra-eager (speculative) data transfer
3636
Program Systems Institute Russian Academy of Sciences
Other Software EffortsOther Software Efforts
3737
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Roshydromet: Losev’s Roshydromet: Losev’s weather forecast model weather forecast model
Roshydromet: Losev’s Roshydromet: Losev’s weather forecast model weather forecast model
3838
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
GRID TESTBEDGRID TESTBEDGRID TESTBEDGRID TESTBED Network of virtual Network of virtual
machines (classes, machines (classes, users, etc)users, etc)
Total peak Total peak performance – 79 performance – 79 GFlopsGFlops
Linux “crippled” Linux “crippled” distribution, auto-distribution, auto-update, moinitoringupdate, moinitoring
3939
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. CHEMICAL DOCKING wCHEMICAL DOCKING w T-T-
GRIDGRIDCHEMICAL DOCKING wCHEMICAL DOCKING w T-T-
GRIDGRID ““Customer” – Faculty of Bioinformatics, Customer” – Faculty of Bioinformatics,
MSUMSU Looking for a drug candidate among Looking for a drug candidate among
large set of substabceslarge set of substabces
мишень
1)
2)
3) ...
4040
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
ACKNOLEDGEMENTSACKNOLEDGEMENTSACKNOLEDGEMENTSACKNOLEDGEMENTS ““SKIF” supercomputing projectSKIF” supercomputing project Russian Academy of Science grantsRussian Academy of Science grants
Program “High-performance computing systems on Program “High-performance computing systems on new principles of computational process new principles of computational process organization” organization”
Program of Presidium of Russian Academy of Program of Presidium of Russian Academy of Science “Development of basics for implementation Science “Development of basics for implementation of distributed scientific informational-computational of distributed scientific informational-computational environment on GRID technologies”environment on GRID technologies”
Russian Foundation Basic Research “05-07-Russian Foundation Basic Research “05-07-08005-офи_а”08005-офи_а”
Microsoft – contract for “Open TS vs MPI” case Microsoft – contract for “Open TS vs MPI” case studystudy
4141
Program Systems Institute Russian Academy of Sciences
THANKS THANKS
… … … … ANY QUESTIONSANY QUESTIONS ??????… …… …
4343
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. Tests:Tests: NASA CG, NASA EP, NASA CG, NASA EP,
FIBFIBTests:Tests: NASA CG, NASA EP, NASA CG, NASA EP,
FIBFIB
0,00%
10,00%
20,00%
30,00%
40,00%
50,00%
60,00%
70,00%
80,00%
90,00%
100,00%
0 5 10 15 20 25 30 35CPUS
Eff
icie
ncy
CG class A EP fib(45)
4444
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
EP @ OpenTS benchmark EP @ OpenTS benchmark EP @ OpenTS benchmark EP @ OpenTS benchmark Embarrassingly parallel Embarrassingly parallel Recursive implementationRecursive implementation Two parametersTwo parameters
size – number of operations in task ~ 2size – number of operations in task ~ 2sizesize
depth – number of grains (t-function calls) depth – number of grains (t-function calls) = 2= 2depthdepth
Number of operations per grain ~ 2Number of operations per grain ~ 2size-depthsize-depth
Allows to stress Runtime Allows to stress Runtime
4545
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
NPB, TestNPB, Test ЕР ЕРRewritten @OpenTSRewritten @OpenTS
NPB, TestNPB, Test ЕР ЕРRewritten @OpenTSRewritten @OpenTS
ЕР – ЕР – EmbarrassinglEmbarrassingly Parallely Parallel
NASA Parallel NASA Parallel Benchmarks Benchmarks suitesuite
SpeedupSpeedup = = 96%96%of theoretical of theoretical maximummaximum(on 10 nodes)(on 10 nodes)
Time, % of sequential
Efficiency,
% of theoretical
4646
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
Additional EPsAdditional EPsAdditional EPsAdditional EPsThe same T++ source code linked with The same T++ source code linked with
different RTL extensionsdifferent RTL extensionsEPEP – standard, with dynamic load – standard, with dynamic load
balancebalanceEP_ASYNCEP_ASYNC – “asynchronous” , data – “asynchronous” , data
exchange interrupts calculationexchange interrupts calculationEP_GSEP_GS – “grid scheduler”, minimize load – “grid scheduler”, minimize load
deviation when assigning a taskdeviation when assigning a taskEP_GS_ASYNCEP_GS_ASYNC – “grid scheduler” with – “grid scheduler” with
“asynchronous” data exchange“asynchronous” data exchange
4747
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
EP metricsEP metricsEP metricsEP metrics M M Calculated asCalculated as
1.1. 22sizesize/time/number of CPUs/time/number of CPUs2.2. Take % of the best over all experimentsTake % of the best over all experiments
Good metric: is approx. the same on a Good metric: is approx. the same on a single CPU with single CPU with depthsdepths between 6 and between 6 and 12 12
Cluster: 16 Dual Athlon 1800MP+, Fast Cluster: 16 Dual Athlon 1800MP+, Fast Ethernet Ethernet
4848
Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.
EP resultsEP resultsEP resultsEP results For all For all size size [28,32], depth [28,32], depth [6,12],[6,12],
MM=99,9% if Ncpu=1=99,9% if Ncpu=1
M M drops below 90% if NCPU>8 CPU for drops below 90% if NCPU>8 CPU for size=6size=6
On 32 CPUs EP_GS_ASYNC is the best On 32 CPUs EP_GS_ASYNC is the best with with M=88,2%M=88,2%, and , and depth=12, depth=12, size=32size=32