48
1 Program Systems Institute Russian Academy of Sciences Recent Advances in Recent Advances in Parallel Computing Parallel Computing Technologies Technologies at PSI RAS RCMS at PSI RAS RCMS Program Systems Institute RAS, Program Systems Institute RAS, Alexander Moskovsky, Sergei Abramov Alexander Moskovsky, Sergei Abramov 06/09/05 06/09/05 Pereslavl-Zalessky Pereslavl-Zalessky

Program Systems Institute Russian Academy of Sciences1 Recent Advances in Parallel Computing Technologies at PSI RAS RCMS Program Systems Institute RAS,

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

11

Program Systems Institute Russian Academy of Sciences

Recent Advances inRecent Advances inParallel Computing Parallel Computing

TechnologiesTechnologiesat PSI RAS RCMSat PSI RAS RCMSProgram Systems Institute RAS, Program Systems Institute RAS,

Alexander Moskovsky, Sergei Abramov Alexander Moskovsky, Sergei Abramov 06/09/0506/09/05

Pereslavl-ZalesskyPereslavl-Zalessky

22

Program Systems Institute Russian Academy of Sciences

Supercomputing Supercomputing Project “SKIF”Project “SKIF”

33

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. SKIF Supercomputing SKIF Supercomputing

ProjectProjectSKIF Supercomputing SKIF Supercomputing

ProjectProject Joint of Russian Federation Joint of Russian Federation

and Republic of Belarusand Republic of Belarus 2000-2004 2000-2004 10 + 10 organizations10 + 10 organizations PSI RAS is lead organization PSI RAS is lead organization

from Russian Federationfrom Russian Federation Hardware and SoftwareHardware and Software

44

Program Systems Institute Russian Academy of Sciences

Open TS OverviewOpen TS Overview

55

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

T-System HistoryT-System HistoryT-System HistoryT-System History Mid-Mid-80-80-iesies

Basic ideasBasic ideas of T-Systemof T-System 1990-1990-iesies

First implementationFirst implementation of T-Systemof T-System 2001-20022001-2002, “SKIF” , “SKIF”

GRACE — Graph Reduction Applied to GRACE — Graph Reduction Applied to Cluster Environment Cluster Environment

2003-current, “SKIF”2003-current, “SKIF” Open TS — Open T-systemOpen TS — Open T-system

66

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Comparison: T-System and Comparison: T-System and MPIMPI

Comparison: T-System and Comparison: T-System and MPIMPI

C/Fortran T-System

Assembler MPI

High-levela few

keywords

Low-levelhundred(s)primitives

Sequential Parallel

77

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Related workRelated workRelated workRelated work Parallel Programming Parallel Programming

Using C++ (Scientific Using C++ (Scientific and Engineering and Engineering Computation) by Computation) by Gregory V. Wilson Gregory V. Wilson (Editor), Paul Lu (Editor)(Editor), Paul Lu (Editor)

ABC++, Amelia, CC++, ABC++, Amelia, CC++, CHAOS++, COOL, C+CHAOS++, COOL, C++//, ICC++, Mentat, +//, ICC++, Mentat, MPC++, MPI++, pC++, MPC++, MPI++, pC++, POOMA, TAU, UC++POOMA, TAU, UC++

88

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

T-System in ComparisonT-System in ComparisonT-System in ComparisonT-System in ComparisonRelated workRelated work Open TS differentiatorOpen TS differentiator

Charm++Charm++ FP-based approachFP-based approach

UPC, mpC++UPC, mpC++ Implicit parallelismImplicit parallelism

Glasgow Glasgow Parallel HaskellParallel Haskell

Allows C/C++ based low-Allows C/C++ based low-level optimizationlevel optimization

OMPC++OMPC++ Provides both language Provides both language and C++ templates and C++ templates librarylibrary

CilkCilk Supports SMP, MPI, PVM, Supports SMP, MPI, PVM, and GRID platformsand GRID platforms

99

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Open TS: an OutlineOpen TS: an OutlineOpen TS: an OutlineOpen TS: an Outline High-performance computing High-performance computing ““Automatic dynamic parallelization”Automatic dynamic parallelization” Combining functional and Combining functional and

imperative approaches, high-level imperative approaches, high-level parallel programmingparallel programming

Т++ Т++ language: “Parallel dialect” of language: “Parallel dialect” of C++ — an approach popular in 90-C++ — an approach popular in 90-iesies

1010

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Т-Т-ApproachApproachТ-Т-ApproachApproach ““Pure” function (Pure” function (tfunctiontfunction) invocations ) invocations

produce grains of parallelismproduce grains of parallelism T-Program isT-Program is

Functional – on higher levelFunctional – on higher level Imperative – on low level (optimization)Imperative – on low level (optimization)

C-compatible execution modelC-compatible execution model Non-ready variables, Multiple Non-ready variables, Multiple

assignmentassignment ““Seamless” C-extension Seamless” C-extension (or Fortran-(or Fortran-

extension)extension)

1111

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Т++Т++ Keywords KeywordsТ++Т++ Keywords Keywords tfuntfun —— Т-Т-functionfunction tvaltval—— Т-Т-variablevariable tptrtptr—— Т-Т-pointerpointer touttout —— Output parameter (like &) Output parameter (like &) tdroptdrop —— Make ready Make ready twaittwait —— Wait for readiness Wait for readiness tcttct —— Т-Т-contextcontext

1212

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Sample ProgramSample ProgramSample ProgramSample Program#include <stdio.h>#include <stdio.h>

tfuntfun int fib (int n) { int fib (int n) { return n < 2 ? n : fib(n-1)+fib(n-2);return n < 2 ? n : fib(n-1)+fib(n-2);}}

tfuntfun int main (int argc, char **argv) { int main (int argc, char **argv) { if (argc != 2) { printf("Usage: fib <n>\n"); return 1; }if (argc != 2) { printf("Usage: fib <n>\n"); return 1; } int n = atoi(argv[1]);int n = atoi(argv[1]); printf("fib(%d) = %d\n", n, printf("fib(%d) = %d\n", n, (int)fib(n));(int)fib(n)); return 0;return 0;}}

1313

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Open TSOpen TS: : EnvironmentEnvironmentOpen TSOpen TS: : EnvironmentEnvironment

Supports 1000 000 threads per CPU

1414

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

NPB, TestNPB, Test ЕР ЕРRewritten @OpenTSRewritten @OpenTS

NPB, TestNPB, Test ЕР ЕРRewritten @OpenTSRewritten @OpenTS

ЕР – ЕР – EmbarrassinglEmbarrassingly Parallely Parallel

NASA Parallel NASA Parallel Benchmarks Benchmarks suitesuite

SpeedupSpeedup = = 96%96%of theoretical of theoretical maximummaximum(on 10 nodes)(on 10 nodes)

Time, % of sequential

Efficiency,

% of theoretical

1515

Program Systems Institute Russian Academy of Sciences

Open TS vs MPI case Open TS vs MPI case studystudy

1616

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

ApplicationsApplicationsApplicationsApplications Popular and widely used Popular and widely used Developed by independent teams (MPI Developed by independent teams (MPI

experts)experts)

PovRayPovRay – Persistence of Vision Ray- – Persistence of Vision Ray-tracer, enabled for parallel run by a tracer, enabled for parallel run by a patchpatch

ALCMD/MP_liteALCMD/MP_lite – molecular dynamics – molecular dynamics package (Ames Lab)package (Ames Lab)

1717

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

T-PovRay vs MPI PovRay: T-PovRay vs MPI PovRay: code complexitycode complexity

T-PovRay vs MPI PovRay: T-PovRay vs MPI PovRay: code complexitycode complexity

ProgramProgram Source code Source code volumevolume

MPI modules for MPI modules for PovRay 3.10gPovRay 3.10g

1,500 lines1,500 lines

MPI patch for MPI patch for PovRay 3.50cPovRay 3.50c

3,000 lines3,000 lines

T++ modules (for T++ modules (for both versions 3.10g & both versions 3.10g & 3.50c)3.50c)

200 lines200 lines

1818

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

T-PovRay vs MPI PovRay: T-PovRay vs MPI PovRay: performanceperformance

T-PovRay vs MPI PovRay: T-PovRay vs MPI PovRay: performanceperformance

90%100%110%120%130%140%150%160%170%180%190%200%210%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of processors

Time MPI/Time OpenTS

16 dual Athlon 1800, AMD Athlon MP 1800+ RAM 1GB, FastEthernet, LAM 7.0.6

1919

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

T-PovRay vs MPI PovRay: T-PovRay vs MPI PovRay: performanceperformance

T-PovRay vs MPI PovRay: T-PovRay vs MPI PovRay: performanceperformance

90%100%110%120%130%140%150%160%170%180%190%200%210%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of processors

Time MPI/Time OpenTS

2CPUs AMD Opteron 248 2.2 GHz RAM 4GB, GigE, LAM 7.1.1

2020

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. ALCMD/MPI vs ALCMD/MPI vs

ALCMD/OpenTS ALCMD/OpenTS ALCMD/MPI vs ALCMD/MPI vs

ALCMD/OpenTS ALCMD/OpenTS MP_Lite component of ALCMD MP_Lite component of ALCMD

rewritten in T++rewritten in T++ Fortran code is left intact Fortran code is left intact

M PI

M PIM P_Lite

ALCMD

OpenTS

OpenTSM P_Lite

ALCMD

2121

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. ALCMD/MPI vs ALCMD/MPI vs

ALCMD/OpenTS : ALCMD/OpenTS : code complexitycode complexity

ALCMD/MPI vs ALCMD/MPI vs ALCMD/OpenTS : ALCMD/OpenTS : code complexitycode complexity

ProgramProgram Source code Source code volumevolume

MP_Lite total/MPIMP_Lite total/MPI ~20,000 lines~20,000 lines

MP_Lite,ALCMD-MP_Lite,ALCMD-related/related/MPIMPI

~3,500 lines~3,500 lines

MP_Lite,ALCMD-MP_Lite,ALCMD-related/related/OpenTSOpenTS

500 lines500 lines

2222

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. ALCMD/MPI vs ALCMD/MPI vs

ALCMD/OpenTS: ALCMD/OpenTS: performanceperformance

ALCMD/MPI vs ALCMD/MPI vs ALCMD/OpenTS: ALCMD/OpenTS:

performanceperformance

80%

90%

100%

110%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of processors

Time MPI/Time OpenTS

16 dual Athlon 1800, AMD Athlon MP 1800+ RAM 1GB, FastEthernet, LAM 7.0.6, Lennard-Jones MD, 512000 atoms

2323

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. ALCMD/MPI vs ALCMD/MPI vs

ALCMD/OpenTS: ALCMD/OpenTS: performanceperformance

ALCMD/MPI vs ALCMD/MPI vs ALCMD/OpenTS: ALCMD/OpenTS:

performanceperformance

80%

90%

100%

110%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of processors

Time MPI/Time OpenTS

2CPUs AMD Opteron 248 2.2 GHz RAM 4GB, GigE, LAM 7.1.1, Lennard-Jones MD, 512000 atoms

2424

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. ALCMD/MPI vs ALCMD/MPI vs

ALCMD/OpenTS: ALCMD/OpenTS: performanceperformance

ALCMD/MPI vs ALCMD/MPI vs ALCMD/OpenTS: ALCMD/OpenTS:

performanceperformance

80%

90%

100%

110%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of processors

Time MPI/Time OpenTS

2CPUs AMD Opteron 248 2.2 GHz RAM 4GB, InfiniBand,MVAMPICH 0.9.4, Lennard-Jones MD,512000 atoms

2525

Program Systems Institute Russian Academy of Sciences

Open TS applicationsOpen TS applications

2626

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Т-Т-ApplicationsApplicationsТ-Т-ApplicationsApplications MultiGen – biological activity estimationMultiGen – biological activity estimation Remote sensing applicationsRemote sensing applications Plasma modelingPlasma modeling Protein simulationProtein simulation AeromechanicsAeromechanics Query engine for XMLQuery engine for XML AI-applicationsAI-applications etc.etc.

2727

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

MultiGenMultiGenChelyabinsk State UniversityChelyabinsk State University

MultiGenMultiGenChelyabinsk State UniversityChelyabinsk State University

Level 0

Level 1

Level 2

Multi-conformation model

К0

К11 К12

К21 К22

2828

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

MultiGen: SpeedupMultiGen: Speedup

Substance Atom number

Rotations number

Conformers Exectution time (min.:с)

1 node 4 nodes 16 nodes

NCI-609067 28 4 13 9:33 3:21 1:22

TOSLAB A2-0261 82 18 49 115:27 39:23 16:09

NCI-641295 126 25 74 266:19 95:57 34:48

National Cancer Institute USAReg.No. NCI-609067(AIDS drug lead)

TOSLAB company (Russia-Belgium)Reg.No. TOSLAB A2-0261(antiphlogistic drug lead)

National Cancer Institute USAReg.No. NCI-641295(AIDS drug lead)

2929

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

AeromechanicsAeromechanicsInstitute of Mechanics, MSUInstitute of Mechanics, MSU

AeromechanicsAeromechanicsInstitute of Mechanics, MSUInstitute of Mechanics, MSU

3030

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

AEROMECHANICSAEROMECHANICSInstitute of Mechanics, MSUInstitute of Mechanics, MSU

AEROMECHANICSAEROMECHANICSInstitute of Mechanics, MSUInstitute of Mechanics, MSU

3131

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Creating space-born radar image from Creating space-born radar image from hologramhologram

Creating space-born radar image from Creating space-born radar image from hologramhologram

0

5

10

15

20

25

30

35

40

45

1 4 8 12 16 20 24 28

0

5

10

15

20

25

30

35

40

45

1 4 8 12 16 20 24 28

3232

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Simulating broadband radar Simulating broadband radar signalsignal

Simulating broadband radar Simulating broadband radar signalsignal

Graphical User Interface

Non-PSI RAS development team (Space research institute of Khrunichev corp.)

0

50

100

150

200

250

300

1 4 8 12 16 20 24 28

0

50

100

150

200

250

300

1 4 8 12 16 20 24 28

3333

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. Landsat Image Landsat Image

ClassificationClassification Landsat Image Landsat Image

ClassificationClassification Computational Computational “web-service”“web-service”

3434

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Future WorkFuture WorkFuture WorkFuture Work

Multi-kernel CPU support Multi-kernel CPU support Distributed computingDistributed computing

SchedulersSchedulers TransportTransport Interface to web-servicesInterface to web-services

Fault-toleranceFault-tolerance Optimizing for modern CPUsOptimizing for modern CPUs Algorithmic skeletons, patterns and high Algorithmic skeletons, patterns and high

level parallel librarieslevel parallel libraries

3535

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Out of Presentation ScopeOut of Presentation ScopeOut of Presentation ScopeOut of Presentation Scope Other T-languages: T-Refal, T-FortanOther T-languages: T-Refal, T-Fortan MemoizationMemoization Automatically choosing between call-Automatically choosing between call-

style and fork-style of function invocationstyle and fork-style of function invocation CheckpointingCheckpointing Heartbeat mechanismHeartbeat mechanism FlavoursFlavours of data references: “normal”, of data references: “normal”,

“glue” and “magnetic” “glue” and “magnetic” — — lazy, eager and lazy, eager and ultra-eager (speculative) data transferultra-eager (speculative) data transfer

3636

Program Systems Institute Russian Academy of Sciences

Other Software EffortsOther Software Efforts

3737

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Roshydromet: Losev’s Roshydromet: Losev’s weather forecast model weather forecast model

Roshydromet: Losev’s Roshydromet: Losev’s weather forecast model weather forecast model

3838

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

GRID TESTBEDGRID TESTBEDGRID TESTBEDGRID TESTBED Network of virtual Network of virtual

machines (classes, machines (classes, users, etc)users, etc)

Total peak Total peak performance – 79 performance – 79 GFlopsGFlops

Linux “crippled” Linux “crippled” distribution, auto-distribution, auto-update, moinitoringupdate, moinitoring

3939

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. CHEMICAL DOCKING wCHEMICAL DOCKING w T-T-

GRIDGRIDCHEMICAL DOCKING wCHEMICAL DOCKING w T-T-

GRIDGRID ““Customer” – Faculty of Bioinformatics, Customer” – Faculty of Bioinformatics,

MSUMSU Looking for a drug candidate among Looking for a drug candidate among

large set of substabceslarge set of substabces

мишень

1)

2)

3) ...

4040

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

ACKNOLEDGEMENTSACKNOLEDGEMENTSACKNOLEDGEMENTSACKNOLEDGEMENTS ““SKIF” supercomputing projectSKIF” supercomputing project Russian Academy of Science grantsRussian Academy of Science grants

Program “High-performance computing systems on Program “High-performance computing systems on new principles of computational process new principles of computational process organization” organization”

Program of Presidium of Russian Academy of Program of Presidium of Russian Academy of Science “Development of basics for implementation Science “Development of basics for implementation of distributed scientific informational-computational of distributed scientific informational-computational environment on GRID technologies”environment on GRID technologies”

Russian Foundation Basic Research “05-07-Russian Foundation Basic Research “05-07-08005-офи_а”08005-офи_а”

Microsoft – contract for “Open TS vs MPI” case Microsoft – contract for “Open TS vs MPI” case studystudy

4141

Program Systems Institute Russian Academy of Sciences

THANKS THANKS

… … … … ANY QUESTIONSANY QUESTIONS ??????… …… …

4242

Program Systems Institute Russian Academy of Sciences

Open TS benchmarksOpen TS benchmarks

4343

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing. Tests:Tests: NASA CG, NASA EP, NASA CG, NASA EP,

FIBFIBTests:Tests: NASA CG, NASA EP, NASA CG, NASA EP,

FIBFIB

0,00%

10,00%

20,00%

30,00%

40,00%

50,00%

60,00%

70,00%

80,00%

90,00%

100,00%

0 5 10 15 20 25 30 35CPUS

Eff

icie

ncy

CG class A EP fib(45)

4444

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

EP @ OpenTS benchmark EP @ OpenTS benchmark EP @ OpenTS benchmark EP @ OpenTS benchmark Embarrassingly parallel Embarrassingly parallel Recursive implementationRecursive implementation Two parametersTwo parameters

size – number of operations in task ~ 2size – number of operations in task ~ 2sizesize

depth – number of grains (t-function calls) depth – number of grains (t-function calls) = 2= 2depthdepth

Number of operations per grain ~ 2Number of operations per grain ~ 2size-depthsize-depth

Allows to stress Runtime Allows to stress Runtime

4545

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

NPB, TestNPB, Test ЕР ЕРRewritten @OpenTSRewritten @OpenTS

NPB, TestNPB, Test ЕР ЕРRewritten @OpenTSRewritten @OpenTS

ЕР – ЕР – EmbarrassinglEmbarrassingly Parallely Parallel

NASA Parallel NASA Parallel Benchmarks Benchmarks suitesuite

SpeedupSpeedup = = 96%96%of theoretical of theoretical maximummaximum(on 10 nodes)(on 10 nodes)

Time, % of sequential

Efficiency,

% of theoretical

4646

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

Additional EPsAdditional EPsAdditional EPsAdditional EPsThe same T++ source code linked with The same T++ source code linked with

different RTL extensionsdifferent RTL extensionsEPEP – standard, with dynamic load – standard, with dynamic load

balancebalanceEP_ASYNCEP_ASYNC – “asynchronous” , data – “asynchronous” , data

exchange interrupts calculationexchange interrupts calculationEP_GSEP_GS – “grid scheduler”, minimize load – “grid scheduler”, minimize load

deviation when assigning a taskdeviation when assigning a taskEP_GS_ASYNCEP_GS_ASYNC – “grid scheduler” with – “grid scheduler” with

“asynchronous” data exchange“asynchronous” data exchange

4747

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

EP metricsEP metricsEP metricsEP metrics M M Calculated asCalculated as

1.1. 22sizesize/time/number of CPUs/time/number of CPUs2.2. Take % of the best over all experimentsTake % of the best over all experiments

Good metric: is approx. the same on a Good metric: is approx. the same on a single CPU with single CPU with depthsdepths between 6 and between 6 and 12 12

Cluster: 16 Dual Athlon 1800MP+, Fast Cluster: 16 Dual Athlon 1800MP+, Fast Ethernet Ethernet

4848

Open TS: an advanced tool for parallel and distributed Open TS: an advanced tool for parallel and distributed computing. computing.

EP resultsEP resultsEP resultsEP results For all For all size size [28,32], depth [28,32], depth [6,12],[6,12],

MM=99,9% if Ncpu=1=99,9% if Ncpu=1

M M drops below 90% if NCPU>8 CPU for drops below 90% if NCPU>8 CPU for size=6size=6

On 32 CPUs EP_GS_ASYNC is the best On 32 CPUs EP_GS_ASYNC is the best with with M=88,2%M=88,2%, and , and depth=12, depth=12, size=32size=32