View
213
Download
0
Embed Size (px)
Citation preview
JLS - Abr00 1
João Luís SobralDepartamento de Informática
Universidade do MinhoBraga - Portugal
Scalable Object Oriented Parallel Scalable Object Oriented Parallel Programming (SCOOPP)Programming (SCOOPP)
JLS - Abr00 2
Object Oriented Concurrent Object Oriented Concurrent ProgrammingProgramming
• Concurrent and Parallel Programming– Concerned with efficiency
– Abstractions• Processes or/and threads
• Inter-processes communication (IPC) or/and synchronisation
• Object Oriented Programming– Concerned with development and maintenance costs
– Abstractions• Objects and classes
• Method calls
JLS - Abr00 3
• Parallel Programming + Object Oriented Programming– Efficiency + low development and maintenance costs
– Abstractions?• Processes or/and threads and Inter-processes communication?
• Objects and method calls
• Design approaches– Centred on processes
– Centred on objects• Object based
• Method based
Object Oriented Concurrent Object Oriented Concurrent ProgrammingProgramming
JLS - Abr00 4
• Process centred approach– Abstractions
• Processes + IPC + objects + method calls
– Drawbacks• Processes and IPC lack of OO features
• Static bind of objects to process
Object
Process
Thread
Processing node
Inter-processescommunication
Object Oriented Concurrent Object Oriented Concurrent ProgrammingProgramming
JLS - Abr00 5
• Object centred approach - Method based– Abstractions
• Objects + method calls + asynchronous method calls
– Based on intra-object concurrency– Drawbacks
• Low efficiency on distributed memory systems
• Object centred approach - Object based– Abstractions
• Active objects + method Calls + asynchronous method calls
– Based on inter-object concurrency– Drawbacks
• High implementation costs of active objects
Object Oriented Concurrent Object Oriented Concurrent ProgrammingProgramming
JLS - Abr00 6
SCOOPP overviewSCOOPP overview(SCalable Object Oriented Parallel Programming)
• Main Goals– Support scalable, portable and efficient // applications– Support already developed OO sequential code
• Main Features– Support both explicit and implicit parallelism– Parallelism extraction– Excess parallelism removal at run-time
(e.g., dynamic granularity control)– Hybrid compile and run-time system
JLS - Abr00 7
SCOOPP overviewSCOOPP overview
• Motivation– Parallelism grain-size has a strong impact on performance:
• Larger number of parallel tasks may help to scale the application and improve load distribution
• If parallel tasks are too fine, performance may degrade due to parallelism overheads
– Example (prime numbers sieve):
0,0
1,5
3,0
4,5
6,0
7,5
9,0
10,5
12,0
13,5
15,0
1 10 100 1000 10000Computation grain-size (filters)
Ex
ec
uti
on
tim
e (s
eco
nd
s)
10 Values per message50 Values per message100 Values per message500 Values per message1000 Values per message
a) 4 x 350 MHz Pentium II (in Cluster)
0
120
240
360
480
600
720
840
960
1080
1200
1 10 100 1000 10000Computation grain-size (filters)
Ex
ec
uti
on
tim
e (s
eco
nd
s)
10 Values per message50 Values per message100 Values per message500 Values per message1000 Values per message
b) 56 x 30 MHz T805 (in MultiCluster)
JLS - Abr00 8
• Based on parallel and sequential objects
• Parallel objects are specified at class level
• Sequential objects are placed in a parallel object context
• Inter-parallel object communication based on asynchronous or synchronous method invocation
• First class references to parallel objects
Sequential object
Parallel object
Object reference
Parallel object context
Parallel task
SCOOPP programming modelSCOOPP programming model
JLS - Abr00 9
• Transforms sequential objects into parallel objects
• Adds further parallelism to parallel code
• Performs inter-object dependency analysis to preserve the sequential code determinism
• Example
SCOOPP parallelism extractionSCOOPP parallelism extraction
Sequential object
Parallel objectObject reference
Parallel object context
Parallel task
Transformed object
ParallelismExtraction
JLS - Abr00 10
• Packing methodologies– How to adapt the grain-size?
– Which tasks to pack?
• Packing policies– When to adapt the grain-size?
– How many items to pack?
SCOOPP dynamic grain-size SCOOPP dynamic grain-size adaptationadaptation
JLS - Abr00 11
• How to adapt the grain-size?– Pack // objects:
• Associate a process to several // objects
• Serialise intra-process operations:– Intra-grain method calls performed directly– Object creation/deletion performed directly
– Pack method calls:• Include several method calls in a single message
– Example
SCOOPP dynamic grain-size SCOOPP dynamic grain-size adaptationadaptation
Parallel taskParallel object Object reference
1
23
Enlargedgrain size
JLS - Abr00 12
• Which tasks to pack/unpack?
– On // object creations• Pack a newly created // object into de source grain or create
a new remote grain
– On method calls• Generate a new message for method call or pack it together
with other method calls
– Based on run-time granularity information
SCOOPP dynamic grain-size SCOOPP dynamic grain-size adaptationadaptation
JLS - Abr00 13
• Granularity information
– Application independent• Latency of a remote “null-method” call
• Inter-node communication bandwidth
– Application dependent• Average overhead of method parameters passing
• Average method execution time
• Average method fan-out
SCOOPP dynamic grain-size SCOOPP dynamic grain-size adaptationadaptation
JLS - Abr00 14
• When to adapt the grain-size?• When the overhead of a remote method call is larger than
the average method execution time
• When the systems is “highly loaded”
• How many items to pack?– // objects
• Remote call overhead should be less than “remote work”
– Method calls• Message passing overhead should be less than “remote
work”
SCOOPP dynamic grain-size SCOOPP dynamic grain-size adaptationadaptation
JLS - Abr00 15
• Extension to C++
• Runs on Parix and PVM environments
• RTS:
Current prototype (ParC++)Current prototype (ParC++)
Processor 0 Processor 1
ImplementationObject 3
Server Object
b)
a)
a)
Call through IPC
C++ call
ImplementationObject 1
ImplementationObject 2
c)
d)
Proxy 1
C++ object creation
Object Manager Object Manager
c)
Proxy 2
c)
Inter-grains a) and intra-grain b) method call, RTS c) and d) direct object creation
JLS - Abr00 16
• Example: - prime number sieve
parallel class SieveFilter { SieveFilter *seg, int myVal; // next sieve and my prime numberpublic: SieveFilter(int prime) { myVal=prime; seg=0; } void Process(int num) {
if ((num%myVal)==0) /* nothing */ // myVal divides num else if (seg!=0) seg->Process(num); // may be prime number else seg = new SieveFilter(num); //is a prime
};
void openOn(void) { SieveFilter *firstSieve = new SieveFilter(3); // create first sieve for(int i=5; i<max; i+=2) firstSieve->Process(i); // send values}
Current prototype (ParC++)Current prototype (ParC++)
2 33
2
325 32
324
532 5
Parallel task (sieve filter)
Message flow (method calls)
JLS - Abr00 17
• Low level evaluation
– Object management overhead costs (Myrinet cluster of 4x PII, 350Mhz, in sec)
Evaluation results (on ParC++)Evaluation results (on ParC++)
Parallel OO SCOOPP Seq. OO
Active
(adj.node)
Active
(same node)Passive
Passive
(opt)C++
Object creation 1670 118 1.70 1.70 0.88
Synchronous method call 1548 103 0.69 0.18 0.06
Asynchronous method call 128 72 - - -
Object destruction 170 74 1.72 1.21 0.97
JLS - Abr00 18
• High level evaluation - prime number sieve
Evaluation results (on ParC++)Evaluation results (on ParC++)
0,0
1,5
3,0
4,5
6,0
7,5
9,0
10,5
12,0
13,5
15,0
1 10 100 1000 10000
Computation grain-size (filters)
Ex
ec
uti
on
tim
e (
se
con
ds
)
10 Values per message50 Values per message100 Values per message500 Values per message1000 Values per message
a) 4 x 350 MHz Pentium II (in Cluster)
0
9
18
27
36
45
54
63
72
81
90
1 10 100 1000 10000Computation grain-size (filters)
Ex
ec
uti
on
tim
e (
se
con
ds
)
10 Values per message50 Values per message100 Values per message500 Values per message1000 Values per message
c) 4 x 66 MHz PPC 601 (in PowerXplorer)
0
9
18
27
36
45
54
63
72
81
90
1 10 100 1000 10000
Computation grain-size (filters)
Ex
ec
uti
on
tim
e (s
eco
nd
s)
10 Values per message50 Values per message100 Values per message500 Values per message1000 Values per message
d) 16 x 66 MHz PPC 601 (in PowerXplorer)
0
120
240
360
480
600
720
840
960
1080
1200
1 10 100 1000 10000Computation grain-size (filters)
Exec
utio
n tim
e (s
econ
ds)
10 Values per message50 Values per message100 Values per message500 Values per message1000 Values per message
e) 14 x 30 MHz T805 (in MultiCluster)
0
120
240
360
480
600
720
840
960
1080
1200
1 10 100 1000 10000Computation grain-size (filters)
Ex
ec
uti
on
tim
e (s
econ
ds)
10 Values per message50 Values per message100 Values per message500 Values per message1000 Values per message
f) 56 x 30 MHz T805 (in MultiCluster)
0,0
1,5
3,0
4,5
6,0
7,5
9,0
10,5
12,0
13,5
15,0
1 10 100 1000 10000Computation grain-size (filters)
Ex
ec
uti
on
tim
e (s
eco
nd
s)
10 Values per message50 Values per message100 Values per message500 Values per message1000 Values per message
b) 7 x 350 MHz Pentium II (in Cluster)
SCOOPP
SCOOPP
SCOOPP
SCOOPP
SCOOPPSCOOPP
JLS - Abr00 19
• SCOOPP main benefits:
– Allows the expression of the full potential parallelism, following an OO approach, in a platform independent way
– Provides dynamic and efficient scalability across several target platforms, without any code modification
• Current and future work– Packing policies for other types of applications
– More case studies
Concluding remarksConcluding remarks