View
227
Download
5
Embed Size (px)
Citation preview
Analuse Globalisée des Données d ’Imagerie Radiologique
Déploiement de workflows de
traitement d’images médicales sur
une grille de calcul
Tristan Glatard - Johan Montagnat - Xavier Pennec
CNRS I3S / INRIA Sophia-Antipolis
AGIR – Paristic’06 – LORIA – 23 novembre 2006 2www.aci-agir.org
Application
• Application: statistical comparison of medical imaging
algorithms
• Constrains/needs:
– Sharing algorithms from different institutes
– Sharing the data between the algorithms
– Computing power
• Solutions:
– Grid execution to share data and answer computing needs
– Service Oriented Architecture to share algorithms
– Workflow of services to describe the application
AGIR – Paristic’06 – LORIA – 23 novembre 2006 3www.aci-agir.org
Messages
Service Oriented Architectures (SOA)
• 3 basic roles:
• A service satisfies 3 properties:
(1) The interface of the service is platform-independent
(2) The service can be dynamically located and invoked
(3) A service does not call another service (loosely coupling)
AGIR – Paristic’06 – LORIA – 23 novembre 2006 4www.aci-agir.org
Service-based workflow
• Graph description:
– Input/output of the application
– Data dependencies between services inputs/outputs
– Iteration strategy between services inputs
– Data synchronization barriers
• Instantiation on data at execution time
Set 0 Set 1
I0
J0
I1
J1
I2
J2
Set 0 Set 1
I0
J0
I1
J1
I2
J2
One-to-oneOne-to-one All-to-allAll-to-all
Service 1
Service 2
Service 3
Service 4
Input 1 Input 2
Output 1
AGIR – Paristic’06 – LORIA – 23 novembre 2006 5www.aci-agir.org
• 3 kinds of parallelism can be exploited:
• Data and service parallelism are intrinsic in task graphs:
S2
S3
S1
D0, D
1
Parallelism in service workflows
S2
S3
S1
D0, D
1
S2
S3
S1
D0, D
1
S2
S3
D
1
D
0
D1
D0
Workflow parallelism Data parallelism Services parallelism
D1,S
1
D1,S
2D
1,S
3
D0,S
1
D0,S
2D
0,S
3
AGIR – Paristic’06 – LORIA – 23 novembre 2006 6www.aci-agir.org
Iteration strategies in a parallel WF
• One-to-one operators assume ordered data set
• No problem if:
– Data parallelism is not present (order is preserved)
– Service parallelism is not present
• One to one operator in a data+service parallel execution:
– Keep track of the data graph
– Two data segments are composed iif they have a common
ancestor
– Groups have to be defined between the workflow inputs
Set 0 Set 1
I0
J0
I1
J1
I2
J2
One-to-oneOne-to-one
AGIR – Paristic’06 – LORIA – 23 novembre 2006 7www.aci-agir.org
Data handling
• Data segments have to be stored within a tree:
• This data representation allows to:
– Retrieve results provenance
– Handle one-to-one iterations strategies if data segments are puzzled
Services representation Data representation
AGIR – Paristic’06 – LORIA – 23 novembre 2006 8www.aci-agir.org
Grid execution
workflowmanager
Input 0
Service B
Output 0
Input 0 Input 1
Service A
Output 0
Data 0
Img Ref 0
Img Ref 0Img Ref 0 Img Ref 0
Img Ref 0Data 1
Img Ref 1
Img Ref 1
Img Ref 1Img Ref 1
Img Ref 1
Img Ref 1Img Ref 1
Img Ref 1Data 2
Img Ref 2
Img Ref 2Img Ref 1Img Ref 2Img Ref 2
Img Ref 2
• The workflow manager is isolated from the grid:
• Prototyping on Grid’5000
• Production on EGEE
Grid resourcesGrid
interface
AGIR – Paristic’06 – LORIA – 23 novembre 2006 9www.aci-agir.org
Latency (s)
Grid5000-Grenoble (20 nodes) 0.48
Grid5000- Sophia (105 nodes) 8.25
EGEE-biomed VO (3000 nodes) 351.4
Performance analysis on EGEE
• Performance results are worse than expected:
– High latency: one measure comparing EGEE to clusters of G5K:
– Variable latencies among the jobs
• Model of the makespan of the application:
– The latency is modeled by a random variable (R)
on the services of the
critical path
on the data segments
AGIR – Paristic’06 – LORIA – 23 novembre 2006 10www.aci-agir.org
Impact of the variability of the latency
• The variability of the latency leads to a factor 2
performance drop
AGIR – Paristic’06 – LORIA – 23 novembre 2006 11www.aci-agir.org
Job Grouping Experiments
• Medical imaging application Sub-workflow– 6 services – 2 grouped pairs - 4 services – 3 grouped pairs
– 4 job submissions/input data set - 1 job submission/input data set
• Tested on 12, 66 and 126 input data sets
AGIR – Paristic’06 – LORIA – 23 novembre 2006 12www.aci-agir.org
The grouping rule
• Let A be a service of the workflow and {B0,...B
n} its children
• For grouping A and Bi0: no parallelism loss <=> (1) & (2)
– (1) Bi0 is an ancestor of every B
j
– (2) Every ancestor of Bi0 is an ancestor of A (or A itself)
• No parallelism loss => (1) & (2)– ¬(1) => parallelism between B
j and B
i0 is broken
– ¬(2) => parallelism between A and C is broken
• (1) & (2) => no parallelism loss
• This rule is recursively applied on the workflow graph
A
Bj
Bi
0
A CBi
0
AGIR – Paristic’06 – LORIA – 23 novembre 2006 13www.aci-agir.org
Performance results
• Speed-ups given by job grouping w.r.t classical
wrapping:
AGIR – Paristic’06 – LORIA – 23 novembre 2006 14www.aci-agir.org
Grouping jobs of the same service
• Optimization the tasks granularity
• Trade-off between parallelism and probability to face
high latencies
• Model and notations:
– Total CPU time of the task to execute: w
– Split into n jobs
– Random latency: R
– Makespan :
= max (R)+w/ni=1..n
increases w.r.t n decreases w.r.t n
AGIR – Paristic’06 – LORIA – 23 novembre 2006 15www.aci-agir.org
Uniform distribution of R (a,b)
• Two behaviors of the expectation of the makespan are
observed:
w>(b-a)
low variability of the latencyw<(b-a)
high variability of the latency
Mak
espa
n (s
)
Number of submitted jobs n
Tot
al e
xecu
tion
tim
e (s
)M
akes
pan
(s)
Number of submitted jobs n
AGIR – Paristic’06 – LORIA – 23 novembre 2006 16www.aci-agir.org
Impact on the performances
• Experiment: submission of a 2000s task on EGEE
with two partitioning strategies:
– Brute force strategy: n is maximal (n=30)
– Improved strategy: ň=min(EH(n))
• 30% drop in the number of submitted jobs with the
improved strategy
• Improved – brute force strategy (seconds):
n=0..30
AGIR – Paristic’06 – LORIA – 23 novembre 2006 17www.aci-agir.org
Conclusions
• Current work on the variability of the latency :
– Timeout optimization
– Dynamic estimation of the distribution of the latency on EGEE
• Implementation of MOTEUR available at:
http://www.i3s.unice.fr/~glatard