Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Farms,Pipes,StreamsandReforestationType-DirectedParallelisation
DavidCastro,KevinHammond andSusmit SarkarUniversityofStAndrews,UK
IFIPWorkingGroup2.11Meeting, Bloomington, Indiana,23/8/16
RePhrase Project:RefactoringParallelHeterogeneousSoftware– aSoftwareEngineeringApproach(ICT-644235),2015-2018,€3.6Mbudget
8Partners,6EuropeancountriesUK,Spain, Italy,Austria,Hungary,Israel
CoordinatedbyKevinHammond, StAndrews
0
ParaFormance Project: ParallelPatterns forHeterogeneousMulticoreSystems(ICT-288570),2015-2018,£537Kbudget
Goal:FormationofHigh-Growth Company ofScaleby2023
Coordinated byKevinHammond, StAndrews
TheProblem
• Weneedtochoosethebestparallelabstractions• Algorithmicskeletons [Cole1989]implementpatterns
• Weneedaformalwaytoreasonaboutparallelstructure§ Correctness oftransformations§ Reasoningaboutperformance
4
f1f1
f1
f2f2
f2
f1f1
f1
f2
ExampleSkeleton:ParallelTaskFarms
§ TaskFarmsuseafixednumberofworkers§ Eachworkerappliesthesameoperation(f)§ f isappliedtoeachoftheinputsinastream.
5
f
f…
f
…,x10,x9,x8
x6
x7
x5
f x3
f x2
f x4
fx1,fx0,…
ExampleSkeleton:ParallelPipeline
§ Parallelpipelinescomposetwooperations(f andg)§ overtheelementsofaninputstream§ f and g areruninparallel
6
f g…x9x8
g(fx1)g(fx0)
fx5fx4
g(fx2)fx6
fx3x7
7
Example
ImageMerge
8
Imagemergingcomposestwooperations,merge andmark
Possibleimplementationsinclude:
ChoosinganImplementation
9
DecoratethefunctiontypewithIM(n,m)
where
Nowthetypesystemautomaticallyselects
Wecanguarantee thatthisisfunctionallyequivalent toimgMerge
Inferringparallelstructures
10
Wecanleaveholesinthetypes,e.g.
IM(n,m)=_||FARMm_
replaces_withthesimplestpossiblestructures
IM(n,m)=mincost(_||FARMm_)
replaces_withtheleastcoststructures.
Wecanchoosetheprovablyleastcostskeleton
11
BasicSemantics
SyntaxofSkeletons
12
funT liftsanatomicfunctiontoacollectiontypeTdc representsdivideandconquerovercollectionTfb introducesfeedback
SkeletonDenotational Semantics
13
Basesemantics,S(⍴ isaglobalenvironmentoffunctiondefns)
Liftedtoastreamingform,PovercollectiontypeT
Morphisms forDivide-and-Conquer
14
Catamorphism (fold)
Anamorphism (unfold)
Morphisms forStreams
15
Givenabifunctor,G,mapsovercollectionTare
Iteration
16
Easytodefineusingthefix-pointcombinator,Yf=f(Yf)
17
GeneralisingRecursionPatterns
Hylomorphisms
18
map,cata andana arejustspecialcasesofhylomorphisms
Hylomorphisms aregeneralrecursionpatterns
ForhyloF fg μF recursivecalltreeg howinputsaresplitf howresultsarecombined
Example:Quicksort
19
or
AlltheWorld’saHylomorphism!
20
21
StructureinTypes
IntroducingParallelPatterns
§ Thetypesystemusesastructure-equivalencerelationthatdescribeswhentwoprogramsareextensionallyequivalent.
§ Thetype-checkingalgorithmneedstodecidethesestructure-equivalences.
§ Thetype-checkingalgorithmalsoneedstounifystructures,modulothisstructure-equivalencerelation.
ProgramStructure Targetparallelstructure
Sequentialnormalisedstructure
Sequentialnormalisedstructure
?
SyntaxofStructuredTypes
23
Structure-AnnotatedTypeRules
24
Convertibility
25
Plussomeotherrulesderivedfromthehylomorphism laws.Weusethistoproduceaconfluentrewritingsystem.
ParallelismErasure
26
Rewriterulesderivedfromconvertibility
Repeatedtoproduceaconfluentrewritingsystem
Normalisation
27
Therewriterulesarederivedfrombasiclaws
Usedtodefineanormalisation procedure
Wecannowproveequivalenceoftwoparalleltermsby:i) erasingparallelismusingerase,ii) normalising usingnorm,andiii) testingforequivalence
…
28
Example
QuickSort Revisited
29
Startwithasequentialversion
Tocreateaparalleldivide-and-conquerversion,weneedtodecide
Thisiseasilydoneusingasimpleparallelismerasure
Inferrring MoreComplexParallelStructure
30
Nowconsideramorecomplexstructure
ParallelismerasureontheRHSgives
Normalisation oftheLHS(usingHYLO-SPLITetc)gives
Normalisation oftheRHSgives
Inferrring MoreComplexParallelStructure(2)
31
Weneedtounify thenormalised forms
and
Inferrring MoreComplexParallelStructure(3)
32
Substitutingbackgivesusthedesiredparallelform
Wecanthenuseequivalencetogivetheactualprogram
Costs
33
For1000listsof30,000,000elements
Predictedv.ActualSpeedup
ImageConvolutionof500imagesontitanic,a 2.4GHz24-core,AMDOpteron6176architecture,runningCentosLinux2.6.18-274.e15.Dashedlinesarepredictions.
34
7.3 Image convolution
Image convolution is widely used in image processing applications. The convo-lution algorithm is the composition of two functions read and process. Thefunction read reads an image from a file, and process processes the image.
tread, tproc : TimingIC, BIC : SkelTyIC = Pipe (Farm (Func {ti=tread})) (Farm (Func {ti=tproc}))BIC = bestInst titanic IC
imageConv1 : Par BIC FilePath ImgimageConv1 = skel [readI, processI]
imageConv2 : Auto titanic FilePath ImgimageConv2 = bestSkel [readI, processI] [tread, tproc]
1 2 4 6 8 10 12 14 16 18 20 22
12
4
6
8
10
12
14
16
18
20
22
n2
Workers
Speedup
Pipe (Farm 6 Func) (Farm n2
Func)
Farm n2
(Seq Func Func)
Farm n2
(Pipe Func Func)
(a) titanic
1 4 8 16 24 32 38 46 54 62
14
8
12
16
20
24
28
32
36
40
44
n2
Workers
Farm n2
(Seq Func Func)
Pipe (Farm 12 Func) (Farm n2
Func)
Farm n2
(Pipe Func Func)
(b) lovelace
Fig. 6. Di↵erent Parallel Structures for Image Convolution, 500 Images 1024 * 1024
8 Related Work
Work has also gone into boiling down skeletons into a small set (included in theskeletons we consider) which can express a variety of patterns [17].
Scaife et al [33] present the design and implementation of a parallelisingcompiler that automatically extracts parallelism for Standard ML. They exploitparallelism in the familiarmap and fold HOFs by using nested parallel skeletons.
Refactoring. Roughly, rewriting systems consist of a set of objects with somerelations on how to transform those objects. Rewriting rules for transformingdi↵erent parallel skeletons into other kinds of parallelism have been used forthe implementation of di↵erent refactoring techniques [1, 2, 11]. In this work weprovide a formalisation of rewriting rules for parallel skeletons that allow us toensure that no incorrect parallel structure is introduced.
1 2 4 6 8 10 12 14 16 18 20 22
12
4
6
8
10
12
14
16
18
n Workers
Speedup
Farm n Func
N = 1024N = 2048
(a) Matrix Multiplication
1 2 3 4 5 6 7 8 9 10 11 12
2
3
4
5
6
n2
Workers
Speedup
Farm n (Pipe Func Func)
(b) Image Merge
1 2 4 6 8 10 12 14 16 18 20 22
12
4
6
8
10
12
14
16
18
20
n2
Workers
Speedup
Pipe (Farm n
1
Func) (Farm n
2
Func)
n1
= 1n1
= 2n1
= 4n1
= 6n1
= 8
(c) Image Conv
Fig. 5. Speedups vs predictions (titanic).
35
Conclusion
Conclusions
• First-ever treatment ofparallelismthatreflects parallelstructure intypes
• Severaladvantages toexposingparallelstructure intypes• clearseparation between thestructureandthefunctionality• documentation ofhowaprogramwasparallelized• easytochangetheparallelstructure ofaprogramwithoutmodifyingthe
functionalbehaviour
• Reasoningaboutcostsofdifferent parallelstructure isverypowerful§ Automatically findsuitableparallelstructures§ Compile-time informationabouttherun-time behaviour§ Automatically rewriteprogramstominimizecosts
36
FutureWork
• Otherpatterns,e.g.stencilandbulksynchronousparallelism
• Moredetailedcostmodels(seee.g.Hammondetal,2016)
• DynamicAnalysisisalsopossible
• Allow(certainkindsof)sideeffectsintheworkers
• Implementback-ends.Runourstructuredprograms!
• Largercasestudies
37
PaperAvailableonRequest
38
ToappearinICFP2016
THANKYOU!
http://rephrase-ict.eu
@rephrase_eu
http://paraphrase-ict.eu
39