Upload
irving
View
27
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Enhancing the Role of Inlining in Effective Interprocedural Parallelization. Jichi Guo, Mike Stiles Qing Yi, Kleanthis Psarris. Problem. Inter-procedural parallelization Parallel after inlining Gain more parallelizable loops Lost of parallelized loops - PowerPoint PPT Presentation
Citation preview
Enhancing the Role of Inlining in
Effective Interprocedural Parallelization
Jichi Guo, Mike StilesQing Yi, Kleanthis Psarris
Problem• Inter-procedural parallelization
o Parallel after inlining• Gain more parallelizable loops• Lost of parallelized loops
o Inlining messes up caller / callee• Missed parallel opportunities
o Inlining increases code complexity
Goal• Keep the gain parallelizable loops• Prevent the lost parallelism• Discover the missed opportunities
Solution• Summarize the code using annotation
o Express the underlying information• Inline the annotation before parallelization
o Pass the summarized information to the compiler• Reverse-inline after parallelization
o Revert inlining side effectso Maintain equivalence
Outline• Innovations• Problems of parallel + inline strategy• Annotation language• Annotation-based inlining technique• Experiments• Summary
Outline• Innovations• Problems of parallel + inline strategy• Annotation language• Annotation-based inlining technique• Experiments• Summary
Problems of parallel + inlining
• Parallel + inliningo Conventional inlining with heuristics and pre-transformations
• Heuristics: code size• Transformations: linearization, forward substitution
o Intra-procedural loop parallelization• Fortran do-all loop
• Goalo Gain loops in caller
• Problemso Lost loops in caller / calleeo Missed loops in caller
Problems of parallel + inlining
• Lost of parallelizable loops in caller/calleeo Transformations that cause the lost
• Forward substitution• Linearization
• Forward substitution of non-linear subscriptso Create indirect array references
• Linearization of array dimensionso Mess up array shapes
Problems of parallel + inlining
• Forward substitution of non-linear subscriptso Create indirect array referencesX2(I) ⇒ T(IX(7) + I)Y2(I) ⇒ T(IX(8) + I)Z2(I) ⇒ T(IX(9) + I)
Problems of parallel + inlining
• Linearization of array dimensionso Mess up array shapesPP(i, j, k) ⇒ PP(i + j*4 + k*16)
Problems of parallel + inlining
• Missed parallelizable loops in callero Coding styles that cause the lost
• Opaque compositional subroutineso A calls B, B calls C, C calls D, …
• Array accesso When it is difficult to determine which part is killed
• Debugging and Error Checkingo Statement that breaks the dependency is never executed
• I/O statements• Indirect array references
o ID=IDX(I), X = A(ID)
Problems of parallel + inlining
• Opaque compositional subroutineso A calls B, B calls C, C calls D, …
Problems of parallel + inlining
• Array accesso Difficult to determine which part is killedCTR computed at runtime
Problems of parallel + inlining
• Debugging and Error Checkingo Statement that breaks the dependency is never executed
• I/O statements
Problems of parallel + inlining
• Indirect array referencesIN=>NODENODE=>IRELIREL=>RHSB
Outline• Innovations• Problems of parallel + inline strategy• Annotation language• Annotation-based inlining technique• Experiments• Summary
The annotation language
• Goalo Summarize informationo Avoid ambiguity
The annotation language
• Restricted grammar• Special operators• Writing annotations
The annotation language
• Restricted grammaro Do-all loop onlyo No goto
The annotation language
• Special operatorsy = operator(x1, x2, …, xn)Purpose: abstract relation
o Unknown operator• Relation is unknown
o Generic functionso Unique operator
• Relation is one-to-one, from X to Y
The annotation language
• Writing annotationso Eliminating adverse side effects
• Preserve caller and callee if inlining breaks the dependency o Summarize opaque subroutines
• Eliminate nested function callso Array access
• Specify exact range get read/modifiedo Debugging and error handling
• Aggressive strategy: ignore checking statementso Indirect array references
• Discover unique relation
The annotation language
• Summarize opaque subroutineso Eliminate nested function calls
The annotation language
• Array accesso Specify exact range get read/modified
The annotation language
• Debugging and error handlingo Aggressive strategy: ignore checking statements
The annotation language
• Indirect array referenceso Discover unique relation
Outline• Innovations• Problems of parallel + inline strategy• Annotation language• Annotation-based inlining technique• Experiments• Summary
Annotation-based inlining
• Goalo Pass annotated information to the compilero Eliminate inlining side effects
• Flowo Inline before parallelizationo Reverse-inlining after parallelizationo Verify and evaluate at last
• Implementationo POLARIS compiler for parallelizationo ROSE compiler for parsingo POET transformero PERFECT benchmark
Annotation-based inlining
• Workflowo Annotation inlining ⇒ Parallelization ⇒ Reverse-inlining
Annotation-based inlining
• Inlining annotationo Steps
• Annotation ⇒ source languageo Translating special operators
• Inlinining generated source languageo Avoiding linearization
o Translating special operators• Unknown: using uninitialized global arrays• Unique: using linear expression
o Avoiding linearization
Annotation-based inlining
• Inlining annotation
Annotation-based inlining
• Parallelize do-all loops
Annotation-based inlining
• Reverse inlining
Annotation-based inlining
• Reverse inlining is indispensibleo Inlinining is restored to function call
• Avoid lost of parallelism in caller / callee• Enable abstraction operators (unknown, unique)
Annotation-based inlining
• Verification and evaluationo Correctness, Efficiency, and Generality
Outline• Innovations• Problems of parallel + inline strategy• Annotation language• Annotation-based inlining technique• Experiments• Summary
Experiment• Purpose
o What does conventional lining bring to parallelization• Gain?• Lost?• Missed?
o How good is annotation-based inlining to avoid above issues• Design
o PERFECT benchmarks (except SPEC77)o Two machines
• 8 cores Intel Mac• 4 cores AMD Operon
o End compiler• GFortran 4.2.1• IFort 11.1
• Resulto Count of Loopso Performance
Experiment• Result: Loops
o Conventional inlining• Having loss
o Annotation-based inlining• No loss, more gain
Experiment• Result: Performance
o Average speeduplimited
o Annot-based inliningalways better
Summary• Inter-procedural parallelization• Summarize effects of conventional inlining
o Gaino Losto Missed
• Propose annotation-based inliningo Annotation summaryo Enhanced inlining strategyo Reverse inlining
Thanks!Questions?