Enhancing the Role of Inlining in Effective Interprocedural Parallelization

Enhancing the Role of Inlining in

Effective Interprocedural Parallelization

Jichi Guo, Mike StilesQing Yi, Kleanthis Psarris

Problem• Inter-procedural parallelization

o Parallel after inlining• Gain more parallelizable loops• Lost of parallelized loops

o Inlining messes up caller / callee• Missed parallel opportunities

o Inlining increases code complexity

Goal• Keep the gain parallelizable loops• Prevent the lost parallelism• Discover the missed opportunities

Solution• Summarize the code using annotation

o Express the underlying information• Inline the annotation before parallelization

o Pass the summarized information to the compiler• Reverse-inline after parallelization

o Revert inlining side effectso Maintain equivalence

Outline• Innovations• Problems of parallel + inline strategy• Annotation language• Annotation-based inlining technique• Experiments• Summary


Problems of parallel + inlining

• Parallel + inliningo Conventional inlining with heuristics and pre-transformations

• Heuristics: code size• Transformations: linearization, forward substitution

o Intra-procedural loop parallelization• Fortran do-all loop

• Goalo Gain loops in caller

• Problemso Lost loops in caller / calleeo Missed loops in caller


• Lost of parallelizable loops in caller/calleeo Transformations that cause the lost

• Forward substitution• Linearization

• Forward substitution of non-linear subscriptso Create indirect array references

• Linearization of array dimensionso Mess up array shapes


• Forward substitution of non-linear subscriptso Create indirect array referencesX2(I) ⇒ T(IX(7) + I)Y2(I) ⇒ T(IX(8) + I)Z2(I) ⇒ T(IX(9) + I)


• Linearization of array dimensionso Mess up array shapesPP(i, j, k) ⇒ PP(i + j*4 + k*16)


• Missed parallelizable loops in callero Coding styles that cause the lost

• Opaque compositional subroutineso A calls B, B calls C, C calls D, …

• Array accesso When it is difficult to determine which part is killed

• Debugging and Error Checkingo Statement that breaks the dependency is never executed

• I/O statements• Indirect array references

o ID=IDX(I), X = A(ID)


• Opaque compositional subroutineso A calls B, B calls C, C calls D, …


• Array accesso Difficult to determine which part is killedCTR computed at runtime


• Debugging and Error Checkingo Statement that breaks the dependency is never executed

• I/O statements


• Indirect array referencesIN=>NODENODE=>IRELIREL=>RHSB


The annotation language

• Goalo Summarize informationo Avoid ambiguity


• Restricted grammar• Special operators• Writing annotations


• Restricted grammaro Do-all loop onlyo No goto


• Special operatorsy = operator(x1, x2, …, xn)Purpose: abstract relation

o Unknown operator• Relation is unknown

o Generic functionso Unique operator

• Relation is one-to-one, from X to Y


• Writing annotationso Eliminating adverse side effects

• Preserve caller and callee if inlining breaks the dependency o Summarize opaque subroutines

• Eliminate nested function callso Array access

• Specify exact range get read/modifiedo Debugging and error handling

• Aggressive strategy: ignore checking statementso Indirect array references

• Discover unique relation


• Summarize opaque subroutineso Eliminate nested function calls


• Array accesso Specify exact range get read/modified


• Debugging and error handlingo Aggressive strategy: ignore checking statements


• Indirect array referenceso Discover unique relation


Annotation-based inlining

• Goalo Pass annotated information to the compilero Eliminate inlining side effects

• Flowo Inline before parallelizationo Reverse-inlining after parallelizationo Verify and evaluate at last

• Implementationo POLARIS compiler for parallelizationo ROSE compiler for parsingo POET transformero PERFECT benchmark


• Workflowo Annotation inlining ⇒ Parallelization ⇒ Reverse-inlining


• Inlining annotationo Steps

• Annotation ⇒ source languageo Translating special operators

• Inlinining generated source languageo Avoiding linearization

o Translating special operators• Unknown: using uninitialized global arrays• Unique: using linear expression

o Avoiding linearization


• Inlining annotation


• Parallelize do-all loops


• Reverse inlining


• Reverse inlining is indispensibleo Inlinining is restored to function call

• Avoid lost of parallelism in caller / callee• Enable abstraction operators (unknown, unique)


• Verification and evaluationo Correctness, Efficiency, and Generality


Experiment• Purpose

o What does conventional lining bring to parallelization• Gain?• Lost?• Missed?

o How good is annotation-based inlining to avoid above issues• Design

o PERFECT benchmarks (except SPEC77)o Two machines

• 8 cores Intel Mac• 4 cores AMD Operon

o End compiler• GFortran 4.2.1• IFort 11.1

• Resulto Count of Loopso Performance

Experiment• Result: Loops

o Conventional inlining• Having loss

o Annotation-based inlining• No loss, more gain

Experiment• Result: Performance

o Average speeduplimited

o Annot-based inliningalways better

Summary• Inter-procedural parallelization• Summarize effects of conventional inlining

o Gaino Losto Missed

• Propose annotation-based inliningo Annotation summaryo Enhanced inlining strategyo Reverse inlining

Thanks!Questions?

Documents

Enhancing the Role of Inlining in Effective Interprocedural Parallelization