Click here to load reader

Enhancing the Role of Inlining in Effective Interprocedural Parallelization

  • View
    24

  • Download
    0

Embed Size (px)

DESCRIPTION

Enhancing the Role of Inlining in Effective Interprocedural Parallelization. Jichi Guo, Mike Stiles Qing Yi, Kleanthis Psarris. Problem. Inter-procedural parallelization Parallel after inlining Gain more parallelizable loops Lost of parallelized loops - PowerPoint PPT Presentation

Text of Enhancing the Role of Inlining in Effective Interprocedural Parallelization

PowerPoint Presentation

Enhancing the Role of Inlining in Effective Interprocedural ParallelizationJichi Guo, Mike StilesQing Yi, Kleanthis PsarrisProblemInter-procedural parallelizationParallel after inliningGain more parallelizable loopsLost of parallelized loopsInlining messes up caller / calleeMissed parallel opportunities Inlining increases code complexityLost: messed up caller / calleeMissed: inlining increases complexity2GoalKeep the gain parallelizable loopsPrevent the lost parallelismDiscover the missed opportunitiesSolutionSummarize the code using annotationExpress the underlying informationInline the annotation before parallelizationPass the summarized information to the compilerReverse-inline after parallelizationRevert inlining side effectsMaintain equivalenceKeywords: Annotation + inline + reverse4OutlineInnovationsProblems of parallel + inline strategyAnnotation languageAnnotation-based inlining techniqueExperimentsSummaryOutlineInnovationsProblems of parallel + inline strategyAnnotation languageAnnotation-based inlining techniqueExperimentsSummaryProblems of parallel + inliningParallel + inliningConventional inlining with heuristics and pre-transformationsHeuristics: code sizeTransformations: linearization, forward substitutionIntra-procedural loop parallelizationFortran do-all loopGoalGain loops in callerProblemsLost loops in caller / calleeMissed loops in callerProblems of parallel + inliningLost of parallelizable loops in caller/calleeTransformations that cause the lostForward substitutionLinearizationForward substitution of non-linear subscriptsCreate indirect array referencesLinearization of array dimensionsMess up array shapesProblems of parallel + inliningForward substitution of non-linear subscriptsCreate indirect array referencesX2(I) T(IX(7) + I)Y2(I) T(IX(8) + I)Z2(I) T(IX(9) + I)

Problems of parallel + inliningLinearization of array dimensionsMess up array shapesPP(i, j, k) PP(i + j*4 + k*16)

Problems of parallel + inliningMissed parallelizable loops in callerCoding styles that cause the lostOpaque compositional subroutinesA calls B, B calls C, C calls D, Array accessWhen it is difficult to determine which part is killedDebugging and Error CheckingStatement that breaks the dependency is never executedI/O statementsIndirect array referencesID=IDX(I), X = A(ID)Problems of parallel + inliningOpaque compositional subroutinesA calls B, B calls C, C calls D,

Problems of parallel + inliningArray accessDifficult to determine which part is killedCTR computed at runtime

Problems of parallel + inliningDebugging and Error CheckingStatement that breaks the dependency is never executedI/O statements

Problems of parallel + inliningIndirect array referencesIN=>NODENODE=>IRELIREL=>RHSB

OutlineInnovationsProblems of parallel + inline strategyAnnotation languageAnnotation-based inlining techniqueExperimentsSummaryThe annotation languageGoalSummarize informationAvoid ambiguityThe annotation languageRestricted grammarSpecial operatorsWriting annotationsThe annotation languageRestricted grammarDo-all loop onlyNo goto

The annotation languageSpecial operatorsy = operator(x1, x2, , xn)Purpose: abstract relationUnknown operatorRelation is unknownGeneric functionsUnique operatorRelation is one-to-one, from X to YThe annotation languageWriting annotationsEliminating adverse side effectsPreserve caller and callee if inlining breaks the dependency Summarize opaque subroutinesEliminate nested function callsArray accessSpecify exact range get read/modifiedDebugging and error handlingAggressive strategy: ignore checking statementsIndirect array referencesDiscover unique relationThe annotation languageSummarize opaque subroutinesEliminate nested function calls

The annotation languageArray accessSpecify exact range get read/modified

The annotation languageDebugging and error handlingAggressive strategy: ignore checking statements

The annotation languageIndirect array referencesDiscover unique relation

OutlineInnovationsProblems of parallel + inline strategyAnnotation languageAnnotation-based inlining techniqueExperimentsSummaryAnnotation-based inliningGoalPass annotated information to the compilerEliminate inlining side effectsFlowInline before parallelizationReverse-inlining after parallelizationVerify and evaluate at lastImplementationPOLARIS compiler for parallelizationROSE compiler for parsingPOET transformerPERFECT benchmarkAnnotation-based inliningWorkflowAnnotation inlining Parallelization Reverse-inlining

Annotation-based inliningInlining annotationStepsAnnotation source languageTranslating special operatorsInlinining generated source languageAvoiding linearizationTranslating special operatorsUnknown: using uninitialized global arraysUnique: using linear expressionAvoiding linearization

Annotation-based inliningInlining annotation

Annotation-based inliningParallelize do-all loops

Annotation-based inliningReverse inlining

Annotation-based inliningReverse inlining is indispensibleInlinining is restored to function callAvoid lost of parallelism in caller / calleeEnable abstraction operators (unknown, unique)Annotation-based inliningVerification and evaluationCorrectness, Efficiency, and Generality

OutlineInnovationsProblems of parallel + inline strategyAnnotation languageAnnotation-based inlining techniqueExperimentsSummaryExperimentPurposeWhat does conventional lining bring to parallelizationGain?Lost?Missed?How good is annotation-based inlining to avoid above issuesDesignPERFECT benchmarks (except SPEC77)Two machines8 cores Intel Mac4 cores AMD OperonEnd compilerGFortran 4.2.1IFort 11.1ResultCount of LoopsPerformanceExperimentResult: LoopsConventional inliningHaving lossAnnotation-based inliningNo loss, more gain

ExperimentResult: PerformanceAverage speeduplimitedAnnot-based inliningalways better

SummaryInter-procedural parallelizationSummarize effects of conventional inliningGainLostMissedPropose annotation-based inliningAnnotation summaryEnhanced inlining strategyReverse inliningThanks!Questions?

Search related