View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Outline
•IA-64™ Features
•Organization and infrastructure
•Components and technology
•Where we are going
•Opportunities for cooperation
IA-64 Features
• It is all about parallelism– at the process/thread level for
programmer– at the instruction level for compiler
•Explicit parallel instruction semantics
•Predication and Control/Data Speculation
•Massive Resources (registers, memory)
•Register stack and its engine
•Software pipelining support
•Memory hierarchy management support
Intermediate Representation
IR is called WHIRL
•Common interface between components
•Multiple languages and multiple targets
•Same IR, 5 levels of representation
•Continuous lowering as compilation progresses
•Optimization strategy tied to level
Components
•Front ends
•Interprocedural analysis and optimization
•Loop nest optimization and parallelization
•Global optimization
•Code generation
IPA
Two stage implementation– Local: gather local information at
end of front end process– Main: analysis and optimization
IPA Main Stage
Two phases in main stageAnalysis:
PIC symbol analysisConstant global identificationScalar mod/refArray sectionCode layout for locality
Optimization:InliningIntrinsic function library inliningCloning for constants, localityDead function, variable eliminationConstant propagation
IPA Engineering
•User transparentAdditional command line option (-ipa)Object files (*.o) contain WHIRLIPA in ld invokes backend
•Integrated into compilerProvides information to loop nest
optimizer, global optimizer, and code generator
Not disabled by normal .o or DSO object
Can analyze DSO objects
Loop Nest Optimizer/Parallelizer
•All languages
•Loop level dependence analysis
•Uniprocessor loop level transformations
•OpenMP
•Automatic parallelization
Loop Level Transformations
Based on unified cost model
Heuristics integrated with software pipelining
•Fission•Fusion•Unroll and jam•Loop interchange•Peeling•Tiling•Vector data prefetching
Parallelization
•AutomaticArray privatizationDoacross parallelizationArray section analysis
•Directive basedOpenMPIntegrated with automatic methods
Global optimization
•Static Single Assignment is unifying technology
• Industrial strength SSA
• All traditional optimizations implemented – SSA-preserving transformations– Deals with aliasing and calls– Uniformly handles indirect loads/stores
•Benefits over bit vector techniques– More efficient: setup and use – More natural algorithms => robustness – Allows selective transformation
Code Generation
•Inner loopsIF conversionSoftware pipeliningRecurrence breakingPredication and rotating registers
•ElsewhereHyperblock formationFrequency based block reorderingGlobal code motionPeephole optimization
Technology
•Target description tables (targ_info)
•Feedback•Parallelization•Static Single Assignment•Software pipelining•Global code motion
Target description tables
Isolate machine attributes from compiler code
•Resources: functional units, busses•Literals: sizes, ranges, excluded bits
•Registers: classes, supported types•Instructions: opcodes, operands, attributes, scheduling, assembly, object code
•Scheduling: resources, latencies
Feedback
Used throughout the compiler•Instrumentation can be added at any stage
•Explicit instrumentation data incorporated where inserted
•Instrumentation data maintained and checked for consistency through program transformations
SSA Advantages
•Built-in use-def edges •Sparse representation of data flow
information •Sparse data flow propagation based
on SSA graph •Linear or near-linear algorithms •Every optimization is global •Transform one construct at a time,
customize to context
•Handle second order effects
SSA as IR for optimizer
•SSA constructed only once at set-up time •Use-def info inherently part of SSA •Use only optimization algorithms that preserve
SSA form: – Transformations do not invalidate SSA info – Full set of SSA-preserving algorithms
•No SSA construction overhead between phases: – Can arbitrarily repeat a phase for newly exposed
optimization opportunities
•Extended to uniformly handle indirect memory references
Software Pipelining
•Technology evolved from Cydra compilers
•Powerful preliminary loop processing•Effective minimization of loop
overhead code•Highly efficient backtracking for
scheduling • Integrated register allocation,
interface with CG • Integrated with LNO loop nest
transformations
Global Code Motion
•Moves instructions between basic blocks
•Purpose: balance resources, improve critical paths
•Uses program structure to guide motion
•Uses feedback or estimated frequency to prioritize motion
•No artificial barriers, no exclusively-optimized paths
Where are we going?
•Open source compiler suite
•Target description for IA-64
•Available via usual Linux distributions and www.oss.sgi.com
•Beta version in June
•MR version when Intel ships systems
•OpenMP for c/c++ (later)
•OpenMP extensions for NUMA (later)
Areas for collaboration
•Target descriptions for other ISAs– real or prototype
•Additional optimizations
•Generate information for performance analysis tools
•Extensions to OpenMP
•Surprise me