Upload
gerard-heath
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Telescoping Languages
A Framework for Generating High-Performance Problem-Solving
Systems
Ken KennedyCenter for High Performance Software
Rice University
http://www.cs.rice.edu/~ken/Presentations/Telescope.pdf
Center for High Performance Software Research
Center for High Performance Software Research
Collaborators
Bradley Broom
Arun Chauhan
Keith Cooper
Jack Dongarra
Rob Fowler
Lennart Johnsson
Chuck Koelbel
Cheryl McCosh
John Mellor-Crummey
Linda Torczon
Center for High Performance Software Research
Philosophy
• Compiler Technology = Off-Line Processing—Goals: improved performance and language usability
– Making it practical to use the full power of the language
—Trade-off: preprocessing time versus execution time—Rule: performance of both compiler and application must
be acceptable to the end user
• Examples—Macro expansion
– PL/I interpretive macro facility– Fixed macros can be compiled
10x improvement with compilation—TransMeta “Code Morphing”
– Dynamic compilation of machine code
Center for High Performance Software Research
Making Languages Usable
It was our belief that if FORTRAN, during its first months, were to translate any reasonable “scientific” source program into an object program only half as fast as its hand-coded counterpart, then acceptance of our system would be in serious danger... I believe that had we failed to produce efficient programs, the widespread use of languages like FORTRAN would have been seriously delayed.
— John Backus
Center for High Performance Software Research
A Java Experiment
• Scientific Programming In Java—Goal: make it possible to use the full object-oriented power
for scientific applications– Many scientific implementations mimic Fortran style
• OwlPack Benchmark Suite—Three versions of LinPACK in Java
– Fortran style– Lite object-oriented style– Full polymorphism
No differences for type
• Experiment—Compare running times for different styles on same Java
VM—Evaluate potential for compiler optimization
Center for High Performance Software Research
Performance Results
0
5
10
15
20
25
30
35
Run Time
inSecs
dgefa dgesl dgedi
Fortran StyleLite OO StyleOO StyleOptimized OONative F90
Results Using JDK 1.2JIT on SUN Ultra 5
Center for High Performance Software Research
Programming Productivity
• Challenges—programming is hard—professional programmers are in short supply—high performance will continue to be important
• One Strategy: Make the End User a Programmer—professional programmers develop components—users integrate components using:
– problem-solving environments (PSEs) based on scripting languages (possibly graphical) examples: Visual Basic, Tcl/Tk, AVS, Khoros
• Compilation for High Performance—translate scripts and components to common intermediate
language—optimize the resulting program using interprocedural
methods
Center for High Performance Software Research
Script-Based Programming
Component Library
Component Library
User Library
User Library
ScriptScript
Center for High Performance Software Research
Script-Based Programming
Component Library
Component Library
User Library
User Library
ScriptScript
IntermediateCode
IntermediateCodeTranslatorTranslator
Center for High Performance Software Research
Script-Based Programming
Component Library
Component Library
User Library
User Library
ScriptScript
IntermediateCode
IntermediateCode
GlobalOptimizer
GlobalOptimizer
TranslatorTranslator
Center for High Performance Software Research
Code Generator
Code Generator
Script-Based Programming
Component Library
Component Library
User Library
User Library
ScriptScript
IntermediateCode
IntermediateCode
GlobalOptimizer
GlobalOptimizer
TranslatorTranslator
Center for High Performance Software Research
Code Generator
Code Generator
Script-Based Programming
Component Library
Component Library
User Library
User Library
ScriptScript
IntermediateCode
IntermediateCode
GlobalOptimizer
GlobalOptimizer
TranslatorTranslator
Problem: long compilation times, even for short scripts!
Center for High Performance Software Research
Code Generator
Code Generator
Script-Based Programming
Component Library
Component Library
User Library
User Library
ScriptScript
IntermediateCode
IntermediateCode
GlobalOptimizer
GlobalOptimizer
TranslatorTranslator
Problem: long compilation times, even for short scripts!Problem: expert knowledge on specialization lost
Center for High Performance Software Research
Telescoping Languages
L1 ClassLibrary
L1 ClassLibrary
Center for High Performance Software Research
Telescoping Languages
L1 ClassLibrary
L1 ClassLibrary
CompilerGenerator
CompilerGenerator
L1 CompilerL1 Compiler
Could run for hours
Center for High Performance Software Research
Telescoping Languages
L1 ClassLibrary
L1 ClassLibrary
ScriptScript
CompilerGenerator
CompilerGenerator
L1 CompilerL1 CompilerScriptTranslator
ScriptTranslator
OptimizedApplication
OptimizedApplication
VendorCompiler
VendorCompiler
Could run for hours
understandslibrary callsas primitives
Center for High Performance Software Research
Telescoping Languages: Advantages
• Compile times can be reasonable—More compilation time can be spent on libraries—Script compilations can be fast
– Components reused from scripts may be included in libraries
• High-level optimizations can be included—Based on specifications of the library designer
– Properties often cannot be determined by compilers– Properties may be hidden after low-level code
generation
• User retains substantive control over language performance—Mature code can be built into a library and incorporated
into language
• Reliability can be improved—Specialization by compilation framework, not user
Center for High Performance Software Research
Applications
• Matlab Compiler—Automatically generated from LAPACK or ScaLAPACK
– With help via annotations from the designer
• Generator for ARPACK—Library developer maintains code in Matlab—Currently recodes in Fortran by hand — could be
automated
• Flexible Data Distributions—Failing of HPF: inflexible distributions—Data distribution == collection of interfaces that meet
specs—Compiler applies standard transformations
• Generator for Grid Computations—GrADS: automatic generation of NetSolve
Center for High Performance Software Research
Application: Matlab for Signal Processing
• Automatically generated from LAPACK or ScaLAPACK —With help via annotations from the designer
• Special project: Signal Processing Applications written in Matlab—Users want simplicity and performance—Matlab currently gives them the first but not the second
– Codes rewritten in C for communications devices—Run signal processing procedures through the generator
– Many code modules reused
Center for High Performance Software Research
Application: POOMA
• Procedure library for computational hydrodynamics—Distributed data structures
– vectors, arrays, tensors—Coded in C++—Context optimizations coded into template expansion
mechanism– 20-line program compiles for over an hour on 32
processors—Enhanced reliability
• Telescoping languages—Generate POOMA from simpler libraries for Fortran and
Java
Center for High Performance Software Research
Requirements of Script Compilation
• Scripts must generate efficient programs—Comparable to those generated from standard
interprocedural methods—Avoid need to recode in standard language
• Script compile times should be proportional to length of script—Not a function of the complexity of the library—Principle of “least astonishment”
Center for High Performance Software Research
Telescoping Languages
ScriptScript L1 CompilerL1 CompilerScriptTranslator
ScriptTranslator
OptimizedApplication
OptimizedApplication
VendorCompiler
VendorCompiler
understandslibrary callsas primitives
Center for High Performance Software Research
Script Compilation Algorithm
• Propagate variable property information throughout the program—Use jump functions to propagate through calls to library
• Apply high-level transformations—Driven by information about properties—Ensure that process applies to expanded code
• Select and substitute specialized variants for library calls—At each call site, determine the best approximation to
parameter properties that is reflected by a specialized fragment in the code database– Use a method similar to “unification”
—Substitute fragment from database for call– This could contain a call to a lower-level library routine.
Center for High Performance Software Research
Telescoping Languages
L1 ClassLibrary
L1 ClassLibrary
CompilerGenerator
CompilerGenerator
L1 CompilerL1 Compiler
Could run for hours
Center for High Performance Software Research
Library Analysis and Preparation
• Discovery of Critical Properties and Propagator Construction
• Analysis of Transformation Specifications—Construction of a specification-driven translator for use in
compiling scripts
• Code Specialization for Different Sets of Parameter Properties
Center for High Performance Software Research
Library Analysis and Preparation
• Discovery of Critical Properties and Propagator Construction—Which properties of parameters affect optimization
– Examples: value, type, rank and size of matrix
Center for High Performance Software Research
Discovery of Critical Properties
• From specifications by the library designer—If the matrix is triangular, then…
• From examining the code itself—Look at a promising optimization point—Determine conditions under which we can make significant
optimizations—See if any of these conditions can be mapped back to
parameter properties
• From sample calling programs provided by the designer
call average(shift(A,-1), shift(A,+1))– Can save on memory accesses
Center for High Performance Software Research
Examining the Code
• Example from LAPACK
subroutine VMP(C, A, B, m, n, s) integer m,n,s; real A(n), B(n), C(m) i = 1 do j = 1, n C(i) = C(i) + A(j)*B(j) i = i + s enddoend VMP
Vectorizable if s != 0
Center for High Performance Software Research
Library Analysis and Preparation
• Discovery of Critical Properties and Propagator Construction—Which properties of parameters affect optimization
– Examples: value, type, rank and size of matrix—Construction of jump functions for the library calls
– With respect to critical properties
Center for High Performance Software Research
Library Analysis and Preparation
• Discovery of Critical Properties and Propagator Construction—Which properties of parameters affect optimization
– Examples: value, type, rank and size of matrix—Construction of jump functions for the library calls
– With respect to critical properties
• Analysis of Transformation Specifications—Construction of a specification-driven translator for use in
compiling scripts
Center for High Performance Software Research
High-level Identities
• Often library developer knows high-level identities—Difficult for the compiler to discern—Optimization should be performed on sequences of calls
rather than code remaining after expansion
• Example: Push and Pop—Designer Push(x) followed by y = Pop() becomes y = x
– Ignore possibility of overflow in Push
• Example: Trigonometric Functions—Sin and Cos used in same loop—both computed using
expensive calls to the trig library—Recognize that cos(x) and sin(x) can be computed by a
single call to sincos(x,s,c) in a little more than the time required for sin(x).
Center for High Performance Software Research
• Out of Core Arrays—Operations Get(I,J) and GetRow(I,Lo,N)
• Get in a loop Do I Do J … Get(I,J) Enddo Enddo
• When can we vectorize?—Turn into GetRow—Answer: if Get is not involved in a recurrence.
– How can we know?
Contextual Expansions
Center for High Performance Software Research
Contextual Expansions
• Out of Core Arrays—Operations Get(I,J) and GetRow(I,Lo,N)
• Get in a loop Do I Do J … Get(I,J) Enddo Enddo
• When can we vectorize?—Turn into GetRow—Answer: if Get is not involved in a recurrence.
– How can we know?
Vector versions of library routines can often be constructed
Center for High Performance Software Research
Library Analysis and Preparation
• Discovery of Critical Properties and Propagator Construction—Which properties of parameters affect optimization
– Examples: value, type, rank and size of matrix—Construction of jump functions for the library calls
– With respect to critical properties
• Analysis of Transformation Specifications—Construction of a specification-driven translator for use in
compiling scripts
• Code Specialization for Different Sets of Parameter Properties—For each set, assume and optimize to produce specialized
code
Center for High Performance Software Research
Code Selection Example
• Library compiler develops inlining tables
subroutine VMP(C, A, B, m, n, s) integer m,n,s; real A(n), B(n), C(m) i = 1 do j = 1, n C(i) = C(i) + A(j)*B(j) i = i + s enddoend VMP
case on s: ==0: C(1) = C(1) + sum(A(1:n)*B(1:n)) !=0: C(1:n:s) = C(1:n:s) + A(1:n)*B(1:n)default: call VMP(C,A,B,m,n,s)
Inlining Table:
vector
Center for High Performance Software Research
Application: Matlab for Signal Processing
• Signal processing users want simplicity, programming power, and performance—Currently over 500,000 Matlab licenses
• Matlab gives them simplicity and power but not performance—Codes prototyped in Matlab—Codes rewritten in C for communications devices
– Users would rather not do this
• Telescoping Languages:—Many signal processing code modules reused over and over—Run these procedures through the language generator
– Produce Matlab SP, a high-level domain-specific environment
Center for High Performance Software Research
Matlab SP: Preliminary Findings
• Optimizations That Pay Off—Vectorization
– Wins because of hand coded vector/matrix primitives—Elimination of common array subexpressions—Optimization of array allocation and reshape operations
• New Optimizations—Procedure vectorization
– Interchange call and loop after distribution—Procedure strength reduction
– Subdivide procedure in to variant and invariant components
– Use invariant component only once
Center for High Performance Software Research
Procedure Strength Reduction
• Procedure called in loopfor i = 1:N
x = f(c1,c2,i,c3)
end
• Becomesf(c1,c2, c3)
for i = 1:N
x = f(i)
end
• Further improvements possible—Use code differentiation to compute differences
– ADIFOR
Center for High Performance Software Research
Procedure Strength Reduction Performance
00.10.20.30.40.50.60.70.80.9
1
jmp1newcdcdsdhdctss olbf
OriginalOptimized
Center for High Performance Software Researchhttp://www.cs.rice.edu/~ken/Presentations/Telescope.pdf
Summary
• Optimization enables language power—Principle: encourage rather than discourage use of
powerful features– Good programming practice should be rewarded
• Programming support is challenging—Particularly with application and platform complexity on
the rise– Compounded by the shortage of IT professionals
• Strategy: make end users into application developers—Telescoping languages: Framework for generating high-
level problem-solving systems—Must produce high-quality code
– Avoid the need to recode by hand
Center for High Performance Software Research
Summary
• PITAC: Focus on long-term, high-risk research
• The scalable infrastructure should be a scalable problem-solver—Access to information is not enough—Linked computation is not enough
• Programming support is still relatively primitive—Application and platform complexity increasing—Compounded by the shortage of IT professionals
• Strategy: make end users into application developers—Professional programmers focus on components—End users build applications in scripting systems
• Telescoping languages:—Framework for generation of high-level problem-solving
systems
Software Support for High-Performance Problem
Solving
(With Application to Grid Programming)
Ken KennedyCenter for High Performance Software
Rice University
http://www.cs.rice.edu/~ken/Presentations/GridTelescope.pdfCenter for High Performance Software Research
Center for High Performance Software Research
Collaborators
Bradley Broom
Arun Chauhan
Keith Cooper
Jack Dongarra
Rob Fowler
Dennis Gannon
Lennart Johnsson
John Mellor-Crummey
John Reynders
Linda Torczon
Center for High Performance Software Research
Lessons from PITAC
• Findings—Research funding increasingly focused on short term—Universities weakened
– Impact on workforce—Industry cannot fill the gap
– Return on investment: 24 percent versus 66 percent
• Refocus Research on Long-Term, High-Risk Problems—Requires an expansion of the base
• Invest in Key Areas—Software—Scalable Information Infrastructure—High Performance Computing—Social, Economic, and Workforce Issues (Education)
Center for High Performance Software Research
Two IT Grand Challenges
• The Internet as Problem-Solving Engine—Challenge: How do we develop applications and manage
their execution?– Reliable performance under varying load– Accessibility to ordinary scientists and engineers
—GrADS Project
• Software Productivity—Challenge: How do we increase the nation’s productivity in
software development– Too much software to be written, too few developers– Application and platform complexity increasing
—Idea: make it possible for end users to be application developers
Center for High Performance Software Research
Grids are “Hot”
Computational
Data
Information
Access
Knowledge
DISCOM
SinRGAPGrid
TeraGrid
Center for High Performance Software Research
National Distributed Problem Solving
Center for High Performance Software Research
National Distributed Problem Solving
Center for High Performance Software Research
National Distributed Problem Solving
Supercomputer
Supercomputer
Center for High Performance Software Research
National Distributed Problem Solving
Supercomputer
Supercomputer
Database
Center for High Performance Software Research
National Distributed Problem Solving
Supercomputer
Supercomputer
Supercomputer
Supercomputer
Database
Center for High Performance Software Research
National Distributed Problem Solving
Database
Supercomputer
Supercomputer
Database
Supercomputer
Supercomputer
Center for High Performance Software Research
Today: Globus
• Developed by Ian Foster and Carl Kesselman—Grew from the I-Way (SC-95)
• Basic Services for distributed computing—Resource discovery and information services—User authentication and access control—Job initiation—Communication services (Nexus and MPI)
• Applications are programmed by hand—Many applications—User responsible for resource mapping and all
communication– Existing users acknowledge how hard this is
Center for High Performance Software Research
GrADSoft Architecture
• Goal: reliable performance on dynamically changing resources
Whole-ProgramCompiler
LibrariesBinder
Real-timePerformance
Monitor
PerformanceProblem
ResourceNegotiator
Scheduler
GridRuntimeSystem
SourceAppli-cation
Config-urableObject
Program
SoftwareComponents
Performance Feedback
Negotiation
Center for High Performance Software Research
GrADSoft Architecture
Execution Environment
Whole-ProgramCompiler
LibrariesBinder
Real-timePerformance
Monitor
PerformanceProblem
ResourceNegotiator
Scheduler
GridRuntimeSystem
SourceAppli-cation
Config-urableObject
Program
SoftwareComponents
Performance Feedback
Negotiation
Center for High Performance Software Research
GrADSoft Architecture
Execution Environment
Whole-ProgramCompiler
LibrariesBinder
Real-timePerformance
Monitor
PerformanceProblem
ResourceNegotiator
Scheduler
GridRuntimeSystem
SourceAppli-cation
Config-urableObject
Program
SoftwareComponents
Performance Feedback
Negotiation
Center for High Performance Software Research
GrADSoft Architecture
Program Preparation System
Whole-ProgramCompiler
LibrariesBinder
Real-timePerformance
Monitor
PerformanceProblem
ResourceNegotiator
Scheduler
GridRuntimeSystem
SourceAppli-cation
Config-urableObject
Program
SoftwareComponents
Performance Feedback
Negotiation
Center for High Performance Software Research
GrADSoft Architecture
Problem-SolvingEnvironments
Whole-ProgramCompiler
LibrariesBinder
Real-timePerformance
Monitor
PerformanceProblem
ResourceNegotiator
Scheduler
GridRuntimeSystem
SourceAppli-cation
Config-urableObject
Program
SoftwareComponents
Performance Feedback
Negotiation
Center for High Performance Software Research
Library Analysis and Preparation
• Discovery of Critical Properties and Propagator Construction—Which properties of parameters affect optimization
– Examples: value, type, rank and size of matrix—Construction of jump functions for the library calls
– With respect to critical properties
• Analysis of Transformation Specifications—Construction of a specification-driven translator for use in
compiling scripts
• Code Specialization for Different Sets of Parameter Properties—For each set, assume and optimize to produce specialized
code