Upload
primo
View
20
Download
0
Embed Size (px)
DESCRIPTION
Panel 1: Hot topics and future directions in programming languages (PL) research. Vivek Sarkar, IBM Research May 9, 2007. My Background. Education B.Tech., IIT Kanpur, 1981 (Advisor: Keshav Nori) M.S., U Wisconsin-Madison, 1982 Ph.D., Stanford University, 1987 (Advisor: John Hennessy) - PowerPoint PPT Presentation
Citation preview
IBM Research: Programming Technologies
© 2007 IBM Corporation
Panel 1: Hot topics and future directions in programming languages (PL) research
Vivek Sarkar, IBM Research
May 9, 2007
© 2007 IBM Corporation2 PL Summer School, May 2007
My Background Education
B.Tech., IIT Kanpur, 1981 (Advisor: Keshav Nori)
M.S., U Wisconsin-Madison, 1982
Ph.D., Stanford University, 1987 (Advisor: John Hennessy)
Career at IBM
1987 - 1990, PTRAN (Manager: Fran Allen)
1991 - 1993, ASTI optimizer
1994 - 1996, Application Development Technology Institute
1997, MIT sabbatical
1998 - 2001, Jalapeno / Jikes RVM
2002 - present, PERCS (includes X10, Parallel Tools, Productivity)
Family
Married with two daughters, 18 and 15
Interests: Hiking, Theater, Horseback riding, Violin
© 2007 IBM Corporation3 PL Summer School, May 2007
PL Research Opportunities: Examples
Programming Models and Programming Language Design
X10 (contact: Vijay Saraswat)
Development Tools
SAFARI (contact: Robert Fuhrer)
Parallel Tools Platform (contact: Evelyn Duesterwald)
Compilers, Managed Runtimes, Static & Dynamic Optimization
Metronome (contact: David Bacon)
© 2007 IBM Corporation4 PL Summer School, May 2007
X10 Vision: Portable Productive Parallel Programming
X10 Places
Physical PEs
X10 language defines mapping from X10 objects & activities to X10 places
X10 Data Structures
X10 deployment defines mapping from virtual X10
places to physical processing elements
Homogeneous Multi -core
ClustersHeterogeneous
Accelerators
16B/cycle (2x)16B/cycle
BIC
FlexIO TM
MIC
Dual XDR TM
16B/cycle
EIB (up to 96B/cycle)
16B/cycle
64 -bit Power Architecture with VMX
PPE
SPE
LS
SXUSPU
SM F
PXUL1
PPU
16B/cycle
L232B/cycle
LS
SXUSPU
SM F
LS
SXUSPU
SM F
LS
SXUSPU
SM F
LS
SXUSPU
SM F
LS
SXUSPU
SM F
LS
SXUSPU
SM F
LS
SXUSPU
SM F
16B/cycle (2x)16B/cycle
BIC
FlexIO TM
MIC
Dual XDR TM
16B/cycle
EIB (up to 96B/cycle)
16B/cycle
64 -bit Power Architecture with VMX
PPE
SPE
LS
SXUSPU
SM F
LS
SXUSPU
SM F
LS
SXUSPU
LS
SXUSPU
SM F
PXUL1
PPU
16B/cycle
PXUL1
PPU
16B/cycle
L232B/cycle
LS
SXUSPU
SM F
LS
SXUSPU
SM F
LS
SXUSPU
LS
SXUSPU
SM F
LS
SXUSPU
SM F
LS
SXUSPU
SM F
LS
SXUSPU
LS
SXUSPU
SM F
LS
SXUSPU
SM F
LS
SXUSPU
SM F
LS
SXUSPU
LS
SXUSPU
SM F
LS
SXUSPU
SM F
LS
SXUSPU
SM F
LS
SXUSPU
LS
SXUSPU
SM F
LS
SXUSPU
SM F
LS
SXUSPU
SM F
LS
SXUSPU
LS
SXUSPU
SM F
LS
SXUSPU
SM F
LS
SXUSPU
SM F
LS
SXUSPU
LS
SXUSPU
SM F
LS
SXUSPU
SM F
LS
SXUSPU
SM F
LS
SXUSPU
LS
SXUSPU
SM F
. . .
L2 Cache
PEs ,L1 $
PEs ,L1 $
. . .
. . .. . .
L2 Cache
PEs ,L1 $
PEs ,L1 $
. . .
Memory
PEs,
SMP NodeP Es ,
. . . . . .
Memory
P Es ,
SMP NodeP Es ,
Interconnect
16B/cycle (2x)16B/cycle
BIC
FlexIO TM
MIC
Dual XDR TM
16B/cycle
EIB (up to 96B/cycle)
16B/cycle
64 -bit P ow e r A rc hite c ture w ith V MX
PPE
SPE
LS
SXUSPU
SMF
PXUL1
PPU
16B/cycle
L232B/cycle
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
SMF
16B/cycle (2x)16B/cycle
BIC
FlexIO TM
MIC
Dual XDR TM
16B/cycle
EIB (up to 96B/cycle)
16B/cycle
64 -bit P ow e r A rc hite c ture w ith V MX
PPE
SPE
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
LS
SXUSPU
SMF
PXUL1
PPU
16B/cycle
PXUL1
PPU
16B/cycle
L232B/cycle
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
LS
SXUSPU
SMF
. . .
L2 Cache
PEs ,L1 $
PEs ,L1 $
. . .
. . .. . .
L2 Cache
PEs ,L1 $
PEs ,L1 $
. . .
L2 Cache
PEs ,L1 $
PEs ,L1 $
. . .
L2 Cache
PEs ,L1 $PEs ,L1 $
PEs ,L1 $PEs ,L1 $
. . .
. . .. . .
L2 Cache
PEs ,L1 $
PEs ,L1 $
. . .
L2 Cache
PEs ,L1 $PEs ,L1 $
PEs ,L1 $PEs ,L1 $
. . .
Memory
P Es ,
SMP NodeP Es ,
. . . . . .
Memory
P Es ,
SMP NodeP Es ,
Interconnect
. . .
Memory
P Es ,
SMP NodeP Es ,
. . .
Memory
P Es ,P Es ,
SMP NodeP Es ,P Es ,
. . . . . .
Memory
P Es ,
SMP NodeP Es ,
. . .
Memory
P Es ,P Es ,
SMP NodeP Es ,P Es ,
Interconnect
© 2007 IBM Corporation5 PL Summer School, May 2007
Overview of X10 (x10.sf.net)
• Dynamic parallelism with a Partitioned Global Address Space• Places encapsulate binding of activities and globally addressable data• async (P) S --- run statement S asynchronously at place P• finish S --- execute statement S, and wait for descendant async’s to terminate • atomic S --- execute statement S atomically
• No place-remote accesses permitted in atomic section
Storage classes:
Activity-local
Place-local
Partitioned global
Immutable
Deadlock safety: any X10 program written with async, atomic, and finish can never deadlock
© 2007 IBM Corporation6 PL Summer School, May 2007
Single-Threaded JavainitTasks() { tasks = new ToTask[nRunsMC]; … }
public void runSerial() { results = new Vector(nRunsMC); // Now do the computation. PriceStock ps; for( int iRun=0; iRun < nRunsMC; iRun++ ) { ps = new PriceStock(); ps.setInitAllTasks(initAllTasks); ps.setTask(tasks[iRun]); ps.run(); results.addElement(ps.getResult()); }}
Single-Threaded JavainitTasks() { tasks = new ToTask[nRunsMC]; … }
public void runSerial() { results = new Vector(nRunsMC); // Now do the computation. PriceStock ps; for( int iRun=0; iRun < nRunsMC; iRun++ ) { ps = new PriceStock(); ps.setInitAllTasks(initAllTasks); ps.setTask(tasks[iRun]); ps.run(); results.addElement(ps.getResult()); }}
Java Grande Forum Example (Monte Carlo)
Source: http://www.epcc.ed.ac.uk/javagrande/javag.html - The Java Grande Forum Benchmark Suite
public void runThread() { results = new Vector(nRunsMC); Runnable thobjects[] = new Runnable [JGFMonteCarloBench.nthreads]; Thread th[] = new Thread [JGFMonteCarloBench.nthreads]; // Create (nthreads-1) to share work for(int i=1;i<JGFMonteCarloBench.nthreads;i++) { thobjects[i] = new AppDemoThread(i,nRunsMC); th[i] = new Thread(thobjects[i]); th[i].start(); } // Parent thread acts as thread 0 thobjects[0] = new AppDemoThread(0,nRunsMC); thobjects[0].run(); // Wait for child threads for(int i=1;i<JGFMonteCarloBench.nthreads;i++) { try { th[i].join();} catch (InterruptedException e) {} }}class AppDemoThread implements Runnable { ... // initialization code public void run() { PriceStock ps; int ilow, iupper, slice; slice = (nRunsMC+JGFMonteCarloBench.nthreads-1) / JGFMonteCarloBench.nthreads; ilow = id*slice; iupper = Math.min((id+1)*slice, nRunsMC); for( int iRun=ilow; iRun < iupper; iRun++ ) { ps = new PriceStock(); ps.setInitAllTasks(AppDemo.initAllTasks); ps.setTask(AppDemo.tasks[iRun]); ps.run(); AppDemo.results.addElement(ps.getResult()); } } // run()}
Multi-Threaded JavaDistributed Multi-Threaded X10
initTasks() { tasks = new ToTask[dist.block([0:nRunsMC-1])]; … }
public void runDistributed() { results = new x10Vector(nRunsMC); // Now do the computation finish ateach ( point[iRun] : tasks.distribution ) { PriceStock ps = new PriceStock(); ps.setInitAllTasks((ToInitAllTasks) initAllTasks); ps.setTask(tasks[iRun]); ps.run(); final ToResult r = ps.getResult(); // ToResult is a value type async(results) atomic results.v.addElement(r); }}
© 2007 IBM Corporation7 PL Summer School, May 2007
SAFARI Vision: Meta-Tooling for Language-Specific IDEs
Problem Lack of tool support can be a significant barrier in adoption of new
languages
SAFARI Solution: Meta-tools and framework Language generation tools (scanner/parser generator, high quality
automatic ASTs) Generation of Eclipse toolkit components
• Encapsulate Eclipse API knowledge• Encapsulate common language structure, semantics, processing idioms
Leverage language inheritance structure/semantics implementation
People P. Charles, J. Dolby, R. Fuhrer, S. Sutton, M. Vaziri
Lead: Robert Fuhrer
© 2007 IBM Corporation8 PL Summer School, May 2007
SAFARI Target IDE Functionality
New Project/Type/… creation wizards
launch & debug: launch configs, breakpoints, backtraces, values, evaluation
syntax highlighting, compiler annotations,hover help, source folding, formatting…
structural views
navigation (hyperlinks, “Open Type”, …)
content assist, quick fixes
compiler w/ incremental build, automatic dependency tracking
analysis & refactoring
© 2007 IBM Corporation9 PL Summer School, May 2007
Example of SAFARI Challenges: Error Handling Errors are the norm! must not cripple the IDE!
SAFARI/LPG: systematic, semi-automatic error recovery for parsing/creating “prosthetic” AST nodes
Polyglot: ideas for finer-grained dependencies, better robustness, make data dependencies more explicit
void A() { int x= 5; foo blah; for(int i=0; i < a.length; i++) { int y= a[i] * a[j]; x += y; }}
A()
body
int x= 5; BadStmt for
int i=0;
body
i < a.length i++
header
…dangling ref
mangled statement
© 2007 IBM Corporation10 PL Summer School, May 2007
Parallel Tools Platform Vision: Integrated Workbench for High-Productivity Parallel Programming
Remote interface from Eclipse
Workbench to HPC system
Parallel ToolsPlatform (PTP)
Open HPC Workbench(Runs on Windows, Linux, Mac OS, …)
HPC System
PERCS workbench enhancements: MPI tools, OpenMP tools, Remote System Exploration, Performance Exploration, Runtime Error Detection, Team Platform, Productivity measurements
Eclipse PTP is the integration hub for all
PERCS tools
Network
Network Adapter
CS
M, R
SC
T
LL
User Space
Kernel Space
LAPI
IBM’ s MPI
PESSL
VSD/NSD
GPFS SOCKETS
TCP UDP
IP
APPLICATION
ES
SL
IF_LSHAL
DDHYP
HMC
Op
erat
ing
Sys
tem
CPO
HPC Toolkit
Compilers SMT Exploitation
COE
Cache injectionDB
Static Analysis Tools
SHMEM
UPC X10
ILM Meiosys
= New additions through PERCS to the HPC SW architecture
© 2007 IBM Corporation11 PL Summer School, May 2007
Contact: Evelyn Duesterwald, Yuan Zhang
Verify barrier synchronization in C/MPI programs
Synchronization errors lead to deadlocks and stalls.
Programmers may have to spend hours trying to find the source of a deadlock
Static verification tools help to eliminate errors before the program is executed
PTP Example: MPI Barrier Verification Tool
Action to run Barrier Verifier
© 2007 IBM Corporation12 PL Summer School, May 2007
MPI Barrier Verification Tool (contd.) MPI does not place any constraints
on the placement of barriers Programmer has to ensure that the
number of barriers along concurrent paths is the same
Synchronization errors in MPI are a common and difficult to find problem
MPI Barrier Verification: Verify that the number of barriers along concurrent paths is the same
- Match barriers that synchronize
- For unmatched barriers, report a synchronization error with a counter example that illustrates the error
rank > 2
P(k)
…i = F(0)
i = rank
i > 0
MPI_Comm_rank(com, &rank)
MPI_Barrier(com)
potential deadlock
not a deadlock
… MPI_Barrier(com)
© 2007 IBM Corporation13 PL Summer School, May 2007
Metronome Vision:Transparent Real-time Java
C++ Application
C++ Runtime System
Java Application
Java Runtime System
(JVM)
Garbage
Collection
Java Application
MetronomeJava Runtime System
Manual, Unsafe
Predictable
Automatic, Safe
Unpredictable
Automatic, Safe
Predictable
www.research.ibm.com/metronome
© 2007 IBM Corporation14 PL Summer School, May 2007
Garbage Collection Pause Times(Customer application)
Worst-case 1.7 ms Average 260 us
Garbage collection is fundamental to Java’s value proposition Safety, reliability, programmer productivity But also causes the most non-determinism (100 ms – 10 s latencies) RTSJ standard does not support use of garbage collection for real-time
Metronome is our hard real-time garbage collector Worst-case 2 ms latencies; high throughput and utilization
• Research under way to further reduce real-time guarantee from ms to us 100x better than competitors’ best garbage collection technology
Application Collector
Time
Spa
ce
a = allocatio
n rate
c = collection rate
Resulting Schedule
Base Application Memory
Real-time Garbage Collection
© 2007 IBM Corporation15 PL Summer School, May 2007
Space Time
a*(∆GC) = Per-GC Allocate Rate
m = Live Data
RT = Trace Rate50 MB/s
RS = Sweep Rate300 MB/s
Application (Mutator)
Collector
Scheduler
u = utilization
s = used space
∆t = time resolution
30 MB
50 MB/s
5 ms
45 MB
50%
100 MB
75%
© 2007 IBM Corporation16 PL Summer School, May 2007
PL Research Opportunities
Programming Models and Programming Language Design
Drivers: Concurrency, Accelerators, Data Access, Web Services, DSLs, …
Development Tools
Drivers: Program Analysis for Software Quality, Debugging Tools, Performance Tools, Refactorings, Language-Sensitive IDE’s, …
Compilers, Managed Runtimes, Static & Dynamic Optimization
Drivers: Hardware roadmap, PL trends, Virtualization, Embedded systems, Real-time systems, …
© 2007 IBM Corporation17 PL Summer School, May 2007
Additional Information
X10, http://x10.sf.net
SAFARI, http://domino.research.ibm.com/comm/research_projects.nsf/pages/safari.index.html
Parallel Tools platform, http://eclipse.org/ptp
Metronome, http://www.research.ibm.com/metronome/
IBM Research
“Innovating at IBM” video, http://www.research.ibm.com/about/career.shtml
“Valuing diversity: an ongoing commitment”, http://www.ibm.com/employment/us/diverse