31
1 PPoPP’06: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming New York, March 29-31, 2006 Conference Review Presented by: Utku Aydonat

Conference Review Presented by: Utku Aydonat

  • Upload
    onaona

  • View
    47

  • Download
    0

Embed Size (px)

DESCRIPTION

PPoPP’06: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming New York, March 29-31, 2006. Conference Review Presented by: Utku Aydonat. Outline. Conference overview Brief summaries of sessions Keynote speeches & Panel Best paper. Conference Overview. - PowerPoint PPT Presentation

Citation preview

Page 1: Conference Review Presented by: Utku Aydonat

1

PPoPP’06:ACM SIGPLAN Symposium on

Principles and Practice of Parallel Programming

New York, March 29-31, 2006

Conference Review

Presented by: Utku Aydonat

Page 2: Conference Review Presented by: Utku Aydonat

2 May 17, 2006CARG

Outline

Conference overview

Brief summaries of sessions

Keynote speeches & Panel

Best paper

Page 3: Conference Review Presented by: Utku Aydonat

3 May 17, 2006CARG

Conference Overview

History: 90, 91, 93, 95, 97, 99, 01, 03, 05, 06

Primary focus: anything related to parallel programming

• Algorithms

• Communication

• Languages

8 sessions, 26 papers

Dominating topics: multicores, parallelization techniques

Page 4: Conference Review Presented by: Utku Aydonat

4 May 17, 2006CARG

Conference Overview

PPoPP: Paper Acceptance Statistics

Year Submitted Accepted Rate

2006 91 25 27%

2005 87 27 31%

2003 45 20 44%

1999 79 17 22%

1997 86 26 30%

Page 5: Conference Review Presented by: Utku Aydonat

5 May 17, 2006CARG

Overview of Session

1. Communication

2. Languages

3. Performance Characterization

4. Shared Memory Parallelism

5. Atomicity Issues

6. Multicore Software

7. Transactional Memory

8. Potpourri

Page 6: Conference Review Presented by: Utku Aydonat

6 May 17, 2006CARG

Session 1: Communication

“Collective Communication of Architectures that Support Simultaneous Communication over Multiple Links”, E.Chan, R.van de Geijn (UTexas), W. Gropp, R.Thakur (Argonne National Lab.)

• Adopt MPI collective communication algorithms to supercomputer architectures that support simultaneous communication with multiple nodes.

• Theoritically latency can be reduced; practically it is not achievable due to the algorithms and overheads.

“Performance Evaluation of Adaptive MPI”, Chao Huang, Gengbin Zheng (UIUC), Sameer Kumar (IBM T. J. Watson), Laxmikant Kale (UIUC)

• Design and evaluate AMPI that supports processor virtualization

• Benefits: load balancing, adaptive overlapping, independence from the available number of processors, etc.

Page 7: Conference Review Presented by: Utku Aydonat

7 May 17, 2006CARG

Session 1: Communication

“Mobile MPI Programs on Computational Grids”, Rohit Fernandes, Keshav Pingali, Paul Stodghill (Cornell)

• Checkpointing system for C programs using MPI

• Able to take checkpoint on Alpha cluster and restart them on Windows

“RDMA Read Based Rendezvous Protocol for MPI over InfiniBand: Design Alternatives and Benefits”, Sayantan Sur, Hyun-Wook Jin, Lei Chai, Dhabaleswar K Panda (Ohio State)

• A rendezvous protocol in MPI using RDMA read.

• Increases communication / computation overlap.

Page 8: Conference Review Presented by: Utku Aydonat

8 May 17, 2006CARG

Session 2: Languages

“Global-View Abstractions for User-Defined Reductions and Scans”, Steve J. Deitz, David Callahan, Bradford L. Chamberlain (Cray), Lawrence Snyder (U. of Washington)

• Chapel programming language developed by Cray Inc. as a part of DARPA High-Productivity Computing Systems program

• Global view abstractions for user-defined reductions and scans

“Programming for Parallelism and Locality with Hierarchically Tiled Arrays”, Ganesh Bikshandi, Jia Guo, Daniel Hoeflinger (UIUC), Gheorghe Almasi (IBM T. J. Watson), Basilio B Fraguela (Universidade da Coruña), Maria Jesus Garzaran, David Padua (UIUC), Christoph von Praun (IBM T. J. Watson)

• Hierarchically Tiled Arrays (HTAs) that define tiling structure for arrays

• Reductions, mapping, scans, transpose, shift operations are defined.

Page 9: Conference Review Presented by: Utku Aydonat

9 May 17, 2006CARG

MinK in Chapel

var minimums: [1..10] integer;

minimums = mink(integer, 10) reduce A;

Called for each element

of A

Page 10: Conference Review Presented by: Utku Aydonat

10 May 17, 2006CARG

HTA

Page 11: Conference Review Presented by: Utku Aydonat

11 May 17, 2006CARG

Session 3: Performance Characterization

“Performance characterization of bio-molecular simulations using molecular dynamics”, Sadaf Alam, Pratul Agarwal, Al Geist, Jeffrey Vetter (ORNL)

• Investigated performance bottlenecks in MD applications on supercomputers

• Found out that the implementations of algorithms are not scalable

“On-line Automated Performance Diagnosis on Thousands of Processors”, Philip C. Roth (ORNL), Barton P. Miller (U. of Wisconsin, Madison)

• Distributed and scalable performance analysis tool

• Can analyze large application with 1024 processes and present the results in a folded graph.

Page 12: Conference Review Presented by: Utku Aydonat

12 May 17, 2006CARG

Session 3: Performance Characterization

“A Case Study in Top-Down Performance Estimation for a Large-Scale Parallel Application”, Ilya Sharapov, Robert Kroeger, Guy Delamarter (Sun Microsystems)

• Performance estimation of HPC workloads on future architectures

• Based on low-level analysis and scalability predictions.

• Predicts the performance of Gyrokinetic Toroidal Code executed on Sun’s future architectures

Page 13: Conference Review Presented by: Utku Aydonat

13 May 17, 2006CARG

Session 4: Shared Memory Parallelism

“Hardware Profile-guided Automatic Page Placement for ccNUMA Systems”, Jaydeep Marathe, Frank Mueller (North Carolina State U.)

• Profiles memory accesses and places pages accordingly.

• 20% performance improvement and 2.7% overhead.

“Adaptive Scheduling with Parallelism Feedback”, Kunal Agrawal, Yuxiong He, Wen Jing Hsu, Charles Leiserson (Mass. Inst. of Tech.)

• Allocates processors to jobs based on the past parallelism of the job.

• Uses R-trimmed mean for the feed-back.

Page 14: Conference Review Presented by: Utku Aydonat

14 May 17, 2006CARG

Session 4: Shared Memory Parallelism

“Predicting Bounds on Queuing Delay for Batch-scheduled Parallel Machines”, John Brevik, Daniel Nurmi, Rich Wolski (UCSB)

• Binomial Method Batch Predictor (BMBP) that bases its predictions on the past wait times.

• Uses 95th percentile and its predictions are close to real wait times experienced.

“Optimizing Irregular Shared-Memory Applications for Distributed-Memory Systems”, Ayon Basumallik, Rudolf Eigenmann (Purdue)

• Converts OpenMP applications to MPI based applications

• Uses inspection loop to find non-local access and reorder loops.

Page 15: Conference Review Presented by: Utku Aydonat

15 May 17, 2006CARG

OpenMP-to-MPI

Page 16: Conference Review Presented by: Utku Aydonat

16 May 17, 2006CARG

Session 5: Atomicity Issues

“Proving Correctness of Highly-Concurrent Linearizable Objects Viktor Vefeiadis (U. of Cambridge)”, Maurice Herlihy (Brown U.), Tony Hoare (Microsoft Research Cambridge), Marc Shapiro (INRIA Rocquencourt & LIP6)

• Proves the safety of concurrent objects using Rely-Guarantee method

• Each thread’s rely condition should be satisfied and each thread’s guarantee condition implies other’s rely condition for every operation.

“Accurate and Efficient Runtime Detection of Atomicity Errors in Concurrent Programs”, Liqiang Wang, Scott D. Stoller (SUNY at Stony Brook)

• Instruments the program and obtain profiling of memory accesses

• Builds a tree of the conflicting accesses and applies some algorithms to prove conflict and view equivalency.

Page 17: Conference Review Presented by: Utku Aydonat

17 May 17, 2006CARG

Session 5: Atomicity Issues

“Scalable Synchronous Queues”, William N. Scherer III (U. of Rochester), Doug Lea (SUNY Oswego), Michael L. Scott (U. of Rochester)

• Best Paper

• Details are coming up.

Page 18: Conference Review Presented by: Utku Aydonat

18 May 17, 2006CARG

Session 6: Multicore Software

“POSH: A TLS Compiler that Exploits Program Structure”, Wei Liu, James Tuck, Luis Ceze, Wonsun Ahn (UIUC), Karin Strauss, Jose Renau (UCSC), Josep Torrellas (UIUC)

• TLS compiler that divides the program to tasks, prune the inefficient ones

• Uses profiling to detect tasks that may violate frequently.

“High-performance IPv6 Forwarding Algorithm for Multi-core and Multithreaded Network Processors”, Hu Xianghui (U. of Sci. and Tech. of China), Xinan Tang (Intel), Bei Hua (U. of Sci. and Tech. of China)

• New IPv6 forwarding algorithm optimized for Intel NPU features

• Achieves 10Gbps speed for large routing tables with up to 400K entries.

Page 19: Conference Review Presented by: Utku Aydonat

19 May 17, 2006CARG

Session 6: Multicore Software

“MAMA! A Memory Allocator for Multithreaded Architectures”, Simon Kahan, Petr Konecny (Cray Inc.)

• A memory allocator that aggregate requests to reduce the fragmentation

• Transforms contention to collaboration

• Experiments with micro-benchmarks proves that it works

Page 20: Conference Review Presented by: Utku Aydonat

20 May 17, 2006CARG

Session 7: Transactional Memory

“A High Performance Software Transactional Memory System For A Multi-Core Runtime”, Bratin Saha, Ali-Reza Adl-Tabatabai, Richard L. Hudson (Intel), Chi Cao Minh, Ben Hertzberg (Stanford)

• Maps each memory location to a unique lock and acquires all the relevant locks before committing a transaction

• Undo-logging, write-locking/read versioning, cache-line conflict detection

“Exploiting Distributed Version Concurrency in a Transactional Memory Cluster”, Kaloian Manassiev, Madalin Mihailescu, Cristiana Amza (UofT)

• Transactional Memory system on commodity clusters for generic C++ and SQL applications

• Diffs are applied by readers on demand and may violate writers.

Page 21: Conference Review Presented by: Utku Aydonat

21 May 17, 2006CARG

Session 7: Transactional Memory

“Hybrid Transactional Memory”, Sanjeev Kumar (Intel), Michael Chu (U. of Mich.), Christopher Hughes, Partha Kundu, Anthony Nguyen (Intel)

• Hardware and Software TM together

• Extends DSTM

• Conflict detection is based on loading and storing the state field of the object wrapper and the locator field.

Page 22: Conference Review Presented by: Utku Aydonat

22 May 17, 2006CARG

Session 8: Potpourri

“Fast and Transparent Recovery for Continuous Availability of Cluster-based Servers”, Rosalia Christodoulopoulou, Kaloian Manassiev (UofT), Angelos Bilas (U. of Crete), Cristiana Amza (UofT)

• Recovery from failure on virtual shared memory systems

• Based on page replication on backup nodes

• Fail-free overhead of 38% and recovery cost is below 600ms.

“Mimimizing Execution Time in MPI Programs on an Energy-Constrained, Power-Scalable Cluster”, Rob Springer1, David K. Lowenthal1, Barry Rountree (The U. of Georgia), Vincent W. Freeh (North Carolina State U.)

• Finds the best # of processors + gear combination that minimizes power and execution time.

• Found the optimum schedule in 50% of the programs by iterating 7% of search space.

Page 23: Conference Review Presented by: Utku Aydonat

23 May 17, 2006CARG

Session 8: Potpourri

“Teaching parallel computing to science faculty: best practices and common pitfalls”, David Joiner (Kean U.), Paul Gray (U. of Northern Iowa), Thomas Murphy (Contra Costa College), Charles Peck (Earlham College)

• Experience in teaching parallel programming in a community college

Page 24: Conference Review Presented by: Utku Aydonat

24 May 17, 2006CARG

Keynote Speeches & Panel

“Parallel Programming and Code Selection in Fortress”, Guy L. Steele Jr., Sun Fellow, Sun Microsystems Laboratories

“Parallel Programming in Modern Web Search Engines”, Raymie Stata, Chief Architect for Search & Marketplace, Yahoo!, Inc.

“Software Issues for Multicore Systems”, Moderator: James Larus, (Microsoft Research), Panelists: Saman Amarasinghe (MIT), Richard Brunner (AMD), Luddy Harrison (UIUC), David Kuck (Intel), Michael Scott (U. Rochester), Burton Smith (Microsoft), Kevin Stoodley (IBM)

Page 25: Conference Review Presented by: Utku Aydonat

25 May 17, 2006CARG

Guy L. Steele: “Parallel Programming and

Code Selection in Fortress” To do for Fortran what Java did for C

• Dynamic compilation

• Platform independence

• Security model including type checking

Research funded in part by the DARPA through their High Productivity Computing Systems program

Don't build the language—grow it

Make programming notation closer to math

Ease use of parallelism

Can a feature be provided by a library rather than in compiler?

Programmers (especially library writers) need not fear subroutines, functions, methods, and interfaces for performance reasons

Page 26: Conference Review Presented by: Utku Aydonat

26 May 17, 2006CARG

Guy L. Steele: “Parallel Programming and

Code Selection in Fortress” Type System: Objects and Traits

• Traits: like interfaces, but may contain code

Primitive types are first-class

• Booleans, integers, floats, characters are all objects

Transactional access to shared variables

Fortress “loops” are parallel by default

Programming language notation can become closer to mathematical notation

Page 27: Conference Review Presented by: Utku Aydonat

27 May 17, 2006CARG

Guy L. Steele: “Parallel Programming and

Code Selection in Fortress”

Page 28: Conference Review Presented by: Utku Aydonat

28 May 17, 2006CARG

Panel: Software Issues for Multicore Systems

Performance Conscious Languages

• Languages that increase programmer productivity while making it easier to optimize

New Compiler Opportunities

• New languages that take performance seriously

• Possible compiler support for using multicores for other than parallelism

• Security Enforcement

• Program Introspection

Meanwhile, vast majority of applications programmers have no idea about parallelism

More Dual-core mid-2006, Quad core in 2007 (AMD)

Software Architecture Challenges (debugging, profiling, making multi-threading easier, etc.

Page 29: Conference Review Presented by: Utku Aydonat

29 May 17, 2006CARG

Panel: Software Issues for Multicore Systems

Some Successes in Using Multi-Core (OS support, transactional memory, virtualization, efficient JVMs)

Parallel software systems must be much simpler, architecturally, than sequential ones if they have a chance of holding together

We will struggle before finally accepting that the cache abstraction does not scale

• Efficient point-to-point communication is required

• Most success will be achieved on nonstandard multicore platforms like graphics processors, network processors, signal processors, where there is less investment in caches.

We need new apps to drive the interest towards multicores

Where will the parallelism come from? (dataflow, reduce/map/scan, speculative parallelization, etc.)

Page 30: Conference Review Presented by: Utku Aydonat

30 May 17, 2006CARG

Panel: Software Issues for Multicore Systems

The explicit sacrifice of single-thread performance in favor of parallel performance

Most vulnerable communities

• Those who have not previously been exposed to or had a need for parallel systems, for example ..

• Typical client software, mobile devices

• Server transactions with significant internal complexity

• Those who chronically need to drive the maximum performance from their computer systems, for example ..

• High performance computing

• Gamers

Above 8 cores, we do not know if multi-cores will be useful or not

Page 31: Conference Review Presented by: Utku Aydonat

31 May 17, 2006CARG

Readings For Future CARG

“Optimizing Irregular Shared-Memory Applications for Distributed-Memory Systems”, Ayon Basumallik, Rudolf Eigenmann (Purdue)

“POSH: A TLS Compiler that Exploits Program Structure”, Wei Liu, James Tuck, Luis Ceze, Wonsun Ahn (UIUC), Karin Strauss, Jose Renau (UCSC), Josep Torrellas (UIUC)

“MAMA! A Memory Allocator for Multithreaded Architectures”, Simon Kahan, Petr Konecny (Cray Inc.)

“Hybrid Transactional Memory”, Sanjeev Kumar (Intel), Michael Chu (U. of Mich.), Christopher Hughes, Partha Kundu, Anthony Nguyen (Intel)