2
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2012; 24:443–444 Published online 18 October 2011 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/cpe.1852 Special Issue: Compilers for Parallel Computing (CPC 2010) This special issue of Concurrency and Computation: Practice and Experience contains selected papers from the 15th International Workshop on Compilers for Parallel Computing. Compilers for Parallel Computing (CPC) 2010 was held on 7–9 July 2010, at Vienna University of Technology, Austria. Compilers for Parallel Computing is a workshop held every 18 months as an opportunity for researchers in the area of parallel compilation to meet to present and discuss their latest results. CPC welcomes presentation of work that is still in progress as well as new and emerging topics: the workshop covers all areas of parallelism, from explicitly parallel instruction sets to multicores, heterogeneous multi-processor systems, and large clusters. Any aspect of programming and opti- mization for these systems is of interest, including parallel programming models, languages, and runtimes, user-directed and automatic parallelization of programs, static and dynamic optimization, backend code generation, performance modeling, analysis, and tuning, and architectural models and architectural support for parallelization. Since 1989, CPC workshops were held in Oxford, Paris, Vienna, Delft, Málaga, Aachen, Linköping, Aussois, Edinburgh, Amsterdam, Chiemsee, A Coruña, Lisbon, and Zürich. At CPC 2010, a total of 28 papers were presented by their respective authors. Extended ver- sions of five of these papers, as well as an article covering the invited talk given at the workshop, were selected for publication in this special issue. The selected papers cover a wide range of topics related to parallel programming: Design of parallel applications, adaptive optimization in various system configurations, and language and library support to ease parallel programming on VLIW and multicore architectures. Parallel application design characterization with quantitative metrics by Alexander van Amesfoort, Ana Varbanescu, and Henk Sips [1], advocates a systematic, quantitative approach to the construction of parallel software. For a high-level decomposition of a problem into tasks, platform-independent metrics are computed: Computational metrics describe local con- currency, arithmetic intensity, and memory footprint; communication metrics describe size, count, and direction of data transfers; synchronization metrics measure the number of local and global synchronizations and conflicts. These metrics allow a quantitative comparison of problem decompositions already at the design stage, allowing developers to choose a design appropriate for their performance needs. Compiler and runtime support for enabling reduction computations on heterogeneous systems by Vignesh Ravi, Wenjing Ma, David Chiu, and Gagan Agrawal [2] describes a single frame- work for automatically mapping generalized reduction computations onto a heterogeneous system consisting of a multi-core CPU and a GPU. Their system allows the expression of com- putational kernels as sequential C functions with some annotations, without further regard for parallelism or the underlying architecture. Using program analysis, their compiler generates code targeting a runtime system for heterogeneous CPU–GPU systems using dynamic work distribution. For this important class of computational problems, the performance improvements validate the approach with very little burden on the programmer. Optimized composition of performance-aware parallel components by Christoph Kessler and Welf Löwe [3] presents a framework for program components associated with code to estimate the component’s performance for a given input size and available resources. Various implemen- tations of a component implementing the same operation are deployed together, and a runtime dispatcher can dynamically select the variant that is expected to perform best on a given prob- lem. Dispatching is implemented using a static table computed at component deployment time and compressed using various techniques. Copyright © 2011 John Wiley & Sons, Ltd.

Special Issue: Compilers for Parallel Computing (CPC 2010)

Embed Size (px)

Citation preview

CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCEConcurrency Computat.: Pract. Exper. 2012; 24:443–444Published online 18 October 2011 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/cpe.1852

Special Issue: Compilers for Parallel Computing (CPC 2010)

This special issue of Concurrency and Computation: Practice and Experience contains selectedpapers from the 15th International Workshop on Compilers for Parallel Computing. Compilers forParallel Computing (CPC) 2010 was held on 7–9 July 2010, at Vienna University of Technology,Austria.

Compilers for Parallel Computing is a workshop held every 18 months as an opportunity forresearchers in the area of parallel compilation to meet to present and discuss their latest results.CPC welcomes presentation of work that is still in progress as well as new and emerging topics:the workshop covers all areas of parallelism, from explicitly parallel instruction sets to multicores,heterogeneous multi-processor systems, and large clusters. Any aspect of programming and opti-mization for these systems is of interest, including parallel programming models, languages, andruntimes, user-directed and automatic parallelization of programs, static and dynamic optimization,backend code generation, performance modeling, analysis, and tuning, and architectural models andarchitectural support for parallelization.

Since 1989, CPC workshops were held in Oxford, Paris, Vienna, Delft, Málaga, Aachen,Linköping, Aussois, Edinburgh, Amsterdam, Chiemsee, A Coruña, Lisbon, and Zürich.

At CPC 2010, a total of 28 papers were presented by their respective authors. Extended ver-sions of five of these papers, as well as an article covering the invited talk given at the workshop,were selected for publication in this special issue. The selected papers cover a wide range of topicsrelated to parallel programming: Design of parallel applications, adaptive optimization in varioussystem configurations, and language and library support to ease parallel programming on VLIWand multicore architectures.

� Parallel application design characterization with quantitative metrics by Alexander vanAmesfoort, Ana Varbanescu, and Henk Sips [1], advocates a systematic, quantitative approachto the construction of parallel software. For a high-level decomposition of a problem intotasks, platform-independent metrics are computed: Computational metrics describe local con-currency, arithmetic intensity, and memory footprint; communication metrics describe size,count, and direction of data transfers; synchronization metrics measure the number of localand global synchronizations and conflicts.

These metrics allow a quantitative comparison of problem decompositions already at thedesign stage, allowing developers to choose a design appropriate for their performance needs.� Compiler and runtime support for enabling reduction computations on heterogeneous systems

by Vignesh Ravi, Wenjing Ma, David Chiu, and Gagan Agrawal [2] describes a single frame-work for automatically mapping generalized reduction computations onto a heterogeneoussystem consisting of a multi-core CPU and a GPU. Their system allows the expression of com-putational kernels as sequential C functions with some annotations, without further regard forparallelism or the underlying architecture. Using program analysis, their compiler generatescode targeting a runtime system for heterogeneous CPU–GPU systems using dynamic workdistribution.

For this important class of computational problems, the performance improvements validatethe approach with very little burden on the programmer.� Optimized composition of performance-aware parallel components by Christoph Kessler and

Welf Löwe [3] presents a framework for program components associated with code to estimatethe component’s performance for a given input size and available resources. Various implemen-tations of a component implementing the same operation are deployed together, and a runtimedispatcher can dynamically select the variant that is expected to perform best on a given prob-lem. Dispatching is implemented using a static table computed at component deployment timeand compressed using various techniques.

Copyright © 2011 John Wiley & Sons, Ltd.

444 EDITORIAL

The experiments show that this approach to adaptive optimization produces good perfor-mance results without explicit manual parallelization.� Asynchronous adaptive optimisation for generic data-parallel array programming by Clemens

Grelck, Tom van Deurzen, Stephan Herhut, and Sven-Bodo Scholz [4] describes an aggres-sive adaptive optimizer for Single Assignment C (SAC), a data-parallel array programminglanguage. The SAC programming model encourages highly generic array programming, leav-ing the sizes and, in many cases, even the ranks (number of dimensions) of arrays unknownat compile time. This genericity results in many expensive dynamic checks at runtime, whichtheir new system handles using adaptive optimization. Frequently executed program parts arespecialized at runtime, using a full-fledged optimizing compiler running on a dedicated core.

Experimental results show good speedups with runtime specialization at minimal overhead.� Compiler supports for VLIW DSP processors with SIMD intrinsics by Chi-Bang Kuan and Jenq

Kuen Lee [5] considers the difficult problem of programming a processor with a distributedregister file. They propose a set of SIMD intrinsics and programming guidelines along witha novel register allocation scheme that allows generation of efficient code without requiringhand-written assembly code.

Using these intrinsics, the authors parallelized two DSP benchmark sets with impressiveresults. Additional speedups were obtained using their register allocator’s automatic datareplication.� An object-oriented BSP library for multicore programming by Albert-Jan Yzelman and Rob

Bisseling [6] covers the contents of Bisseling’s invited talk. It presents a library implementingBulk-Synchronous Programming (BSP) for shared-memory multicore systems. Good paral-lel speedups on several computational kernels demonstrate that the easy-to-use BSP model isapplicable not only to the distributed-memory systems for which it was developed but also tosmaller multicore machines.

ACKNOWLEDGEMENTS

We would like to take the opportunity to thank the authors of all papers and all other attendees for makingthe workshop a success, as well as the anonymous referees and John Wiley & Sons Ltd. for making thepublication of this special issue possible.

REFERENCES

1. van Amesfoort A, Varbanescu A, et al. Parallel application design characterization with quantitative metrics.Concurrency and Computation: Practice and Experience. [DOI: 10.1002/cpe.1882]

2. Ravi V, Ma W, et al. Compiler and runtime support for enabling reduction computations on heterogeneous systems.Concurrency and Computation: Practice and Experience. [DOI: 10.1002/cpe.1848]

3. Kessler C, Löwe W. Optimized composition of performance-aware parallel components. Concurrency and Compu-tation: Practice and Experience. [DOI: 10.1002/cpe.1844]

4. Grelck C, van Deurzen T, et al. Asynchronous adaptive optimisation for generic data-parallel array programming.Concurrency and Computation: Practice and Experience. [DOI: 10.1002/cpe.1842]

5. Kuan C-B, Lee JK. Compiler supports for VLIW DSP processors with SIMD intrinsics. Concurrency andComputation: Practice and Experience. [DOI: 10.1002/cpe.1845]

6. Yzelman A-J, Bisseling R. An object-oriented BSP library for multicore programming. Concurrency and Computa-tion: Practice and Experience. [DOI: 10.1002/cpe.1843]

ANDREAS KRALL

Institute of Computer LanguagesVienna University of Technology

Wien, Austria

GERGÖ BARANY

Institute of Computer LanguagesVienna University of Technology

Wien, Austria

Copyright © 2011 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2012; 24:443–444DOI: 10.1002/cpe