OpenMP to CUDA

Mapping OpenMP to the Stream Programming Model

Hu Ming Zhang Fangzhou Yue Kun

Objective 1. Study the mapping relationship of parallel mechanism in OpenMP to stream programming model (CUDA). 2. Point out the which part is suitable for translation. 3. Analyzing typical scientific applications

Outline OpenMP vs CUDA: Execution model OpenMP vs CUDA: Semantics OpenMP vs CUDA: Performace Analysis of Benchmarks

OpenMP vs CUDA Execution Model

OpenMP vs CUDA Semantic

Parallel Construct parallel

Worksharing Construct loop, sections, single

Master and Synchronization Construct critical, barrier, taskwait, atomic, flush, ordered

Data Environment shared, private, firstprivate, lastprivate, reduction, copyin, copyprivate

#include <omp.h>

main()

int x;

x = 0;

#pragma omp parallel shared(x)

#pragma omp critical

x = x + 1;

/* end of parallel section */

#pragma omp for ordered [clauses...] (loop region) #pragma omp ordered structured_block (endo of loop region)

Most of the directives and clauses can be mapped into the stream programs

OpenMP vs CUDA Performance

CUDA: lightweight hardware thread data-centric processing model simple control logic inefficient to handle branch

OpenMP: OS level thread thread-centric parallel processing model thread can be complicated

Map those constructs that have large parallelism and uniform processing among threads

OpenMP vs CUDA Performance

Not suitable: single, section. –-- they have small parallelism and different processing among threads master ---- parallelism is 1 barrier, taskwait ---- demand all threads grouped into one block lastprivate ---- processing is not uniform among threadc

OpenMP vs CUDA

To understand whether it is reasonable to translate OpenMP program to CUDA program, we should analyze the application’s pattern.

Conclusion 1. A majority of scientific applications

are suitable to be mapped to stream programming model.

2. The heterogeneous architecture using CPU and GPU will be more common.

Comments: 1.This paper’s work is mainly on

analysis.

2.We think more real applications should be considered, not just benchmark.

3.Automatically translate OpenMP program to CUDA program may be possible.

OpenMP to CUDA

Documents

Introduction to Scientific Programming using GPGPU and CUDA · Introduction to Scientific Programming using GPGPU and CUDA ... (NVIDIA CUDA Programming Guide) ... CUDA C OpenCL CUDA

Multigrid Method using OpenMP/MPI Hybrid Parallel ... CPU+GPU, CPU+Manycores (e.g. Intel MIC/Xeon Phi) • MPI+X: OpenMP, OpenACC, CUDA, OpenCL Fujitsu@SC12 2 Multigrid • Scalable

Exercises to support learning OpenMP · Exercises to support learning OpenMP* * The name “OpenMP” is the property of the OpenMP Architecture Review Board. Tim Mattson Intel Corp

An#Introduction#to#CUDA/OpenCL# …parlab.eecs.berkeley.edu/sites/all/parlab/files/CatanzaroIntroToG... · Mapping#CUDA#to#Nvidia#GPUs#! ... Introduction to CUDA! CUDA Programming

Exercises to support learning OpenMPbebop.cs.berkeley.edu/bootcamp2014/omp-exercises.pdf · Exercises to support learning OpenMP* * The name “OpenMP” is the property of the OpenMP

CUDA, OpenMPI, OpenMP Basics - University at Buffalo · Device code compiled into binary (cubin object) ... nodes in a cluster environment ... CUDA, OpenMPI, OpenMP Basics Created

Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines

CUDA, OpenMPI, OpenMP Basics - UB Computer Science and Engineering

Introduction to OpenMP - KFUPMhpc.kfupm.edu.sa/Documentation/OpenMP.pdf · Introduction to OpenMP • Introduction • OpenMP basics • OpenMP directives, clauses, and ... between

Introduction to OpenMP - homepages.math.uic.eduhomepages.math.uic.edu/~jan/mcs572/intro2openmp.pdf · Introduction to OpenMP 1 the OpenMP Application Program Interface programming

Introduction to OpenMP

JCudaMP: OpenMP/Java on CUDA

An Introduction to CUDA Programming and the NAG … · An Introduction to CUDA Programming and the NAG Numerical Routines for GPUs ... Some History –PC Games ... e.g. OpenMP parallel

Introduction to CUDA - TUMIntroduction to CUDA Oliver Meister November 7th 2012 Oliver Meister: Introduction to CUDA ... software-side: programming models for GPU computing: CUDA,

C OpenMP - cc.u-tokyo.ac.jp · C OpenMP 1. OpenMP OpenMP Architecture Review Board ARB

[height=3.0cm]img/teciplogoeng OpenMP and GPU …retis.sssup.it/~giorgio/slides/cbsd/Ruffaldi-gpu1.pdf · OpenMP and GPU Programming GPU Intro Emanuele Ru aldi ... CUDA scalar version

Tuned OpenMP to CUDA Translationeigenman/app/omp2gpu-upc2011.pdfWhy OpenMP? Advantages of OpenMP as a programming paradigm for GPGPUs. Loop-level parallelism of OpenMP is an ideal

Python for Development of OpenMP and CUDA Kernels for

OP2 MANY-CORE ARCHITECTURES - University of Oxfordpeople.maths.ox.ac.uk/gilesm/talks/AWE-Visit-27012012.pdf · Single Node CUDA Single Node OpenMP Cluster MPI Cluster MPI+CUDA Conventional

A “Hands-on” Introduction to OpenMP - Intel® Software · The OpenMP API for Multithreaded Programming OpenMP Tutorial 1 1 A “Hands-on” Introduction to OpenMP* Tim Mattson