GLAF: A Visual Programming and Auto- Tuning Framework for...

Preview:

Citation preview

synergy.cs.vt.edu

GLAF: A Visual Programming and Auto-

Tuning Framework for Parallel Computing

Student: Konstantinos Krommydas

Collaborator: Dr. Ruchira Sasanka (Intel)

Advisor: Dr. Wu-chun Feng

synergy.cs.vt.edu

2

High-performance computing is crucial

in a broad range of scientific domains:

• Engineering, math, physics, biology, …

Parallel

programming

revolution has made

high-performance

computing

accessible to the

broader masses

Motivation

synergy.cs.vt.edu

Many different programming languages

Challenges

Many different computing platforms

Many different architectures

Many different optimization strategies

Domain experts should not (need to) know all these details

• but rather focus on their science

synergy.cs.vt.edu

Challenges

Domain experts need to collaborate with computer scientists

4

pseudocode

unoptimized/naïve

serial code

highly optimized

parallel code

Innovation

slow-down

Limited access to

parallel computing

• Communication overhead & errors

• Need to exchange domain-

specific/programming knowledge

synergy.cs.vt.edu

Contributions

• Realize a programming abstraction & development

framework for domain experts to provide a balance

between performance and programmability

– i.e., obtain fast performance from algorithms that have been

programmed easily

5

*GL

AF

auto-parallelizable, optimizable, tunable

intuitive, familiar, minimalistic syntax

data-visual and interactive

able to integrate with existing legacy code

Desired features

*Grid-based Language and Auto-tuning Framework

synergy.cs.vt.edu

Programming Using GLAF: a Simple Example

6

synergy.cs.vt.edu

7

Graphical User InterfaceComments

(in step,

statements,

grids, …)

Module

name

Function

name

Step number

(within function)

Click-based

interface

synergy.cs.vt.edu

8

Grid-Based Data Structures

• GLAF variables are based on the concept of grids:

• familiar abstraction (e.g., images, matrices,

spreadsheets)

• regular format that facilitates code generation,

optimizations, and parallelism detection

• Grid-based programming puts the focus on the

relation, rather than the implementation details

• Example:

• Scalar variable: 0D grid

• 1D array: one-dimensional grid

synergy.cs.vt.edu

9

Grid-Based Data Structures

Data typeDimension title

Grid name

synergy.cs.vt.edu

10

Grid-Based Data Structures

Different data

types across

a dimension

synergy.cs.vt.edu

11

Grid-Based Data Structures

synergy.cs.vt.edu

12

GLAF Infrastructure

Browser Web/Cloud

• GLAF Programming GUI

• Data visualization

• Fortran/C/JS/OpenCL code

generation

• Auto-parallelization

• Compilation/auto-tuning

script generation

• Data storage• Compilation

• Auto-tuning

• Execution

synergy.cs.vt.edu

13

Code Generation

synergy.cs.vt.edu

14

Code Generation

synergy.cs.vt.edu

15

Auto-TuningGenerates

platform-specific

binaries &

optimizationsSelects the

languages in

which to auto-

generate code Selects one or

more code “starting

points” for each

languageSelects

optimizations for

each combination

of language and

code “starting point”

synergy.cs.vt.edu

16

Auto-Tuning

synergy.cs.vt.edu

Multi-level

Auto-Tuning

Approach

17

Fortran C OpenCL

Serial GLAF-Parallel Compiler Parallel

GLAF

program

Data-layout

transformsLoop collapse

transforms

Loop interchange

transforms…

synergy.cs.vt.edu

18

Visualization

• Data visualization facilitates:

– understanding the algorithm being developed

– revealing bugs at an early stage

“Show Data”

synergy.cs.vt.edu

19

Visualization

• Data visualization facilitates:

– understanding the algorithm being developed

– revealing bugs at an early stage

“Colorize”

synergy.cs.vt.edu

20

Visualization

• Data visualization facilitates:

– understanding the algorithm being developed

– revealing bugs at an early stage

“Image Map”

synergy.cs.vt.edu

21

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

serial compiler-parallelized GLAF-parallelized(col(3)) GLAF-parallelized(col(1))

Speed-upoverserialfortran_C

PU

Implementa on

fortran_CPU

C_CPU_norestr

C_CPU_restr

fortran_XP

C_XP_norestr

C_XP_restr

Results: 3D finite difference algorithm

synergy.cs.vt.edu

22

0

5

10

15

20

25

serialSoA serialAoS compiler-parallelizedSoAcompiler-parallelizedAoS GLAF-parallelizedSoA GLAF-parallelizedAoS

Speed-upoverserialSoAfortran_CPU

Implementa on

fortran_CPU fortran_CPU_tmp C_CPU_norestr C_CPU_restr_tmp C_CPU_restr_tmp_powf

fortran_XP fortran_XP_tmp C_XP_norestr C_XP_restr_tmp C_XP_restr_tmp_powf

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Results: N-body algorithm

synergy.cs.vt.edu

Related Work

23

Implicit Explicit

Parallelization

Compilers:

gcc, Intel,

Cray, PGI,

Extensions/

libraries:

Pthreads, OpenMP,

OpenACC, Chapel

+ Includes loop-level parallelism & data-level

parallelism (vectorization), and potential

optimizations

- Requires certain programming knowledge

- (Often) conservative nature

Problem

Solving

Environments

Domain-specific:

computational biology,

physics,

dense linear algebra,

Fast-fourier transforms,

+ High-performance auto-tuning

- Restrictive in nature

- Restrictive in terms of target

language and/or platform

Domain-

Specific

Languages

synergy.cs.vt.edu

Future Work

• Improve tool’s robustness

• Enabling more languages/extensions:

– OpenCL, OpenACC

– Support of distributed programming (MPI)

• Dynamic feedback/advice on parallelism issues

• Extend auto-tuning, auto-parallelization/auto-

vectorization capabilities

• Implement more dwarfs and provide back-end support

for common programming pitfalls in code generated

for supported languages

synergy.cs.vt.edu

Conclusion

GLAF targets domain experts and provides a fine balance between

performance and programmability

• auto-parallelization, optimization and auto-tuning

• helps avoid common programming pitfalls

GLAF allows systematic generation

of multiple starting points for

different

languages/platforms/optimizations

Different seeds in a state-

space search algorithm

Analogous to:In summary:

“GLAF: A Visual Programming and Auto-Tuning

Framework for Parallel Computing”Krommydas, Sasanka, Feng

Leads to overall better performance Global vs. local minimum

Recommended