Julia - NVIDIA...What is Julia? Technical computing language High-level like Python Performance of C function mandel(z) c = z maxiter = 80 for n = 1:maxiter if abs(z) > 2

JuliaA Fresh Approach to GPU Computing

What is Julia?

Technical computing language

High-level like Python

Performance of C

function mandel(z) c = z maxiter = 80 for n = 1:maxiter if abs(z) > 2 return n-1 end z = z^2 + c end return maxiterend

Julia for GPU programming

What is GPU programming?

High-level

Low-level

TensorFlow, Keras

ArrayFire, Thrust

cuBLAS, cuDNN

CUB, MGPU

CUDA C

Flux.jl, Knet.jl

GPUArrays.jl

CuArrays.jl

CUDAnative.jl

Programming with libraries

� cuBLAS.jl

� cuFFT.jl

� cuRAND.jl

� cuSPARSE.jl

� cuDNN.jl

� cuSOLVER.jl

CuArrays.jl

a = CuArray(Float32,2)

b = curand(Float32, 2)

a*a

fft(a)

qrfact(a)

softmax(a)

Programming with kernels

Much harder!

Designed for performance

Multiple dispatch

function foo(x) if isa(x, Int64) … elseif isa(x, Float64) … endend

foo(x::Int64) = …

foo(x::Float64) = …


Type inference

function sigmoid(x) temp = exp(-x) return (1 / (1+temp))end


Type inference

function sigmoid(x::Int) temp = exp(-x)::Float64 return (1 / (1+temp))::Float64end


Type inference

function sigmoid(x::Float32) temp = exp(-x)::Float32 return (1 / (1+temp))::Float32end

Machine-native types


Multiple dispatch

Type inference

Machine-native types

Specializing JIT compiler

High-qualitystand-alonemachine code

Extensible language

Source AST Julia IR LLVM IR Machine code

Inspect& Inject

Configure

Source

How does it look?

function vadd(a, b, c)i = threadIdx().xc[i] = a[i] + b[i]return

end

a = CuArray(randn(2,2))b = CuArray(randn(2,2))c = similar(a)

@cuda threads=4 vadd(a,b,c)

No DSL, no subset, just Julia

CUDA abstraction level

Performance parity

How does it run?

How does it work?


end

a = CuArray(randn(2,2))b = CuArray(randn(2,2))c = similar(a)

@cuda threads=4 vadd(a,b,c)

vadd

vaddCuArray{Float64,2}

LLVM IR

PTX

How does it work?

vadd

LLVM IR

PTX

Run time JIT compiler

Fully transparent

No overhead!

vaddCuArray{Float64,2}

LLVM IR

PTX

vaddCuArray{Int32,3}

High-level GPU programming

( (a .+ b) ./ d ) .- e

Great performance

Clean & concise

Generic code


end

W = randn(2, 10)b = randn(2) f(x) = softmax(W * x .+ b)

model = Chain( Dense(10, 5, σ), Dense(5, 2), softmax)

From GPU kernels, to differentiable algorithms, to high-level layer stacking. All on one platform.

Pkg.add(“Flux”)

Differential Equations Machine Learning

CUDA Automatic Differentiation

Everything You Build

The Julia Magic

Everything just works with everything else!

All the HPC Tooling

Generic programming is extremely powerful.

StructOfArrays.jl DistributedArrays.jlJuliaDB.jl & DataFrames.jl

Deep LearningDifferential Equations

Operations Research

function model(tree)

if isleaf(tree)

tree.value

else

model(tree.left) + model(tree.right)

end

Case Studies

JuliaCon – 300 Attendees, 150 Talks

https://github.com/JuliaGPU/NVIDIA Parallel Forall blog

https://github.com/FluxML/

Julia

https://github.com/JuliaGPU/

https://github.com/FluxML/

Documents

Julia - NVIDIA...What is Julia? Technical computing language High-level like Python Performance of C function mandel(z) c = z maxiter = 80 for n = 1:maxiter if abs(z) > 2