Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
JuliaA Fresh Approach to GPU Computing
What is Julia?
Technical computing language
High-level like Python
Performance of C
function mandel(z) c = z maxiter = 80 for n = 1:maxiter if abs(z) > 2 return n-1 end z = z^2 + c end return maxiterend
Julia for GPU programming
What is GPU programming?
High-level
Low-level
TensorFlow, Keras
ArrayFire, Thrust
cuBLAS, cuDNN
CUB, MGPU
CUDA C
Flux.jl, Knet.jl
GPUArrays.jl
CuArrays.jl
CUDAnative.jl
Programming with libraries
� cuBLAS.jl
� cuFFT.jl
� cuRAND.jl
� cuSPARSE.jl
� cuDNN.jl
� cuSOLVER.jl
CuArrays.jl
a = CuArray(Float32,2)
b = curand(Float32, 2)
a*a
fft(a)
qrfact(a)
softmax(a)
Programming with kernels
Much harder!
Designed for performance
Multiple dispatch
function foo(x) if isa(x, Int64) … elseif isa(x, Float64) … endend
foo(x::Int64) = …
foo(x::Float64) = …
Designed for performance
Type inference
function sigmoid(x) temp = exp(-x) return (1 / (1+temp))end
Designed for performance
Type inference
function sigmoid(x::Int) temp = exp(-x)::Float64 return (1 / (1+temp))::Float64end
Designed for performance
Type inference
function sigmoid(x::Float32) temp = exp(-x)::Float32 return (1 / (1+temp))::Float32end
Machine-native types
Designed for performance
Multiple dispatch
Type inference
Machine-native types
Specializing JIT compiler
High-qualitystand-alonemachine code
Extensible language
Source AST Julia IR LLVM IR Machine code
Inspect& Inject
Configure
Source
How does it look?
function vadd(a, b, c)i = threadIdx().xc[i] = a[i] + b[i]return
end
a = CuArray(randn(2,2))b = CuArray(randn(2,2))c = similar(a)
@cuda threads=4 vadd(a,b,c)
No DSL, no subset, just Julia
CUDA abstraction level
Performance parity
How does it run?
How does it work?
function vadd(a, b, c)i = threadIdx().xc[i] = a[i] + b[i]return
end
a = CuArray(randn(2,2))b = CuArray(randn(2,2))c = similar(a)
@cuda threads=4 vadd(a,b,c)
vadd
vaddCuArray{Float64,2}
LLVM IR
PTX
How does it work?
vadd
LLVM IR
PTX
Run time JIT compiler
Fully transparent
No overhead!
vaddCuArray{Float64,2}
LLVM IR
PTX
vaddCuArray{Int32,3}
High-level GPU programming
( (a .+ b) ./ d ) .- e
Great performance
Clean & concise
Generic code
function vadd(a, b, c)i = threadIdx().xc[i] = a[i] + b[i]return
end
W = randn(2, 10)b = randn(2) f(x) = softmax(W * x .+ b)
model = Chain( Dense(10, 5, σ), Dense(5, 2), softmax)
From GPU kernels, to differentiable algorithms, to high-level layer stacking. All on one platform.
Pkg.add(“Flux”)
Differential Equations Machine Learning
CUDA Automatic Differentiation
Everything You Build
The Julia Magic
Everything just works with everything else!
All the HPC Tooling
Generic programming is extremely powerful.
StructOfArrays.jl DistributedArrays.jlJuliaDB.jl & DataFrames.jl
Deep LearningDifferential Equations
Operations Research
function model(tree)
if isleaf(tree)
tree.value
else
model(tree.left) + model(tree.right)
end
Case Studies
JuliaCon – 300 Attendees, 150 Talks
https://github.com/JuliaGPU/NVIDIA Parallel Forall blog
https://github.com/FluxML/
Julia