Upload
christopher-preston
View
217
Download
0
Embed Size (px)
Citation preview
Operated by Los Alamos National Security, LLC for DOE/NNSA
DC Reviewed by Kei Davis
SKA – Static Kernel Analysis using LLVM IR
Kartik Ramkrishnan and Ben BergenApplied Computer Science (CCS-7)
Los Alamos National Laboratory
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
SKA – Static Kernel Analyzer SKA is a very useful tool to improve the
development process. Performs static architecture aware analysis of
kernels. Outputs code metrics during the development
process. Visualizes the code execution on the specified
pipeline.
What is SKA
Slide 2
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
SKA-Enhanced Development Cycle
Slide 3
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
define i32 @main(i32 %argc, i8** nocapture %argv) nounwind uwtable readnone { entry: %a1 = alloca [32 x float], align 4 %b2 = alloca [32 x float], align 4 %c3 = alloca [32 x float], align 4 br label %"3" "3": ; preds = %"3", %entry %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %"3" ] %0 = getelementptr [32 x float]* %a1, i64 0, i64 %indvars.iv %1 = load float* %0, align 4 %2 = getelementptr [32 x float]* %b2, i64 0, i64 %indvars.iv %3 = load float* %2, align 4 %4 = getelementptr [32 x float]* %c3, i64 0, i64 %indvars.iv %5 = load float* %4, align 4 %6 = fmul float %3, %5 %7 = fadd float %1, %6 store float %7, float* %4, align 4 %indvars.iv.next = add i64 %indvars.iv, 1 %lftr.wideiv = trunc i64 %indvars.iv.next to i32 %exitcond = icmp eq i32 %lftr.wideiv, 32 br i1 %exitcond, label %"5", label %"3" "5": ; preds = %"3" ret i32 0
Example kernel – saxpy.ll
Slide 4
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
LLVM IR is SSA (single static assignment) which has infinite register count.
ISAs(instruction set architectures) have a limited number of registers.
We improve SKA’s fidelity by allocating registers to the IR based on the target ISA.
Register allocation support for SKA
Slide 5
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
Simple register allocation algorithm.
Register Allocation algorithm
Slide 6
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
Build Liveness Tables
Slide 7
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
SKA takes an LLVM IR module as input and builds a liveness table.
Build Liveness Tables
Slide 8
Partial liveness table for saxpy.ll
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
Build Liveness Tables
Slide 9
Top level loop
Single BB liveness calculation
Populate liveness table
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
Build Interference Graph
Slide 10
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
Traverse the liveness table to create the interference graph.
Build Interference Graph
Slide 11
Partial igraph for saxpy.ll
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
Build Interference Graph
Slide 12
Top level loop Populate igraph
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
Simplify Interference Graph
Slide 13
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
Populate a stack which records whether a register (node) is simple or not.
Simplify Interference Graph
Slide 14
Partial node stack for saxpy.ll
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
Simplify Interference Graph
Slide 15
Populate simple node stack
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
Assign ISA Registers to IR
Slide 16
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
Assign ISA Registers to IR
Slide 17
Assign ISA registers to IR, if no true spill. We choose between int, float and vector.
Partial register allocation for saxpy.ll
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
Assign ISA registers to IR
Slide 18
Assign register if no true spill
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
Rewrite IR
Slide 19
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
The live range of %a1 is shown in red. It reduces after rewriting the IR.
Rewrite IR
Slide 20
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
Rewrite IR
Slide 21
Store instruction into stack
Load, use and store
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
Register allocation done !
Slide 22
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
Specified in an xml file. Specifies logical units, instructions they process,
latencies, issue width …
Virtual architecture specification
Slide 23Partial architecture example
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
Pipeline simulation
Slide 24
Pipeline simulation of saxpy.ll
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
Skaview
Slide 25
Graphical visualization of saxpy.ll
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
SKA outputs useful metrics about the code.
Primitive statistics include basic performance counters, such as instructions, cycles and stalls.
Derived statistics are obtained from primitive statistics.
Code metrics
Slide 26
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
CPI prediction is better after register allocation.
Results for residual.ll
Slide 27
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
No change in CPI prediction. Why ?
Results for ef_operator.ll
Slide 28
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
Predicts CPI > 1.0 for KNC for single threaded workloads.
Results for KNC (Knights corner)
Slide 29
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
SKA now supports register allocation. Register allocation improves SKA’s fidelity by 5-
10% across three architectures for a compute intensive benchmark.
Dynamic scheduling and cache models can further improve SKA fidelity.
Conclusion
Slide 30
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
Questions ?
Thank You !
Slide 31