Operated by Los Alamos National Security, LLC for DOE/NNSA DC Reviewed by Kei Davis SKA – Static...

Preview:

Citation preview

Operated by Los Alamos National Security, LLC for DOE/NNSA

DC Reviewed by Kei Davis

SKA – Static Kernel Analysis using LLVM IR

Kartik Ramkrishnan and Ben BergenApplied Computer Science (CCS-7)

Los Alamos National Laboratory

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

SKA – Static Kernel Analyzer SKA is a very useful tool to improve the

development process. Performs static architecture aware analysis of

kernels. Outputs code metrics during the development

process. Visualizes the code execution on the specified

pipeline.

What is SKA

Slide 2

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

SKA-Enhanced Development Cycle

Slide 3

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

define i32 @main(i32 %argc, i8** nocapture %argv) nounwind uwtable readnone { entry: %a1 = alloca [32 x float], align 4 %b2 = alloca [32 x float], align 4 %c3 = alloca [32 x float], align 4 br label %"3" "3": ; preds = %"3", %entry %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %"3" ] %0 = getelementptr [32 x float]* %a1, i64 0, i64 %indvars.iv %1 = load float* %0, align 4 %2 = getelementptr [32 x float]* %b2, i64 0, i64 %indvars.iv %3 = load float* %2, align 4 %4 = getelementptr [32 x float]* %c3, i64 0, i64 %indvars.iv %5 = load float* %4, align 4 %6 = fmul float %3, %5 %7 = fadd float %1, %6 store float %7, float* %4, align 4 %indvars.iv.next = add i64 %indvars.iv, 1 %lftr.wideiv = trunc i64 %indvars.iv.next to i32 %exitcond = icmp eq i32 %lftr.wideiv, 32 br i1 %exitcond, label %"5", label %"3" "5": ; preds = %"3" ret i32 0

Example kernel – saxpy.ll

Slide 4

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

LLVM IR is SSA (single static assignment) which has infinite register count.

ISAs(instruction set architectures) have a limited number of registers.

We improve SKA’s fidelity by allocating registers to the IR based on the target ISA.

Register allocation support for SKA

Slide 5

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

Simple register allocation algorithm.

Register Allocation algorithm

Slide 6

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

Build Liveness Tables

Slide 7

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

SKA takes an LLVM IR module as input and builds a liveness table.

Build Liveness Tables

Slide 8

Partial liveness table for saxpy.ll

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

Build Liveness Tables

Slide 9

Top level loop

Single BB liveness calculation

Populate liveness table

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

Build Interference Graph

Slide 10

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

Traverse the liveness table to create the interference graph.

Build Interference Graph

Slide 11

Partial igraph for saxpy.ll

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

Build Interference Graph

Slide 12

Top level loop Populate igraph

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

Simplify Interference Graph

Slide 13

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

Populate a stack which records whether a register (node) is simple or not.

Simplify Interference Graph

Slide 14

Partial node stack for saxpy.ll

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

Simplify Interference Graph

Slide 15

Populate simple node stack

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

Assign ISA Registers to IR

Slide 16

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

Assign ISA Registers to IR

Slide 17

Assign ISA registers to IR, if no true spill. We choose between int, float and vector.

Partial register allocation for saxpy.ll

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

Assign ISA registers to IR

Slide 18

Assign register if no true spill

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

Rewrite IR

Slide 19

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

The live range of %a1 is shown in red. It reduces after rewriting the IR.

Rewrite IR

Slide 20

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

Rewrite IR

Slide 21

Store instruction into stack

Load, use and store

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

Register allocation done !

Slide 22

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

Specified in an xml file. Specifies logical units, instructions they process,

latencies, issue width …

Virtual architecture specification

Slide 23Partial architecture example

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

Pipeline simulation

Slide 24

Pipeline simulation of saxpy.ll

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

Skaview

Slide 25

Graphical visualization of saxpy.ll

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

SKA outputs useful metrics about the code.

Primitive statistics include basic performance counters, such as instructions, cycles and stalls.

Derived statistics are obtained from primitive statistics.

Code metrics

Slide 26

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

CPI prediction is better after register allocation.

Results for residual.ll

Slide 27

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

No change in CPI prediction. Why ?

Results for ef_operator.ll

Slide 28

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

Predicts CPI > 1.0 for KNC for single threaded workloads.

Results for KNC (Knights corner)

Slide 29

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

SKA now supports register allocation. Register allocation improves SKA’s fidelity by 5-

10% across three architectures for a compute intensive benchmark.

Dynamic scheduling and cache models can further improve SKA fidelity.

Conclusion

Slide 30

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

Questions ?

Thank You !

Slide 31