18
Neural Network Assisted Tile Size Selection Mohammed Rahman, Louis-Noël Pouchet and P. Sadayappan Dept. of Computer Science and Engineering Ohio State University June 22, 2010 iWAPT 2010 Workshop Berkeley, USA

Neural Network Assisted Tile Size Selection › ~pouchet › doc › iwapt-slides.10.pdfAutomatic parametric tiling [ICS’09,CGO’10]: I Produce code where the tile dimensions are

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • Neural Network Assisted Tile Size Selection

    Mohammed Rahman, Louis-Noël Pouchet and P. Sadayappan

    Dept. of Computer Science and EngineeringOhio State University

    June 22, 2010

    iWAPT 2010 WorkshopBerkeley, USA

  • Introduction: iWAPT’10

    Overview

    Situation:I New advances in parametric tiling → more user code to be tunedI The problem of tile size selection is complex and unsolved!

    Our approach:I Use machine learning to create a performance predictor of tile size

    performance, for a specific programI Rely on the distribution shape to extract promising subspaces for

    empirical searchI Outcome: < 2% of the space traversed → 90+% of maximal speedup

    achieved

    Ohio State 2

  • Problem Statement: iWAPT’10

    Tiling

    I Tiling partition the computation into blocksI Note we consider only rectangular tiling hereI For tiling to be legal, such a partitioning must be legal

    Ohio State 3

  • Problem Statement: iWAPT’10

    Parametric Tiling

    Automatic parametric tiling [ICS’09,CGO’10]:I Produce code where the tile dimensions are parametersI Seamlessly find/apply all required transformation to make the code

    tilableI Actual tile sizes are given at run-timeI very useful for tile size selection (no need to recompile)I recent progresses have generalized the approach:

    I Operates on arbitrary affine-control loops (imperfectly nested)I Produce good quality codeI Even expose pipeline-parallelism if neededI Software (from OSU): Pluto, PrimeTile/DynTile/PTile

    Ohio State 4

  • Problem Statement: iWAPT’10

    Tile Size Selection

    Problem: how to select the tile size to have the best performance?

    I data reuse within the execution of a tile;I data reuse between tiles;I the layout in memory of the data used in a tile;I the relative penalty of misses at each level of the hierarchy, which is

    machine-dependent.I the cache replacement policy;I the interaction with other units, such at prefetching;I the interaction with vectorization, to enable a profitable steady-state for

    the vectorized loop(s);I ...

    Ohio State 5

  • Problem Statement: iWAPT’10

    Performance DistributionPerformance distribution of fdtd-2d and syr2k

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    1:1

    :1

    4:2

    :40

    8:4

    :500

    12:8

    :30

    16:1

    0:3

    00

    20:1

    6:1

    2

    25:3

    0:2

    00

    30:4

    0:8

    35:4

    8:1

    28

    42:1

    00

    :4

    48:1

    28

    :64

    Ex

    ec

    uti

    on

    Tim

    e i

    n S

    ec

    on

    ds

    Tile Sizes ( Ti:Tj:Tk)

    fdtd-2d: Performance distribution with Tile Size configurations

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    1:1

    :1

    4:2

    :40

    8:4

    :500

    12:8

    :30

    30:1

    0:3

    00

    40

    :16

    :12

    64:3

    0:2

    00

    12

    8:4

    0:8

    200

    :48:1

    28

    300

    :100:4

    500

    :128:6

    4

    Ex

    ec

    uti

    on

    tim

    e i

    n S

    ec

    on

    ds

    Tile sizes- Ti:Tj:Tk

    dsyr2k: Performance Distribution with Tile Size Configurations

    I Search space: 10648 possible tile sizesI {1,2,4,6,8,10,12,16, 30,32,40,48,64,100,128,

    150,200,256,300,400,500,600}I Machine: Core i7 (1 thread)I 2 "standard" distribution shapes

    Ohio State 6

  • Performance Prediction: iWAPT’10

    Ojectives

    Correlate execution time with tile sizes

    I (Static) performance models do exist...I ... but fail to capture the interplay between all hardware componentsI Usually better suited for well-known problems (eg, uniform reuse +

    square tiles)I Another view: pruning the space of poor-performing tile sizes

    Our approach:I Build a neural network to model the performance distributionI Focus directly on the execution timeI ANN dedicated to a specific program + dataset size

    Ohio State 7

  • Performance Prediction: iWAPT’10

    Neural Network

    Layout:I Fully connected, multi-layer perceptron (MLP)I Input layer: the tile sizes (Ti, Tj, Tk)I Output layer: predicted execution timeI One hidden layer consisting of 30 hidden neuronsI Use Stuttgart Neural Network Simulator library

    Training:I Select 5% (530 tuples) from the search space of 10648I Run the program on the machine using the tile size specified by the

    tuplesI Train with resilient back-propagation (rprop), using the actual execution

    time for a tupleI Standard 10% cross-validation procedure

    Ohio State 8

  • Performance Prediction: iWAPT’10

    Performance Prediction [1/2]

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    10

    :12

    :8

    16

    :2:8

    12

    :1:4

    8

    45

    :12

    8:6

    20

    :2:1

    6

    12

    :40

    0:8

    32

    :4:4

    30

    :64

    :15

    0

    10

    :1:2

    56

    16

    :40

    0:4

    00

    40

    :60

    0:1

    2

    Execu

    tio

    n T

    ime in

    Seco

    nd

    s

    Tile Sizes (Ti:Tj:Tk)

    fdtd-2d: Predicted versus Actual Performance

    ExTime (Actual )

    ExTime (Predicted)

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    5

    8:4

    :64

    60

    0:1

    28

    :32

    64

    :4:1

    6

    10

    :40

    0:5

    00

    12

    8:2

    :30

    0

    25

    6:2

    00

    :25

    6

    100:4

    0:3

    00

    30

    :30

    0:3

    00

    40

    :10

    :4

    10

    0:3

    00

    :12

    6:1

    2:1

    Exe

    cu

    tio

    n T

    ime

    in

    se

    co

    nd

    s

    Tile sizes - Ti:Tj:Tk

    dsyr2k : Predicted versus Actual Performance

    ExTime(Actual)

    ExTime(Predicted)

    Ohio State 9

  • Performance Prediction: iWAPT’10

    Performance Prediction [2/2]

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    12

    :12

    :16

    32

    :2:1

    28

    64

    :40

    :16

    2:1

    0:1

    1:3

    2:2

    56

    25

    6:6

    4:4

    10

    :25

    6:1

    2

    4:5

    00

    :10

    30

    :64

    :40

    0

    6:2

    00

    :50

    0

    25

    6:4

    00

    :16

    Ex

    ec

    uti

    on

    Tim

    e i

    n S

    ec

    on

    ds

    Tile sizes ( Ti:Tj:Tk)

    lu: Predicted versus Actual Performance

    ExTime (Actual)

    ExTime (Predicted)

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    1:1

    :1

    4:2

    :40

    8:4

    :50

    0

    12:8

    :30

    30

    :10

    :30

    0

    40

    :16

    :12

    64

    :30

    :20

    0

    12

    8:4

    0:8

    20

    0:4

    8:1

    28

    30

    0:1

    00

    :4

    50

    0:1

    28

    :64

    Exec

    uti

    on

    Tim

    e in

    Se

    co

    nd

    s

    Tile Sizes (Ti:Tj:Tk)

    dgemm: Predicted versus Actual Performance

    ExTime (Actual)

    ExTime (Predicted)

    Ohio State 10

  • Performance Prediction: iWAPT’10

    Discussions

    I for trmm, lu, 2d-jacobi, syr2k and doitgen, predict more than 90% of oursearch space with less than 10% deviation for the actual execution time

    I In total, can predict 80% and more with less than 10% deviationI Usually smaller deviation for the best tile sizes

    → These ANN are able to model the performance distribution

    Openings:I Program classifier w.r.t. performance distributionI Training: do not "fit" that much the training points?

    Ohio State 11

  • Tile Size Selection: iWAPT’10

    Selecting the Best Tile Size

    The performance distribution can drive the empirical search to focus onpromising subspaces

    Tile size selection:I Random approach has a huge variability on some distribution shapesI Exhaustive search is likely not neededI Need for an intermediate solution

    I Low number of empirical runsI Good convergence, good variabilityI General enough to work on arbitrary user codes

    Ohio State 12

  • Tile Size Selection: iWAPT’10

    Overview of the Algorithm

    1 Generate a parametrically tiled code

    2 Randomly select x% of the tile size space, and run them on the machine

    3 Train an ANN using this data

    4 Use the ANN to predict performance of the entire space

    5 Collect y tile sizes that are predicted best and not already ran

    6 Run the y tile sizes on the machine, output the best found

    Ohio State 13

  • Tile Size Selection: iWAPT’10

    Experimental Setup

    I Studied various kernels (perfectly/imperfectly nested, BLAS & stencils)I Only focused on single-threaded execution, on an Intel Core i7

    I Comparison: simple random search (R), ANN search (ANN)I Repeat each experiment 100 times, for various sampling rate

    Ohio State 14

  • Tile Size Selection: iWAPT’10

    Experimental Results (y = 50)doitgen gemm syr2k lu 2d-jacobi fdtd-2d

    1%

    R-best 100% 99.86% 98.15% 99.89% 99.91% 97.75%R-average 98.71% 96.29% 94.80% 92.19% 94.10% 84.15%R-worst 95.35% 69.64% 89.81% 40.63% 17.69% 31.02%ANN-best 100% 99.86% 100% 100% 99.91% 100%ANN-average 98.89% 96.35% 96.01% 92.62% 98.51% 84.50%ANN-worst 97.26% 82.93% 89.79% 79.68% 94.23% 66.53%

    2%

    R-best 99.97% 99.86% 98.71% 99.89% 100% 100%R-average 98.71% 96.42% 94.80% 92.87% 97.60% 84.10%R-worst 86.49% 67.89% 88.20% 45.29% 55.98% 27.30%ANN-best 100% 99.86% 100% 100% 100% 100%ANN-average 98.89% 96.76% 96.69% 95.34% 98.55% 88.61%ANN-worst 97.26% 89.83% 89.65% 85.80% 94.17% 60.65%

    3%

    R-best 99.97% 99.86% 98.71% 99.89% 100% 100%R-average 98.77% 96.47% 94.80% 94.27% 98.39% 85.47%R-worst 94.89% 63.58% 87.99% 61.24% 84.54% 47.99%ANN-best 99.97% 99.86% 100% 100% 100% 100%ANN-average 98.93% 97.14% 97.17% 95.34% 98.74% 91.45%ANN-worst 97.64% 91.01% 92.27% 85.80% 94.50% 63.34%

    4%

    R-best 99.97% 99.86% 98.71% 99.89% 100% 100%R-average 98.80% 96.65% 94.93% 92.19% 98.41% 85.55%R-worst 96.86% 69.73% 88.57% 52.03% 82.47% 43.74%ANN-best 100% 99.86% 100% 100% 100% 100%ANN-average 98.99% 97.67% 97.20% 95.79% 98.90% 93.55%ANN-worst 98.28% 93.65% 92.66% 85.80% 94.50% 79.26%

    Ohio State 15

  • Tile Size Selection: iWAPT’10

    Some Related Work

    Epshteyn et al. [LCPC’05]:I Search-oriented contributionI Uses regression curves to approximate the performance distributionI Uses active learning to select good candidates for empirical evaluationI Good results for BLAS kernels

    Yuki et al. [CGO’10]:I Aims at selecting/combining between different static modelsI Uses program features to characterize accesses, train ANNI Results demonstrated for matrix-like kernels

    Ohio State 16

  • Tile Size Selection: iWAPT’10

    Conclusions and Future Work

    ANN is a candidate approach to connect tile sizes with performanceI Good prediction qualityI Deviation usually smaller for the good pointsI Combined search heuristic proposed:

    I Strong variability improvement over naive random approachI 90+% efficiency using < 2% of the space, likely can be improved further

    Future work:I Generalization!

    I Categorize benchmarks reg. the performance distribution shapeI Dataset size

    I Do not try to fit the random samples during trainingI Reduce the training timeI problem: ANN configuration

    Ohio State 17

  • Acknowledgements: iWAPT’10

    Acknowledgements

    This work was funded in part by the U.S. National Science Foundationthrough award 0926688 and the Defense Advanced Research ProjectsAgency through AFRL Contract FA8650-09-C-7915. The opinions andfindings in this document do not necessarily reflect the views of eitherthe United States Government or the Ohio State University.

    Ohio State 18

    IntroductionProblem StatementPerformance PredictionTile Size SelectionAcknowledgements