15
Riptide: Techniques for Fast End-to-End Binarized Networks Josh Fromm

Riptide: Techniques for Fast End-to-End Binarized Networks · 2020. 12. 12. · Riptide: Techniques for Fast End-to-End Binarized Networks Josh Fromm

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

  • Riptide: Techniques for Fast End-to-End

    Binarized NetworksJosh Fromm

  • Binary Networks2 0 1 3

    0 0 1 10 0 1 1

    1 0 0 1

    Original Datauint32

    Bitplanesuint1

    3

    93Bitpacked Data

    uint4

    Bitserial Dot Product

    0 0 1 1

    0011

    10010011= =

    1× popcount(3&3) + 2× popcount(3&9) = 4

  • Binary Networks2 0 1 3

    0 0 1 10 0 1 1

    1 0 0 1

    Original Datauint32

    Bitplanesuint1

    3

    93Bitpacked Data

    uint4

    Bitserial Dot Product

    0 0 1 1

    0011

    10010011= =

    1× popcount(3&3) + 2× popcount(3&9) = 4

    • Replaces 32-bit values with 1 or 2 bits

  • Binary Networks2 0 1 3

    0 0 1 10 0 1 1

    1 0 0 1

    Original Datauint32

    Bitplanesuint1

    3

    93Bitpacked Data

    uint4

    Bitserial Dot Product

    0 0 1 1

    0011

    10010011= =

    1× popcount(3&3) + 2× popcount(3&9) = 4

    • Replaces 32-bit values with 1 or 2 bits• Up to 32X compression

  • Binary Networks2 0 1 3

    0 0 1 10 0 1 1

    1 0 0 1

    Original Datauint32

    Bitplanesuint1

    3

    93Bitpacked Data

    uint4

    Bitserial Dot Product

    0 0 1 1

    0011

    10010011= =

    1× popcount(3&3) + 2× popcount(3&9) = 4

    • Replaces 32-bit values with 1 or 2 bits• Up to 32X compression• Up to 48X (hypothetical) speedup

  • Fused Glue

    Fully Binarized (ours) Traditional

    Cum

    ulat

    ive

    Num

    ber o

    f Ope

    ratio

    ns

    Quantized Layer

    Conversion Layer

    Floating point Layer

    Input/Output

    Bitp

    ack

    Qua

    ntize

    Bina

    ry C

    onv

    Dequ

    antiz

    e

    Scal

    e

    Pool

    Batc

    h N

    orm

    Activ

    atio

    n

    Bina

    ry C

    onv

    Bitp

    ack

    Shift

    Sca

    le

    Clip Po

    ol

  • Bitpack Fusion

    𝑥 𝑥 𝑥2 𝑥 𝑥 𝑥 𝑥 𝑥 𝑥 𝑥 𝑥. . . Bytes

    BinaryConv

    𝑐 𝑐2𝑐 𝑐 𝑐. 2𝑁𝐻𝑊𝐶 Bytes

    Int-N Bit Packed Activations

    Int-16 Popcount Accumulation

    Fused Shift/Scale

    Int-16

    𝑞 𝑞2𝑞 𝑞 .Int-16 Quantized Prepacked Bits𝑞 2𝑁𝐻𝑊𝐶 Bytes

    Bit Pack

    𝑦 𝑦 𝑦2 𝑦 𝑦 𝑦 𝑦 𝑦 𝑦 𝑦 𝑦. . .Int-N Bit Packed Outputs

    Int-8

    Bytes

    Int-16 Unused

    Int-8

    Int-N

    Bitpack Fusion

  • SqueezeNet Speedups

  • SqueezeNet Speedups

    • Up to 10X speedup

  • SqueezeNet Speedups

    • Up to 10X speedup• Memory savings from fused schedule


    yield 1.3X speedup

  • SqueezeNet Speedups

    • Up to 10X speedup• Memory savings from fused schedule


    yield 1.3X speedup

    • All optimizations have significant
impact

  • SqueezeNet Speedups

    • Up to 10X speedup• Memory savings from fused schedule


    yield 1.3X speedup

    • All optimizations have significant
impact

    • Fused glue offers a 2X speedup

  • All Speedups

  • Accuracy / Runtime

  • Accuracy / Runtime

    Available Open Source Soon!