36
The Embedded Learning Library

The Embedded Learning Library - tinyML

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Embedded Learning Library - tinyML

Lig

ht B

lue

R0 G

188 B

242

Gre

en

R16 G

124 B

16

Red

R232 G

17 B

35

Mag

en

taR

180 G

0 B

158

Pu

rple

R92 G

45 B

145

Blu

eR

0 G

120 B

212

Teal

R0 G

130 B

114

Yello

wR

255 G

185 B

0

Ora

ng

eR

216 G

59 B

1

Lig

ht Y

ello

wR

255 G

241 B

0Lig

ht O

ran

ge

R255 G

140 B

0Lig

ht M

ag

en

taR

227 G

0 B

140

Lig

ht P

urp

leR

180 G

160 B

255

Lig

ht T

eal

R0 G

178 B

148

Lig

ht G

reen

R186 G

216 B

10

Dark

Red

R168 G

0 B

0D

ark

Mag

en

ta

R92 G

0 B

92

Dark

Pu

rple

R50 G

20 B

90

Mid

Blu

eR

0 G

24 B

143

Dark

Teal

R0 G

75 B

80

Dark

Gre

en

R0 G

75 B

28

Dark

Blu

eR

0 G

32 B

80

Mid

Gra

yR

115 G

115 B

115

Dark

Gra

yR

80 G

80 B

80

Ric

h B

lack

R0 G

0 B

0

Wh

iteR

255 G

255 B

255

Gra

yR

210 G

210 B

210

Lig

ht G

ray

R230 G

230 B

230

So

ft Bla

ck

for T

ext

R26 G

26 B

26

So

ft Bla

ck

for B

ackg

rou

nd

sR

13 G

130 B

13

The Embedded Learning Library

Page 2: The Embedded Learning Library - tinyML

The Embedded Learning Library (ELL)

Cross-compiler for AI pipelines, specialized for resource constrained target platforms

https://github.com/Microsoft/ELL

AI

Pipeline

Target

Machine

Code

ELL

Page 3: The Embedded Learning Library - tinyML

• 3 years at Microsoft Research

• compiler toolchain, tutorials, model gallery

• focus: ARM CPUs embedded GPUs, vision on ARM Cortex A53, keyword spotting on ARM Cortex M4f

The Embedded Learning Library

Page 4: The Embedded Learning Library - tinyML

Computation Graph Optimizer

ELL Platform Abstraction Layer

LLVM

Emitter

OpenCL

Emitter

Importer Importer Importers

Importer Importer Target

Profiles

Importer Importer ELL Trainers

Target

Dataset Pretrained

Model

LLVM OpenCL BLAS

Architecture

Page 5: The Embedded Learning Library - tinyML

AI compiler vs. AI runtime

• model-specific optimization

• target-specific optimization

• small executable

• portability

• seamless migration from cloud to edge

why AI compiler? why AI runtime?

best of both worlds

just-in-time AI compiler

Page 6: The Embedded Learning Library - tinyML

compression techniques:

• efficient architectures

• pruning

• low precision math and quantization

• low rank matrix approximation

Evaluation

small loss in accuracy large gain in cost

Page 7: The Embedded Learning Library - tinyML

January 2018

30

35

40

45

50

55

60

65

70

0 100 200 300 400 500 600 700 800 900 1000

ILSV

RC

2012 t

op

-1

ms/image on RPi3@700MHz

Architecture search

model Pareto frontier

Page 8: The Embedded Learning Library - tinyML

30

35

40

45

50

55

60

65

70

0 100 200 300 400 500 600 700 800 900 1000

ILSV

RC

2012 t

op

-1

ms/image on RPi3@700MHz

Architecture search January 2018

Page 9: The Embedded Learning Library - tinyML

February 2018

30

35

40

45

50

55

60

65

70

0 100 200 300 400 500 600 700 800 900 1000

ILSV

RC

2012 t

op

-1

ms/image on RPi3@700MHz

Architecture search

Page 10: The Embedded Learning Library - tinyML

March 2018

30

35

40

45

50

55

60

65

70

0 100 200 300 400 500 600 700 800 900 1000

ILSV

RC

2012 t

op

-1

ms/image on RPi3@700MHz

Architecture search

Page 11: The Embedded Learning Library - tinyML

April 2018

30

35

40

45

50

55

60

65

70

0 100 200 300 400 500 600 700 800 900 1000

ILSV

RC

2012 t

op

-1

ms/image on RPi3@700MHz

Architecture search

Page 12: The Embedded Learning Library - tinyML

• variety of convolution kernels

• scheduling

• engineering

Lossless acceleration

Page 13: The Embedded Learning Library - tinyML

January 2019

30

35

40

45

50

55

60

65

70

0 100 200 300 400 500 600 700 800 900 1000

ILSV

RC

2012 t

op

-1

ms/image on RPi3@700MHz

Lossless acceleration

Page 14: The Embedded Learning Library - tinyML

February 2019

30

35

40

45

50

55

60

65

70

0 100 200 300 400 500 600 700 800 900 1000

ILSV

RC

2012 t

op

-1

ms/image on RPi3@700MHz

Lossless acceleration

Page 15: The Embedded Learning Library - tinyML

March 2019

30

35

40

45

50

55

60

65

70

0 100 200 300 400 500 600 700 800 900 1000

ILSV

RC

2012 t

op

-1

ms/image on RPi3@700MHz

Lossless acceleration .

Page 16: The Embedded Learning Library - tinyML

mix and match compression techniques

engineering/ML co-design

during training vs post processing

Lossy Acceleration

Page 17: The Embedded Learning Library - tinyML

bit value

0 0

1 1

bit value

0 -1

1 1

bits value

00 0

01 1

10 n/a

11 -1

bits value

0…k [0...2^k - 1]

bits value

0…k [-2^(b-1)-1...2^(b-1)-1]

bits Value

0…k lookup

bits value

0…k a±b±c±.. ±n

Quantization semantics binary

ternary linear

exponential

lookup/clustered iterative sum

Page 18: The Embedded Learning Library - tinyML

b3 b2 b1 b0 a3 a2 a1 a0

d3 d2 d1 d0 c3 c2 c1 c0

d0 c0 b0 a0

d1 c1 b1 a1

d2 c2 b2 a2

d3 c3 b3 a3

bit packed

bit planes

Quantization representation

Page 19: The Embedded Learning Library - tinyML

Quantization example

activations

weights

5 1 7 6 3 4 2 5

1 -1 0 -1 -1 -1 1 0

ternary weights, 3-bit unsigned linear activations (bitplane)

dot = 5*1 + 1*-1 + 7*0 + 6*-1 + 3*-1 + 4*-1 + 2*1 + 5*0 = -7

Page 20: The Embedded Learning Library - tinyML

Quantization example

1 1 1 0 1 0 0 1

0 0 1 1 1 0 1 0

1 0 1 1 0 1 0 1

0 1 0 1 1 1 0 0

1 1 0 1 1 1 1 0

activations

sign

magnitude

5 1 7 6 3 4 2 5

1 -1 0 -1 -1 -1 1 0

Page 21: The Embedded Learning Library - tinyML

Quantization example

1 1 1 0 1 0 0 1

0 0 1 1 1 0 1 0

1 0 1 1 0 1 0 1

0 1 0 1 1 1 0 0

1 1 0 1 1 1 1 0

activations

sign

magnitude

Page 22: The Embedded Learning Library - tinyML

Quantization example

1 1 1 0 1 0 0 1

0 0 1 1 1 0 1 0

1 0 1 1 0 1 0 1

0 1 0 1 1 1 0 0

1 1 0 1 1 1 1 0

activations

sign

magnitude

o = 11101001 && 11011110 = 11001000

absSum += popcount(o) = 3

o = 1100100 && 01011100 = 10000100

negSum += popcount(o) = 2

absSum: o = a && m

absSum += popcount(o)

negSum: o = a && s

negSum += popcount(o)

Page 23: The Embedded Learning Library - tinyML

Quantization example

1 1 1 0 1 0 0 1

0 0 1 1 1 0 1 0

1 0 1 1 0 1 0 1

0 1 0 1 1 1 0 0

1 1 0 1 1 1 1 0

activations

sign

magnitude

o = 00111010 && 11011110 = 00011010

absSum += popcount(o) = 3 + 2*3 = 9

o = 00011010 && 01011100 = 00011000

negSum += popcount(o) = 2 + 2*2 = 6

absSum: o = a && m

absSum += popcount(o) << 1

negSum: o = a && s

negSum += popcount(o) << 1

Page 24: The Embedded Learning Library - tinyML

Quantization example

1 1 1 0 1 0 0 1

0 0 1 1 1 0 1 0

1 0 1 1 0 1 0 1

0 1 0 1 1 1 0 0

1 1 0 1 1 1 1 0

activations

sign

magnitude

absSum: o = a && m

absSum += popcount(o) << 2

negSum: o = a && s

negSum += popcount(o) << 2

total = absSum – 2 * negSum

o = 10110101 && 11011110 = 11001000

absSum += popcount(o) = 9 + 4 * 3 = 21

o = 11001000 && 01011100 = 01001000

negSum += popcount(o) = 6 + 4 * 2 = 14

total = 21 – 2 * 14 = -7

Page 25: The Embedded Learning Library - tinyML

Quantization example

1 1 1 0 1 0 0 1

0 0 1 1 1 0 1 0

1 0 1 1 0 1 0 1

0 1 0 1 1 1 0 0

1 1 0 1 1 1 1 0

activations

sign

magnitude

instruction_count = 8 instructions * 3 bits = 24 instructions

vector size = 8

instructions per element = 24 / 8 = 3

if word is 128-bit (NEON):

instruction_count = 8 instructions * 3 bits + 0.3 reduce ops = 24.3 instructions

vector size = 128

instructions per element = 24.3 / 128 = 0.19 (5x faster than float)

Page 26: The Embedded Learning Library - tinyML

Quantization performance

0

5

10

15

20

25

quantize

d v

s fu

ll p

reci

sio

n Speedup on ARM1176

1 Bit 2 Bits 3 bits 8 bits

Page 27: The Embedded Learning Library - tinyML

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7acc

ura

cy v

s o

rig

inal m

od

el

proportion of zeros in ternary weights

model with

binary weights models with

trinarized

weights

Quantized weight accuracy

Page 28: The Embedded Learning Library - tinyML

Quantized activation accuracy

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7 8

acc

ura

cy v

s re

al act

ivatio

ns

quantized activation bit count

ternary weights

binary weights

Page 29: The Embedded Learning Library - tinyML

• post-training lossy compression (pruning and quantization)

• engineering/ML training co-design

• infrastructure:

beating BLAS on embedded platforms

extending platform abstraction layer to embedded GPUs

global optimizer

Current focus areas

Page 30: The Embedded Learning Library - tinyML

Questions?

• https://microsoft.github.io/ELL/

• Code: https://github.com/Microsoft/ELL

• Model Gallery: https://microsoft.github.io/ELL/gallery/

Page 31: The Embedded Learning Library - tinyML

Lig

ht B

lue

R0 G

188 B

242

Gre

en

R16 G

124 B

16

Red

R232 G

17 B

35

Mag

en

taR

180 G

0 B

158

Pu

rple

R92 G

45 B

145

Blu

eR

0 G

120 B

212

Teal

R0 G

130 B

114

Yello

wR

255 G

185 B

0

Ora

ng

eR

216 G

59 B

1

Lig

ht Y

ello

wR

255 G

241 B

0Lig

ht O

ran

ge

R255 G

140 B

0Lig

ht M

ag

en

taR

227 G

0 B

140

Lig

ht P

urp

leR

180 G

160 B

255

Lig

ht T

eal

R0 G

178 B

148

Lig

ht G

reen

R186 G

216 B

10

Dark

Red

R168 G

0 B

0D

ark

Mag

en

ta

R92 G

0 B

92

Dark

Pu

rple

R50 G

20 B

90

Mid

Blu

eR

0 G

24 B

143

Dark

Teal

R0 G

75 B

80

Dark

Gre

en

R0 G

75 B

28

Dark

Blu

eR

0 G

32 B

80

Mid

Gra

yR

115 G

115 B

115

Dark

Gra

yR

80 G

80 B

80

Ric

h B

lack

R0 G

0 B

0

Wh

iteR

255 G

255 B

255

Gra

yR

210 G

210 B

210

Lig

ht G

ray

R230 G

230 B

230

So

ft Bla

ck

for T

ext

R26 G

26 B

26

So

ft Bla

ck

for B

ackg

rou

nd

sR

13 G

130 B

13

Page 32: The Embedded Learning Library - tinyML

Lig

ht B

lue

R0 G

188 B

242

Gre

en

R16 G

124 B

16

Red

R232 G

17 B

35

Mag

en

taR

180 G

0 B

158

Pu

rple

R92 G

45 B

145

Blu

eR

0 G

120 B

212

Teal

R0 G

130 B

114

Yello

wR

255 G

185 B

0

Ora

ng

eR

216 G

59 B

1

Lig

ht Y

ello

wR

255 G

241 B

0Lig

ht O

ran

ge

R255 G

140 B

0Lig

ht M

ag

en

taR

227 G

0 B

140

Lig

ht P

urp

leR

180 G

160 B

255

Lig

ht T

eal

R0 G

178 B

148

Lig

ht G

reen

R186 G

216 B

10

Dark

Red

R168 G

0 B

0D

ark

Mag

en

ta

R92 G

0 B

92

Dark

Pu

rple

R50 G

20 B

90

Mid

Blu

eR

0 G

24 B

143

Dark

Teal

R0 G

75 B

80

Dark

Gre

en

R0 G

75 B

28

Dark

Blu

eR

0 G

32 B

80

Mid

Gra

yR

115 G

115 B

115

Dark

Gra

yR

80 G

80 B

80

Ric

h B

lack

R0 G

0 B

0

Wh

iteR

255 G

255 B

255

Gra

yR

210 G

210 B

210

Lig

ht G

ray

R230 G

230 B

230

So

ft Bla

ck

for T

ext

R26 G

26 B

26

So

ft Bla

ck

for B

ackg

rou

nd

sR

13 G

130 B

13

Page 33: The Embedded Learning Library - tinyML

Lig

ht B

lue

R0 G

188 B

242

Gre

en

R16 G

124 B

16

Red

R232 G

17 B

35

Mag

en

taR

180 G

0 B

158

Pu

rple

R92 G

45 B

145

Blu

eR

0 G

120 B

212

Teal

R0 G

130 B

114

Yello

wR

255 G

185 B

0

Ora

ng

eR

216 G

59 B

1

Lig

ht Y

ello

wR

255 G

241 B

0Lig

ht O

ran

ge

R255 G

140 B

0Lig

ht M

ag

en

taR

227 G

0 B

140

Lig

ht P

urp

leR

180 G

160 B

255

Lig

ht T

eal

R0 G

178 B

148

Lig

ht G

reen

R186 G

216 B

10

Dark

Red

R168 G

0 B

0D

ark

Mag

en

ta

R92 G

0 B

92

Dark

Pu

rple

R50 G

20 B

90

Mid

Blu

eR

0 G

24 B

143

Dark

Teal

R0 G

75 B

80

Dark

Gre

en

R0 G

75 B

28

Dark

Blu

eR

0 G

32 B

80

Mid

Gra

yR

115 G

115 B

115

Dark

Gra

yR

80 G

80 B

80

Ric

h B

lack

R0 G

0 B

0

Wh

iteR

255 G

255 B

255

Gra

yR

210 G

210 B

210

Lig

ht G

ray

R230 G

230 B

230

So

ft Bla

ck

for T

ext

R26 G

26 B

26

So

ft Bla

ck

for B

ackg

rou

nd

sR

13 G

130 B

13

Page 34: The Embedded Learning Library - tinyML

Lig

ht B

lue

R0 G

188 B

242

Gre

en

R16 G

124 B

16

Red

R232 G

17 B

35

Mag

en

taR

180 G

0 B

158

Pu

rple

R92 G

45 B

145

Blu

eR

0 G

120 B

212

Teal

R0 G

130 B

114

Yello

wR

255 G

185 B

0

Ora

ng

eR

216 G

59 B

1

Lig

ht Y

ello

wR

255 G

241 B

0Lig

ht O

ran

ge

R255 G

140 B

0Lig

ht M

ag

en

taR

227 G

0 B

140

Lig

ht P

urp

leR

180 G

160 B

255

Lig

ht T

eal

R0 G

178 B

148

Lig

ht G

reen

R186 G

216 B

10

Dark

Red

R168 G

0 B

0D

ark

Mag

en

ta

R92 G

0 B

92

Dark

Pu

rple

R50 G

20 B

90

Mid

Blu

eR

0 G

24 B

143

Dark

Teal

R0 G

75 B

80

Dark

Gre

en

R0 G

75 B

28

Dark

Blu

eR

0 G

32 B

80

Mid

Gra

yR

115 G

115 B

115

Dark

Gra

yR

80 G

80 B

80

Ric

h B

lack

R0 G

0 B

0

Wh

iteR

255 G

255 B

255

Gra

yR

210 G

210 B

210

Lig

ht G

ray

R230 G

230 B

230

So

ft Bla

ck

for T

ext

R26 G

26 B

26

So

ft Bla

ck

for B

ackg

rou

nd

sR

13 G

130 B

13

Page 35: The Embedded Learning Library - tinyML

Not every model is a winner

Page 36: The Embedded Learning Library - tinyML

Lig

ht B

lue

R0 G

188 B

242

Gre

en

R16 G

124 B

16

Red

R232 G

17 B

35

Mag

en

taR

180 G

0 B

158

Pu

rple

R92 G

45 B

145

Blu

eR

0 G

120 B

212

Teal

R0 G

130 B

114

Yello

wR

255 G

185 B

0

Ora

ng

eR

216 G

59 B

1

Lig

ht Y

ello

wR

255 G

241 B

0Lig

ht O

ran

ge

R255 G

140 B

0Lig

ht M

ag

en

taR

227 G

0 B

140

Lig

ht P

urp

leR

180 G

160 B

255

Lig

ht T

eal

R0 G

178 B

148

Lig

ht G

reen

R186 G

216 B

10

Dark

Red

R168 G

0 B

0D

ark

Mag

en

ta

R92 G

0 B

92

Dark

Pu

rple

R50 G

20 B

90

Mid

Blu

eR

0 G

24 B

143

Dark

Teal

R0 G

75 B

80

Dark

Gre

en

R0 G

75 B

28

Dark

Blu

eR

0 G

32 B

80

Mid

Gra

yR

115 G

115 B

115

Dark

Gra

yR

80 G

80 B

80

Ric

h B

lack

R0 G

0 B

0

Wh

iteR

255 G

255 B

255

Gra

yR

210 G

210 B

210

Lig

ht G

ray

R230 G

230 B

230

So

ft Bla

ck

for T

ext

R26 G

26 B

26

So

ft Bla

ck

for B

ackg

rou

nd

sR

13 G

130 B

13