Neuromorphic Integrated Bioelectronics

Gert Cauwenberghs [email protected] BENG 207 Neuromorphic Integrated Bioelectronics

BENG 207 Special Topics in Bioengineering

Neuromorphic Integrated Bioelectronics

Week 8: Energy Conservation

Gert Cauwenberghs

Department of Bioengineering UC San Diego

http://isn.ucsd.edu/courses/beng207


BENG 207 Neuromorphic Integrated Bioelectronics


Resonant Adiabatic Energy Recovery

reso

nanc

e

(capacitive load)

Resonant adiabatic energy recovery in computing: Kerneltron support vector machine for visual pattern recognition with resonant hot clock adiabatic energy recovery in charge-domain processing-in-memory computing at 1 fJ of energy per multiply-accumulate [Karakiewicz et al, 2017, 2012]. Energy recovery logic (ERL) CMOS adiabatic line drivers recover 98% of the CV2 electrostatic energy in the charge-mode array.


Resonant Adiabatic Energy Recovery

Resonant adiabatic energy recovery in communication: Cyclic On-Off Keying (COOK) modulation for wireless power and telemetry offers record bandwidth efficiency, allowing to transmit one bit of data every carrier cycle while simultaneously receiving RF power over the same high-Q inductive link [Ha et al, 2016].


SVM Pattern Recognition

ASP A/D Sensory Features

Digital Analog

Large-Margin Kernel Regression

Class Identification Kerneltron: massively parallel support vector “machine” in silicon (ESSCIRC’2002)


Trainable Modular Vision Systems: The SVM Approach

–  Strong mathematical foundations in Statistical Learning Theory (Vapnik, 1995)

–  The training process selects a small fraction of prototype support vectors from the data set, located at the margin on both sides of the classification boundary (e.g., barely faces vs. barely non-faces) SVM classification for

pedestrian and face object detection

Papageorgiou, Oren, Osuna and Poggio, 1998


Trainable Modular Vision Systems: The SVM Approach

–  The number of support vectors, in relation to the number of training samples and the vector dimension, determine the generalization performance

–  Both training and run-time performance are severely limited by the computational complexity of evaluating kernel functions

ROC curve for various image representations and dimensions

Papageorgiou, Oren, Osuna and Poggio, 1998


Kernels and Support Vector Machines

x X

)(⋅Φ

)()(

xXxX

Φ=

Φ= ii

),( ⋅⋅K ),()()( xxxx ii K=Φ⋅Φ

)),((sign bKyySi

iii += ∑∈

xxα

))()((sign byySi

iii +Φ⋅Φ= ∑∈

xxα

)()( xxXX Φ⋅Φ=⋅ ii

Mercer’s Condition

)(sign byySi

iii +⋅= ∑∈

XXα

Mercer, 1909; Aizerman et al., 1964 Boser, Guyon and Vapnik, 1992


)exp( 2σ

xx ⋅∝ i

Kernel Machines

–  Gaussian (Radial Basis Function Networks)

–  Sigmoid (Two-Layer Perceptron)

–  Polynomial (Splines etc.) ν)1(),( xxxx ⋅+= iiK

)exp(),( 2

2

2σ

xxxx −−= iiK

)tanh(),( xxxx ⋅+= ii LK only for certain L

k

k

x1x

2x

11yα

22yαsign y

)),((sign bKyySi

iii += ∑∈

xxα


Parallel SVM Architecture

–  Kernel inner-products are implemented by parallel matrix-vector multiplication (MVM).

–  Silicon area and power dissipation are proportional to number of support vectors, favoring sparse SVM solutions.

–  Sparsity in SVM training also guarantees proper generalization performance (Vapnik, 1995).

x i

x SIGN

α i

MVM SUPPORT VECTORS

INPUT y

KE

RN

EL K

(x ,x i ) )),((sign bKyy

Siiii += ∑

∈

xxα


Kerneltron III: Adiabatic Support Vector “Machine”

•  1.2 TMACS / mW –  adiabatic resonant clocking conserves

charge energy –  energy efficiency on par with human

brain (1015 SynOP/S at 15W)

Karakiewicz, Genov, and Cauwenberghs, VLSI’2006; CICC’2007

x i

x SIGN

α i MVM

SUPPORT VECTORS

INPUT y

KE

RN

EL K

(x ,x i )

)),((sign bKyySi

iii += ∑∈

xxα

Classification results on MIT CBCL face detection data


CMOS Logic vs. Adiabatic Computing

Energy recovery logic (ERL) (Y. Moon, JSSC’96)

•  ‘Hot clock’ recycles energy –  LC tank resonant clock

•  Reversible computation 2

. CVddEdiss =

CMOS logic

•  Dynamic energy dissipation

Vdd

C


Resonant Charge Energy Recovery

resonance

capacitive load

CID array

reso

nanc

e

(capacitive load)

Karakiewicz, Genov, and Cauwenberghs, IEEE JSSC, 2007


Incremental and Decremental SVM Learning

–  Support Vector Machine training requires solving a linearly constrained quadratic programming problem in a number of coefficients equal to the number of data points.

–  An incremental version, training one data point at at time, is obtained by solving the QP problem in recursive fashion, without the need for QP steps or inverting a matrix.

•  On-line learning is thus feasible, with no more than L2 state variables, where L is the number of margin (support) vectors.

•  Training time scales approximately linearly with data size for large, low-dimensional data sets.

–  Decremental learning (adiabatic reversal of incremental learning) allows to directly evaluate the exact leave-one-out generalization performance on the training data.

–  When the incremental inverse jacobian is (near) ill-conditioned, a direct L1-norm minimization of the α coefficients yields an optimally sparse solution.

Cauwenberghs and Poggio, 2001


Trajectory of coefficients a as a function of time during incremental learning, for 100 data points in the non-separable case, and using a Gaussian kernel.


SVM Learning Revisited

iCy

Q

iii

i

ii

ijij

ji

SVcM

b

∀≤≤≡

−=

∑

∑∑∑

,0 and 0:subject to

:min 21

2,

αα

αααεw

Soft-Margin SVM Classification (Cortes and Vapnik, 1995)

∑+=i

ibzgC )(:min 2

21

,w

wε

)( byz iii +⋅= Xw

Primal Formulation

Dual Formulation

KKT Conditions

margin lo

ss

potential coefficient

),(

)('

jijiij

ij

jiji

ii

KyyQ

byQzzCg

xx=

+=

−=

∑ α

α


Sequential On-Line SVM Learning Chakrabartty, Genov and Cauwenberghs, 2003

ccccc

cc

QzzzCgα

α

+≈

−=0

)('

zc0 = zc

Qcczc0>1



zc0

zc0<1

zc = 1

Qcc

ccccc

cc

QzzzCgα

α

+≈

−=0

)('

margin support vector

αc < C



zc0

zc0<1

zc < C

Qcc

ccccc

cc

QzzzCgα

α

+≈

−=0

)('

error support vector

αc = C


Effects of Sequential On-Line Learning and Finite Resolution

•  Matched Filter Response

•  On-Line Sequential Training

•  Kerneltron II

test train

-1

+1

•  Batch Training

•  Floating-

Point Resolution


SVM Sequence Estimation

ASP A/D Sensory Features

Digital Analog

MAP Forward Decoding

Sequence Identification

Sub-microwatt speaker verification and phoneme recognition (NIPS’2004)

GiniSVM

X[n] X[n-1] X[n+1]

j

i


X[1]

q[1]

X[2]

q[2]

X[N]

q[N]

Generative (MLE) HMM

Density models (such as mixtures of Gaussians) require vast amounts of training data to reliably estimate parameters.

Discriminative (MAP) FDKM

X[1]

q[1]

X[2]

q[2]

X[N]

q[N]

Transition-based speech recognition (H. Bourlard and N. Morgan, 1994)

MAP forward decoding

Transition probabilities generated by large margin probability regressor

1 2

P(1|1,x) P(2|2,x)

P(1|2,x)

P(2|1,x)

MLE vs. MAP Sequence Estimation


GiniSVM

Forward Decoding Kernel Machines (FDKM)

–  Forward decoding of posterior probabilities αi

–  Transition probabilities Pij generated by SVM conditioned on input data X

X[n] X[n-1]

∑ −=j

jiji nnPn ]1[][][ αα

])[(])[,|(][ nXfnXjiPnP ijij ∝=

X[n+1]

j

i

Chakrabartty and Cauwenberghs (NIPS’2002)


GiniSVM Sparse Probability Regression Chakrabartty and Cauwenberghs, JMLR 2007

aaaHy

HCQ

Giniiii

i

iCGini

ijij

ji

kGini

bi

)1(4)( with C,0 and 0:subject to

)(:min 21

2,

−=≤≤≡

−=

∑

∑∑∑

γαα

αα αεw

Gini Quadratic Entropy

Huber Loss Function


GiniSVM Probability Regression

Gini quadratic entropy in SVM training leads to sparse, large-margin regression of class probabilities (Chakrabartty et al. 2002)

x s

x NORMALIZATION

λ s i MVM MVM

SUPPORT VECTORS

INPUT f i (x)

P

KE

RN

EL K (x ,x s

) (i | x)

bKyfiPSs

siisi +=∝ ∑

∈

),()()|( xxxx λ

1)|(0

1)|(

≤≤

=∑x

x

iP

iPi

GiniSVM

iX


FDKM Architecture

x s

x NORMALIZATION

λ i1 s

14 24x24

30x24 30x24

1 2 24

α j[n-1] α i[n]

24

MVM MVM

SUPPORT VECTORS

INPUT f i1 (x)

24 FORWARD DECODING

P i1 P i2 4 K

ER

NE

L K(x ,x s )

24x24 GiniSVM

X[n] X[n-1] X[n+1]

j

i

bKf sSs

sijij +=∑

∈

),()( xxx λ

∑ −=j

jiji nnPn ]1[][][ αα


6 X 4 GiniSVMs

INPU

T ST

AG

E

30

14 24

PROGRAMMING REGISTERS

OUTPUT BUS

PRO

GR

AM

MIN

G R

EGIS

TER

S

GiniSVM/FDKM Processor Chakrabartty and Cauwenberghs (NIPS’2004)

Silicon support vector machine (SVM) and forward decoding kernel machine (FDKM)

x s

x NORMALIZATION

λ i1 s

14 24x24

30x24 30x24

1 2 24

α j[n-1] α i[n] 24

MVM MVM

SUPPORT VECTORS

INPUT f i1 (x)

24 FORWARD DECODING

P i1 P i2 4

KE

RN

EL K

(x,x s )

24x24

GiniSVM

X[n] X[n-1] X[n+1]

j

i

–  Sub-Microwatt Power –  Subthreshold translinear MOS circuits –  Programmable with floating-gate non-

volatile analog storage


FDKM Dynamic Sequence Detection (80 nW)

q1q2 q3 q4 q5 q6

q8 q9 q10 q11 q12 q13

q7x1x2 x3 x4 x5 x6

x6x5 x4 x3 x2 x1



FDKM Training Formulation

–  Large-margin training of state transition probabilities, using regularized cross-entropy on the posterior state probabilities:

–  Forward Decoding Kernel Machines (FDKM) decompose an upper bound of the regularized cross-entropy (by expressing concavity of the logarithm in forward recursion on the previous state):

which then reduces to S independent regressions of conditional probabilities, one for each outgoing state:

Chakrabartty and Cauwenberghs, 2002

∑∑∑∑−

=

−

=

−

=

−

=

−=1

0

1

0

221

1

0

1

0||][log][

S

j

S

iij

N

n

S

iii wnnyCH α

∑∑ ∑−

=

−

=

−

=

−=1

0

221

1

0

1

0

||][log][][S

iij

N

n

S

iijijj wnPnynCH

∑−

=

≥1

0

S

jjHH

]1[][ −= nCnC jj α


Recursive MAP Training of FDKM

Epoch 2

n n-1 n-2

Epoch 1

n-1 n

1

2

Epoch K

n n-1 n-k

time

1

2


Phonetic Experiments (TIMIT)

0102030405060708090100

V S F N SV Sil

FDKMStatic

Features: cepstral coefficients for Vowels, Stops, Fricatives, Semi-Vowels, and Silence

Kernel Map Recognition Rate

Chakrabartty and Cauwenberghs, 2002


Programmable Analog Filter Bank (Deng et al, 2004 )

FDKM Chip

Con

ditio

nal

Prob

abili

ties

Speech

Log

Spec

tral

Fe

atur

es

On-Chip TIMIT Phone Recognition

–  6 phones /t/n/r/ow/ah/eh/ from TIMIT corpus –  Thresholded Mel-cepstral features from log-compressed analog

filterbank



On-Chip Speaker Verification (840nW)

–  1 speaker and 10 imposters from YOHO dataset

–  92% recognition accuracy on 48 true and 432 imposter out-of-sample utterances

–  352 support vectors (47% FDKM chip capacity)

–  840 nW power at 25msec frame rate

Correct Speaker Imposter



BENG 207 Neuromorphic Integrated Bioelectronics

Documents

Neuromorphic Integrated Bioelectronics