12
Introduction to Dynamic Stochastic Computing Siting Liu Department of Electrical and Computer Engineering McGill University Montreal, QC H3A 0E9, Canada Email: [email protected] Warren J. Gross Department of Electrical and Computer Engineering McGill University Montreal, QC H3A 0E9, Canada Email: [email protected] Jie Han Department of Electrical and Computer Engineering University of Alberta Edmonton, AB T6G 1H9, Canada Email: [email protected] Abstract—Stochastic computing (SC) is an old but reviving computing paradigm for its simple data path that can perform various arithmetic operations. It allows for low power implemen- tation, which would otherwise be complex using the conventional positional binary coding. In SC, a number is encoded by a random bit stream of ‘0’s and ‘1’s with an equal weight for every bit. However, a long bit stream is usually required to achieve a high accuracy. This requirement inevitably incurs a long latency and high energy consumption in an SC system. In this article, we present a new type of stochastic computing that uses dynamically variable bit streams, which is, therefore, referred to as dynamic stochastic computing (DSC). In DSC, a random bit is used to encode a single value from a digital signal. A sequence of such random bits is referred to as a dynamic stochastic sequence. Using a stochastic integrator, DSC is well suited for implementing accumulation-based iterative algo- rithms such as numerical integration and gradient descent. The underlying mathematical models are formulated for functional analysis and error estimation. A DSC system features a higher energy efficiency than conventional computing using a fixed-point representation with a power consumption as low as conventional SC. It is potentially useful in a broad spectrum of applications including signal processing, numerical integration and machine learning. I. I NTRODUCTION As an unconventional computing paradigm, stochastic com- puting (SC) operates on numbers encoded by sequences of random ‘0’s and ‘1’s [1]. The probability of a ‘1’ in the stochastic sequence is then used to encode a number. The datapath of a typical SC system consists of three major components: stochastic number generators (SNGs), stochastic circuits and probability estimators (PEs), as shown in Fig. 1. Complex computation can be performed by the stochastic circuits in relatively simple bit-wise operations, while the SNG array and PEs are used to convert numbers between the conventional binary representation and stochastic encoding. SC has been studied for decades for its relatively simple logic, Stochastic number generator (SNG) array Binary inputs (from memory devices) Stochastic sequences Stochastic computing circuits Stochastic sequences Probability estimators (PEs) Binary outputs Fig. 1: Datapath of a stochastic computing system. ...010100101010110100... p = L N L N: number of ‘1’s in the sequence p: encoded value (a) Stochastic unipolar encoding. A = Δt n (t; tt) Δt n(·): number of spikes in the time window A: encoded value (b) Rate coding. Fig. 2: (a) Stochastic encoding vs. (b) rate coding. low power consumption, an inherent resilience to bit-flip errors and biological plausibility. In a stochastic sequence, a number, x, can be simply encoded as the probability of a ‘1’ or, approximately, as the ratio of the number of ‘1’s in the total number of bits. An inverter then computes 1 - x for an input sequence encoding x. For two independent input sequences encoding x and y, an AND gate outputs a ‘1’ when both the input bits are ‘1’s, so the probability of a ‘1’ in the output sequence is equal to the product of the probabilities of ‘1’s of the two input stochastic sequences, i.e., xy, whereas in conventional arithmetic, a multiplier requires a complex circuit. Since numbers are represented by redundant random bits, a bit-flip usually incurs a relatively small error in the final result, so SC is fault-tolerant and robust against moderate operational errors. However, if a bit flip occurs at a more significant bit in a conventional circuit, it can introduce a large error [2], [3]. In fact, a stochastic sequence works in a similar manner to the rate coding [4] in a spiking neuron, as shown in Fig. 2. With this coding, the frequency of spikes during a time interval carries the information in a neuron [5], similar to the stochastic encoding using the frequency of the occurrence of ‘1’ in SC. Inspired by the neuronal coding, SC has been used to implement various machine learning algorithms [6]–[17]. To achieve a high accuracy, however, a long sequence is often required in SC due to stochastic fluctuations. A long sequence in turn implies a high latency and a high energy consumption. In addition, the costly SNGs and PEs can account for a major part of the circuits in an SC system [18]. To address this issue, we noticed that a stochastic sequence can carry the information of a varying signal with each bit encoding a value from the signal. This new type of sequences is referred to as a dynamic stochastic sequence (DSS). In

Introduction to Dynamic Stochastic Computing

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Dynamic Stochastic Computing

Introduction to Dynamic Stochastic ComputingSiting Liu

Department of Electrical andComputer Engineering

McGill UniversityMontreal QC H3A 0E9 CanadaEmail sitingliumailmcgillca

Warren J GrossDepartment of Electrical and

Computer EngineeringMcGill University

Montreal QC H3A 0E9 CanadaEmail warrengrossmcgillca

Jie HanDepartment of Electrical and

Computer EngineeringUniversity of Alberta

Edmonton AB T6G 1H9 CanadaEmail jhan8ualbertaca

AbstractmdashStochastic computing (SC) is an old but revivingcomputing paradigm for its simple data path that can performvarious arithmetic operations It allows for low power implemen-tation which would otherwise be complex using the conventionalpositional binary coding In SC a number is encoded by arandom bit stream of lsquo0rsquos and lsquo1rsquos with an equal weight forevery bit However a long bit stream is usually required toachieve a high accuracy This requirement inevitably incurs along latency and high energy consumption in an SC systemIn this article we present a new type of stochastic computingthat uses dynamically variable bit streams which is thereforereferred to as dynamic stochastic computing (DSC) In DSCa random bit is used to encode a single value from a digitalsignal A sequence of such random bits is referred to as adynamic stochastic sequence Using a stochastic integrator DSCis well suited for implementing accumulation-based iterative algo-rithms such as numerical integration and gradient descent Theunderlying mathematical models are formulated for functionalanalysis and error estimation A DSC system features a higherenergy efficiency than conventional computing using a fixed-pointrepresentation with a power consumption as low as conventionalSC It is potentially useful in a broad spectrum of applicationsincluding signal processing numerical integration and machinelearning

I INTRODUCTION

As an unconventional computing paradigm stochastic com-puting (SC) operates on numbers encoded by sequences ofrandom lsquo0rsquos and lsquo1rsquos [1] The probability of a lsquo1rsquo in thestochastic sequence is then used to encode a number Thedatapath of a typical SC system consists of three majorcomponents stochastic number generators (SNGs) stochasticcircuits and probability estimators (PEs) as shown in Fig 1Complex computation can be performed by the stochasticcircuits in relatively simple bit-wise operations while theSNG array and PEs are used to convert numbers between theconventional binary representation and stochastic encodingSC has been studied for decades for its relatively simple logic

MUX

01100110hellip

p1=05

10101010hellip

p2=05

00100010hellip

p=p1p2=025

01100101hellip

p1=05

11101110hellip

p2=075

sel

psel=05

00110110hellip

0

1 p=(1-psel)p1+pselp2=0625

01100111hellip (a)

(c)

10010000hellip

00101000hellip01000111hellip times2-1= -05

416

p1=

times2-1= -05 416

p2=

times2-1= 025 1016

p1p2=

(b) A bipolar stochastic multiplier

00101000hellip 11010111hellip

p1=025 p=1-p1=075

(b)

p1

p2

p=p1p2

1

1

p1

p2

(1 1)

AltB

AltB

RN1

RN2

Counter

(b) A bipolar stochastic multiplier

Stochastic number generator (SNG)

array

Binary inputs(from memory

devices)

Stochastic sequences

Stochastic computing

circuits

Stochastic sequences

Probability estimators

(PEs)

Binary outputs

+

-

DSNGDigital signal

Bipolar stochastic

sequence encodinglsquo0rsquo

Integration results

Output DSS

Fig 1 Datapath of a stochastic computing system

010100101010110100

A =Δt

n (t t+Δt)

p =p

N

Δt

L

N number of lsquo1rsquos in the sequence

n() number of spikes in the time window

encoded value L

A encoded value

(a) Stochastic unipolar encoding

(b) Rate coding

RNGDynamic stochastic sequence

encoding signal xn

111010001011hellip

Comparator

Y

XXltY

xn

010100101010110100

p =L

N

L

N number of lsquo1rsquos in the sequence

p encoded value

(a) Stochastic unipolar encoding

A =Δt

n (t t+Δt)

Δt

n() number of spikes in the time window

A encoded value

(b) Rate coding

Fig 2 (a) Stochastic encoding vs (b) rate coding

low power consumption an inherent resilience to bit-flip errorsand biological plausibility

In a stochastic sequence a number x can be simplyencoded as the probability of a lsquo1rsquo or approximately as theratio of the number of lsquo1rsquos in the total number of bits Aninverter then computes 1minus x for an input sequence encodingx For two independent input sequences encoding x and yan AND gate outputs a lsquo1rsquo when both the input bits arelsquo1rsquos so the probability of a lsquo1rsquo in the output sequence isequal to the product of the probabilities of lsquo1rsquos of the twoinput stochastic sequences ie xy whereas in conventionalarithmetic a multiplier requires a complex circuit

Since numbers are represented by redundant random bits abit-flip usually incurs a relatively small error in the final resultso SC is fault-tolerant and robust against moderate operationalerrors However if a bit flip occurs at a more significant bit ina conventional circuit it can introduce a large error [2] [3]

In fact a stochastic sequence works in a similar manner tothe rate coding [4] in a spiking neuron as shown in Fig 2With this coding the frequency of spikes during a time intervalcarries the information in a neuron [5] similar to the stochasticencoding using the frequency of the occurrence of lsquo1rsquo inSC Inspired by the neuronal coding SC has been used toimplement various machine learning algorithms [6]ndash[17]

To achieve a high accuracy however a long sequence isoften required in SC due to stochastic fluctuations A longsequence in turn implies a high latency and a high energyconsumption In addition the costly SNGs and PEs canaccount for a major part of the circuits in an SC system [18]

To address this issue we noticed that a stochastic sequencecan carry the information of a varying signal with each bitencoding a value from the signal This new type of sequencesis referred to as a dynamic stochastic sequence (DSS) In

a DSS the sequence length encoding each number is just1 bit so it can significantly improve the performance andenergy efficiency of a stochastic circuit In this article wepresent dynamic stochastic computing (DSC) using DSSrsquos anddiscuss the efficient implementation of iterative accumulation-based computations including filtering the Euler methodfor solving ordinary differential equations (ODEs) and thegradient descent (GD) algorithm in machine learning

II BACKGROUND

Originally proposed in the 1960s SC is intended to be alow-cost alternative to conventional computing In SC num-bers are encoded by random binary bit streams or stochasticsequences The probability of each bit being lsquo1rsquo is consideredto be the probability encoded by a stochastic sequence andused in different mapping schemes to encode numbers withindifferent ranges [1]

Assume that x is the number to be encoded and that p is theprobability of a stochastic sequence The simplest mapping isto let x = p ie the probability of the stochastic sequence isused to represent a real number within [01] This is referredto as the unipolar representation In order to expand the rangeto include negative values the bipolar representation takes alinear transformation of the unipolar representation by x =(pminus 05)times 2 so that the representation range is [minus1 1]

Recently a sign-magnitude representation was proposed toexpand the unipolar representation range by adding an extrasign bit to a stochastic sequence [19] The same range isobtained as the bipolar representation and the computed resultis more accurate However it is still considered a linearmapping since it is a modified unipolar representation A fewexamples of these representations are shown in Table I

TABLE I Examples of different SC representations

Sequence p Sign Value encoded

Unipolar 0110101110 06 ndash 06Bipolar 0100110100 04 ndash 2times 04minus 1 = minus02

Sign-mag 0110101110 06 1 (minus1)times 06 = minus06

A different mapping scheme may require different comput-ing elements For the rest of this article the unipolar and sign-magnitude representations are adopted for its simple circuitimplementation and high accuracy unless stated otherwise

A Stochastic sequence generation

An SNG consists of a random number generator (RNG)and a comparator as shown in Fig 3 An RNG is usuallyimplemented by a linear-feedback-shift register (LFSR) Amaximum-length n-bit LFSR traverses all the integer numberswithin [1 2nminus1] The numbers generated by an LFSR are con-sidered pseudorandom numbers because they are deterministicrather than truly random once the seed and the structure of theLFSR are determined However due to their statistical char-acteristics deliberately generated pseudorandom numbers canbe considered as uniformly distributed If the n-bit numberused to encode a fractional number within [0 1] p after the

normalization by 2n is larger than an n-bit pseudorandomnumber a lsquo1rsquo is generated otherwise lsquo0rsquo is the outputThen the probability of generating a lsquo1rsquo is approximately pThe resulting stochastic sequence encodes p in the unipolarrepresentation and x in the bipolar representation using theaforementioned linear mapping x = 2p minus 1 A sequencewith probability p is generated similarly to encode x in thisrepresentation A detailed description and the mathematicalmodel for an LFSR-based SNG can be found in [20]

Inputs Hidden layers

Outputs

(a)

hellip

x

w o=f(wx)

hellip

(b)

(c)

wl+1

δl+1o wl

hellip

δwl = f`wl+1δl+1

hellip

Random number generator (RNG)

p

Stochastic sequence with probability p

0101100hellip

Comparator

B

AAltB

N

N

(a)

CLK

Fig 3 An SNG

In [21] and [22] two types of low-discrepancy (LD) se-quences the Halton and Sobol sequences are introduced toSC They are originally proposed to accelerate and improve theaccuracy of Monte Carlo simulations by using a regular gener-ation of quasirandom numbers as shown in Fig 4 [23] Haltonsequences can be generated by a binary-coded prime number-based counter (eg (2120)3 is stored as (10)2(01)2(10)2(00)2in the counter) The order of the digits in the counter isthen reversed and the resulting number is converted to abinary number for generating a Halton sequence [21] Sobolsequences can be generated by the circuit shown in Fig 5As per the Sobol generation algorithm [24] the index ofthe direction vector is first obtained by the least significantzero (LSZ) detection and index generation component Thedirection vector array (DVA) is generated by using a primitivepolynomial At each clock cycle one direction vector (DV)Vk is fetched and XORed with the current Sobol numberRi to generate the next Sobol number Ri+1 In SC the useof the LD sequences improves the accuracy with a shorterlength However a fairly long sequence is still required for ahigh accuracy

B Stochastic circuits

Stochastic circuits are the core of an SC system Theoperation of a combinational circuit can be considered as

0 05 1(a)

0

02

04

06

08

1

0 05 1(b)

0

02

04

06

08

1

Fig 4 Examples of two-dimensional (a) Sobol sequencesand (b) LFSR-generated sequences The quasirandom numbersin (a) are more evenly distributed than the pseudorandomnumbers in (b)

Synapse

Dendrites

Axon

Soma(cell)

Membrane

VM (t+1) = VM (t) + Σεij - Σιij

if VM gt Vth reset and fire

else do not fire

εij Excitatory inputs of the neuron

ιij Inhibitory inputs of the neuron

VM Membrane potential

Vth Threshold voltage

+

-

Uij-1Ui-1jUi+1jUij+1

uij

Uij

+3

Usum

divide4

+

-

Uij-1

Ui-1j

Ui+1j

Uij+1 uij

Uij

+3

cN

CS

CE

CW

CN

3+

CS

CE

CW

CN

Counter (starting

from i=0)

Priority encoder

indexk

Direction vector array (DVA)

V3

V2

V1

V0

hellip

Ri

Vk

Ri+1D-flip flops

Least significant zero (LSZ) detection and index generation

Fig 5 A Sobol sequence generator [25] It can be used as theRNG in an SNG to improve the accuracy

a Monte Carlo simulation process [21] For example thefunction of a unipolar stochastic multiplier is equivalent toestimating the rectangular area determined by p1p2 where p1and p2 are the probabilities encoded by the input sequencesby randomly dropping points into a unit square The ANDgate is used to decide whether the randomly generated pointwith coordinates (r1 r2) is within the rectangle Only whenboth r1 lt p1 and r2 lt p2 is the random point located withinthe rectangle as shown in Fig 6 and the output of the ANDgate is lsquo1rsquo The counter is then used to count the total numberof points within the rectangle as an estimate of p1p2

MUX

01100110hellip

p1=05

10101010hellip

p2=05

00100010hellip

p=p1p2=025

01100101hellip

p1=05

11101110hellip

p2=075

sel

psel=05

00110110hellip

0

1 p=(1-psel)p1+pselp2=0625

01100111hellip (a)

(c)

10010000hellip

00101000hellip01000111hellip times2-1= -05

416

p1=

times2-1= -05 416

p2=

times2-1= 025 1016

p1p2=

(b) A bipolar stochastic multiplier

00101000hellip 11010111hellip

p1=025 p=1-p1=075

(b)

p1

p2

p=p1p2

1

1

p1

p2

(1 1)

AltB

AltB

r1

r2Counter

(b) A bipolar stochastic multiplier

Stochastic number generator (SNG)

array

Binary inputs(from memory

devices)

Stochastic sequences

Stochastic computing

circuits

Stochastic sequences

Probability estimators

(PEs)

Binary outputs

+

-

DSNGDigital signal

Bipolar stochastic

sequence encodinglsquo0rsquo

Integration results

Output DSS

MUX

+SNGarray

y

x

Stochastic sequences encoding coefficients

z0z1

zN

hellip

N copies of independent stochastic sequences that

each encodes x

SEL0

1hellip

N

hellip

A

B

+

-

+

-

A

B

AB

(a) (b)

B(A+B)

(a) (b)

hellip

k

+

-

p

p

(c)

^

Fig 6 (a) A unipolar stochastic multiplier and (b) its MonteCarlo model

Fig 7 shows several stochastic multipliers in differentrepresentations a bipolar multiplier is implemented by anXNOR gate and the sign-magnitude stochastic multiplier isimplemented by a unipolar multiplier and an XOR gate forcomputing the sign bit Generic methods have also beenproposed to synthesize combinational circuits for computingBernstein polynomials [2] or multi-linear functions [26]

Inputs Hidden layers

Outputs

(a)

hellip

x

w o=f(wx)

hellip

(b)

(c)

wl+1

δl+1o wl

hellip

δwl = f`wl+1δl+1

hellip

Random number generator (RNG)

p

Stochastic sequence with probability p

0101100hellip

Comparator

B

AAltB

N

N

(a)

CLK

01100110hellip

p1=05

10101010hellip

p2=05

00100010hellip

p=p1p2=025

(a) A stochastic multiplier for the unipolar representation

0100010010010000hellip

1000100010001000hellip0011001111100111hellip

times2-1= -05 4

16p1=

times2-1= -05 416

p2=

times2-1= 025 1016

p1p2=

(b) A stochastic multiplier for the bipolar representation

01100110hellip

|p1|=05

10101010hellip

|p2|=05

00100010hellip

|p|=|p1||p2|=025

(c) A stochastic multiplier for the sign-magnitude representation

|p1|=05

Sign(p1)=1(negative)

Sign(p2)=0(positve)

Sign(p)=1(negative)

Fig 7 Three stochastic multipliers for different representa-tions The input sequences should be uncorrelated for a higheraccuracy

There are primarily two types of sequential circuits theFSM-based and stochastic integrator-based circuits An FSM-based circuit can be modeled by a state transition diagram asshown in Fig 8 For an input bit of lsquo0rsquo the FSM moves tothe left of its current state otherwise it moves to the rightNote that when the FSM is at the leftmost (or rightmost)state it stays at the current state when receiving a lsquo0rsquo (or lsquo1rsquo)The FSM can be considered as a Markov chain if the inputsequence is not autocorrelated or every bit in the sequence isgenerated independently with the same probability to be lsquo1rsquo Asteady hyper-state can be reached after several runs where theprobability that the FSM staying at each state converges [1]By assigning a different output (lsquo0rsquo or lsquo1rsquo) to each stateexponential logarithmic and high-order polynomial functionscan be implemented by the FSM-based circuits A generalsynthesis method is discussed in [27]

S0 S1 Sk-1 Sk Sk+1 SN-2 SN-1helliphelliphelliphellip

X=0X=0X=0X=0X=0X=0

X=1 X=1 X=1 X=1 X=1 X=1 X=1

X=0

Fig 8 State transition diagram for an FSM-based stochasticcircuit

Stochastic integrators are sequential SC elements that accu-mulate the difference between two input sequences [1] Thestochastic integrator consists of an RNG an updown counterand a comparator as shown in Fig 9(a) Let the output ofthe comparator be Si (either lsquo0rsquo or lsquo1rsquo) at clock cycle i theinput bits be Ai and Bi and the n-bit integer stored in thecounter be Ci The RNG and the comparator work as an SNGand the probability of generating a lsquo1rsquo is ci = Ci2

n ie then-bit counter stores a number in a fixed-point manner and allthe bits belong to the fractional part The counter updates itsvalue Ci at each clock cycle by Ci+1 = Ci + Ai minus Bi orequivalently

ci+1 = ci + 12n(Ai minusBi) (1)

In k clock cycles the stochastic integrator implements

ck = c0 + 12nkminus1sumi=0

(Ai minusBi) (2)

where c0 is the initial (fractional) value stored in the counterA stochastic integrator works in a similar manner to a

spiking neuron as shown in Fig 9 The stochastic integratorldquofiresrdquo randomly depending on the value stored in the counterwhereas the spiking neuron fires when the accumulated mem-brane potential VM reaches a threshold Vth [5] The inputs tothe neuron can be excitatory (εij) or inhibitory (ιij) while theinput sequences A and B of the stochastic integrator increaseand decrease the value stored in the counter

The integrator has been used in a stochastic divider withthe output stochastic sequence as the feedback signal [1] asshown in Figs 10(a) and (b) The quotient is obtained whenthe value stored in the counter reaches an equilibrium stateie the updown counter has an equal probability to increase

Rate = average over pool of equivalent neurons(several neuron single runs)

j=1

2

3

hellip

A =1Δt M

n(t t+Δt)M

Δt

j=1

2

3

M

hellip

01010

11000

00010

01000

p =M

N

L

010100101010110100

A =Δt

n (t t+Δt)

p =L

N

Δt

L

N number of lsquo1rsquos in the sequence

n() number of spikes in the time window

2n-state Counter INC

DEC

RNG

gt S

A

B

comparator

C

ri

C i+1 = C i + Ai - Bi

if Ci gt ri S = 1

else S = 0

p encoded value

A encoded value

(a) A stochastic integrator

S+

-

CA

B

(b) A symbol

Synapse

Dendrites

Axon

Soma(cell)

Membrane

VM (t+1) = VM (t) + Σεij - Σιij

if VM gt Vth reset and fire

else do not fire

εij Excitatory inputs of the neuron

ιij Inhibitory inputs of the neuron

VM Membrane potential

Vth Threshold voltage

(c) An integrate-and-fire spiking neuron model

Fig 9 (a) A stochastic integrator (b) its symbol and (c) aspiking neuron [5]

or decrease However it may take a long time to converge tothe equilibrium state as discussed in [6] [25] A stochasticintegrator with a feedback loop has been used as an ADpativeDIgital Element (ADDIE) shown in Fig 10(c) The ADDIEcan function as a probability follower ie the probability ofthe output sequence tracks the change of the probability of theinput sequence [1] [28]

MUX

01100110hellip

p1=05

10101010hellip

p2=05

00100010hellip

p=p1p2=025

01100101hellip

p1=05

11101110hellip

p2=075

sel

psel=05

00110110hellip

0

1 p=(1-psel)p1+pselp2=0625

01100111hellip (a)

(c)

10010000hellip

00101000hellip01000111hellip times2-1= -05

416

p1=

times2-1= -05 416

p2=

times2-1= 025 1016

p1p2=

(b) A bipolar stochastic multiplier

00101000hellip 11010111hellip

p1=025 p=1-p1=075

(b)

p1

p2

p=p1p2

1

1

p1

p2

(1 1)

AltB

AltB

RN1

RN2

Counter

(b) A bipolar stochastic multiplier

Stochastic number generator (SNG)

array

Binary inputs(from memory

devices)

Stochastic sequences

Stochastic computing

circuits

Stochastic sequences

Probability estimators

(PEs)

Binary outputs

+

-

DSNGDigital signal

Bipolar stochastic

sequence encodinglsquo0rsquo

Integration results

Output DSS

MUX

+SNGarray

y

x

Stochastic sequences encoding coefficients

z0z1

zN

hellip

N copies of independent stochastic sequences that

each encodes x

SEL0

1hellip

N

hellip

A

B

+

-

+

-

A

B

AB

(a) (b)

B(A+B)

(a) (b)

hellip

k

+

-

p

p

(c)

^

Fig 10 Stochastic dividers compute (a) AB (b) B(A+B)(c) shows an ADDIE

C Probability estimatorsThe computed probability of the output sequence can be

found by using a counter It is estimated as the quotient

of the number of lsquo1rsquos divided by the sequence length TheADDIE in Fig 10(c) is an alternative component to estimatethe probability by using a stochastic integrator reaching theequilibrium state [1]

D Limitations

Although the SC circuits are much simpler than conven-tional arithmetic circuits a large sequence length is requiredto achieve a high accuracy This requirement leads to poorperformance and low energy efficiency Fig 11 shows the rootmean square error (RMSE) encoding a number in the unipolarrepresentation using different types of ldquorandomrdquo numbers Thepseudorandom numbers are generated by a 16-bit LFSR withrandomly generated seeds The RMSEs are obtained from1000 trials using different sequence lengths As shown inFig 11 to achieve an RMSE of 1times10minus2 a sequence length of212 bits is required for the LFSR generated sequences For thestochastic Sobol sequences the length is significantly reducedto 26 or 64 bits to achieve a similar accuracy although it isstill not as compact as an equivalent 5-bit fixed-point numberthat results in a similar RMSE Meanwhile low precision (egat an RMSE of 1times 10minus2 or larger) can be tolerated in manyapplications such as inference in a machine learning model[12]ndash[17] [29]ndash[33] digital signal processing [34]ndash[40] andimage processing [41]ndash[44] Hence progressive precision (PP)can be employed and the Sobol sequence length can be as lowas 8 bits in an application in a CNN inference engine [45]

0 2 4 6 8 10 12

Sequence length (2N)

10-4

10-3

10-2

10-1

100

RM

SE

Sobol sequencePseudorandom sequence

Fig 11 RMSE of the encoded numbers with different typesof stochastic sequences

Despite the fact that some of these implementations achievea lower energy consumption with PP compared with conven-tional circuits the application of conventional SC (CSC) isstill quite limited

III DYNAMIC STOCHASTIC COMPUTING

In a DSC system as shown in Fig 12 the stochasticsequences encode varying signals instead of static numbersConsider a discrete-time digital signal xi within [0 1] (i =0 1 2 ) A DSS encoding xi satisfies that E[Xi] = xiwhere Xi is the ith bit in the DSS Thus every bit in thesequence can have a different probability to be 1

Digital signals (from ADCs or

storage)

DSNGsSignal

reconstruction To be stored or converted back to

analog signals

Stochastic logicDynamic

stochastic sequences

Dynamic stochastic sequences

Fig 12 A DSC system

RNGDynamic stochastic sequence

encoding signal xi

111010001011hellipComparator

B

AAltB

xi

Fig 13 A DSNG xi is an input to the comparator at onesample per clock cycle

A DSS generation

A DSS can be generated by a dynamic SNG (DSNG) asshown in Fig 13 It is similar to a conventional SNG exceptthat the input is a varying signal instead of a static number TheRNG generates uniformly distributed random numbers within[0 1] Given a discrete signal xi as shown in Fig 13 itis compared with a random number one sample at a timeIf xi is larger than the random number a lsquo1rsquo is generatedotherwise the DSNG generates a lsquo0rsquo Generating every bit inthe DSS is a Bernoulli process (in the ideally random case)so the expectation getting a lsquo1rsquo is equal to the correspondingsample value from the discrete digital signal The discretesignal can either be sampled from an analog signal or readfrom a memory device

B Data representations in DSC

Similar to CSC all the aforementioned mapping schemescan be used to encode numbers within certain ranges in DSCwhich subsequently requires the use of different computationalelements Specifically a DSS can encode a signal within [0 1]in the unipolar representation or within [minus1 1] in the bipolaror sign-magnitude representation For the sign-magnitude rep-resentation two sequences are used to encode a signal Oneencodes the magnitude of the signal Since the signal can haveboth positive and negative values another sequence is used torepresent the signs of the numbers

C Stochastic circuits

The DSC circuits can be combinational or sequential1) Combinational circuits A combinational circuit using

DSSrsquos as its inputs implements the composite function of theoriginal stochastic function For example an AND gate or aunipolar stochastic multiplier computes zi = xiyi for eachpair of bits in the input DSSrsquos that independently encode xiand yi (i = 0 1 2 ) respectively [46] due to E[Zi] =E[XiYi] = E[Xi]E[Yi] = xiyi where Zi Xi and Yi are theith bits in the output and input DSSrsquos respectively Note thatfor a continuous signal oversampling is required to producethe discrete signals xi and yi for accurate multiplication [46]Fig 14(a) shows the DSC circuit for a signal multiplicationand (b) shows the results of multiplying two sinusoidal signals

DSNG Signal Xx(t)

DSNG

DSNG

Signal reconstruction

X

Y

x(t)

y(t)ziZ

(a)

0 02 04 06 08 1 12 14 160

05

1

x(t)

Original signalDSS

0 02 04 06 08 1 12 14 160

05

1

y(t)

0 05 1 15t (s)

0

05

1

z(t)

Double precisionDynamic stochastic

(b)

Fig 14 A frequency mixer by using a stochastic multiplierwith DSSrsquos as the inputs (a) circuit and (b) results

with frequencies of 15 Hz and 2 Hz both sampled at a rateof 210 Hz The sampled signals are converted to DSSrsquos by twoDSNGs with independent Sobol sequence generators [47] asthe RNGs The output DSS is reconstructed to a digital signalby a moving average [28] The DSS can also be reconstructedby an ADDIE [1]

2) Sequential circuits The stochastic integrator is an im-portant component used in DSC for iterative accumulation Itimplements integrations of the signal encoded by accumulatingthe stochastic bits in a DSS Fig 15 shows that a bipolarstochastic integrator can be used to perform numerical inte-gration In Fig 15(a) the input digital signal is cos(πt2)sampled at a sampling rate of 28 Hz and converted to a DSSby the DSNG The other input of the stochastic integrator isset to a bipolar stochastic sequence encoding 0 (a stochasticbit stream with half of the bits being lsquo1rsquo) By accumulatingthe input bits the stochastic integrator provides an unbiasedestimate to the Riemann sum [48] The integration is obtainedby recording the changing values of the counter Since theintegrated results are directly provided by the counter areconstruction unit is not required

As can be seen from Fig 15(b) the results produced bythe stochastic integrator is very close to the analytical resultsSince the output DSS is generated by comparing the changingsignal (as the integration results) with uniformly distributedrandom numbers the RNG and the comparator inside theintegrator works as a DSNG As a result the output sequenceencodes the discretized stochastic integration solution for theanalytical solution (2π) sin(πt2) The output DSS is alsoshown in Fig 15(b)

MUX

01100110hellip

p1=05

10101010hellip

p2=05

00100010hellip

p=p1p2=025

01100101hellip

p1=05

11101110hellip

p2=075

sel

psel=05

00110110hellip

0

1 p=(1-psel)p1+pselp2=0625

01100111hellip (a)

(c)

10010000hellip

00101000hellip01000111hellip times2-1= -05

416

p1=

times2-1= -05 416

p2=

times2-1= 025 1016

p1p2=

(b) A bipolar stochastic multiplier

00101000hellip 11010111hellip

p1=025 p=1-p1=075

(b)

p1

p2

p=p1p2

1

1

p1

p2

(1 1)

AltB

AltB

RN1

RN2

Counter

(b) A bipolar stochastic multiplier

Stochastic number generator (SNG)

array

hellip

Binary inputs(from memory

devices)

hellip

Stochastic sequences

Stochastic computing

circuits

Stochastic sequences

Probability estimators

(PEs)

hellip hellip

Binary outputs

+

-

DSNGDigital signal

Bipolar stochastic

sequence encodinglsquo0rsquo

Integration results

Output DSS

(a)

0 05 1 15 2 25 3 35 4t

-1

-05

0

05

1

Analytical resultsResults by DSCOriginal signalOutput DSS

(b)

Fig 15 A bipolar stochastic integrator for numerical integra-tion (a) circuit and (b) results The output DSS is decimatedfor a clear view

The stochastic integrator has been used in a stochastic low-density parity-check (LDPC) decoder to implement a trackingforecast memory in [49] It solves the latch-up problem in thedecoding process with a low hardware complexity

For the FSM-based sequential circuits however the DSScannot work well because the probability consistently changesand a steady hyper-state is hard to reach The DSS could alsoviolate the condition of the model being a Markov chain dueto the autocorrelation of the encoded signal

IV APPLICATION OF DSC

By using the stochastic integrators DSC can implement aseries of algorithms that involve iterative accumulations suchas an IIR filter the Euler method and a GD algorithm

A IIR filtering

In [50] it is shown that an ADDIE with different parameterscan be used as a low-pass IIR filter with different frequencyresponses as shown in Fig 16 Given the bit width of thestochastic integrator n the transfer function of such a filter inthe Z-domain is given by

H(Z) =1

2nZ + 1minus 2n (3)

for a stable first-order low-pass IIR filter [50] An oversampled∆ minus Σ modulated bit stream is then used as the input of thecircuit Since the generation of a ∆minusΣ modulated bit stream

2n-state Counter INC

DEC

RNG

gtSeqout

A

BSeqout

(a) (b)

comparator

A

B

+

-

Countout

Countout

Seqout

(b)

+

-

Countout

X

+

-

p

y(t)

Fig 16 ADDIE circuit produces an exponential function whenthe input is a sequence encoding a static number

0 01 02 03 04t (s)

(a) Original and filtered signals

-1

0

1

0 50 100 150 200 250f (Hz)

(b) Spectra of original and filtered signals

0

02

04Original signalFiltered signal

Fig 17 The low-pass filtering results in (a) the time domainand (b) the frequency domain

can also be considered as a Bernoulli process [51] the DSScan be used for IIR filtering as well

Fig 17 shows the filtering of mixed sinusoidal signals of 4Hz and 196 Hz The mixture is first sampled at a sampling rateof 656k Hz and the sampled digital signal is then convertedto a DSS The DSS is filtered by an 8-bit ADDIE As can beseen the high-frequency signals are almost filtered out whilethe low-frequency signals remain

B ODE solvers

The Euler method provides numerical solutions for ODEsby accumulating the derivative functions [52] For an ODEdydt = f(t y(t)) the estimated solution y(t) is given by

y(ti+1) = yi+1 = yn + hf(ti yi) (4)

with h as the step size and ti+1 = ti + h After k steps ofaccumulation

yk = y0 + h

kminus1sumi=0

f(ti yi) (5)

To use a stochastic integrator let the pair of input DSSrsquos(eg A and B in (2)) encode the discrete derivative functionf(ti yi) such that E[Ai minus Bi] = f(ti yi) By doing sothe stochastic integrator implements (2) and approximatelycomputes (5) with a step size h = 12n As a result the bitwidth of the counter does not only decide the precision of thecomputation but also the time resolution The integration in theEuler method is implemented by accumulating the stochasticbits in the DSSrsquos instead of real values

Using this method an exponential function can be generatedby a stochastic integrator The ADDIE circuit as shown inFig 16 solves

dy(t)

dt= pminus y(t) (6)

which is the differential equation for a leaky integratormodel [53] Thus the ADDIE can be used to implement

neuronal dynamics in a leaky integrate-and-fire model Theanalytical solution of (6) is given by

y(t) = pminus (pminus y0)eminust (7)

where y0 is provided in the initial condition ie y(0) = y0This model is commonly seen in natural processes such asneuronal dynamics [53] Fig 18 shows the results producedby the ADDIE when y0 = 1 and p = 0

0 2 4 6 8t

0

02

04

06

08

1

y(t)

DSCAnalytical

Fig 18 ADDIE used as an ODE solver The solution is anexponential function

For a set of ODEsdy1(t)

dt = y2(t)minus 05dy2(t)

dt = 05minus y1(t)(8)

with the initial values y1(0) = 0 and y2(0) = 05 a numericalsolution can be found by the circuit in Fig 19 The upperintegrator computes the numerical estimate of y1(t) by takingthe output DSS from the lower integrator and a conventionalstochastic sequence encoding 05 as inputs So the differenceof the two input sequences approximately encodes y2(t)minus05the derivative of y1(t) Simultaneously the output DSS of theupper stochastic integrator encodes the numerical solution ofy1(t) which is used to solve the second differential equationin (8) Thus the stochastic integrator can be viewed as anunbiased Euler solution estimator [54] The results producedby the dynamic stochastic circuit is shown in Fig 20 8-bitupdown counters are used in both stochastic integrators andthe RNG is implemented by a Sobol sequence generator [47]Since the results are provided by the counter in the stochasticintegrator an explicit reconstruction unit as shown in Fig 12is not required

A partial differential equation such as a Laplacersquos equationcan also be solved by an array of stochastic integrators witha finite-difference method

C Weight update for an adaptive filterDSC can be applied to update the weights in an adaptive

digital filter An adaptive filter system consists of a linear filterand a weight update component as shown in Fig 21 Theoutput of the linear filter at time i is given by

yi = F (wixi) = wixi =sumMminus1

j=0wj

iximinusM+j+1 (9)

+

-

y(t)

1

z(t)

times2

+

-

Y

+

-

+

-

y1(t)

y2(t)

p=05Y1

Y2

(a) (b)

p=05(times2)

bi

ai

+

-

y(t)

1

z(t)

times2

+

-

+

-

+

-

y1(t)

y2(t)

p=05

p=05(times2)

bi

ai

+

-

1

0+

-0+

-0

16

y(t)= t3

y(t)=t

y(t)= t212

Fig 19 A pair of stochastic integrators solving (8) [54]

0 2 4 6 8 10 120

05

1

y 1(t

)

DSCAnalytical solution

0 2 4 6 8 10 12t

0

05

1

y 2(t

)Fig 20 Simulation results of the ODE solver in Fig 19 Theyare compared with the analytical solutions

where M is the length of the filter or the number of weightswi is a vector of M weights wi = [w

(0)i w

(1)i w

(Mminus1)i ]

and xi is the input vector for the digital signal sampled atdifferent time points xi = [ximinusM+1 ximinusM+2 xi]

T

+

-

y(xi)

F(wi xi)

F(wi xi) wi

ti

SNGpart F(wi xi)

part wijSNG

SNG

ai

bi

EEEE

Linear filter with weights wi

LMS weight update unit

xi-M+1xi-M+2

xi

hellip

yi +- ti+

ei

xi

+ -

xi-M+j

wij

xixi-M+1

+ -

yi

wi1

ti

RNG

+ -

wiM

RNG

hellip hellip

hellip hellip

hellip hellip

Comparator

+ - + - + -

wijwi1 wiM

xi-M+j+1 xixi-M+1

yi

ti

RNG

RNGhellip hellip

hellip hellip

hellip hellip

Comparator

Linear filter with parameters wi

Δwi

LMS weight updating unit

hellip

yi +- ti+

ei

hellip

Δwi

(Ci)

wij[Pi] = [12nCi] =

Fig 21 An adaptive filter

In the least-mean-square (LMS) algorithm the cost functionis the mean squared error between the filter output and theguidance signal ti The weights are updated by

wi+1 = wi + ηxi(ti minus yi) (10)

where η is the step size The update is implemented by thecircuit in Fig 22 The two multipliers are used to computexiti and xiyi respectively The integrator is then used toaccumulate ηxi(ti minus yi) over i = 0 1

The dynamic stochastic circuit is used to perform systemidentification for a high pass 103-tap FIR filter 103 weightsare trained by using randomly generated samples with theLMS algorithm 14-bit stochastic integrators are used Afteraround 340k iterations of training the misalignment of the

Actual classification results

Trained parameters

-

+

1 6 8 0

Estimate classification results (distribution)

SNGarray

Images

10 classes

00000hellip

11010hellip

00010hellip

Trained parameters

-

+

1 6 8 0DSNGarray

10 classes

times

times

ti

yi

xi

Fig 22 A DSC circuit training a weight in an adaptive filterThe stochastic multipliers are denoted by rectangles with ldquotimesrdquoBipolar and sign-magnitude representations can be used for thestochastic multipliers and integrator

weights in the adaptive filter converges to about -45 dBcompared to the target system The frequency responses ofthe target and the trained filter are shown in Fig 23 As canbe seen the adaptive filter has a similar frequency responseto the target system especially in the passband indicating anaccurate identification result

0 05 1 15 2 25 3Normalized frequency

-80

-60

-40

-20

0

Mag

nitu

de in

dB

target systemadaptive filter

Fig 23 Frequency responses of the target and trained filterusing the sign-magnitude representation

D A gradient descent (GD) circuit

GD is a simple yet effective optimization algorithm It findsthe local minima by iteratively taking steps along the negativesof the gradients at current points One-step GD is given by

wi+1 = wi minus ηnablaF (wi) (11)

where F (wi) is the function to be minimized at the ith stepnablaF (wi) denotes the current gradient and η is the step sizeThe step size is an indicator of how fast the model learns Asmall step size leads to a slow convergence while a large onecan lead to divergence

Similar to the implementation of the Euler method let apair of differential DSSrsquos encode the gradient minusnablaF (wi)the stochastic integrator can be used to compute (11) with astep size η = 12n by using these DSSrsquos as inputs DSC canthen be used to perform the optimization of the weights in aneural network (NN) by minimizing the cost function

The training of an NN usually involves two phases Take afully connected NN as an example shown in Fig 24 In theforward propagation the sum of product of the input vector xand the weight vector w is obtained as v = wx The output

Inputs Hidden layers

Outputs

(a)

hellip

x

w o=f(wx)

hellip

(b)

(c)

wl+1

δl+1o wl

hellip

δwl = f`wl+1δl+1

hellip

Random number generator (RNG)

p

Stochastic sequence with probability p

0101100hellip

Comparator

B

AAltB

N

N

(a)

CLK

01100110hellip

p1=05

10101010hellip

p2=05

00100010hellip

p=p1p2=025

(a) A stochastic multiplier for the unipolar representation

0100010010010000hellip

1000100010001000hellip0011001111100111hellip

times2-1= -05 4

16p1=

times2-1= -05 416

p2=

times2-1= 025 1016

p1p2=

(b) A stochastic multiplier for the bipolar representation

01100110hellip

|p1|=05

10101010hellip

|p2|=05

00100010hellip

|p|=|p1||p2|=025

(c) A stochastic multiplier for the sign-magnitude representation

|p1|=05

Sign(p1)=1(negative)

Sign(p2)=0(positve)

Sign(p)=1(negative)

Fig 24 (a) A fully connected NN in which the neuronsare organized layer by layer (b) The computation duringthe forward propagation (c) The local gradient for wl iscomputed by using the backward propagation The output ofrom previous neuron is required to compute the gradient

of the neuron o is then computed by o = f(v) where f isthe activation function During this phase the weights do notchange In the backward propagation the difference betweenthe actual and the desired final outputs ie the classificationresults is calculated and the local gradients are computed foreach neuron based on the difference [55] Take the neuronin Fig 24(c) as an example the local gradient is obtainedby δl = f prime(v)wl+1δl+1 where f prime(middot) is the derivative of theactivation function wl+1 is the weight vector connecting layerl and l+1 and δl+1 is the local gradient vector of the neuronsin layer l + 1 Then the gradient of a weight at layer l iscomputed by multiplying the local gradient with the recordedoutput of the designated neuron ie nablaF (wl) = minusoδl

In [56] it is found that the local gradient can be decomposedinto two parts δ = δ+ minus δminus where δ+ is contributed bythe desired output of the NN and δminus by the actual outputof the NN So the gradient for weight w can be rewritten asnablaF (w) = minuso(δ+minusδminus) Similar to the implementation of (10)the DSC circuit in Fig 22 can be used to implement the GDalgorithm to perform the online training (one sample at a time)of the fully connected NN with the input signals δ+ o and δminusHowever a backpropagation ie δ = f prime(v)wl+1δl+1 is stillcomputed by using a conventional method (eg a fixed-pointarithmetic circuit) to obtain the local gradients Otherwisethe accuracy loss would be too much for the algorithm toconverge

Fig 25 shows the convergence of cross entropy (as a costfunction) and testing accuracy for the training of a 784-128-128-10 NN with the MNIST handwritten digit dataset As canbe seen the testing accuracy of the NN trained by the DSCcircuit using the sign-magnitude representation is similar to theone produced by the fixed-point implementation (around 97)However the accuracy of the DSC implementation using thebipolar representation is relatively low

The proposed circuitry can be useful for online learningwhere real-time interaction with the environment is requiredwith an energy constraint [57] It can also be used to train amachine learning model using private or security-critical dataon mobile devices if data are sensitive and cannot be uploaded

to a cloud computer

V ERROR REDUCTION AND ASSESSMENT

One major issue in SC is the loss of accuracy [58] Indepen-dent RNGs have been used to avoid correlation and improvethe accuracy of components such as stochastic multipliersThis method inevitably increases the hardware overheadHowever in a stochastic integrator the sharing of RNGs forgenerating the two input sequences reduces the variance thusreducing the error This is shown as follows

The error introduced by DSC at each step of computation isrelated to the variance of the difference of the two accumulatedstochastic bits AiminusBi When independent RNGs are used togenerate Ai and Bi the variance at a single step is given by

Var[Ai minusBi] = E[(Ai minusBi minus E(Ai minusBi))2]

= E[(Ai minusBi)2]minus 2E[Ai minusBi]

(PAi minus PBi) + (PAi minus PBi)2

(12)

where PAi and PBi are the probabilities of Ai and Bi beinglsquo1rsquo respectively ie E[Ai] = PAi and E[Bi] = PBi Since(AiminusBi)

2 is 0 when Ai and Bi are equal or 1 when they aredifferent E[(AiminusBi)

2] equals PAi(1minusPBi)+PBi(1minusPAi)which is the probability that Ai and Bi are different when Ai

and Bi are independently generated1 (12) is then rewritten as

Varind[Ai minusBi] = PAi(1minus PBi) + PBi(1minus PAi)

minus (PAi minus PBi)2

= PAi(1minus PAi) + PBi(1minus PBi)

(13)

On the other hand if the same RNG is used to generate Ai

and Bi the variance of Ai minusBi can be derived from (12) aswell However since only a shared random number is used togenerate Ai and Bi E[(AiminusBi)

2] equals |PAiminusPBi| which

1E[(AiminusBi)2] = Prob[(AiminusBi)

2 = 1]times1+Prob[(AiminusBi)2 = 0]times0

0 5 10 15 20Epoch

0

01

02

03

04

05

Cro

ss e

ntro

py

Signed DSCBipolar DSCFixed-point

(a)

0 5 10 15 20Epoch

88

90

92

94

96

98

Test

ing

accu

racy

(in

perc

enta

ge)

Signed DSCBipolar DSCFixed-point

(b)

Fig 25 (a) Convergence of cross entropy and (b) testingaccuracy of MNIST hand-written digit recognition datasetCross entropy is the cost function to be minimized during thetraining of an NN The weights of the NN are updated by theDSC circuits The DSC circuits using the sign-magnitude andbipolar representations and the fixed-point implementation areconsidered for comparison

gives the probability that Ai and Bi are different Thus (12)can be rewritten as

Varshare[Ai minusBi] = |PAi minus PBi| minus (PAi minus PBi)2

= |PAi minus PBi|(1minus |PAi minus PBi|)(14)

Since Varshare[Ai minus Bi] minus Varind[Ai minus Bi] = 2PBiPAi minus2min(PAi PBi) le 0 we obtain Varshare[Ai minus Bi] leVarind[AiminusBi] for any i = 0 1 2 [54] Therefore sharingthe use of RNGs to generate input stochastic bit streamsimproves the accuracy

For the circuit in Fig 22 though the inputs of the stochasticintegrator are the outputs of the stochastic multipliers insteadof being directly generated by a DSNG the RNG sharingscheme still works in the generated DSSrsquos encoding δ+ andδminus ie it produces a smaller variance in the computed result[56] However an uncorrelated RNG must be used to generatethe DSS encoding the recorded output of the neuron obecause independency is still required in general for accuratestochastic multiplication

The accuracy of the accumulated result is also related to thebit-width of the counter in the stochastic integrator The bitwidth does not only determine the resolution of the result butalso the step size especially in the ODE solver adaptive filterand the gradient descent circuit A larger bit width indicatesa smaller resolution and a smaller step size for fine tuning atthe cost of extra computation time It is shown in [56] that themulti-step variance bound exponentially decreases with the bitwidth of the counter However for the IIR filter the changeof bit-width affects the characteristic of the filter

For the ODE solver circuit it is also found that the use ofquasirandom numbers improves the accuracy compared to theuse of pseudorandom numbers [54] However when the DSS isused to encode signals from the input or images such as in theadaptive filter and the gradient descent algorithm quasirandomand pseudorandom numbers produce similar accuracy Thereason may lie in the fact that the values of the encodedsignals in these two applications are independently selectedfrom the training dataset However for the ODE solver thesignals encoded are samples from continuous functions andadjacent samples have similar values As a result withina short segment of the DSS in this case it approximatelyencodes the same value It then works similarly as CSCfor which the Sobol sequence helps improving the accuracyFig 26 compares the convergence of misalignment duringthe training of the adaptive filter using pseudorandom andquasirandom numbers 16-bit stochastic integrators are usedin this experiment The misalignment is given by

Misalignment =E[(w minus w)2]

E[w2] (15)

where w denotes the target value and w is the trained resultIt can be seen that the convergence speed of using these twotypes of random numbers is similar and their misalignmentsboth converge to around -52 dB

TABLE II Hardware evaluation of the gradient descent circuitarray training a 784-128-128-10 neural network

Metrics DSC Fixed-point Ratio

Step size 2minus10 2minus10 -Epochs 20 20 -Min time (ns) 16times 106 47times 106 13EPO (fJ) 12times 107 11times 108 192TPA (imagesmicrom2) 57times 101 15 381Aver test Accu 9704 9749 -

TABLE III Hardware performance comparison of the ODEsolvers for (8)

Metric DSC Fixed-point Ratio

Step size 2minus8 2minus8 -Iterations 3217 3217 -Min time (ns) 257359 855720 133EPO (fJ) 20121 46600 123TPA (wordmicrosmicrom2) 475 058 821RMSE 47times 10minus3 36times 10minus3 -

VI HARDWARE EFFICIENCY EVALUATION

Due to the extremely low sequence length used for encodinga value in DSC the power consumption of DSC circuits ismuch lower than conventional arithmetic circuits Moreoversince each bit in a DSS is used to encode a number DSCis more energy-efficient than CSC The gradient descent andODE solver circuits are considered as examples to show thehardware efficiency Implemented in VHDL the circuits aresynthesized by Synopsys Design Compiler with a 28-nm tech-nology to obtain the power consumption maximum frequencyand hardware measurements The minimum runtime energyper operation (EPO) throughput per area (TPA) and accuracyare reported in Table II for the gradient descent circuits andin Table III for ODE solvers

As can be seen with a slightly degraded accuracy theDSC circuits outperform conventional circuits using a fixed-point representation in speed energy efficiency and hardwareefficiency For the gradient descent circuit since the two RNGscan be shared among the 118282 stochastic integrators (for118282 weights) the improvement in hardware efficiency iseven more significant than the ODE solver for which one RNG

0 05 1 15 2 25 3Number of steps ( 64 input samples) 104

-60

-50

-40

-30

-20

-10

0

Mis

alig

nmen

t (dB

)

PseudorandomQuasirandom

Fig 26 Comparison of the convergence of misalignment usingpseudorandom and quasirandom numbers

is shared between two stochastic integrators However sincethe sharing of RNGs does not significantly affect the criticalpath the performance improvements are similar for these twocircuits

VII CONCLUSION

In this article a new type of stochastic computing dynamicstochastic computing (DSC) is introduced to encode a varyingsignal by using a dynamically variable binary bit streamreferred to as a dynamic stochastic sequence (DSS) A DSCcircuit can efficiently implement an iterative accumulation-based algorithm It is useful in implementing the Euler methodtraining a neural network and an adaptive filter and IIRfiltering In those applications integrations are implementedby stochastic integrators that accumulate the stochastic bits in aDSS Since the computation is performed on single stochasticbits instead of conventional binary values or long stochasticsequences a significant improvement in hardware efficiencyis achieved with a high accuracy

ACKNOWLEDGMENT

This work was supported by the Natural Sciences andEngineering Research Council of Canada (NSERC) underProjects RES0025211 and RES0048688

REFERENCES

[1] B R Gaines Stochastic Computing Systems Boston MA SpringerUS 1969 pp 37ndash172

[2] W Qian X Li M D Riedel K Bazargan and D J Lilja ldquoAnarchitecture for fault-tolerant computation with stochastic logicrdquo IEEETransactions on Computers vol 60 no 1 pp 93ndash105 2011

[3] P Li and D J Lilja ldquoUsing stochastic computing to implementdigital image processing algorithmsrdquo in 2011 IEEE 29th InternationalConference on Computer Design (ICCD) 2011 pp 154ndash161

[4] S C Smithson N Onizawa B H Meyer W J Gross and T HanyuldquoEfficient CMOS invertible logic using stochastic computingrdquo IEEETransactions on Circuits and Systems I Regular Papers vol 66 no 6pp 2263ndash2274 June 2019

[5] W Maass and C M Bishop Pulsed neural networks MIT press 2001[6] Y Liu S Liu Y Wang F Lombardi and J Han ldquoA stochastic

computational multi-layer perceptron with backward propagationrdquo IEEETransactions on Computers vol 67 no 9 pp 1273ndash1286 2018

[7] J Zhao J Shawe-Taylor and M van Daalen ldquoLearning in stochastic bitstream neural networksrdquo Neural Networks vol 9 no 6 pp 991ndash9981996

[8] B D Brown and H C Card ldquoStochastic neural computation Icomputational elementsrdquo IEEE Transactions on Computers vol 50no 9 pp 891ndash905 2001

[9] D Zhang and H Li ldquoA stochastic-based FPGA controller for aninduction motor drive with integrated neural network algorithmsrdquo IEEETransactions on Industrial Electronics vol 55 no 2 pp 551ndash561 2008

[10] Y Ji F Ran C Ma and D J Lilja ldquoA hardware implementationof a radial basis function neural network using stochastic logicrdquo inProceedings of the Design Automation amp Test in Europe Conference(DATE) 2015 pp 880ndash883

[11] V Canals A Morro A Oliver M L Alomar and J L RosselloldquoA new stochastic computing methodology for efficient neural networkimplementationrdquo IEEE Transactions on Neural Networks and LearningSystems vol 27 no 3 pp 551ndash564 2016

[12] A Ardakani F Leduc-Primeau N Onizawa T Hanyu and W J GrossldquoVLSI implementation of deep neural network using integral stochasticcomputingrdquo IEEE Transactions on Very Large Scale Integration (VLSI)Systems vol 25 no 10 pp 2688ndash2699 2017

[13] A Ren Z Li C Ding Q Qiu Y Wang J Li X Qian and B YuanldquoSC-DCNN Highly-scalable deep convolutional neural network us-ing stochastic computingrdquo in Proceedings of the 22nd InternationalConference on Architectural Support for Programming Languages andOperating Systems (ASPLOS) 2017 pp 405ndash418

[14] V T Lee A Alaghi J P Hayes V Sathe and L Ceze ldquoEnergy-efficienthybrid stochastic-binary neural networks for near-sensor computingrdquo inDesign Automation amp Test in Europe Conference amp Exhibition (DATE)2017 pp 13ndash18

[15] R Hojabr K Givaki S Tayaranian P Esfahanian A Khonsari D Rah-mati and M H Najafi ldquoSkippyNN An embedded stochastic-computingaccelerator for convolutional neural networksrdquo in Proceedings of the56th Annual Design Automation Conference (DAC) 2019 pp 1ndash6

[16] Y Liu L Liu F Lombardi and J Han ldquoAn energy-efficient and noise-tolerant recurrent neural network using stochastic computingrdquo IEEETransactions on Very Large Scale Integration (VLSI) Systems vol 27no 9 pp 2213ndash2221 Sep 2019

[17] R Cai A Ren O Chen N Liu C Ding X Qian J Han W LuoN Yoshikawa and Y Wang ldquoA stochastic-computing based deeplearning framework using adiabatic quantum-flux-parametron supercon-ducting technologyrdquo in Proceedings of the 46th International Symposiumon Computer Architecture ser ISCA rsquo19 New York NY USA ACM2019 pp 567ndash578

[18] A Alaghi and J P Hayes ldquoSurvey of stochastic computingrdquo ACMTrans Embed Comput Syst vol 12 no 2s pp 921ndash9219 May 2013

[19] A Zhakatayev S Lee H Sim and J Lee ldquoSign-magnitude SC getting10x accuracy for free in stochastic computing for deep neural networksrdquoin Proceedings of the 55th Annual Design Automation Conference(DAC) 2018 pp 1ndash6

[20] P K Gupta and R Kumaresan ldquoBinary multiplication with PN se-quencesrdquo IEEE Transactions on Acoustics Speech and Signal Process-ing vol 36 no 4 pp 603ndash606 1988

[21] A Alaghi and J P Hayes ldquoFast and accurate computation using stochas-tic circuitsrdquo in Proceedings of the conference on Design Automation ampTest in Europe (DATE) 2014 pp 1ndash4

[22] S Liu and J Han ldquoEnergy efficient stochastic computing with Sobolsequencesrdquo in Design Automation amp Test in Europe Conference ampExhibition (DATE) 2017 pp 650ndash653

[23] H Niederreiter Random Number Generation and quasi-Monte CarloMethods Philadelphia PA USA Society for Industrial and AppliedMathematics 1992

[24] P Bratley and B L Fox ldquoAlgorithm 659 Implementing Sobolrsquosquasirandom sequence generatorrdquo ACM Transactions on MathematicalSoftware (TOMS) vol 14 no 1 pp 88ndash100 1988

[25] S Liu and J Han ldquoToward energy-efficient stochastic circuits usingparallel sobol sequencesrdquo IEEE Transactions on Very Large ScaleIntegration (VLSI) Systems vol 26 no 7 pp 1326ndash1339 July 2018

[26] A Alaghi and J P Hayes ldquoSTRAUSS Spectral transform use instochastic circuit synthesisrdquo IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems vol 34 no 11 pp 1770ndash1783 2015

[27] P Li D J Lilja W Qian K Bazargan and M Riedel ldquoThe synthesisof complex arithmetic computation on stochastic bit streams usingsequential logicrdquo in Proceedings of the International Conference onComputer-Aided Design (ICCAD) 2012 pp 480ndash487

[28] P Mars and W J Poppelbaum Stochastic and deterministic averagingprocessors Peter Peregrinus Press 1981 no 1

[29] Z Li A Ren J Li Q Qiu Y Wang and B Yuan ldquoDSCNN Hardware-oriented optimization for stochastic computing based deep convolutionalneural networksrdquo in IEEE 34th International Conference on ComputerDesign (ICCD) 2016 pp 678ndash681

[30] A Ren Z Li Y Wang Q Qiu and B Yuan ldquoDesigning reconfigurablelarge-scale deep learning systems using stochastic computingrdquo in IEEEInternational Conference on Rebooting Computing (ICRC) 2016 pp1ndash7

[31] K Kim J Kim J Yu J Seo J Lee and K Choi ldquoDynamic energy-accuracy trade-off using stochastic computing in deep neural networksrdquoin Proceedings of the 53rd Annual Design Automation Conference(DAC) 2016 pp 1241ndash1246

[32] H Sim and J Lee ldquoA new stochastic computing multiplier withapplication to deep convolutional neural networksrdquo in Proceedings ofthe 54th Annual Design Automation Conference (DAC) 2017 pp 1ndash6

[33] Y Liu Y Wang F Lombardi and J Han ldquoAn energy-efficient online-learning stochastic computational deep belief networkrdquo IEEE Journal

on Emerging and Selected Topics in Circuits and Systems vol 8 no 3pp 454ndash465 Sep 2018

[34] J F Keane and L E Atlas ldquoImpulses and stochastic arithmetic forsignal processingrdquo in IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP) vol 2 2001 pp 1257ndash1260

[35] R Kuehnel ldquoBinomial logic extending stochastic computing to high-bandwidth signalsrdquo in Conference Record of the Thirty-Sixth AsilomarConference on Signals Systems and Computers vol 2 2002 pp 1089ndash1093

[36] N Yamamoto H Fujisaka K Haeiwa and T Kamio ldquoNanoelectroniccircuits for stochastic computingrdquo in 6th IEEE Conference on Nanotech-nology (IEEE-NANO) vol 1 2006 pp 306ndash309

[37] K K Parhi and Y Liu ldquoArchitectures for IIR digital filters usingstochastic computingrdquo in IEEE International Symposium on Circuitsand Systems (ISCAS) 2014 pp 373ndash376

[38] B Yuan Y Wang and Z Wang ldquoArea-efficient scaling-free DFTFFTdesign using stochastic computingrdquo IEEE Transactions on Circuits andSystems II Express Briefs vol 63 no 12 pp 1131ndash1135 2016

[39] R Wang J Han B F Cockburn and D G Elliott ldquoStochastic circuitdesign and performance evaluation of vector quantization for differenterror measuresrdquo IEEE Transactions on Very Large Scale Integration(VLSI) Systems vol 24 no 10 pp 3169ndash3183 2016

[40] H Ichihara T Sugino S Ishii T Iwagaki and T Inoue ldquoCompactand accurate digital filters based on stochastic computingrdquo IEEE Trans-actions on Emerging Topics in Computing vol 7 no 1 pp 31ndash432016

[41] T Hammadou M Nilson A Bermak and P Ogunbona ldquoA 96spltimes64 intelligent digital pixel array with extended binary stochasticarithmeticrdquo in Proceedings of the International Symposium on Circuitsand Systems (ISCAS) 2003

[42] A Alaghi C Li and J P Hayes ldquoStochastic circuits for real-timeimage-processing applicationsrdquo in Proceedings of the 50th AnnualDesign Automation Conference (DAC) 2013 pp 1361ndash1366

[43] P Li D J Lilja W Qian K Bazargan and M D Riedel ldquoComputationon stochastic bit streams digital image processing case studiesrdquo IEEETransactions on Very Large Scale Integration (VLSI) Systems vol 22no 3 pp 449ndash462 2014

[44] M H Najafi S Jamali-Zavareh D J Lilja M D Riedel K Bazarganand R Harjani ldquoTime-encoded values for highly efficient stochasticcircuitsrdquo IEEE Transactions on Very Large Scale Integration (VLSI)Systems vol 25 no 5 pp 1644ndash1657 2017

[45] S R Faraji M Hassan Najafi B Li D J Lilja and K BazarganldquoEnergy-efficient convolutional neural networks with deterministic bit-stream processingrdquo in 2019 Design Automation Test in Europe Confer-ence Exhibition (DATE) 2019 pp 1757ndash1762

[46] S Liu and J Han ldquoDynamic stochastic computing for digital signalprocessing applicationsrdquo in Design Automation amp Test in EuropeConference amp Exhibition (DATE) 2020

[47] I L Dalal D Stefan and J Harwayne-Gidansky ldquoLow discrepancysequences for Monte Carlo simulations on reconfigurable platformsrdquo inInternational Conference on Application-Specific Systems Architecturesand Processors (ASAP) 2008 pp 108ndash113

[48] W G McCallum D Hughes-Hallett A M Gleason D O LomenD Lovelock J Tecosky-Feldman T W Tucker D Flath T ThrashK R Rhea et al Multivariable calculus Wiley 1997 vol 200 no 1

[49] S S Tehrani A Naderi G-A Kamendje S Hemati S Mannor andW J Gross ldquoMajority-based tracking forecast memories for stochasticLDPC decodingrdquo IEEE Transactions on Signal Processing vol 58no 9 pp 4883ndash4896 2010

[50] N Saraf K Bazargan D J Lilja and M D Riedel ldquoIIR filtersusing stochastic arithmeticrdquo in Design Automation amp Test in EuropeConference amp Exhibition (DATE) 2014 pp 1ndash6

[51] F Maloberti ldquoNon conventional signal processing by the use of sigmadelta technique a tutorial introductionrdquo in Proceedings of IEEE Inter-national Symposium on Circuits and Systems (ISCAS) vol 6 1992 pp2645ndash2648

[52] J C Butcher The Numerical Analysis of Ordinary Differential Equa-tions Runge-Kutta and General Linear Methods USA Wiley-Interscience 1987

[53] W Gerstner W M Kistler R Naud and L Paninski Neuronaldynamics From single neurons to networks and models of cognitionCambridge University Press 2014

[54] S Liu and J Han ldquoHardware ODE solvers using stochastic circuitsrdquo inProceedings of the 54th Annual Design Automation Conference (DAC)2017 pp 1ndash6

[55] S S Haykin Neural networks and learning machines Pearson UpperSaddle River NJ USA 2009 vol 3

[56] S Liu H Jiang L Liu and J Han ldquoGradient descent using stochasticcircuits for efficient training of learning machinesrdquo IEEE Transactionson Computer-Aided Design of Integrated Circuits and Systems vol 37no 11 pp 2530ndash2541 Nov 2018

[57] R Nishihara P Moritz S Wang A Tumanov W Paul J Schleier-Smith R Liaw M Niknami M I Jordan and I StoicaldquoReal-time machine learning The missing piecesrdquo in Proceedingsof the 16th Workshop on Hot Topics in Operating Systemsser HotOS rsquo17 ACM 2017 pp 106ndash110 [Online] Availablehttpdoiacmorg10114531029803102998

[58] J P Hayes ldquoIntroduction to stochastic computing and its challengesrdquo inProceedings of the 52nd Annual Design Automation Conference (DAC)2015 pp 591ndash593

Page 2: Introduction to Dynamic Stochastic Computing

a DSS the sequence length encoding each number is just1 bit so it can significantly improve the performance andenergy efficiency of a stochastic circuit In this article wepresent dynamic stochastic computing (DSC) using DSSrsquos anddiscuss the efficient implementation of iterative accumulation-based computations including filtering the Euler methodfor solving ordinary differential equations (ODEs) and thegradient descent (GD) algorithm in machine learning

II BACKGROUND

Originally proposed in the 1960s SC is intended to be alow-cost alternative to conventional computing In SC num-bers are encoded by random binary bit streams or stochasticsequences The probability of each bit being lsquo1rsquo is consideredto be the probability encoded by a stochastic sequence andused in different mapping schemes to encode numbers withindifferent ranges [1]

Assume that x is the number to be encoded and that p is theprobability of a stochastic sequence The simplest mapping isto let x = p ie the probability of the stochastic sequence isused to represent a real number within [01] This is referredto as the unipolar representation In order to expand the rangeto include negative values the bipolar representation takes alinear transformation of the unipolar representation by x =(pminus 05)times 2 so that the representation range is [minus1 1]

Recently a sign-magnitude representation was proposed toexpand the unipolar representation range by adding an extrasign bit to a stochastic sequence [19] The same range isobtained as the bipolar representation and the computed resultis more accurate However it is still considered a linearmapping since it is a modified unipolar representation A fewexamples of these representations are shown in Table I

TABLE I Examples of different SC representations

Sequence p Sign Value encoded

Unipolar 0110101110 06 ndash 06Bipolar 0100110100 04 ndash 2times 04minus 1 = minus02

Sign-mag 0110101110 06 1 (minus1)times 06 = minus06

A different mapping scheme may require different comput-ing elements For the rest of this article the unipolar and sign-magnitude representations are adopted for its simple circuitimplementation and high accuracy unless stated otherwise

A Stochastic sequence generation

An SNG consists of a random number generator (RNG)and a comparator as shown in Fig 3 An RNG is usuallyimplemented by a linear-feedback-shift register (LFSR) Amaximum-length n-bit LFSR traverses all the integer numberswithin [1 2nminus1] The numbers generated by an LFSR are con-sidered pseudorandom numbers because they are deterministicrather than truly random once the seed and the structure of theLFSR are determined However due to their statistical char-acteristics deliberately generated pseudorandom numbers canbe considered as uniformly distributed If the n-bit numberused to encode a fractional number within [0 1] p after the

normalization by 2n is larger than an n-bit pseudorandomnumber a lsquo1rsquo is generated otherwise lsquo0rsquo is the outputThen the probability of generating a lsquo1rsquo is approximately pThe resulting stochastic sequence encodes p in the unipolarrepresentation and x in the bipolar representation using theaforementioned linear mapping x = 2p minus 1 A sequencewith probability p is generated similarly to encode x in thisrepresentation A detailed description and the mathematicalmodel for an LFSR-based SNG can be found in [20]

Inputs Hidden layers

Outputs

(a)

hellip

x

w o=f(wx)

hellip

(b)

(c)

wl+1

δl+1o wl

hellip

δwl = f`wl+1δl+1

hellip

Random number generator (RNG)

p

Stochastic sequence with probability p

0101100hellip

Comparator

B

AAltB

N

N

(a)

CLK

Fig 3 An SNG

In [21] and [22] two types of low-discrepancy (LD) se-quences the Halton and Sobol sequences are introduced toSC They are originally proposed to accelerate and improve theaccuracy of Monte Carlo simulations by using a regular gener-ation of quasirandom numbers as shown in Fig 4 [23] Haltonsequences can be generated by a binary-coded prime number-based counter (eg (2120)3 is stored as (10)2(01)2(10)2(00)2in the counter) The order of the digits in the counter isthen reversed and the resulting number is converted to abinary number for generating a Halton sequence [21] Sobolsequences can be generated by the circuit shown in Fig 5As per the Sobol generation algorithm [24] the index ofthe direction vector is first obtained by the least significantzero (LSZ) detection and index generation component Thedirection vector array (DVA) is generated by using a primitivepolynomial At each clock cycle one direction vector (DV)Vk is fetched and XORed with the current Sobol numberRi to generate the next Sobol number Ri+1 In SC the useof the LD sequences improves the accuracy with a shorterlength However a fairly long sequence is still required for ahigh accuracy

B Stochastic circuits

Stochastic circuits are the core of an SC system Theoperation of a combinational circuit can be considered as

0 05 1(a)

0

02

04

06

08

1

0 05 1(b)

0

02

04

06

08

1

Fig 4 Examples of two-dimensional (a) Sobol sequencesand (b) LFSR-generated sequences The quasirandom numbersin (a) are more evenly distributed than the pseudorandomnumbers in (b)

Synapse

Dendrites

Axon

Soma(cell)

Membrane

VM (t+1) = VM (t) + Σεij - Σιij

if VM gt Vth reset and fire

else do not fire

εij Excitatory inputs of the neuron

ιij Inhibitory inputs of the neuron

VM Membrane potential

Vth Threshold voltage

+

-

Uij-1Ui-1jUi+1jUij+1

uij

Uij

+3

Usum

divide4

+

-

Uij-1

Ui-1j

Ui+1j

Uij+1 uij

Uij

+3

cN

CS

CE

CW

CN

3+

CS

CE

CW

CN

Counter (starting

from i=0)

Priority encoder

indexk

Direction vector array (DVA)

V3

V2

V1

V0

hellip

Ri

Vk

Ri+1D-flip flops

Least significant zero (LSZ) detection and index generation

Fig 5 A Sobol sequence generator [25] It can be used as theRNG in an SNG to improve the accuracy

a Monte Carlo simulation process [21] For example thefunction of a unipolar stochastic multiplier is equivalent toestimating the rectangular area determined by p1p2 where p1and p2 are the probabilities encoded by the input sequencesby randomly dropping points into a unit square The ANDgate is used to decide whether the randomly generated pointwith coordinates (r1 r2) is within the rectangle Only whenboth r1 lt p1 and r2 lt p2 is the random point located withinthe rectangle as shown in Fig 6 and the output of the ANDgate is lsquo1rsquo The counter is then used to count the total numberof points within the rectangle as an estimate of p1p2

MUX

01100110hellip

p1=05

10101010hellip

p2=05

00100010hellip

p=p1p2=025

01100101hellip

p1=05

11101110hellip

p2=075

sel

psel=05

00110110hellip

0

1 p=(1-psel)p1+pselp2=0625

01100111hellip (a)

(c)

10010000hellip

00101000hellip01000111hellip times2-1= -05

416

p1=

times2-1= -05 416

p2=

times2-1= 025 1016

p1p2=

(b) A bipolar stochastic multiplier

00101000hellip 11010111hellip

p1=025 p=1-p1=075

(b)

p1

p2

p=p1p2

1

1

p1

p2

(1 1)

AltB

AltB

r1

r2Counter

(b) A bipolar stochastic multiplier

Stochastic number generator (SNG)

array

Binary inputs(from memory

devices)

Stochastic sequences

Stochastic computing

circuits

Stochastic sequences

Probability estimators

(PEs)

Binary outputs

+

-

DSNGDigital signal

Bipolar stochastic

sequence encodinglsquo0rsquo

Integration results

Output DSS

MUX

+SNGarray

y

x

Stochastic sequences encoding coefficients

z0z1

zN

hellip

N copies of independent stochastic sequences that

each encodes x

SEL0

1hellip

N

hellip

A

B

+

-

+

-

A

B

AB

(a) (b)

B(A+B)

(a) (b)

hellip

k

+

-

p

p

(c)

^

Fig 6 (a) A unipolar stochastic multiplier and (b) its MonteCarlo model

Fig 7 shows several stochastic multipliers in differentrepresentations a bipolar multiplier is implemented by anXNOR gate and the sign-magnitude stochastic multiplier isimplemented by a unipolar multiplier and an XOR gate forcomputing the sign bit Generic methods have also beenproposed to synthesize combinational circuits for computingBernstein polynomials [2] or multi-linear functions [26]

Inputs Hidden layers

Outputs

(a)

hellip

x

w o=f(wx)

hellip

(b)

(c)

wl+1

δl+1o wl

hellip

δwl = f`wl+1δl+1

hellip

Random number generator (RNG)

p

Stochastic sequence with probability p

0101100hellip

Comparator

B

AAltB

N

N

(a)

CLK

01100110hellip

p1=05

10101010hellip

p2=05

00100010hellip

p=p1p2=025

(a) A stochastic multiplier for the unipolar representation

0100010010010000hellip

1000100010001000hellip0011001111100111hellip

times2-1= -05 4

16p1=

times2-1= -05 416

p2=

times2-1= 025 1016

p1p2=

(b) A stochastic multiplier for the bipolar representation

01100110hellip

|p1|=05

10101010hellip

|p2|=05

00100010hellip

|p|=|p1||p2|=025

(c) A stochastic multiplier for the sign-magnitude representation

|p1|=05

Sign(p1)=1(negative)

Sign(p2)=0(positve)

Sign(p)=1(negative)

Fig 7 Three stochastic multipliers for different representa-tions The input sequences should be uncorrelated for a higheraccuracy

There are primarily two types of sequential circuits theFSM-based and stochastic integrator-based circuits An FSM-based circuit can be modeled by a state transition diagram asshown in Fig 8 For an input bit of lsquo0rsquo the FSM moves tothe left of its current state otherwise it moves to the rightNote that when the FSM is at the leftmost (or rightmost)state it stays at the current state when receiving a lsquo0rsquo (or lsquo1rsquo)The FSM can be considered as a Markov chain if the inputsequence is not autocorrelated or every bit in the sequence isgenerated independently with the same probability to be lsquo1rsquo Asteady hyper-state can be reached after several runs where theprobability that the FSM staying at each state converges [1]By assigning a different output (lsquo0rsquo or lsquo1rsquo) to each stateexponential logarithmic and high-order polynomial functionscan be implemented by the FSM-based circuits A generalsynthesis method is discussed in [27]

S0 S1 Sk-1 Sk Sk+1 SN-2 SN-1helliphelliphelliphellip

X=0X=0X=0X=0X=0X=0

X=1 X=1 X=1 X=1 X=1 X=1 X=1

X=0

Fig 8 State transition diagram for an FSM-based stochasticcircuit

Stochastic integrators are sequential SC elements that accu-mulate the difference between two input sequences [1] Thestochastic integrator consists of an RNG an updown counterand a comparator as shown in Fig 9(a) Let the output ofthe comparator be Si (either lsquo0rsquo or lsquo1rsquo) at clock cycle i theinput bits be Ai and Bi and the n-bit integer stored in thecounter be Ci The RNG and the comparator work as an SNGand the probability of generating a lsquo1rsquo is ci = Ci2

n ie then-bit counter stores a number in a fixed-point manner and allthe bits belong to the fractional part The counter updates itsvalue Ci at each clock cycle by Ci+1 = Ci + Ai minus Bi orequivalently

ci+1 = ci + 12n(Ai minusBi) (1)

In k clock cycles the stochastic integrator implements

ck = c0 + 12nkminus1sumi=0

(Ai minusBi) (2)

where c0 is the initial (fractional) value stored in the counterA stochastic integrator works in a similar manner to a

spiking neuron as shown in Fig 9 The stochastic integratorldquofiresrdquo randomly depending on the value stored in the counterwhereas the spiking neuron fires when the accumulated mem-brane potential VM reaches a threshold Vth [5] The inputs tothe neuron can be excitatory (εij) or inhibitory (ιij) while theinput sequences A and B of the stochastic integrator increaseand decrease the value stored in the counter

The integrator has been used in a stochastic divider withthe output stochastic sequence as the feedback signal [1] asshown in Figs 10(a) and (b) The quotient is obtained whenthe value stored in the counter reaches an equilibrium stateie the updown counter has an equal probability to increase

Rate = average over pool of equivalent neurons(several neuron single runs)

j=1

2

3

hellip

A =1Δt M

n(t t+Δt)M

Δt

j=1

2

3

M

hellip

01010

11000

00010

01000

p =M

N

L

010100101010110100

A =Δt

n (t t+Δt)

p =L

N

Δt

L

N number of lsquo1rsquos in the sequence

n() number of spikes in the time window

2n-state Counter INC

DEC

RNG

gt S

A

B

comparator

C

ri

C i+1 = C i + Ai - Bi

if Ci gt ri S = 1

else S = 0

p encoded value

A encoded value

(a) A stochastic integrator

S+

-

CA

B

(b) A symbol

Synapse

Dendrites

Axon

Soma(cell)

Membrane

VM (t+1) = VM (t) + Σεij - Σιij

if VM gt Vth reset and fire

else do not fire

εij Excitatory inputs of the neuron

ιij Inhibitory inputs of the neuron

VM Membrane potential

Vth Threshold voltage

(c) An integrate-and-fire spiking neuron model

Fig 9 (a) A stochastic integrator (b) its symbol and (c) aspiking neuron [5]

or decrease However it may take a long time to converge tothe equilibrium state as discussed in [6] [25] A stochasticintegrator with a feedback loop has been used as an ADpativeDIgital Element (ADDIE) shown in Fig 10(c) The ADDIEcan function as a probability follower ie the probability ofthe output sequence tracks the change of the probability of theinput sequence [1] [28]

MUX

01100110hellip

p1=05

10101010hellip

p2=05

00100010hellip

p=p1p2=025

01100101hellip

p1=05

11101110hellip

p2=075

sel

psel=05

00110110hellip

0

1 p=(1-psel)p1+pselp2=0625

01100111hellip (a)

(c)

10010000hellip

00101000hellip01000111hellip times2-1= -05

416

p1=

times2-1= -05 416

p2=

times2-1= 025 1016

p1p2=

(b) A bipolar stochastic multiplier

00101000hellip 11010111hellip

p1=025 p=1-p1=075

(b)

p1

p2

p=p1p2

1

1

p1

p2

(1 1)

AltB

AltB

RN1

RN2

Counter

(b) A bipolar stochastic multiplier

Stochastic number generator (SNG)

array

Binary inputs(from memory

devices)

Stochastic sequences

Stochastic computing

circuits

Stochastic sequences

Probability estimators

(PEs)

Binary outputs

+

-

DSNGDigital signal

Bipolar stochastic

sequence encodinglsquo0rsquo

Integration results

Output DSS

MUX

+SNGarray

y

x

Stochastic sequences encoding coefficients

z0z1

zN

hellip

N copies of independent stochastic sequences that

each encodes x

SEL0

1hellip

N

hellip

A

B

+

-

+

-

A

B

AB

(a) (b)

B(A+B)

(a) (b)

hellip

k

+

-

p

p

(c)

^

Fig 10 Stochastic dividers compute (a) AB (b) B(A+B)(c) shows an ADDIE

C Probability estimatorsThe computed probability of the output sequence can be

found by using a counter It is estimated as the quotient

of the number of lsquo1rsquos divided by the sequence length TheADDIE in Fig 10(c) is an alternative component to estimatethe probability by using a stochastic integrator reaching theequilibrium state [1]

D Limitations

Although the SC circuits are much simpler than conven-tional arithmetic circuits a large sequence length is requiredto achieve a high accuracy This requirement leads to poorperformance and low energy efficiency Fig 11 shows the rootmean square error (RMSE) encoding a number in the unipolarrepresentation using different types of ldquorandomrdquo numbers Thepseudorandom numbers are generated by a 16-bit LFSR withrandomly generated seeds The RMSEs are obtained from1000 trials using different sequence lengths As shown inFig 11 to achieve an RMSE of 1times10minus2 a sequence length of212 bits is required for the LFSR generated sequences For thestochastic Sobol sequences the length is significantly reducedto 26 or 64 bits to achieve a similar accuracy although it isstill not as compact as an equivalent 5-bit fixed-point numberthat results in a similar RMSE Meanwhile low precision (egat an RMSE of 1times 10minus2 or larger) can be tolerated in manyapplications such as inference in a machine learning model[12]ndash[17] [29]ndash[33] digital signal processing [34]ndash[40] andimage processing [41]ndash[44] Hence progressive precision (PP)can be employed and the Sobol sequence length can be as lowas 8 bits in an application in a CNN inference engine [45]

0 2 4 6 8 10 12

Sequence length (2N)

10-4

10-3

10-2

10-1

100

RM

SE

Sobol sequencePseudorandom sequence

Fig 11 RMSE of the encoded numbers with different typesof stochastic sequences

Despite the fact that some of these implementations achievea lower energy consumption with PP compared with conven-tional circuits the application of conventional SC (CSC) isstill quite limited

III DYNAMIC STOCHASTIC COMPUTING

In a DSC system as shown in Fig 12 the stochasticsequences encode varying signals instead of static numbersConsider a discrete-time digital signal xi within [0 1] (i =0 1 2 ) A DSS encoding xi satisfies that E[Xi] = xiwhere Xi is the ith bit in the DSS Thus every bit in thesequence can have a different probability to be 1

Digital signals (from ADCs or

storage)

DSNGsSignal

reconstruction To be stored or converted back to

analog signals

Stochastic logicDynamic

stochastic sequences

Dynamic stochastic sequences

Fig 12 A DSC system

RNGDynamic stochastic sequence

encoding signal xi

111010001011hellipComparator

B

AAltB

xi

Fig 13 A DSNG xi is an input to the comparator at onesample per clock cycle

A DSS generation

A DSS can be generated by a dynamic SNG (DSNG) asshown in Fig 13 It is similar to a conventional SNG exceptthat the input is a varying signal instead of a static number TheRNG generates uniformly distributed random numbers within[0 1] Given a discrete signal xi as shown in Fig 13 itis compared with a random number one sample at a timeIf xi is larger than the random number a lsquo1rsquo is generatedotherwise the DSNG generates a lsquo0rsquo Generating every bit inthe DSS is a Bernoulli process (in the ideally random case)so the expectation getting a lsquo1rsquo is equal to the correspondingsample value from the discrete digital signal The discretesignal can either be sampled from an analog signal or readfrom a memory device

B Data representations in DSC

Similar to CSC all the aforementioned mapping schemescan be used to encode numbers within certain ranges in DSCwhich subsequently requires the use of different computationalelements Specifically a DSS can encode a signal within [0 1]in the unipolar representation or within [minus1 1] in the bipolaror sign-magnitude representation For the sign-magnitude rep-resentation two sequences are used to encode a signal Oneencodes the magnitude of the signal Since the signal can haveboth positive and negative values another sequence is used torepresent the signs of the numbers

C Stochastic circuits

The DSC circuits can be combinational or sequential1) Combinational circuits A combinational circuit using

DSSrsquos as its inputs implements the composite function of theoriginal stochastic function For example an AND gate or aunipolar stochastic multiplier computes zi = xiyi for eachpair of bits in the input DSSrsquos that independently encode xiand yi (i = 0 1 2 ) respectively [46] due to E[Zi] =E[XiYi] = E[Xi]E[Yi] = xiyi where Zi Xi and Yi are theith bits in the output and input DSSrsquos respectively Note thatfor a continuous signal oversampling is required to producethe discrete signals xi and yi for accurate multiplication [46]Fig 14(a) shows the DSC circuit for a signal multiplicationand (b) shows the results of multiplying two sinusoidal signals

DSNG Signal Xx(t)

DSNG

DSNG

Signal reconstruction

X

Y

x(t)

y(t)ziZ

(a)

0 02 04 06 08 1 12 14 160

05

1

x(t)

Original signalDSS

0 02 04 06 08 1 12 14 160

05

1

y(t)

0 05 1 15t (s)

0

05

1

z(t)

Double precisionDynamic stochastic

(b)

Fig 14 A frequency mixer by using a stochastic multiplierwith DSSrsquos as the inputs (a) circuit and (b) results

with frequencies of 15 Hz and 2 Hz both sampled at a rateof 210 Hz The sampled signals are converted to DSSrsquos by twoDSNGs with independent Sobol sequence generators [47] asthe RNGs The output DSS is reconstructed to a digital signalby a moving average [28] The DSS can also be reconstructedby an ADDIE [1]

2) Sequential circuits The stochastic integrator is an im-portant component used in DSC for iterative accumulation Itimplements integrations of the signal encoded by accumulatingthe stochastic bits in a DSS Fig 15 shows that a bipolarstochastic integrator can be used to perform numerical inte-gration In Fig 15(a) the input digital signal is cos(πt2)sampled at a sampling rate of 28 Hz and converted to a DSSby the DSNG The other input of the stochastic integrator isset to a bipolar stochastic sequence encoding 0 (a stochasticbit stream with half of the bits being lsquo1rsquo) By accumulatingthe input bits the stochastic integrator provides an unbiasedestimate to the Riemann sum [48] The integration is obtainedby recording the changing values of the counter Since theintegrated results are directly provided by the counter areconstruction unit is not required

As can be seen from Fig 15(b) the results produced bythe stochastic integrator is very close to the analytical resultsSince the output DSS is generated by comparing the changingsignal (as the integration results) with uniformly distributedrandom numbers the RNG and the comparator inside theintegrator works as a DSNG As a result the output sequenceencodes the discretized stochastic integration solution for theanalytical solution (2π) sin(πt2) The output DSS is alsoshown in Fig 15(b)

MUX

01100110hellip

p1=05

10101010hellip

p2=05

00100010hellip

p=p1p2=025

01100101hellip

p1=05

11101110hellip

p2=075

sel

psel=05

00110110hellip

0

1 p=(1-psel)p1+pselp2=0625

01100111hellip (a)

(c)

10010000hellip

00101000hellip01000111hellip times2-1= -05

416

p1=

times2-1= -05 416

p2=

times2-1= 025 1016

p1p2=

(b) A bipolar stochastic multiplier

00101000hellip 11010111hellip

p1=025 p=1-p1=075

(b)

p1

p2

p=p1p2

1

1

p1

p2

(1 1)

AltB

AltB

RN1

RN2

Counter

(b) A bipolar stochastic multiplier

Stochastic number generator (SNG)

array

hellip

Binary inputs(from memory

devices)

hellip

Stochastic sequences

Stochastic computing

circuits

Stochastic sequences

Probability estimators

(PEs)

hellip hellip

Binary outputs

+

-

DSNGDigital signal

Bipolar stochastic

sequence encodinglsquo0rsquo

Integration results

Output DSS

(a)

0 05 1 15 2 25 3 35 4t

-1

-05

0

05

1

Analytical resultsResults by DSCOriginal signalOutput DSS

(b)

Fig 15 A bipolar stochastic integrator for numerical integra-tion (a) circuit and (b) results The output DSS is decimatedfor a clear view

The stochastic integrator has been used in a stochastic low-density parity-check (LDPC) decoder to implement a trackingforecast memory in [49] It solves the latch-up problem in thedecoding process with a low hardware complexity

For the FSM-based sequential circuits however the DSScannot work well because the probability consistently changesand a steady hyper-state is hard to reach The DSS could alsoviolate the condition of the model being a Markov chain dueto the autocorrelation of the encoded signal

IV APPLICATION OF DSC

By using the stochastic integrators DSC can implement aseries of algorithms that involve iterative accumulations suchas an IIR filter the Euler method and a GD algorithm

A IIR filtering

In [50] it is shown that an ADDIE with different parameterscan be used as a low-pass IIR filter with different frequencyresponses as shown in Fig 16 Given the bit width of thestochastic integrator n the transfer function of such a filter inthe Z-domain is given by

H(Z) =1

2nZ + 1minus 2n (3)

for a stable first-order low-pass IIR filter [50] An oversampled∆ minus Σ modulated bit stream is then used as the input of thecircuit Since the generation of a ∆minusΣ modulated bit stream

2n-state Counter INC

DEC

RNG

gtSeqout

A

BSeqout

(a) (b)

comparator

A

B

+

-

Countout

Countout

Seqout

(b)

+

-

Countout

X

+

-

p

y(t)

Fig 16 ADDIE circuit produces an exponential function whenthe input is a sequence encoding a static number

0 01 02 03 04t (s)

(a) Original and filtered signals

-1

0

1

0 50 100 150 200 250f (Hz)

(b) Spectra of original and filtered signals

0

02

04Original signalFiltered signal

Fig 17 The low-pass filtering results in (a) the time domainand (b) the frequency domain

can also be considered as a Bernoulli process [51] the DSScan be used for IIR filtering as well

Fig 17 shows the filtering of mixed sinusoidal signals of 4Hz and 196 Hz The mixture is first sampled at a sampling rateof 656k Hz and the sampled digital signal is then convertedto a DSS The DSS is filtered by an 8-bit ADDIE As can beseen the high-frequency signals are almost filtered out whilethe low-frequency signals remain

B ODE solvers

The Euler method provides numerical solutions for ODEsby accumulating the derivative functions [52] For an ODEdydt = f(t y(t)) the estimated solution y(t) is given by

y(ti+1) = yi+1 = yn + hf(ti yi) (4)

with h as the step size and ti+1 = ti + h After k steps ofaccumulation

yk = y0 + h

kminus1sumi=0

f(ti yi) (5)

To use a stochastic integrator let the pair of input DSSrsquos(eg A and B in (2)) encode the discrete derivative functionf(ti yi) such that E[Ai minus Bi] = f(ti yi) By doing sothe stochastic integrator implements (2) and approximatelycomputes (5) with a step size h = 12n As a result the bitwidth of the counter does not only decide the precision of thecomputation but also the time resolution The integration in theEuler method is implemented by accumulating the stochasticbits in the DSSrsquos instead of real values

Using this method an exponential function can be generatedby a stochastic integrator The ADDIE circuit as shown inFig 16 solves

dy(t)

dt= pminus y(t) (6)

which is the differential equation for a leaky integratormodel [53] Thus the ADDIE can be used to implement

neuronal dynamics in a leaky integrate-and-fire model Theanalytical solution of (6) is given by

y(t) = pminus (pminus y0)eminust (7)

where y0 is provided in the initial condition ie y(0) = y0This model is commonly seen in natural processes such asneuronal dynamics [53] Fig 18 shows the results producedby the ADDIE when y0 = 1 and p = 0

0 2 4 6 8t

0

02

04

06

08

1

y(t)

DSCAnalytical

Fig 18 ADDIE used as an ODE solver The solution is anexponential function

For a set of ODEsdy1(t)

dt = y2(t)minus 05dy2(t)

dt = 05minus y1(t)(8)

with the initial values y1(0) = 0 and y2(0) = 05 a numericalsolution can be found by the circuit in Fig 19 The upperintegrator computes the numerical estimate of y1(t) by takingthe output DSS from the lower integrator and a conventionalstochastic sequence encoding 05 as inputs So the differenceof the two input sequences approximately encodes y2(t)minus05the derivative of y1(t) Simultaneously the output DSS of theupper stochastic integrator encodes the numerical solution ofy1(t) which is used to solve the second differential equationin (8) Thus the stochastic integrator can be viewed as anunbiased Euler solution estimator [54] The results producedby the dynamic stochastic circuit is shown in Fig 20 8-bitupdown counters are used in both stochastic integrators andthe RNG is implemented by a Sobol sequence generator [47]Since the results are provided by the counter in the stochasticintegrator an explicit reconstruction unit as shown in Fig 12is not required

A partial differential equation such as a Laplacersquos equationcan also be solved by an array of stochastic integrators witha finite-difference method

C Weight update for an adaptive filterDSC can be applied to update the weights in an adaptive

digital filter An adaptive filter system consists of a linear filterand a weight update component as shown in Fig 21 Theoutput of the linear filter at time i is given by

yi = F (wixi) = wixi =sumMminus1

j=0wj

iximinusM+j+1 (9)

+

-

y(t)

1

z(t)

times2

+

-

Y

+

-

+

-

y1(t)

y2(t)

p=05Y1

Y2

(a) (b)

p=05(times2)

bi

ai

+

-

y(t)

1

z(t)

times2

+

-

+

-

+

-

y1(t)

y2(t)

p=05

p=05(times2)

bi

ai

+

-

1

0+

-0+

-0

16

y(t)= t3

y(t)=t

y(t)= t212

Fig 19 A pair of stochastic integrators solving (8) [54]

0 2 4 6 8 10 120

05

1

y 1(t

)

DSCAnalytical solution

0 2 4 6 8 10 12t

0

05

1

y 2(t

)Fig 20 Simulation results of the ODE solver in Fig 19 Theyare compared with the analytical solutions

where M is the length of the filter or the number of weightswi is a vector of M weights wi = [w

(0)i w

(1)i w

(Mminus1)i ]

and xi is the input vector for the digital signal sampled atdifferent time points xi = [ximinusM+1 ximinusM+2 xi]

T

+

-

y(xi)

F(wi xi)

F(wi xi) wi

ti

SNGpart F(wi xi)

part wijSNG

SNG

ai

bi

EEEE

Linear filter with weights wi

LMS weight update unit

xi-M+1xi-M+2

xi

hellip

yi +- ti+

ei

xi

+ -

xi-M+j

wij

xixi-M+1

+ -

yi

wi1

ti

RNG

+ -

wiM

RNG

hellip hellip

hellip hellip

hellip hellip

Comparator

+ - + - + -

wijwi1 wiM

xi-M+j+1 xixi-M+1

yi

ti

RNG

RNGhellip hellip

hellip hellip

hellip hellip

Comparator

Linear filter with parameters wi

Δwi

LMS weight updating unit

hellip

yi +- ti+

ei

hellip

Δwi

(Ci)

wij[Pi] = [12nCi] =

Fig 21 An adaptive filter

In the least-mean-square (LMS) algorithm the cost functionis the mean squared error between the filter output and theguidance signal ti The weights are updated by

wi+1 = wi + ηxi(ti minus yi) (10)

where η is the step size The update is implemented by thecircuit in Fig 22 The two multipliers are used to computexiti and xiyi respectively The integrator is then used toaccumulate ηxi(ti minus yi) over i = 0 1

The dynamic stochastic circuit is used to perform systemidentification for a high pass 103-tap FIR filter 103 weightsare trained by using randomly generated samples with theLMS algorithm 14-bit stochastic integrators are used Afteraround 340k iterations of training the misalignment of the

Actual classification results

Trained parameters

-

+

1 6 8 0

Estimate classification results (distribution)

SNGarray

Images

10 classes

00000hellip

11010hellip

00010hellip

Trained parameters

-

+

1 6 8 0DSNGarray

10 classes

times

times

ti

yi

xi

Fig 22 A DSC circuit training a weight in an adaptive filterThe stochastic multipliers are denoted by rectangles with ldquotimesrdquoBipolar and sign-magnitude representations can be used for thestochastic multipliers and integrator

weights in the adaptive filter converges to about -45 dBcompared to the target system The frequency responses ofthe target and the trained filter are shown in Fig 23 As canbe seen the adaptive filter has a similar frequency responseto the target system especially in the passband indicating anaccurate identification result

0 05 1 15 2 25 3Normalized frequency

-80

-60

-40

-20

0

Mag

nitu

de in

dB

target systemadaptive filter

Fig 23 Frequency responses of the target and trained filterusing the sign-magnitude representation

D A gradient descent (GD) circuit

GD is a simple yet effective optimization algorithm It findsthe local minima by iteratively taking steps along the negativesof the gradients at current points One-step GD is given by

wi+1 = wi minus ηnablaF (wi) (11)

where F (wi) is the function to be minimized at the ith stepnablaF (wi) denotes the current gradient and η is the step sizeThe step size is an indicator of how fast the model learns Asmall step size leads to a slow convergence while a large onecan lead to divergence

Similar to the implementation of the Euler method let apair of differential DSSrsquos encode the gradient minusnablaF (wi)the stochastic integrator can be used to compute (11) with astep size η = 12n by using these DSSrsquos as inputs DSC canthen be used to perform the optimization of the weights in aneural network (NN) by minimizing the cost function

The training of an NN usually involves two phases Take afully connected NN as an example shown in Fig 24 In theforward propagation the sum of product of the input vector xand the weight vector w is obtained as v = wx The output

Inputs Hidden layers

Outputs

(a)

hellip

x

w o=f(wx)

hellip

(b)

(c)

wl+1

δl+1o wl

hellip

δwl = f`wl+1δl+1

hellip

Random number generator (RNG)

p

Stochastic sequence with probability p

0101100hellip

Comparator

B

AAltB

N

N

(a)

CLK

01100110hellip

p1=05

10101010hellip

p2=05

00100010hellip

p=p1p2=025

(a) A stochastic multiplier for the unipolar representation

0100010010010000hellip

1000100010001000hellip0011001111100111hellip

times2-1= -05 4

16p1=

times2-1= -05 416

p2=

times2-1= 025 1016

p1p2=

(b) A stochastic multiplier for the bipolar representation

01100110hellip

|p1|=05

10101010hellip

|p2|=05

00100010hellip

|p|=|p1||p2|=025

(c) A stochastic multiplier for the sign-magnitude representation

|p1|=05

Sign(p1)=1(negative)

Sign(p2)=0(positve)

Sign(p)=1(negative)

Fig 24 (a) A fully connected NN in which the neuronsare organized layer by layer (b) The computation duringthe forward propagation (c) The local gradient for wl iscomputed by using the backward propagation The output ofrom previous neuron is required to compute the gradient

of the neuron o is then computed by o = f(v) where f isthe activation function During this phase the weights do notchange In the backward propagation the difference betweenthe actual and the desired final outputs ie the classificationresults is calculated and the local gradients are computed foreach neuron based on the difference [55] Take the neuronin Fig 24(c) as an example the local gradient is obtainedby δl = f prime(v)wl+1δl+1 where f prime(middot) is the derivative of theactivation function wl+1 is the weight vector connecting layerl and l+1 and δl+1 is the local gradient vector of the neuronsin layer l + 1 Then the gradient of a weight at layer l iscomputed by multiplying the local gradient with the recordedoutput of the designated neuron ie nablaF (wl) = minusoδl

In [56] it is found that the local gradient can be decomposedinto two parts δ = δ+ minus δminus where δ+ is contributed bythe desired output of the NN and δminus by the actual outputof the NN So the gradient for weight w can be rewritten asnablaF (w) = minuso(δ+minusδminus) Similar to the implementation of (10)the DSC circuit in Fig 22 can be used to implement the GDalgorithm to perform the online training (one sample at a time)of the fully connected NN with the input signals δ+ o and δminusHowever a backpropagation ie δ = f prime(v)wl+1δl+1 is stillcomputed by using a conventional method (eg a fixed-pointarithmetic circuit) to obtain the local gradients Otherwisethe accuracy loss would be too much for the algorithm toconverge

Fig 25 shows the convergence of cross entropy (as a costfunction) and testing accuracy for the training of a 784-128-128-10 NN with the MNIST handwritten digit dataset As canbe seen the testing accuracy of the NN trained by the DSCcircuit using the sign-magnitude representation is similar to theone produced by the fixed-point implementation (around 97)However the accuracy of the DSC implementation using thebipolar representation is relatively low

The proposed circuitry can be useful for online learningwhere real-time interaction with the environment is requiredwith an energy constraint [57] It can also be used to train amachine learning model using private or security-critical dataon mobile devices if data are sensitive and cannot be uploaded

to a cloud computer

V ERROR REDUCTION AND ASSESSMENT

One major issue in SC is the loss of accuracy [58] Indepen-dent RNGs have been used to avoid correlation and improvethe accuracy of components such as stochastic multipliersThis method inevitably increases the hardware overheadHowever in a stochastic integrator the sharing of RNGs forgenerating the two input sequences reduces the variance thusreducing the error This is shown as follows

The error introduced by DSC at each step of computation isrelated to the variance of the difference of the two accumulatedstochastic bits AiminusBi When independent RNGs are used togenerate Ai and Bi the variance at a single step is given by

Var[Ai minusBi] = E[(Ai minusBi minus E(Ai minusBi))2]

= E[(Ai minusBi)2]minus 2E[Ai minusBi]

(PAi minus PBi) + (PAi minus PBi)2

(12)

where PAi and PBi are the probabilities of Ai and Bi beinglsquo1rsquo respectively ie E[Ai] = PAi and E[Bi] = PBi Since(AiminusBi)

2 is 0 when Ai and Bi are equal or 1 when they aredifferent E[(AiminusBi)

2] equals PAi(1minusPBi)+PBi(1minusPAi)which is the probability that Ai and Bi are different when Ai

and Bi are independently generated1 (12) is then rewritten as

Varind[Ai minusBi] = PAi(1minus PBi) + PBi(1minus PAi)

minus (PAi minus PBi)2

= PAi(1minus PAi) + PBi(1minus PBi)

(13)

On the other hand if the same RNG is used to generate Ai

and Bi the variance of Ai minusBi can be derived from (12) aswell However since only a shared random number is used togenerate Ai and Bi E[(AiminusBi)

2] equals |PAiminusPBi| which

1E[(AiminusBi)2] = Prob[(AiminusBi)

2 = 1]times1+Prob[(AiminusBi)2 = 0]times0

0 5 10 15 20Epoch

0

01

02

03

04

05

Cro

ss e

ntro

py

Signed DSCBipolar DSCFixed-point

(a)

0 5 10 15 20Epoch

88

90

92

94

96

98

Test

ing

accu

racy

(in

perc

enta

ge)

Signed DSCBipolar DSCFixed-point

(b)

Fig 25 (a) Convergence of cross entropy and (b) testingaccuracy of MNIST hand-written digit recognition datasetCross entropy is the cost function to be minimized during thetraining of an NN The weights of the NN are updated by theDSC circuits The DSC circuits using the sign-magnitude andbipolar representations and the fixed-point implementation areconsidered for comparison

gives the probability that Ai and Bi are different Thus (12)can be rewritten as

Varshare[Ai minusBi] = |PAi minus PBi| minus (PAi minus PBi)2

= |PAi minus PBi|(1minus |PAi minus PBi|)(14)

Since Varshare[Ai minus Bi] minus Varind[Ai minus Bi] = 2PBiPAi minus2min(PAi PBi) le 0 we obtain Varshare[Ai minus Bi] leVarind[AiminusBi] for any i = 0 1 2 [54] Therefore sharingthe use of RNGs to generate input stochastic bit streamsimproves the accuracy

For the circuit in Fig 22 though the inputs of the stochasticintegrator are the outputs of the stochastic multipliers insteadof being directly generated by a DSNG the RNG sharingscheme still works in the generated DSSrsquos encoding δ+ andδminus ie it produces a smaller variance in the computed result[56] However an uncorrelated RNG must be used to generatethe DSS encoding the recorded output of the neuron obecause independency is still required in general for accuratestochastic multiplication

The accuracy of the accumulated result is also related to thebit-width of the counter in the stochastic integrator The bitwidth does not only determine the resolution of the result butalso the step size especially in the ODE solver adaptive filterand the gradient descent circuit A larger bit width indicatesa smaller resolution and a smaller step size for fine tuning atthe cost of extra computation time It is shown in [56] that themulti-step variance bound exponentially decreases with the bitwidth of the counter However for the IIR filter the changeof bit-width affects the characteristic of the filter

For the ODE solver circuit it is also found that the use ofquasirandom numbers improves the accuracy compared to theuse of pseudorandom numbers [54] However when the DSS isused to encode signals from the input or images such as in theadaptive filter and the gradient descent algorithm quasirandomand pseudorandom numbers produce similar accuracy Thereason may lie in the fact that the values of the encodedsignals in these two applications are independently selectedfrom the training dataset However for the ODE solver thesignals encoded are samples from continuous functions andadjacent samples have similar values As a result withina short segment of the DSS in this case it approximatelyencodes the same value It then works similarly as CSCfor which the Sobol sequence helps improving the accuracyFig 26 compares the convergence of misalignment duringthe training of the adaptive filter using pseudorandom andquasirandom numbers 16-bit stochastic integrators are usedin this experiment The misalignment is given by

Misalignment =E[(w minus w)2]

E[w2] (15)

where w denotes the target value and w is the trained resultIt can be seen that the convergence speed of using these twotypes of random numbers is similar and their misalignmentsboth converge to around -52 dB

TABLE II Hardware evaluation of the gradient descent circuitarray training a 784-128-128-10 neural network

Metrics DSC Fixed-point Ratio

Step size 2minus10 2minus10 -Epochs 20 20 -Min time (ns) 16times 106 47times 106 13EPO (fJ) 12times 107 11times 108 192TPA (imagesmicrom2) 57times 101 15 381Aver test Accu 9704 9749 -

TABLE III Hardware performance comparison of the ODEsolvers for (8)

Metric DSC Fixed-point Ratio

Step size 2minus8 2minus8 -Iterations 3217 3217 -Min time (ns) 257359 855720 133EPO (fJ) 20121 46600 123TPA (wordmicrosmicrom2) 475 058 821RMSE 47times 10minus3 36times 10minus3 -

VI HARDWARE EFFICIENCY EVALUATION

Due to the extremely low sequence length used for encodinga value in DSC the power consumption of DSC circuits ismuch lower than conventional arithmetic circuits Moreoversince each bit in a DSS is used to encode a number DSCis more energy-efficient than CSC The gradient descent andODE solver circuits are considered as examples to show thehardware efficiency Implemented in VHDL the circuits aresynthesized by Synopsys Design Compiler with a 28-nm tech-nology to obtain the power consumption maximum frequencyand hardware measurements The minimum runtime energyper operation (EPO) throughput per area (TPA) and accuracyare reported in Table II for the gradient descent circuits andin Table III for ODE solvers

As can be seen with a slightly degraded accuracy theDSC circuits outperform conventional circuits using a fixed-point representation in speed energy efficiency and hardwareefficiency For the gradient descent circuit since the two RNGscan be shared among the 118282 stochastic integrators (for118282 weights) the improvement in hardware efficiency iseven more significant than the ODE solver for which one RNG

0 05 1 15 2 25 3Number of steps ( 64 input samples) 104

-60

-50

-40

-30

-20

-10

0

Mis

alig

nmen

t (dB

)

PseudorandomQuasirandom

Fig 26 Comparison of the convergence of misalignment usingpseudorandom and quasirandom numbers

is shared between two stochastic integrators However sincethe sharing of RNGs does not significantly affect the criticalpath the performance improvements are similar for these twocircuits

VII CONCLUSION

In this article a new type of stochastic computing dynamicstochastic computing (DSC) is introduced to encode a varyingsignal by using a dynamically variable binary bit streamreferred to as a dynamic stochastic sequence (DSS) A DSCcircuit can efficiently implement an iterative accumulation-based algorithm It is useful in implementing the Euler methodtraining a neural network and an adaptive filter and IIRfiltering In those applications integrations are implementedby stochastic integrators that accumulate the stochastic bits in aDSS Since the computation is performed on single stochasticbits instead of conventional binary values or long stochasticsequences a significant improvement in hardware efficiencyis achieved with a high accuracy

ACKNOWLEDGMENT

This work was supported by the Natural Sciences andEngineering Research Council of Canada (NSERC) underProjects RES0025211 and RES0048688

REFERENCES

[1] B R Gaines Stochastic Computing Systems Boston MA SpringerUS 1969 pp 37ndash172

[2] W Qian X Li M D Riedel K Bazargan and D J Lilja ldquoAnarchitecture for fault-tolerant computation with stochastic logicrdquo IEEETransactions on Computers vol 60 no 1 pp 93ndash105 2011

[3] P Li and D J Lilja ldquoUsing stochastic computing to implementdigital image processing algorithmsrdquo in 2011 IEEE 29th InternationalConference on Computer Design (ICCD) 2011 pp 154ndash161

[4] S C Smithson N Onizawa B H Meyer W J Gross and T HanyuldquoEfficient CMOS invertible logic using stochastic computingrdquo IEEETransactions on Circuits and Systems I Regular Papers vol 66 no 6pp 2263ndash2274 June 2019

[5] W Maass and C M Bishop Pulsed neural networks MIT press 2001[6] Y Liu S Liu Y Wang F Lombardi and J Han ldquoA stochastic

computational multi-layer perceptron with backward propagationrdquo IEEETransactions on Computers vol 67 no 9 pp 1273ndash1286 2018

[7] J Zhao J Shawe-Taylor and M van Daalen ldquoLearning in stochastic bitstream neural networksrdquo Neural Networks vol 9 no 6 pp 991ndash9981996

[8] B D Brown and H C Card ldquoStochastic neural computation Icomputational elementsrdquo IEEE Transactions on Computers vol 50no 9 pp 891ndash905 2001

[9] D Zhang and H Li ldquoA stochastic-based FPGA controller for aninduction motor drive with integrated neural network algorithmsrdquo IEEETransactions on Industrial Electronics vol 55 no 2 pp 551ndash561 2008

[10] Y Ji F Ran C Ma and D J Lilja ldquoA hardware implementationof a radial basis function neural network using stochastic logicrdquo inProceedings of the Design Automation amp Test in Europe Conference(DATE) 2015 pp 880ndash883

[11] V Canals A Morro A Oliver M L Alomar and J L RosselloldquoA new stochastic computing methodology for efficient neural networkimplementationrdquo IEEE Transactions on Neural Networks and LearningSystems vol 27 no 3 pp 551ndash564 2016

[12] A Ardakani F Leduc-Primeau N Onizawa T Hanyu and W J GrossldquoVLSI implementation of deep neural network using integral stochasticcomputingrdquo IEEE Transactions on Very Large Scale Integration (VLSI)Systems vol 25 no 10 pp 2688ndash2699 2017

[13] A Ren Z Li C Ding Q Qiu Y Wang J Li X Qian and B YuanldquoSC-DCNN Highly-scalable deep convolutional neural network us-ing stochastic computingrdquo in Proceedings of the 22nd InternationalConference on Architectural Support for Programming Languages andOperating Systems (ASPLOS) 2017 pp 405ndash418

[14] V T Lee A Alaghi J P Hayes V Sathe and L Ceze ldquoEnergy-efficienthybrid stochastic-binary neural networks for near-sensor computingrdquo inDesign Automation amp Test in Europe Conference amp Exhibition (DATE)2017 pp 13ndash18

[15] R Hojabr K Givaki S Tayaranian P Esfahanian A Khonsari D Rah-mati and M H Najafi ldquoSkippyNN An embedded stochastic-computingaccelerator for convolutional neural networksrdquo in Proceedings of the56th Annual Design Automation Conference (DAC) 2019 pp 1ndash6

[16] Y Liu L Liu F Lombardi and J Han ldquoAn energy-efficient and noise-tolerant recurrent neural network using stochastic computingrdquo IEEETransactions on Very Large Scale Integration (VLSI) Systems vol 27no 9 pp 2213ndash2221 Sep 2019

[17] R Cai A Ren O Chen N Liu C Ding X Qian J Han W LuoN Yoshikawa and Y Wang ldquoA stochastic-computing based deeplearning framework using adiabatic quantum-flux-parametron supercon-ducting technologyrdquo in Proceedings of the 46th International Symposiumon Computer Architecture ser ISCA rsquo19 New York NY USA ACM2019 pp 567ndash578

[18] A Alaghi and J P Hayes ldquoSurvey of stochastic computingrdquo ACMTrans Embed Comput Syst vol 12 no 2s pp 921ndash9219 May 2013

[19] A Zhakatayev S Lee H Sim and J Lee ldquoSign-magnitude SC getting10x accuracy for free in stochastic computing for deep neural networksrdquoin Proceedings of the 55th Annual Design Automation Conference(DAC) 2018 pp 1ndash6

[20] P K Gupta and R Kumaresan ldquoBinary multiplication with PN se-quencesrdquo IEEE Transactions on Acoustics Speech and Signal Process-ing vol 36 no 4 pp 603ndash606 1988

[21] A Alaghi and J P Hayes ldquoFast and accurate computation using stochas-tic circuitsrdquo in Proceedings of the conference on Design Automation ampTest in Europe (DATE) 2014 pp 1ndash4

[22] S Liu and J Han ldquoEnergy efficient stochastic computing with Sobolsequencesrdquo in Design Automation amp Test in Europe Conference ampExhibition (DATE) 2017 pp 650ndash653

[23] H Niederreiter Random Number Generation and quasi-Monte CarloMethods Philadelphia PA USA Society for Industrial and AppliedMathematics 1992

[24] P Bratley and B L Fox ldquoAlgorithm 659 Implementing Sobolrsquosquasirandom sequence generatorrdquo ACM Transactions on MathematicalSoftware (TOMS) vol 14 no 1 pp 88ndash100 1988

[25] S Liu and J Han ldquoToward energy-efficient stochastic circuits usingparallel sobol sequencesrdquo IEEE Transactions on Very Large ScaleIntegration (VLSI) Systems vol 26 no 7 pp 1326ndash1339 July 2018

[26] A Alaghi and J P Hayes ldquoSTRAUSS Spectral transform use instochastic circuit synthesisrdquo IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems vol 34 no 11 pp 1770ndash1783 2015

[27] P Li D J Lilja W Qian K Bazargan and M Riedel ldquoThe synthesisof complex arithmetic computation on stochastic bit streams usingsequential logicrdquo in Proceedings of the International Conference onComputer-Aided Design (ICCAD) 2012 pp 480ndash487

[28] P Mars and W J Poppelbaum Stochastic and deterministic averagingprocessors Peter Peregrinus Press 1981 no 1

[29] Z Li A Ren J Li Q Qiu Y Wang and B Yuan ldquoDSCNN Hardware-oriented optimization for stochastic computing based deep convolutionalneural networksrdquo in IEEE 34th International Conference on ComputerDesign (ICCD) 2016 pp 678ndash681

[30] A Ren Z Li Y Wang Q Qiu and B Yuan ldquoDesigning reconfigurablelarge-scale deep learning systems using stochastic computingrdquo in IEEEInternational Conference on Rebooting Computing (ICRC) 2016 pp1ndash7

[31] K Kim J Kim J Yu J Seo J Lee and K Choi ldquoDynamic energy-accuracy trade-off using stochastic computing in deep neural networksrdquoin Proceedings of the 53rd Annual Design Automation Conference(DAC) 2016 pp 1241ndash1246

[32] H Sim and J Lee ldquoA new stochastic computing multiplier withapplication to deep convolutional neural networksrdquo in Proceedings ofthe 54th Annual Design Automation Conference (DAC) 2017 pp 1ndash6

[33] Y Liu Y Wang F Lombardi and J Han ldquoAn energy-efficient online-learning stochastic computational deep belief networkrdquo IEEE Journal

on Emerging and Selected Topics in Circuits and Systems vol 8 no 3pp 454ndash465 Sep 2018

[34] J F Keane and L E Atlas ldquoImpulses and stochastic arithmetic forsignal processingrdquo in IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP) vol 2 2001 pp 1257ndash1260

[35] R Kuehnel ldquoBinomial logic extending stochastic computing to high-bandwidth signalsrdquo in Conference Record of the Thirty-Sixth AsilomarConference on Signals Systems and Computers vol 2 2002 pp 1089ndash1093

[36] N Yamamoto H Fujisaka K Haeiwa and T Kamio ldquoNanoelectroniccircuits for stochastic computingrdquo in 6th IEEE Conference on Nanotech-nology (IEEE-NANO) vol 1 2006 pp 306ndash309

[37] K K Parhi and Y Liu ldquoArchitectures for IIR digital filters usingstochastic computingrdquo in IEEE International Symposium on Circuitsand Systems (ISCAS) 2014 pp 373ndash376

[38] B Yuan Y Wang and Z Wang ldquoArea-efficient scaling-free DFTFFTdesign using stochastic computingrdquo IEEE Transactions on Circuits andSystems II Express Briefs vol 63 no 12 pp 1131ndash1135 2016

[39] R Wang J Han B F Cockburn and D G Elliott ldquoStochastic circuitdesign and performance evaluation of vector quantization for differenterror measuresrdquo IEEE Transactions on Very Large Scale Integration(VLSI) Systems vol 24 no 10 pp 3169ndash3183 2016

[40] H Ichihara T Sugino S Ishii T Iwagaki and T Inoue ldquoCompactand accurate digital filters based on stochastic computingrdquo IEEE Trans-actions on Emerging Topics in Computing vol 7 no 1 pp 31ndash432016

[41] T Hammadou M Nilson A Bermak and P Ogunbona ldquoA 96spltimes64 intelligent digital pixel array with extended binary stochasticarithmeticrdquo in Proceedings of the International Symposium on Circuitsand Systems (ISCAS) 2003

[42] A Alaghi C Li and J P Hayes ldquoStochastic circuits for real-timeimage-processing applicationsrdquo in Proceedings of the 50th AnnualDesign Automation Conference (DAC) 2013 pp 1361ndash1366

[43] P Li D J Lilja W Qian K Bazargan and M D Riedel ldquoComputationon stochastic bit streams digital image processing case studiesrdquo IEEETransactions on Very Large Scale Integration (VLSI) Systems vol 22no 3 pp 449ndash462 2014

[44] M H Najafi S Jamali-Zavareh D J Lilja M D Riedel K Bazarganand R Harjani ldquoTime-encoded values for highly efficient stochasticcircuitsrdquo IEEE Transactions on Very Large Scale Integration (VLSI)Systems vol 25 no 5 pp 1644ndash1657 2017

[45] S R Faraji M Hassan Najafi B Li D J Lilja and K BazarganldquoEnergy-efficient convolutional neural networks with deterministic bit-stream processingrdquo in 2019 Design Automation Test in Europe Confer-ence Exhibition (DATE) 2019 pp 1757ndash1762

[46] S Liu and J Han ldquoDynamic stochastic computing for digital signalprocessing applicationsrdquo in Design Automation amp Test in EuropeConference amp Exhibition (DATE) 2020

[47] I L Dalal D Stefan and J Harwayne-Gidansky ldquoLow discrepancysequences for Monte Carlo simulations on reconfigurable platformsrdquo inInternational Conference on Application-Specific Systems Architecturesand Processors (ASAP) 2008 pp 108ndash113

[48] W G McCallum D Hughes-Hallett A M Gleason D O LomenD Lovelock J Tecosky-Feldman T W Tucker D Flath T ThrashK R Rhea et al Multivariable calculus Wiley 1997 vol 200 no 1

[49] S S Tehrani A Naderi G-A Kamendje S Hemati S Mannor andW J Gross ldquoMajority-based tracking forecast memories for stochasticLDPC decodingrdquo IEEE Transactions on Signal Processing vol 58no 9 pp 4883ndash4896 2010

[50] N Saraf K Bazargan D J Lilja and M D Riedel ldquoIIR filtersusing stochastic arithmeticrdquo in Design Automation amp Test in EuropeConference amp Exhibition (DATE) 2014 pp 1ndash6

[51] F Maloberti ldquoNon conventional signal processing by the use of sigmadelta technique a tutorial introductionrdquo in Proceedings of IEEE Inter-national Symposium on Circuits and Systems (ISCAS) vol 6 1992 pp2645ndash2648

[52] J C Butcher The Numerical Analysis of Ordinary Differential Equa-tions Runge-Kutta and General Linear Methods USA Wiley-Interscience 1987

[53] W Gerstner W M Kistler R Naud and L Paninski Neuronaldynamics From single neurons to networks and models of cognitionCambridge University Press 2014

[54] S Liu and J Han ldquoHardware ODE solvers using stochastic circuitsrdquo inProceedings of the 54th Annual Design Automation Conference (DAC)2017 pp 1ndash6

[55] S S Haykin Neural networks and learning machines Pearson UpperSaddle River NJ USA 2009 vol 3

[56] S Liu H Jiang L Liu and J Han ldquoGradient descent using stochasticcircuits for efficient training of learning machinesrdquo IEEE Transactionson Computer-Aided Design of Integrated Circuits and Systems vol 37no 11 pp 2530ndash2541 Nov 2018

[57] R Nishihara P Moritz S Wang A Tumanov W Paul J Schleier-Smith R Liaw M Niknami M I Jordan and I StoicaldquoReal-time machine learning The missing piecesrdquo in Proceedingsof the 16th Workshop on Hot Topics in Operating Systemsser HotOS rsquo17 ACM 2017 pp 106ndash110 [Online] Availablehttpdoiacmorg10114531029803102998

[58] J P Hayes ldquoIntroduction to stochastic computing and its challengesrdquo inProceedings of the 52nd Annual Design Automation Conference (DAC)2015 pp 591ndash593

Page 3: Introduction to Dynamic Stochastic Computing

Synapse

Dendrites

Axon

Soma(cell)

Membrane

VM (t+1) = VM (t) + Σεij - Σιij

if VM gt Vth reset and fire

else do not fire

εij Excitatory inputs of the neuron

ιij Inhibitory inputs of the neuron

VM Membrane potential

Vth Threshold voltage

+

-

Uij-1Ui-1jUi+1jUij+1

uij

Uij

+3

Usum

divide4

+

-

Uij-1

Ui-1j

Ui+1j

Uij+1 uij

Uij

+3

cN

CS

CE

CW

CN

3+

CS

CE

CW

CN

Counter (starting

from i=0)

Priority encoder

indexk

Direction vector array (DVA)

V3

V2

V1

V0

hellip

Ri

Vk

Ri+1D-flip flops

Least significant zero (LSZ) detection and index generation

Fig 5 A Sobol sequence generator [25] It can be used as theRNG in an SNG to improve the accuracy

a Monte Carlo simulation process [21] For example thefunction of a unipolar stochastic multiplier is equivalent toestimating the rectangular area determined by p1p2 where p1and p2 are the probabilities encoded by the input sequencesby randomly dropping points into a unit square The ANDgate is used to decide whether the randomly generated pointwith coordinates (r1 r2) is within the rectangle Only whenboth r1 lt p1 and r2 lt p2 is the random point located withinthe rectangle as shown in Fig 6 and the output of the ANDgate is lsquo1rsquo The counter is then used to count the total numberof points within the rectangle as an estimate of p1p2

MUX

01100110hellip

p1=05

10101010hellip

p2=05

00100010hellip

p=p1p2=025

01100101hellip

p1=05

11101110hellip

p2=075

sel

psel=05

00110110hellip

0

1 p=(1-psel)p1+pselp2=0625

01100111hellip (a)

(c)

10010000hellip

00101000hellip01000111hellip times2-1= -05

416

p1=

times2-1= -05 416

p2=

times2-1= 025 1016

p1p2=

(b) A bipolar stochastic multiplier

00101000hellip 11010111hellip

p1=025 p=1-p1=075

(b)

p1

p2

p=p1p2

1

1

p1

p2

(1 1)

AltB

AltB

r1

r2Counter

(b) A bipolar stochastic multiplier

Stochastic number generator (SNG)

array

Binary inputs(from memory

devices)

Stochastic sequences

Stochastic computing

circuits

Stochastic sequences

Probability estimators

(PEs)

Binary outputs

+

-

DSNGDigital signal

Bipolar stochastic

sequence encodinglsquo0rsquo

Integration results

Output DSS

MUX

+SNGarray

y

x

Stochastic sequences encoding coefficients

z0z1

zN

hellip

N copies of independent stochastic sequences that

each encodes x

SEL0

1hellip

N

hellip

A

B

+

-

+

-

A

B

AB

(a) (b)

B(A+B)

(a) (b)

hellip

k

+

-

p

p

(c)

^

Fig 6 (a) A unipolar stochastic multiplier and (b) its MonteCarlo model

Fig 7 shows several stochastic multipliers in differentrepresentations a bipolar multiplier is implemented by anXNOR gate and the sign-magnitude stochastic multiplier isimplemented by a unipolar multiplier and an XOR gate forcomputing the sign bit Generic methods have also beenproposed to synthesize combinational circuits for computingBernstein polynomials [2] or multi-linear functions [26]

Inputs Hidden layers

Outputs

(a)

hellip

x

w o=f(wx)

hellip

(b)

(c)

wl+1

δl+1o wl

hellip

δwl = f`wl+1δl+1

hellip

Random number generator (RNG)

p

Stochastic sequence with probability p

0101100hellip

Comparator

B

AAltB

N

N

(a)

CLK

01100110hellip

p1=05

10101010hellip

p2=05

00100010hellip

p=p1p2=025

(a) A stochastic multiplier for the unipolar representation

0100010010010000hellip

1000100010001000hellip0011001111100111hellip

times2-1= -05 4

16p1=

times2-1= -05 416

p2=

times2-1= 025 1016

p1p2=

(b) A stochastic multiplier for the bipolar representation

01100110hellip

|p1|=05

10101010hellip

|p2|=05

00100010hellip

|p|=|p1||p2|=025

(c) A stochastic multiplier for the sign-magnitude representation

|p1|=05

Sign(p1)=1(negative)

Sign(p2)=0(positve)

Sign(p)=1(negative)

Fig 7 Three stochastic multipliers for different representa-tions The input sequences should be uncorrelated for a higheraccuracy

There are primarily two types of sequential circuits theFSM-based and stochastic integrator-based circuits An FSM-based circuit can be modeled by a state transition diagram asshown in Fig 8 For an input bit of lsquo0rsquo the FSM moves tothe left of its current state otherwise it moves to the rightNote that when the FSM is at the leftmost (or rightmost)state it stays at the current state when receiving a lsquo0rsquo (or lsquo1rsquo)The FSM can be considered as a Markov chain if the inputsequence is not autocorrelated or every bit in the sequence isgenerated independently with the same probability to be lsquo1rsquo Asteady hyper-state can be reached after several runs where theprobability that the FSM staying at each state converges [1]By assigning a different output (lsquo0rsquo or lsquo1rsquo) to each stateexponential logarithmic and high-order polynomial functionscan be implemented by the FSM-based circuits A generalsynthesis method is discussed in [27]

S0 S1 Sk-1 Sk Sk+1 SN-2 SN-1helliphelliphelliphellip

X=0X=0X=0X=0X=0X=0

X=1 X=1 X=1 X=1 X=1 X=1 X=1

X=0

Fig 8 State transition diagram for an FSM-based stochasticcircuit

Stochastic integrators are sequential SC elements that accu-mulate the difference between two input sequences [1] Thestochastic integrator consists of an RNG an updown counterand a comparator as shown in Fig 9(a) Let the output ofthe comparator be Si (either lsquo0rsquo or lsquo1rsquo) at clock cycle i theinput bits be Ai and Bi and the n-bit integer stored in thecounter be Ci The RNG and the comparator work as an SNGand the probability of generating a lsquo1rsquo is ci = Ci2

n ie then-bit counter stores a number in a fixed-point manner and allthe bits belong to the fractional part The counter updates itsvalue Ci at each clock cycle by Ci+1 = Ci + Ai minus Bi orequivalently

ci+1 = ci + 12n(Ai minusBi) (1)

In k clock cycles the stochastic integrator implements

ck = c0 + 12nkminus1sumi=0

(Ai minusBi) (2)

where c0 is the initial (fractional) value stored in the counterA stochastic integrator works in a similar manner to a

spiking neuron as shown in Fig 9 The stochastic integratorldquofiresrdquo randomly depending on the value stored in the counterwhereas the spiking neuron fires when the accumulated mem-brane potential VM reaches a threshold Vth [5] The inputs tothe neuron can be excitatory (εij) or inhibitory (ιij) while theinput sequences A and B of the stochastic integrator increaseand decrease the value stored in the counter

The integrator has been used in a stochastic divider withthe output stochastic sequence as the feedback signal [1] asshown in Figs 10(a) and (b) The quotient is obtained whenthe value stored in the counter reaches an equilibrium stateie the updown counter has an equal probability to increase

Rate = average over pool of equivalent neurons(several neuron single runs)

j=1

2

3

hellip

A =1Δt M

n(t t+Δt)M

Δt

j=1

2

3

M

hellip

01010

11000

00010

01000

p =M

N

L

010100101010110100

A =Δt

n (t t+Δt)

p =L

N

Δt

L

N number of lsquo1rsquos in the sequence

n() number of spikes in the time window

2n-state Counter INC

DEC

RNG

gt S

A

B

comparator

C

ri

C i+1 = C i + Ai - Bi

if Ci gt ri S = 1

else S = 0

p encoded value

A encoded value

(a) A stochastic integrator

S+

-

CA

B

(b) A symbol

Synapse

Dendrites

Axon

Soma(cell)

Membrane

VM (t+1) = VM (t) + Σεij - Σιij

if VM gt Vth reset and fire

else do not fire

εij Excitatory inputs of the neuron

ιij Inhibitory inputs of the neuron

VM Membrane potential

Vth Threshold voltage

(c) An integrate-and-fire spiking neuron model

Fig 9 (a) A stochastic integrator (b) its symbol and (c) aspiking neuron [5]

or decrease However it may take a long time to converge tothe equilibrium state as discussed in [6] [25] A stochasticintegrator with a feedback loop has been used as an ADpativeDIgital Element (ADDIE) shown in Fig 10(c) The ADDIEcan function as a probability follower ie the probability ofthe output sequence tracks the change of the probability of theinput sequence [1] [28]

MUX

01100110hellip

p1=05

10101010hellip

p2=05

00100010hellip

p=p1p2=025

01100101hellip

p1=05

11101110hellip

p2=075

sel

psel=05

00110110hellip

0

1 p=(1-psel)p1+pselp2=0625

01100111hellip (a)

(c)

10010000hellip

00101000hellip01000111hellip times2-1= -05

416

p1=

times2-1= -05 416

p2=

times2-1= 025 1016

p1p2=

(b) A bipolar stochastic multiplier

00101000hellip 11010111hellip

p1=025 p=1-p1=075

(b)

p1

p2

p=p1p2

1

1

p1

p2

(1 1)

AltB

AltB

RN1

RN2

Counter

(b) A bipolar stochastic multiplier

Stochastic number generator (SNG)

array

Binary inputs(from memory

devices)

Stochastic sequences

Stochastic computing

circuits

Stochastic sequences

Probability estimators

(PEs)

Binary outputs

+

-

DSNGDigital signal

Bipolar stochastic

sequence encodinglsquo0rsquo

Integration results

Output DSS

MUX

+SNGarray

y

x

Stochastic sequences encoding coefficients

z0z1

zN

hellip

N copies of independent stochastic sequences that

each encodes x

SEL0

1hellip

N

hellip

A

B

+

-

+

-

A

B

AB

(a) (b)

B(A+B)

(a) (b)

hellip

k

+

-

p

p

(c)

^

Fig 10 Stochastic dividers compute (a) AB (b) B(A+B)(c) shows an ADDIE

C Probability estimatorsThe computed probability of the output sequence can be

found by using a counter It is estimated as the quotient

of the number of lsquo1rsquos divided by the sequence length TheADDIE in Fig 10(c) is an alternative component to estimatethe probability by using a stochastic integrator reaching theequilibrium state [1]

D Limitations

Although the SC circuits are much simpler than conven-tional arithmetic circuits a large sequence length is requiredto achieve a high accuracy This requirement leads to poorperformance and low energy efficiency Fig 11 shows the rootmean square error (RMSE) encoding a number in the unipolarrepresentation using different types of ldquorandomrdquo numbers Thepseudorandom numbers are generated by a 16-bit LFSR withrandomly generated seeds The RMSEs are obtained from1000 trials using different sequence lengths As shown inFig 11 to achieve an RMSE of 1times10minus2 a sequence length of212 bits is required for the LFSR generated sequences For thestochastic Sobol sequences the length is significantly reducedto 26 or 64 bits to achieve a similar accuracy although it isstill not as compact as an equivalent 5-bit fixed-point numberthat results in a similar RMSE Meanwhile low precision (egat an RMSE of 1times 10minus2 or larger) can be tolerated in manyapplications such as inference in a machine learning model[12]ndash[17] [29]ndash[33] digital signal processing [34]ndash[40] andimage processing [41]ndash[44] Hence progressive precision (PP)can be employed and the Sobol sequence length can be as lowas 8 bits in an application in a CNN inference engine [45]

0 2 4 6 8 10 12

Sequence length (2N)

10-4

10-3

10-2

10-1

100

RM

SE

Sobol sequencePseudorandom sequence

Fig 11 RMSE of the encoded numbers with different typesof stochastic sequences

Despite the fact that some of these implementations achievea lower energy consumption with PP compared with conven-tional circuits the application of conventional SC (CSC) isstill quite limited

III DYNAMIC STOCHASTIC COMPUTING

In a DSC system as shown in Fig 12 the stochasticsequences encode varying signals instead of static numbersConsider a discrete-time digital signal xi within [0 1] (i =0 1 2 ) A DSS encoding xi satisfies that E[Xi] = xiwhere Xi is the ith bit in the DSS Thus every bit in thesequence can have a different probability to be 1

Digital signals (from ADCs or

storage)

DSNGsSignal

reconstruction To be stored or converted back to

analog signals

Stochastic logicDynamic

stochastic sequences

Dynamic stochastic sequences

Fig 12 A DSC system

RNGDynamic stochastic sequence

encoding signal xi

111010001011hellipComparator

B

AAltB

xi

Fig 13 A DSNG xi is an input to the comparator at onesample per clock cycle

A DSS generation

A DSS can be generated by a dynamic SNG (DSNG) asshown in Fig 13 It is similar to a conventional SNG exceptthat the input is a varying signal instead of a static number TheRNG generates uniformly distributed random numbers within[0 1] Given a discrete signal xi as shown in Fig 13 itis compared with a random number one sample at a timeIf xi is larger than the random number a lsquo1rsquo is generatedotherwise the DSNG generates a lsquo0rsquo Generating every bit inthe DSS is a Bernoulli process (in the ideally random case)so the expectation getting a lsquo1rsquo is equal to the correspondingsample value from the discrete digital signal The discretesignal can either be sampled from an analog signal or readfrom a memory device

B Data representations in DSC

Similar to CSC all the aforementioned mapping schemescan be used to encode numbers within certain ranges in DSCwhich subsequently requires the use of different computationalelements Specifically a DSS can encode a signal within [0 1]in the unipolar representation or within [minus1 1] in the bipolaror sign-magnitude representation For the sign-magnitude rep-resentation two sequences are used to encode a signal Oneencodes the magnitude of the signal Since the signal can haveboth positive and negative values another sequence is used torepresent the signs of the numbers

C Stochastic circuits

The DSC circuits can be combinational or sequential1) Combinational circuits A combinational circuit using

DSSrsquos as its inputs implements the composite function of theoriginal stochastic function For example an AND gate or aunipolar stochastic multiplier computes zi = xiyi for eachpair of bits in the input DSSrsquos that independently encode xiand yi (i = 0 1 2 ) respectively [46] due to E[Zi] =E[XiYi] = E[Xi]E[Yi] = xiyi where Zi Xi and Yi are theith bits in the output and input DSSrsquos respectively Note thatfor a continuous signal oversampling is required to producethe discrete signals xi and yi for accurate multiplication [46]Fig 14(a) shows the DSC circuit for a signal multiplicationand (b) shows the results of multiplying two sinusoidal signals

DSNG Signal Xx(t)

DSNG

DSNG

Signal reconstruction

X

Y

x(t)

y(t)ziZ

(a)

0 02 04 06 08 1 12 14 160

05

1

x(t)

Original signalDSS

0 02 04 06 08 1 12 14 160

05

1

y(t)

0 05 1 15t (s)

0

05

1

z(t)

Double precisionDynamic stochastic

(b)

Fig 14 A frequency mixer by using a stochastic multiplierwith DSSrsquos as the inputs (a) circuit and (b) results

with frequencies of 15 Hz and 2 Hz both sampled at a rateof 210 Hz The sampled signals are converted to DSSrsquos by twoDSNGs with independent Sobol sequence generators [47] asthe RNGs The output DSS is reconstructed to a digital signalby a moving average [28] The DSS can also be reconstructedby an ADDIE [1]

2) Sequential circuits The stochastic integrator is an im-portant component used in DSC for iterative accumulation Itimplements integrations of the signal encoded by accumulatingthe stochastic bits in a DSS Fig 15 shows that a bipolarstochastic integrator can be used to perform numerical inte-gration In Fig 15(a) the input digital signal is cos(πt2)sampled at a sampling rate of 28 Hz and converted to a DSSby the DSNG The other input of the stochastic integrator isset to a bipolar stochastic sequence encoding 0 (a stochasticbit stream with half of the bits being lsquo1rsquo) By accumulatingthe input bits the stochastic integrator provides an unbiasedestimate to the Riemann sum [48] The integration is obtainedby recording the changing values of the counter Since theintegrated results are directly provided by the counter areconstruction unit is not required

As can be seen from Fig 15(b) the results produced bythe stochastic integrator is very close to the analytical resultsSince the output DSS is generated by comparing the changingsignal (as the integration results) with uniformly distributedrandom numbers the RNG and the comparator inside theintegrator works as a DSNG As a result the output sequenceencodes the discretized stochastic integration solution for theanalytical solution (2π) sin(πt2) The output DSS is alsoshown in Fig 15(b)

MUX

01100110hellip

p1=05

10101010hellip

p2=05

00100010hellip

p=p1p2=025

01100101hellip

p1=05

11101110hellip

p2=075

sel

psel=05

00110110hellip

0

1 p=(1-psel)p1+pselp2=0625

01100111hellip (a)

(c)

10010000hellip

00101000hellip01000111hellip times2-1= -05

416

p1=

times2-1= -05 416

p2=

times2-1= 025 1016

p1p2=

(b) A bipolar stochastic multiplier

00101000hellip 11010111hellip

p1=025 p=1-p1=075

(b)

p1

p2

p=p1p2

1

1

p1

p2

(1 1)

AltB

AltB

RN1

RN2

Counter

(b) A bipolar stochastic multiplier

Stochastic number generator (SNG)

array

hellip

Binary inputs(from memory

devices)

hellip

Stochastic sequences

Stochastic computing

circuits

Stochastic sequences

Probability estimators

(PEs)

hellip hellip

Binary outputs

+

-

DSNGDigital signal

Bipolar stochastic

sequence encodinglsquo0rsquo

Integration results

Output DSS

(a)

0 05 1 15 2 25 3 35 4t

-1

-05

0

05

1

Analytical resultsResults by DSCOriginal signalOutput DSS

(b)

Fig 15 A bipolar stochastic integrator for numerical integra-tion (a) circuit and (b) results The output DSS is decimatedfor a clear view

The stochastic integrator has been used in a stochastic low-density parity-check (LDPC) decoder to implement a trackingforecast memory in [49] It solves the latch-up problem in thedecoding process with a low hardware complexity

For the FSM-based sequential circuits however the DSScannot work well because the probability consistently changesand a steady hyper-state is hard to reach The DSS could alsoviolate the condition of the model being a Markov chain dueto the autocorrelation of the encoded signal

IV APPLICATION OF DSC

By using the stochastic integrators DSC can implement aseries of algorithms that involve iterative accumulations suchas an IIR filter the Euler method and a GD algorithm

A IIR filtering

In [50] it is shown that an ADDIE with different parameterscan be used as a low-pass IIR filter with different frequencyresponses as shown in Fig 16 Given the bit width of thestochastic integrator n the transfer function of such a filter inthe Z-domain is given by

H(Z) =1

2nZ + 1minus 2n (3)

for a stable first-order low-pass IIR filter [50] An oversampled∆ minus Σ modulated bit stream is then used as the input of thecircuit Since the generation of a ∆minusΣ modulated bit stream

2n-state Counter INC

DEC

RNG

gtSeqout

A

BSeqout

(a) (b)

comparator

A

B

+

-

Countout

Countout

Seqout

(b)

+

-

Countout

X

+

-

p

y(t)

Fig 16 ADDIE circuit produces an exponential function whenthe input is a sequence encoding a static number

0 01 02 03 04t (s)

(a) Original and filtered signals

-1

0

1

0 50 100 150 200 250f (Hz)

(b) Spectra of original and filtered signals

0

02

04Original signalFiltered signal

Fig 17 The low-pass filtering results in (a) the time domainand (b) the frequency domain

can also be considered as a Bernoulli process [51] the DSScan be used for IIR filtering as well

Fig 17 shows the filtering of mixed sinusoidal signals of 4Hz and 196 Hz The mixture is first sampled at a sampling rateof 656k Hz and the sampled digital signal is then convertedto a DSS The DSS is filtered by an 8-bit ADDIE As can beseen the high-frequency signals are almost filtered out whilethe low-frequency signals remain

B ODE solvers

The Euler method provides numerical solutions for ODEsby accumulating the derivative functions [52] For an ODEdydt = f(t y(t)) the estimated solution y(t) is given by

y(ti+1) = yi+1 = yn + hf(ti yi) (4)

with h as the step size and ti+1 = ti + h After k steps ofaccumulation

yk = y0 + h

kminus1sumi=0

f(ti yi) (5)

To use a stochastic integrator let the pair of input DSSrsquos(eg A and B in (2)) encode the discrete derivative functionf(ti yi) such that E[Ai minus Bi] = f(ti yi) By doing sothe stochastic integrator implements (2) and approximatelycomputes (5) with a step size h = 12n As a result the bitwidth of the counter does not only decide the precision of thecomputation but also the time resolution The integration in theEuler method is implemented by accumulating the stochasticbits in the DSSrsquos instead of real values

Using this method an exponential function can be generatedby a stochastic integrator The ADDIE circuit as shown inFig 16 solves

dy(t)

dt= pminus y(t) (6)

which is the differential equation for a leaky integratormodel [53] Thus the ADDIE can be used to implement

neuronal dynamics in a leaky integrate-and-fire model Theanalytical solution of (6) is given by

y(t) = pminus (pminus y0)eminust (7)

where y0 is provided in the initial condition ie y(0) = y0This model is commonly seen in natural processes such asneuronal dynamics [53] Fig 18 shows the results producedby the ADDIE when y0 = 1 and p = 0

0 2 4 6 8t

0

02

04

06

08

1

y(t)

DSCAnalytical

Fig 18 ADDIE used as an ODE solver The solution is anexponential function

For a set of ODEsdy1(t)

dt = y2(t)minus 05dy2(t)

dt = 05minus y1(t)(8)

with the initial values y1(0) = 0 and y2(0) = 05 a numericalsolution can be found by the circuit in Fig 19 The upperintegrator computes the numerical estimate of y1(t) by takingthe output DSS from the lower integrator and a conventionalstochastic sequence encoding 05 as inputs So the differenceof the two input sequences approximately encodes y2(t)minus05the derivative of y1(t) Simultaneously the output DSS of theupper stochastic integrator encodes the numerical solution ofy1(t) which is used to solve the second differential equationin (8) Thus the stochastic integrator can be viewed as anunbiased Euler solution estimator [54] The results producedby the dynamic stochastic circuit is shown in Fig 20 8-bitupdown counters are used in both stochastic integrators andthe RNG is implemented by a Sobol sequence generator [47]Since the results are provided by the counter in the stochasticintegrator an explicit reconstruction unit as shown in Fig 12is not required

A partial differential equation such as a Laplacersquos equationcan also be solved by an array of stochastic integrators witha finite-difference method

C Weight update for an adaptive filterDSC can be applied to update the weights in an adaptive

digital filter An adaptive filter system consists of a linear filterand a weight update component as shown in Fig 21 Theoutput of the linear filter at time i is given by

yi = F (wixi) = wixi =sumMminus1

j=0wj

iximinusM+j+1 (9)

+

-

y(t)

1

z(t)

times2

+

-

Y

+

-

+

-

y1(t)

y2(t)

p=05Y1

Y2

(a) (b)

p=05(times2)

bi

ai

+

-

y(t)

1

z(t)

times2

+

-

+

-

+

-

y1(t)

y2(t)

p=05

p=05(times2)

bi

ai

+

-

1

0+

-0+

-0

16

y(t)= t3

y(t)=t

y(t)= t212

Fig 19 A pair of stochastic integrators solving (8) [54]

0 2 4 6 8 10 120

05

1

y 1(t

)

DSCAnalytical solution

0 2 4 6 8 10 12t

0

05

1

y 2(t

)Fig 20 Simulation results of the ODE solver in Fig 19 Theyare compared with the analytical solutions

where M is the length of the filter or the number of weightswi is a vector of M weights wi = [w

(0)i w

(1)i w

(Mminus1)i ]

and xi is the input vector for the digital signal sampled atdifferent time points xi = [ximinusM+1 ximinusM+2 xi]

T

+

-

y(xi)

F(wi xi)

F(wi xi) wi

ti

SNGpart F(wi xi)

part wijSNG

SNG

ai

bi

EEEE

Linear filter with weights wi

LMS weight update unit

xi-M+1xi-M+2

xi

hellip

yi +- ti+

ei

xi

+ -

xi-M+j

wij

xixi-M+1

+ -

yi

wi1

ti

RNG

+ -

wiM

RNG

hellip hellip

hellip hellip

hellip hellip

Comparator

+ - + - + -

wijwi1 wiM

xi-M+j+1 xixi-M+1

yi

ti

RNG

RNGhellip hellip

hellip hellip

hellip hellip

Comparator

Linear filter with parameters wi

Δwi

LMS weight updating unit

hellip

yi +- ti+

ei

hellip

Δwi

(Ci)

wij[Pi] = [12nCi] =

Fig 21 An adaptive filter

In the least-mean-square (LMS) algorithm the cost functionis the mean squared error between the filter output and theguidance signal ti The weights are updated by

wi+1 = wi + ηxi(ti minus yi) (10)

where η is the step size The update is implemented by thecircuit in Fig 22 The two multipliers are used to computexiti and xiyi respectively The integrator is then used toaccumulate ηxi(ti minus yi) over i = 0 1

The dynamic stochastic circuit is used to perform systemidentification for a high pass 103-tap FIR filter 103 weightsare trained by using randomly generated samples with theLMS algorithm 14-bit stochastic integrators are used Afteraround 340k iterations of training the misalignment of the

Actual classification results

Trained parameters

-

+

1 6 8 0

Estimate classification results (distribution)

SNGarray

Images

10 classes

00000hellip

11010hellip

00010hellip

Trained parameters

-

+

1 6 8 0DSNGarray

10 classes

times

times

ti

yi

xi

Fig 22 A DSC circuit training a weight in an adaptive filterThe stochastic multipliers are denoted by rectangles with ldquotimesrdquoBipolar and sign-magnitude representations can be used for thestochastic multipliers and integrator

weights in the adaptive filter converges to about -45 dBcompared to the target system The frequency responses ofthe target and the trained filter are shown in Fig 23 As canbe seen the adaptive filter has a similar frequency responseto the target system especially in the passband indicating anaccurate identification result

0 05 1 15 2 25 3Normalized frequency

-80

-60

-40

-20

0

Mag

nitu

de in

dB

target systemadaptive filter

Fig 23 Frequency responses of the target and trained filterusing the sign-magnitude representation

D A gradient descent (GD) circuit

GD is a simple yet effective optimization algorithm It findsthe local minima by iteratively taking steps along the negativesof the gradients at current points One-step GD is given by

wi+1 = wi minus ηnablaF (wi) (11)

where F (wi) is the function to be minimized at the ith stepnablaF (wi) denotes the current gradient and η is the step sizeThe step size is an indicator of how fast the model learns Asmall step size leads to a slow convergence while a large onecan lead to divergence

Similar to the implementation of the Euler method let apair of differential DSSrsquos encode the gradient minusnablaF (wi)the stochastic integrator can be used to compute (11) with astep size η = 12n by using these DSSrsquos as inputs DSC canthen be used to perform the optimization of the weights in aneural network (NN) by minimizing the cost function

The training of an NN usually involves two phases Take afully connected NN as an example shown in Fig 24 In theforward propagation the sum of product of the input vector xand the weight vector w is obtained as v = wx The output

Inputs Hidden layers

Outputs

(a)

hellip

x

w o=f(wx)

hellip

(b)

(c)

wl+1

δl+1o wl

hellip

δwl = f`wl+1δl+1

hellip

Random number generator (RNG)

p

Stochastic sequence with probability p

0101100hellip

Comparator

B

AAltB

N

N

(a)

CLK

01100110hellip

p1=05

10101010hellip

p2=05

00100010hellip

p=p1p2=025

(a) A stochastic multiplier for the unipolar representation

0100010010010000hellip

1000100010001000hellip0011001111100111hellip

times2-1= -05 4

16p1=

times2-1= -05 416

p2=

times2-1= 025 1016

p1p2=

(b) A stochastic multiplier for the bipolar representation

01100110hellip

|p1|=05

10101010hellip

|p2|=05

00100010hellip

|p|=|p1||p2|=025

(c) A stochastic multiplier for the sign-magnitude representation

|p1|=05

Sign(p1)=1(negative)

Sign(p2)=0(positve)

Sign(p)=1(negative)

Fig 24 (a) A fully connected NN in which the neuronsare organized layer by layer (b) The computation duringthe forward propagation (c) The local gradient for wl iscomputed by using the backward propagation The output ofrom previous neuron is required to compute the gradient

of the neuron o is then computed by o = f(v) where f isthe activation function During this phase the weights do notchange In the backward propagation the difference betweenthe actual and the desired final outputs ie the classificationresults is calculated and the local gradients are computed foreach neuron based on the difference [55] Take the neuronin Fig 24(c) as an example the local gradient is obtainedby δl = f prime(v)wl+1δl+1 where f prime(middot) is the derivative of theactivation function wl+1 is the weight vector connecting layerl and l+1 and δl+1 is the local gradient vector of the neuronsin layer l + 1 Then the gradient of a weight at layer l iscomputed by multiplying the local gradient with the recordedoutput of the designated neuron ie nablaF (wl) = minusoδl

In [56] it is found that the local gradient can be decomposedinto two parts δ = δ+ minus δminus where δ+ is contributed bythe desired output of the NN and δminus by the actual outputof the NN So the gradient for weight w can be rewritten asnablaF (w) = minuso(δ+minusδminus) Similar to the implementation of (10)the DSC circuit in Fig 22 can be used to implement the GDalgorithm to perform the online training (one sample at a time)of the fully connected NN with the input signals δ+ o and δminusHowever a backpropagation ie δ = f prime(v)wl+1δl+1 is stillcomputed by using a conventional method (eg a fixed-pointarithmetic circuit) to obtain the local gradients Otherwisethe accuracy loss would be too much for the algorithm toconverge

Fig 25 shows the convergence of cross entropy (as a costfunction) and testing accuracy for the training of a 784-128-128-10 NN with the MNIST handwritten digit dataset As canbe seen the testing accuracy of the NN trained by the DSCcircuit using the sign-magnitude representation is similar to theone produced by the fixed-point implementation (around 97)However the accuracy of the DSC implementation using thebipolar representation is relatively low

The proposed circuitry can be useful for online learningwhere real-time interaction with the environment is requiredwith an energy constraint [57] It can also be used to train amachine learning model using private or security-critical dataon mobile devices if data are sensitive and cannot be uploaded

to a cloud computer

V ERROR REDUCTION AND ASSESSMENT

One major issue in SC is the loss of accuracy [58] Indepen-dent RNGs have been used to avoid correlation and improvethe accuracy of components such as stochastic multipliersThis method inevitably increases the hardware overheadHowever in a stochastic integrator the sharing of RNGs forgenerating the two input sequences reduces the variance thusreducing the error This is shown as follows

The error introduced by DSC at each step of computation isrelated to the variance of the difference of the two accumulatedstochastic bits AiminusBi When independent RNGs are used togenerate Ai and Bi the variance at a single step is given by

Var[Ai minusBi] = E[(Ai minusBi minus E(Ai minusBi))2]

= E[(Ai minusBi)2]minus 2E[Ai minusBi]

(PAi minus PBi) + (PAi minus PBi)2

(12)

where PAi and PBi are the probabilities of Ai and Bi beinglsquo1rsquo respectively ie E[Ai] = PAi and E[Bi] = PBi Since(AiminusBi)

2 is 0 when Ai and Bi are equal or 1 when they aredifferent E[(AiminusBi)

2] equals PAi(1minusPBi)+PBi(1minusPAi)which is the probability that Ai and Bi are different when Ai

and Bi are independently generated1 (12) is then rewritten as

Varind[Ai minusBi] = PAi(1minus PBi) + PBi(1minus PAi)

minus (PAi minus PBi)2

= PAi(1minus PAi) + PBi(1minus PBi)

(13)

On the other hand if the same RNG is used to generate Ai

and Bi the variance of Ai minusBi can be derived from (12) aswell However since only a shared random number is used togenerate Ai and Bi E[(AiminusBi)

2] equals |PAiminusPBi| which

1E[(AiminusBi)2] = Prob[(AiminusBi)

2 = 1]times1+Prob[(AiminusBi)2 = 0]times0

0 5 10 15 20Epoch

0

01

02

03

04

05

Cro

ss e

ntro

py

Signed DSCBipolar DSCFixed-point

(a)

0 5 10 15 20Epoch

88

90

92

94

96

98

Test

ing

accu

racy

(in

perc

enta

ge)

Signed DSCBipolar DSCFixed-point

(b)

Fig 25 (a) Convergence of cross entropy and (b) testingaccuracy of MNIST hand-written digit recognition datasetCross entropy is the cost function to be minimized during thetraining of an NN The weights of the NN are updated by theDSC circuits The DSC circuits using the sign-magnitude andbipolar representations and the fixed-point implementation areconsidered for comparison

gives the probability that Ai and Bi are different Thus (12)can be rewritten as

Varshare[Ai minusBi] = |PAi minus PBi| minus (PAi minus PBi)2

= |PAi minus PBi|(1minus |PAi minus PBi|)(14)

Since Varshare[Ai minus Bi] minus Varind[Ai minus Bi] = 2PBiPAi minus2min(PAi PBi) le 0 we obtain Varshare[Ai minus Bi] leVarind[AiminusBi] for any i = 0 1 2 [54] Therefore sharingthe use of RNGs to generate input stochastic bit streamsimproves the accuracy

For the circuit in Fig 22 though the inputs of the stochasticintegrator are the outputs of the stochastic multipliers insteadof being directly generated by a DSNG the RNG sharingscheme still works in the generated DSSrsquos encoding δ+ andδminus ie it produces a smaller variance in the computed result[56] However an uncorrelated RNG must be used to generatethe DSS encoding the recorded output of the neuron obecause independency is still required in general for accuratestochastic multiplication

The accuracy of the accumulated result is also related to thebit-width of the counter in the stochastic integrator The bitwidth does not only determine the resolution of the result butalso the step size especially in the ODE solver adaptive filterand the gradient descent circuit A larger bit width indicatesa smaller resolution and a smaller step size for fine tuning atthe cost of extra computation time It is shown in [56] that themulti-step variance bound exponentially decreases with the bitwidth of the counter However for the IIR filter the changeof bit-width affects the characteristic of the filter

For the ODE solver circuit it is also found that the use ofquasirandom numbers improves the accuracy compared to theuse of pseudorandom numbers [54] However when the DSS isused to encode signals from the input or images such as in theadaptive filter and the gradient descent algorithm quasirandomand pseudorandom numbers produce similar accuracy Thereason may lie in the fact that the values of the encodedsignals in these two applications are independently selectedfrom the training dataset However for the ODE solver thesignals encoded are samples from continuous functions andadjacent samples have similar values As a result withina short segment of the DSS in this case it approximatelyencodes the same value It then works similarly as CSCfor which the Sobol sequence helps improving the accuracyFig 26 compares the convergence of misalignment duringthe training of the adaptive filter using pseudorandom andquasirandom numbers 16-bit stochastic integrators are usedin this experiment The misalignment is given by

Misalignment =E[(w minus w)2]

E[w2] (15)

where w denotes the target value and w is the trained resultIt can be seen that the convergence speed of using these twotypes of random numbers is similar and their misalignmentsboth converge to around -52 dB

TABLE II Hardware evaluation of the gradient descent circuitarray training a 784-128-128-10 neural network

Metrics DSC Fixed-point Ratio

Step size 2minus10 2minus10 -Epochs 20 20 -Min time (ns) 16times 106 47times 106 13EPO (fJ) 12times 107 11times 108 192TPA (imagesmicrom2) 57times 101 15 381Aver test Accu 9704 9749 -

TABLE III Hardware performance comparison of the ODEsolvers for (8)

Metric DSC Fixed-point Ratio

Step size 2minus8 2minus8 -Iterations 3217 3217 -Min time (ns) 257359 855720 133EPO (fJ) 20121 46600 123TPA (wordmicrosmicrom2) 475 058 821RMSE 47times 10minus3 36times 10minus3 -

VI HARDWARE EFFICIENCY EVALUATION

Due to the extremely low sequence length used for encodinga value in DSC the power consumption of DSC circuits ismuch lower than conventional arithmetic circuits Moreoversince each bit in a DSS is used to encode a number DSCis more energy-efficient than CSC The gradient descent andODE solver circuits are considered as examples to show thehardware efficiency Implemented in VHDL the circuits aresynthesized by Synopsys Design Compiler with a 28-nm tech-nology to obtain the power consumption maximum frequencyand hardware measurements The minimum runtime energyper operation (EPO) throughput per area (TPA) and accuracyare reported in Table II for the gradient descent circuits andin Table III for ODE solvers

As can be seen with a slightly degraded accuracy theDSC circuits outperform conventional circuits using a fixed-point representation in speed energy efficiency and hardwareefficiency For the gradient descent circuit since the two RNGscan be shared among the 118282 stochastic integrators (for118282 weights) the improvement in hardware efficiency iseven more significant than the ODE solver for which one RNG

0 05 1 15 2 25 3Number of steps ( 64 input samples) 104

-60

-50

-40

-30

-20

-10

0

Mis

alig

nmen

t (dB

)

PseudorandomQuasirandom

Fig 26 Comparison of the convergence of misalignment usingpseudorandom and quasirandom numbers

is shared between two stochastic integrators However sincethe sharing of RNGs does not significantly affect the criticalpath the performance improvements are similar for these twocircuits

VII CONCLUSION

In this article a new type of stochastic computing dynamicstochastic computing (DSC) is introduced to encode a varyingsignal by using a dynamically variable binary bit streamreferred to as a dynamic stochastic sequence (DSS) A DSCcircuit can efficiently implement an iterative accumulation-based algorithm It is useful in implementing the Euler methodtraining a neural network and an adaptive filter and IIRfiltering In those applications integrations are implementedby stochastic integrators that accumulate the stochastic bits in aDSS Since the computation is performed on single stochasticbits instead of conventional binary values or long stochasticsequences a significant improvement in hardware efficiencyis achieved with a high accuracy

ACKNOWLEDGMENT

This work was supported by the Natural Sciences andEngineering Research Council of Canada (NSERC) underProjects RES0025211 and RES0048688

REFERENCES

[1] B R Gaines Stochastic Computing Systems Boston MA SpringerUS 1969 pp 37ndash172

[2] W Qian X Li M D Riedel K Bazargan and D J Lilja ldquoAnarchitecture for fault-tolerant computation with stochastic logicrdquo IEEETransactions on Computers vol 60 no 1 pp 93ndash105 2011

[3] P Li and D J Lilja ldquoUsing stochastic computing to implementdigital image processing algorithmsrdquo in 2011 IEEE 29th InternationalConference on Computer Design (ICCD) 2011 pp 154ndash161

[4] S C Smithson N Onizawa B H Meyer W J Gross and T HanyuldquoEfficient CMOS invertible logic using stochastic computingrdquo IEEETransactions on Circuits and Systems I Regular Papers vol 66 no 6pp 2263ndash2274 June 2019

[5] W Maass and C M Bishop Pulsed neural networks MIT press 2001[6] Y Liu S Liu Y Wang F Lombardi and J Han ldquoA stochastic

computational multi-layer perceptron with backward propagationrdquo IEEETransactions on Computers vol 67 no 9 pp 1273ndash1286 2018

[7] J Zhao J Shawe-Taylor and M van Daalen ldquoLearning in stochastic bitstream neural networksrdquo Neural Networks vol 9 no 6 pp 991ndash9981996

[8] B D Brown and H C Card ldquoStochastic neural computation Icomputational elementsrdquo IEEE Transactions on Computers vol 50no 9 pp 891ndash905 2001

[9] D Zhang and H Li ldquoA stochastic-based FPGA controller for aninduction motor drive with integrated neural network algorithmsrdquo IEEETransactions on Industrial Electronics vol 55 no 2 pp 551ndash561 2008

[10] Y Ji F Ran C Ma and D J Lilja ldquoA hardware implementationof a radial basis function neural network using stochastic logicrdquo inProceedings of the Design Automation amp Test in Europe Conference(DATE) 2015 pp 880ndash883

[11] V Canals A Morro A Oliver M L Alomar and J L RosselloldquoA new stochastic computing methodology for efficient neural networkimplementationrdquo IEEE Transactions on Neural Networks and LearningSystems vol 27 no 3 pp 551ndash564 2016

[12] A Ardakani F Leduc-Primeau N Onizawa T Hanyu and W J GrossldquoVLSI implementation of deep neural network using integral stochasticcomputingrdquo IEEE Transactions on Very Large Scale Integration (VLSI)Systems vol 25 no 10 pp 2688ndash2699 2017

[13] A Ren Z Li C Ding Q Qiu Y Wang J Li X Qian and B YuanldquoSC-DCNN Highly-scalable deep convolutional neural network us-ing stochastic computingrdquo in Proceedings of the 22nd InternationalConference on Architectural Support for Programming Languages andOperating Systems (ASPLOS) 2017 pp 405ndash418

[14] V T Lee A Alaghi J P Hayes V Sathe and L Ceze ldquoEnergy-efficienthybrid stochastic-binary neural networks for near-sensor computingrdquo inDesign Automation amp Test in Europe Conference amp Exhibition (DATE)2017 pp 13ndash18

[15] R Hojabr K Givaki S Tayaranian P Esfahanian A Khonsari D Rah-mati and M H Najafi ldquoSkippyNN An embedded stochastic-computingaccelerator for convolutional neural networksrdquo in Proceedings of the56th Annual Design Automation Conference (DAC) 2019 pp 1ndash6

[16] Y Liu L Liu F Lombardi and J Han ldquoAn energy-efficient and noise-tolerant recurrent neural network using stochastic computingrdquo IEEETransactions on Very Large Scale Integration (VLSI) Systems vol 27no 9 pp 2213ndash2221 Sep 2019

[17] R Cai A Ren O Chen N Liu C Ding X Qian J Han W LuoN Yoshikawa and Y Wang ldquoA stochastic-computing based deeplearning framework using adiabatic quantum-flux-parametron supercon-ducting technologyrdquo in Proceedings of the 46th International Symposiumon Computer Architecture ser ISCA rsquo19 New York NY USA ACM2019 pp 567ndash578

[18] A Alaghi and J P Hayes ldquoSurvey of stochastic computingrdquo ACMTrans Embed Comput Syst vol 12 no 2s pp 921ndash9219 May 2013

[19] A Zhakatayev S Lee H Sim and J Lee ldquoSign-magnitude SC getting10x accuracy for free in stochastic computing for deep neural networksrdquoin Proceedings of the 55th Annual Design Automation Conference(DAC) 2018 pp 1ndash6

[20] P K Gupta and R Kumaresan ldquoBinary multiplication with PN se-quencesrdquo IEEE Transactions on Acoustics Speech and Signal Process-ing vol 36 no 4 pp 603ndash606 1988

[21] A Alaghi and J P Hayes ldquoFast and accurate computation using stochas-tic circuitsrdquo in Proceedings of the conference on Design Automation ampTest in Europe (DATE) 2014 pp 1ndash4

[22] S Liu and J Han ldquoEnergy efficient stochastic computing with Sobolsequencesrdquo in Design Automation amp Test in Europe Conference ampExhibition (DATE) 2017 pp 650ndash653

[23] H Niederreiter Random Number Generation and quasi-Monte CarloMethods Philadelphia PA USA Society for Industrial and AppliedMathematics 1992

[24] P Bratley and B L Fox ldquoAlgorithm 659 Implementing Sobolrsquosquasirandom sequence generatorrdquo ACM Transactions on MathematicalSoftware (TOMS) vol 14 no 1 pp 88ndash100 1988

[25] S Liu and J Han ldquoToward energy-efficient stochastic circuits usingparallel sobol sequencesrdquo IEEE Transactions on Very Large ScaleIntegration (VLSI) Systems vol 26 no 7 pp 1326ndash1339 July 2018

[26] A Alaghi and J P Hayes ldquoSTRAUSS Spectral transform use instochastic circuit synthesisrdquo IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems vol 34 no 11 pp 1770ndash1783 2015

[27] P Li D J Lilja W Qian K Bazargan and M Riedel ldquoThe synthesisof complex arithmetic computation on stochastic bit streams usingsequential logicrdquo in Proceedings of the International Conference onComputer-Aided Design (ICCAD) 2012 pp 480ndash487

[28] P Mars and W J Poppelbaum Stochastic and deterministic averagingprocessors Peter Peregrinus Press 1981 no 1

[29] Z Li A Ren J Li Q Qiu Y Wang and B Yuan ldquoDSCNN Hardware-oriented optimization for stochastic computing based deep convolutionalneural networksrdquo in IEEE 34th International Conference on ComputerDesign (ICCD) 2016 pp 678ndash681

[30] A Ren Z Li Y Wang Q Qiu and B Yuan ldquoDesigning reconfigurablelarge-scale deep learning systems using stochastic computingrdquo in IEEEInternational Conference on Rebooting Computing (ICRC) 2016 pp1ndash7

[31] K Kim J Kim J Yu J Seo J Lee and K Choi ldquoDynamic energy-accuracy trade-off using stochastic computing in deep neural networksrdquoin Proceedings of the 53rd Annual Design Automation Conference(DAC) 2016 pp 1241ndash1246

[32] H Sim and J Lee ldquoA new stochastic computing multiplier withapplication to deep convolutional neural networksrdquo in Proceedings ofthe 54th Annual Design Automation Conference (DAC) 2017 pp 1ndash6

[33] Y Liu Y Wang F Lombardi and J Han ldquoAn energy-efficient online-learning stochastic computational deep belief networkrdquo IEEE Journal

on Emerging and Selected Topics in Circuits and Systems vol 8 no 3pp 454ndash465 Sep 2018

[34] J F Keane and L E Atlas ldquoImpulses and stochastic arithmetic forsignal processingrdquo in IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP) vol 2 2001 pp 1257ndash1260

[35] R Kuehnel ldquoBinomial logic extending stochastic computing to high-bandwidth signalsrdquo in Conference Record of the Thirty-Sixth AsilomarConference on Signals Systems and Computers vol 2 2002 pp 1089ndash1093

[36] N Yamamoto H Fujisaka K Haeiwa and T Kamio ldquoNanoelectroniccircuits for stochastic computingrdquo in 6th IEEE Conference on Nanotech-nology (IEEE-NANO) vol 1 2006 pp 306ndash309

[37] K K Parhi and Y Liu ldquoArchitectures for IIR digital filters usingstochastic computingrdquo in IEEE International Symposium on Circuitsand Systems (ISCAS) 2014 pp 373ndash376

[38] B Yuan Y Wang and Z Wang ldquoArea-efficient scaling-free DFTFFTdesign using stochastic computingrdquo IEEE Transactions on Circuits andSystems II Express Briefs vol 63 no 12 pp 1131ndash1135 2016

[39] R Wang J Han B F Cockburn and D G Elliott ldquoStochastic circuitdesign and performance evaluation of vector quantization for differenterror measuresrdquo IEEE Transactions on Very Large Scale Integration(VLSI) Systems vol 24 no 10 pp 3169ndash3183 2016

[40] H Ichihara T Sugino S Ishii T Iwagaki and T Inoue ldquoCompactand accurate digital filters based on stochastic computingrdquo IEEE Trans-actions on Emerging Topics in Computing vol 7 no 1 pp 31ndash432016

[41] T Hammadou M Nilson A Bermak and P Ogunbona ldquoA 96spltimes64 intelligent digital pixel array with extended binary stochasticarithmeticrdquo in Proceedings of the International Symposium on Circuitsand Systems (ISCAS) 2003

[42] A Alaghi C Li and J P Hayes ldquoStochastic circuits for real-timeimage-processing applicationsrdquo in Proceedings of the 50th AnnualDesign Automation Conference (DAC) 2013 pp 1361ndash1366

[43] P Li D J Lilja W Qian K Bazargan and M D Riedel ldquoComputationon stochastic bit streams digital image processing case studiesrdquo IEEETransactions on Very Large Scale Integration (VLSI) Systems vol 22no 3 pp 449ndash462 2014

[44] M H Najafi S Jamali-Zavareh D J Lilja M D Riedel K Bazarganand R Harjani ldquoTime-encoded values for highly efficient stochasticcircuitsrdquo IEEE Transactions on Very Large Scale Integration (VLSI)Systems vol 25 no 5 pp 1644ndash1657 2017

[45] S R Faraji M Hassan Najafi B Li D J Lilja and K BazarganldquoEnergy-efficient convolutional neural networks with deterministic bit-stream processingrdquo in 2019 Design Automation Test in Europe Confer-ence Exhibition (DATE) 2019 pp 1757ndash1762

[46] S Liu and J Han ldquoDynamic stochastic computing for digital signalprocessing applicationsrdquo in Design Automation amp Test in EuropeConference amp Exhibition (DATE) 2020

[47] I L Dalal D Stefan and J Harwayne-Gidansky ldquoLow discrepancysequences for Monte Carlo simulations on reconfigurable platformsrdquo inInternational Conference on Application-Specific Systems Architecturesand Processors (ASAP) 2008 pp 108ndash113

[48] W G McCallum D Hughes-Hallett A M Gleason D O LomenD Lovelock J Tecosky-Feldman T W Tucker D Flath T ThrashK R Rhea et al Multivariable calculus Wiley 1997 vol 200 no 1

[49] S S Tehrani A Naderi G-A Kamendje S Hemati S Mannor andW J Gross ldquoMajority-based tracking forecast memories for stochasticLDPC decodingrdquo IEEE Transactions on Signal Processing vol 58no 9 pp 4883ndash4896 2010

[50] N Saraf K Bazargan D J Lilja and M D Riedel ldquoIIR filtersusing stochastic arithmeticrdquo in Design Automation amp Test in EuropeConference amp Exhibition (DATE) 2014 pp 1ndash6

[51] F Maloberti ldquoNon conventional signal processing by the use of sigmadelta technique a tutorial introductionrdquo in Proceedings of IEEE Inter-national Symposium on Circuits and Systems (ISCAS) vol 6 1992 pp2645ndash2648

[52] J C Butcher The Numerical Analysis of Ordinary Differential Equa-tions Runge-Kutta and General Linear Methods USA Wiley-Interscience 1987

[53] W Gerstner W M Kistler R Naud and L Paninski Neuronaldynamics From single neurons to networks and models of cognitionCambridge University Press 2014

[54] S Liu and J Han ldquoHardware ODE solvers using stochastic circuitsrdquo inProceedings of the 54th Annual Design Automation Conference (DAC)2017 pp 1ndash6

[55] S S Haykin Neural networks and learning machines Pearson UpperSaddle River NJ USA 2009 vol 3

[56] S Liu H Jiang L Liu and J Han ldquoGradient descent using stochasticcircuits for efficient training of learning machinesrdquo IEEE Transactionson Computer-Aided Design of Integrated Circuits and Systems vol 37no 11 pp 2530ndash2541 Nov 2018

[57] R Nishihara P Moritz S Wang A Tumanov W Paul J Schleier-Smith R Liaw M Niknami M I Jordan and I StoicaldquoReal-time machine learning The missing piecesrdquo in Proceedingsof the 16th Workshop on Hot Topics in Operating Systemsser HotOS rsquo17 ACM 2017 pp 106ndash110 [Online] Availablehttpdoiacmorg10114531029803102998

[58] J P Hayes ldquoIntroduction to stochastic computing and its challengesrdquo inProceedings of the 52nd Annual Design Automation Conference (DAC)2015 pp 591ndash593

Page 4: Introduction to Dynamic Stochastic Computing

Rate = average over pool of equivalent neurons(several neuron single runs)

j=1

2

3

hellip

A =1Δt M

n(t t+Δt)M

Δt

j=1

2

3

M

hellip

01010

11000

00010

01000

p =M

N

L

010100101010110100

A =Δt

n (t t+Δt)

p =L

N

Δt

L

N number of lsquo1rsquos in the sequence

n() number of spikes in the time window

2n-state Counter INC

DEC

RNG

gt S

A

B

comparator

C

ri

C i+1 = C i + Ai - Bi

if Ci gt ri S = 1

else S = 0

p encoded value

A encoded value

(a) A stochastic integrator

S+

-

CA

B

(b) A symbol

Synapse

Dendrites

Axon

Soma(cell)

Membrane

VM (t+1) = VM (t) + Σεij - Σιij

if VM gt Vth reset and fire

else do not fire

εij Excitatory inputs of the neuron

ιij Inhibitory inputs of the neuron

VM Membrane potential

Vth Threshold voltage

(c) An integrate-and-fire spiking neuron model

Fig 9 (a) A stochastic integrator (b) its symbol and (c) aspiking neuron [5]

or decrease However it may take a long time to converge tothe equilibrium state as discussed in [6] [25] A stochasticintegrator with a feedback loop has been used as an ADpativeDIgital Element (ADDIE) shown in Fig 10(c) The ADDIEcan function as a probability follower ie the probability ofthe output sequence tracks the change of the probability of theinput sequence [1] [28]

MUX

01100110hellip

p1=05

10101010hellip

p2=05

00100010hellip

p=p1p2=025

01100101hellip

p1=05

11101110hellip

p2=075

sel

psel=05

00110110hellip

0

1 p=(1-psel)p1+pselp2=0625

01100111hellip (a)

(c)

10010000hellip

00101000hellip01000111hellip times2-1= -05

416

p1=

times2-1= -05 416

p2=

times2-1= 025 1016

p1p2=

(b) A bipolar stochastic multiplier

00101000hellip 11010111hellip

p1=025 p=1-p1=075

(b)

p1

p2

p=p1p2

1

1

p1

p2

(1 1)

AltB

AltB

RN1

RN2

Counter

(b) A bipolar stochastic multiplier

Stochastic number generator (SNG)

array

Binary inputs(from memory

devices)

Stochastic sequences

Stochastic computing

circuits

Stochastic sequences

Probability estimators

(PEs)

Binary outputs

+

-

DSNGDigital signal

Bipolar stochastic

sequence encodinglsquo0rsquo

Integration results

Output DSS

MUX

+SNGarray

y

x

Stochastic sequences encoding coefficients

z0z1

zN

hellip

N copies of independent stochastic sequences that

each encodes x

SEL0

1hellip

N

hellip

A

B

+

-

+

-

A

B

AB

(a) (b)

B(A+B)

(a) (b)

hellip

k

+

-

p

p

(c)

^

Fig 10 Stochastic dividers compute (a) AB (b) B(A+B)(c) shows an ADDIE

C Probability estimatorsThe computed probability of the output sequence can be

found by using a counter It is estimated as the quotient

of the number of lsquo1rsquos divided by the sequence length TheADDIE in Fig 10(c) is an alternative component to estimatethe probability by using a stochastic integrator reaching theequilibrium state [1]

D Limitations

Although the SC circuits are much simpler than conven-tional arithmetic circuits a large sequence length is requiredto achieve a high accuracy This requirement leads to poorperformance and low energy efficiency Fig 11 shows the rootmean square error (RMSE) encoding a number in the unipolarrepresentation using different types of ldquorandomrdquo numbers Thepseudorandom numbers are generated by a 16-bit LFSR withrandomly generated seeds The RMSEs are obtained from1000 trials using different sequence lengths As shown inFig 11 to achieve an RMSE of 1times10minus2 a sequence length of212 bits is required for the LFSR generated sequences For thestochastic Sobol sequences the length is significantly reducedto 26 or 64 bits to achieve a similar accuracy although it isstill not as compact as an equivalent 5-bit fixed-point numberthat results in a similar RMSE Meanwhile low precision (egat an RMSE of 1times 10minus2 or larger) can be tolerated in manyapplications such as inference in a machine learning model[12]ndash[17] [29]ndash[33] digital signal processing [34]ndash[40] andimage processing [41]ndash[44] Hence progressive precision (PP)can be employed and the Sobol sequence length can be as lowas 8 bits in an application in a CNN inference engine [45]

0 2 4 6 8 10 12

Sequence length (2N)

10-4

10-3

10-2

10-1

100

RM

SE

Sobol sequencePseudorandom sequence

Fig 11 RMSE of the encoded numbers with different typesof stochastic sequences

Despite the fact that some of these implementations achievea lower energy consumption with PP compared with conven-tional circuits the application of conventional SC (CSC) isstill quite limited

III DYNAMIC STOCHASTIC COMPUTING

In a DSC system as shown in Fig 12 the stochasticsequences encode varying signals instead of static numbersConsider a discrete-time digital signal xi within [0 1] (i =0 1 2 ) A DSS encoding xi satisfies that E[Xi] = xiwhere Xi is the ith bit in the DSS Thus every bit in thesequence can have a different probability to be 1

Digital signals (from ADCs or

storage)

DSNGsSignal

reconstruction To be stored or converted back to

analog signals

Stochastic logicDynamic

stochastic sequences

Dynamic stochastic sequences

Fig 12 A DSC system

RNGDynamic stochastic sequence

encoding signal xi

111010001011hellipComparator

B

AAltB

xi

Fig 13 A DSNG xi is an input to the comparator at onesample per clock cycle

A DSS generation

A DSS can be generated by a dynamic SNG (DSNG) asshown in Fig 13 It is similar to a conventional SNG exceptthat the input is a varying signal instead of a static number TheRNG generates uniformly distributed random numbers within[0 1] Given a discrete signal xi as shown in Fig 13 itis compared with a random number one sample at a timeIf xi is larger than the random number a lsquo1rsquo is generatedotherwise the DSNG generates a lsquo0rsquo Generating every bit inthe DSS is a Bernoulli process (in the ideally random case)so the expectation getting a lsquo1rsquo is equal to the correspondingsample value from the discrete digital signal The discretesignal can either be sampled from an analog signal or readfrom a memory device

B Data representations in DSC

Similar to CSC all the aforementioned mapping schemescan be used to encode numbers within certain ranges in DSCwhich subsequently requires the use of different computationalelements Specifically a DSS can encode a signal within [0 1]in the unipolar representation or within [minus1 1] in the bipolaror sign-magnitude representation For the sign-magnitude rep-resentation two sequences are used to encode a signal Oneencodes the magnitude of the signal Since the signal can haveboth positive and negative values another sequence is used torepresent the signs of the numbers

C Stochastic circuits

The DSC circuits can be combinational or sequential1) Combinational circuits A combinational circuit using

DSSrsquos as its inputs implements the composite function of theoriginal stochastic function For example an AND gate or aunipolar stochastic multiplier computes zi = xiyi for eachpair of bits in the input DSSrsquos that independently encode xiand yi (i = 0 1 2 ) respectively [46] due to E[Zi] =E[XiYi] = E[Xi]E[Yi] = xiyi where Zi Xi and Yi are theith bits in the output and input DSSrsquos respectively Note thatfor a continuous signal oversampling is required to producethe discrete signals xi and yi for accurate multiplication [46]Fig 14(a) shows the DSC circuit for a signal multiplicationand (b) shows the results of multiplying two sinusoidal signals

DSNG Signal Xx(t)

DSNG

DSNG

Signal reconstruction

X

Y

x(t)

y(t)ziZ

(a)

0 02 04 06 08 1 12 14 160

05

1

x(t)

Original signalDSS

0 02 04 06 08 1 12 14 160

05

1

y(t)

0 05 1 15t (s)

0

05

1

z(t)

Double precisionDynamic stochastic

(b)

Fig 14 A frequency mixer by using a stochastic multiplierwith DSSrsquos as the inputs (a) circuit and (b) results

with frequencies of 15 Hz and 2 Hz both sampled at a rateof 210 Hz The sampled signals are converted to DSSrsquos by twoDSNGs with independent Sobol sequence generators [47] asthe RNGs The output DSS is reconstructed to a digital signalby a moving average [28] The DSS can also be reconstructedby an ADDIE [1]

2) Sequential circuits The stochastic integrator is an im-portant component used in DSC for iterative accumulation Itimplements integrations of the signal encoded by accumulatingthe stochastic bits in a DSS Fig 15 shows that a bipolarstochastic integrator can be used to perform numerical inte-gration In Fig 15(a) the input digital signal is cos(πt2)sampled at a sampling rate of 28 Hz and converted to a DSSby the DSNG The other input of the stochastic integrator isset to a bipolar stochastic sequence encoding 0 (a stochasticbit stream with half of the bits being lsquo1rsquo) By accumulatingthe input bits the stochastic integrator provides an unbiasedestimate to the Riemann sum [48] The integration is obtainedby recording the changing values of the counter Since theintegrated results are directly provided by the counter areconstruction unit is not required

As can be seen from Fig 15(b) the results produced bythe stochastic integrator is very close to the analytical resultsSince the output DSS is generated by comparing the changingsignal (as the integration results) with uniformly distributedrandom numbers the RNG and the comparator inside theintegrator works as a DSNG As a result the output sequenceencodes the discretized stochastic integration solution for theanalytical solution (2π) sin(πt2) The output DSS is alsoshown in Fig 15(b)

MUX

01100110hellip

p1=05

10101010hellip

p2=05

00100010hellip

p=p1p2=025

01100101hellip

p1=05

11101110hellip

p2=075

sel

psel=05

00110110hellip

0

1 p=(1-psel)p1+pselp2=0625

01100111hellip (a)

(c)

10010000hellip

00101000hellip01000111hellip times2-1= -05

416

p1=

times2-1= -05 416

p2=

times2-1= 025 1016

p1p2=

(b) A bipolar stochastic multiplier

00101000hellip 11010111hellip

p1=025 p=1-p1=075

(b)

p1

p2

p=p1p2

1

1

p1

p2

(1 1)

AltB

AltB

RN1

RN2

Counter

(b) A bipolar stochastic multiplier

Stochastic number generator (SNG)

array

hellip

Binary inputs(from memory

devices)

hellip

Stochastic sequences

Stochastic computing

circuits

Stochastic sequences

Probability estimators

(PEs)

hellip hellip

Binary outputs

+

-

DSNGDigital signal

Bipolar stochastic

sequence encodinglsquo0rsquo

Integration results

Output DSS

(a)

0 05 1 15 2 25 3 35 4t

-1

-05

0

05

1

Analytical resultsResults by DSCOriginal signalOutput DSS

(b)

Fig 15 A bipolar stochastic integrator for numerical integra-tion (a) circuit and (b) results The output DSS is decimatedfor a clear view

The stochastic integrator has been used in a stochastic low-density parity-check (LDPC) decoder to implement a trackingforecast memory in [49] It solves the latch-up problem in thedecoding process with a low hardware complexity

For the FSM-based sequential circuits however the DSScannot work well because the probability consistently changesand a steady hyper-state is hard to reach The DSS could alsoviolate the condition of the model being a Markov chain dueto the autocorrelation of the encoded signal

IV APPLICATION OF DSC

By using the stochastic integrators DSC can implement aseries of algorithms that involve iterative accumulations suchas an IIR filter the Euler method and a GD algorithm

A IIR filtering

In [50] it is shown that an ADDIE with different parameterscan be used as a low-pass IIR filter with different frequencyresponses as shown in Fig 16 Given the bit width of thestochastic integrator n the transfer function of such a filter inthe Z-domain is given by

H(Z) =1

2nZ + 1minus 2n (3)

for a stable first-order low-pass IIR filter [50] An oversampled∆ minus Σ modulated bit stream is then used as the input of thecircuit Since the generation of a ∆minusΣ modulated bit stream

2n-state Counter INC

DEC

RNG

gtSeqout

A

BSeqout

(a) (b)

comparator

A

B

+

-

Countout

Countout

Seqout

(b)

+

-

Countout

X

+

-

p

y(t)

Fig 16 ADDIE circuit produces an exponential function whenthe input is a sequence encoding a static number

0 01 02 03 04t (s)

(a) Original and filtered signals

-1

0

1

0 50 100 150 200 250f (Hz)

(b) Spectra of original and filtered signals

0

02

04Original signalFiltered signal

Fig 17 The low-pass filtering results in (a) the time domainand (b) the frequency domain

can also be considered as a Bernoulli process [51] the DSScan be used for IIR filtering as well

Fig 17 shows the filtering of mixed sinusoidal signals of 4Hz and 196 Hz The mixture is first sampled at a sampling rateof 656k Hz and the sampled digital signal is then convertedto a DSS The DSS is filtered by an 8-bit ADDIE As can beseen the high-frequency signals are almost filtered out whilethe low-frequency signals remain

B ODE solvers

The Euler method provides numerical solutions for ODEsby accumulating the derivative functions [52] For an ODEdydt = f(t y(t)) the estimated solution y(t) is given by

y(ti+1) = yi+1 = yn + hf(ti yi) (4)

with h as the step size and ti+1 = ti + h After k steps ofaccumulation

yk = y0 + h

kminus1sumi=0

f(ti yi) (5)

To use a stochastic integrator let the pair of input DSSrsquos(eg A and B in (2)) encode the discrete derivative functionf(ti yi) such that E[Ai minus Bi] = f(ti yi) By doing sothe stochastic integrator implements (2) and approximatelycomputes (5) with a step size h = 12n As a result the bitwidth of the counter does not only decide the precision of thecomputation but also the time resolution The integration in theEuler method is implemented by accumulating the stochasticbits in the DSSrsquos instead of real values

Using this method an exponential function can be generatedby a stochastic integrator The ADDIE circuit as shown inFig 16 solves

dy(t)

dt= pminus y(t) (6)

which is the differential equation for a leaky integratormodel [53] Thus the ADDIE can be used to implement

neuronal dynamics in a leaky integrate-and-fire model Theanalytical solution of (6) is given by

y(t) = pminus (pminus y0)eminust (7)

where y0 is provided in the initial condition ie y(0) = y0This model is commonly seen in natural processes such asneuronal dynamics [53] Fig 18 shows the results producedby the ADDIE when y0 = 1 and p = 0

0 2 4 6 8t

0

02

04

06

08

1

y(t)

DSCAnalytical

Fig 18 ADDIE used as an ODE solver The solution is anexponential function

For a set of ODEsdy1(t)

dt = y2(t)minus 05dy2(t)

dt = 05minus y1(t)(8)

with the initial values y1(0) = 0 and y2(0) = 05 a numericalsolution can be found by the circuit in Fig 19 The upperintegrator computes the numerical estimate of y1(t) by takingthe output DSS from the lower integrator and a conventionalstochastic sequence encoding 05 as inputs So the differenceof the two input sequences approximately encodes y2(t)minus05the derivative of y1(t) Simultaneously the output DSS of theupper stochastic integrator encodes the numerical solution ofy1(t) which is used to solve the second differential equationin (8) Thus the stochastic integrator can be viewed as anunbiased Euler solution estimator [54] The results producedby the dynamic stochastic circuit is shown in Fig 20 8-bitupdown counters are used in both stochastic integrators andthe RNG is implemented by a Sobol sequence generator [47]Since the results are provided by the counter in the stochasticintegrator an explicit reconstruction unit as shown in Fig 12is not required

A partial differential equation such as a Laplacersquos equationcan also be solved by an array of stochastic integrators witha finite-difference method

C Weight update for an adaptive filterDSC can be applied to update the weights in an adaptive

digital filter An adaptive filter system consists of a linear filterand a weight update component as shown in Fig 21 Theoutput of the linear filter at time i is given by

yi = F (wixi) = wixi =sumMminus1

j=0wj

iximinusM+j+1 (9)

+

-

y(t)

1

z(t)

times2

+

-

Y

+

-

+

-

y1(t)

y2(t)

p=05Y1

Y2

(a) (b)

p=05(times2)

bi

ai

+

-

y(t)

1

z(t)

times2

+

-

+

-

+

-

y1(t)

y2(t)

p=05

p=05(times2)

bi

ai

+

-

1

0+

-0+

-0

16

y(t)= t3

y(t)=t

y(t)= t212

Fig 19 A pair of stochastic integrators solving (8) [54]

0 2 4 6 8 10 120

05

1

y 1(t

)

DSCAnalytical solution

0 2 4 6 8 10 12t

0

05

1

y 2(t

)Fig 20 Simulation results of the ODE solver in Fig 19 Theyare compared with the analytical solutions

where M is the length of the filter or the number of weightswi is a vector of M weights wi = [w

(0)i w

(1)i w

(Mminus1)i ]

and xi is the input vector for the digital signal sampled atdifferent time points xi = [ximinusM+1 ximinusM+2 xi]

T

+

-

y(xi)

F(wi xi)

F(wi xi) wi

ti

SNGpart F(wi xi)

part wijSNG

SNG

ai

bi

EEEE

Linear filter with weights wi

LMS weight update unit

xi-M+1xi-M+2

xi

hellip

yi +- ti+

ei

xi

+ -

xi-M+j

wij

xixi-M+1

+ -

yi

wi1

ti

RNG

+ -

wiM

RNG

hellip hellip

hellip hellip

hellip hellip

Comparator

+ - + - + -

wijwi1 wiM

xi-M+j+1 xixi-M+1

yi

ti

RNG

RNGhellip hellip

hellip hellip

hellip hellip

Comparator

Linear filter with parameters wi

Δwi

LMS weight updating unit

hellip

yi +- ti+

ei

hellip

Δwi

(Ci)

wij[Pi] = [12nCi] =

Fig 21 An adaptive filter

In the least-mean-square (LMS) algorithm the cost functionis the mean squared error between the filter output and theguidance signal ti The weights are updated by

wi+1 = wi + ηxi(ti minus yi) (10)

where η is the step size The update is implemented by thecircuit in Fig 22 The two multipliers are used to computexiti and xiyi respectively The integrator is then used toaccumulate ηxi(ti minus yi) over i = 0 1

The dynamic stochastic circuit is used to perform systemidentification for a high pass 103-tap FIR filter 103 weightsare trained by using randomly generated samples with theLMS algorithm 14-bit stochastic integrators are used Afteraround 340k iterations of training the misalignment of the

Actual classification results

Trained parameters

-

+

1 6 8 0

Estimate classification results (distribution)

SNGarray

Images

10 classes

00000hellip

11010hellip

00010hellip

Trained parameters

-

+

1 6 8 0DSNGarray

10 classes

times

times

ti

yi

xi

Fig 22 A DSC circuit training a weight in an adaptive filterThe stochastic multipliers are denoted by rectangles with ldquotimesrdquoBipolar and sign-magnitude representations can be used for thestochastic multipliers and integrator

weights in the adaptive filter converges to about -45 dBcompared to the target system The frequency responses ofthe target and the trained filter are shown in Fig 23 As canbe seen the adaptive filter has a similar frequency responseto the target system especially in the passband indicating anaccurate identification result

0 05 1 15 2 25 3Normalized frequency

-80

-60

-40

-20

0

Mag

nitu

de in

dB

target systemadaptive filter

Fig 23 Frequency responses of the target and trained filterusing the sign-magnitude representation

D A gradient descent (GD) circuit

GD is a simple yet effective optimization algorithm It findsthe local minima by iteratively taking steps along the negativesof the gradients at current points One-step GD is given by

wi+1 = wi minus ηnablaF (wi) (11)

where F (wi) is the function to be minimized at the ith stepnablaF (wi) denotes the current gradient and η is the step sizeThe step size is an indicator of how fast the model learns Asmall step size leads to a slow convergence while a large onecan lead to divergence

Similar to the implementation of the Euler method let apair of differential DSSrsquos encode the gradient minusnablaF (wi)the stochastic integrator can be used to compute (11) with astep size η = 12n by using these DSSrsquos as inputs DSC canthen be used to perform the optimization of the weights in aneural network (NN) by minimizing the cost function

The training of an NN usually involves two phases Take afully connected NN as an example shown in Fig 24 In theforward propagation the sum of product of the input vector xand the weight vector w is obtained as v = wx The output

Inputs Hidden layers

Outputs

(a)

hellip

x

w o=f(wx)

hellip

(b)

(c)

wl+1

δl+1o wl

hellip

δwl = f`wl+1δl+1

hellip

Random number generator (RNG)

p

Stochastic sequence with probability p

0101100hellip

Comparator

B

AAltB

N

N

(a)

CLK

01100110hellip

p1=05

10101010hellip

p2=05

00100010hellip

p=p1p2=025

(a) A stochastic multiplier for the unipolar representation

0100010010010000hellip

1000100010001000hellip0011001111100111hellip

times2-1= -05 4

16p1=

times2-1= -05 416

p2=

times2-1= 025 1016

p1p2=

(b) A stochastic multiplier for the bipolar representation

01100110hellip

|p1|=05

10101010hellip

|p2|=05

00100010hellip

|p|=|p1||p2|=025

(c) A stochastic multiplier for the sign-magnitude representation

|p1|=05

Sign(p1)=1(negative)

Sign(p2)=0(positve)

Sign(p)=1(negative)

Fig 24 (a) A fully connected NN in which the neuronsare organized layer by layer (b) The computation duringthe forward propagation (c) The local gradient for wl iscomputed by using the backward propagation The output ofrom previous neuron is required to compute the gradient

of the neuron o is then computed by o = f(v) where f isthe activation function During this phase the weights do notchange In the backward propagation the difference betweenthe actual and the desired final outputs ie the classificationresults is calculated and the local gradients are computed foreach neuron based on the difference [55] Take the neuronin Fig 24(c) as an example the local gradient is obtainedby δl = f prime(v)wl+1δl+1 where f prime(middot) is the derivative of theactivation function wl+1 is the weight vector connecting layerl and l+1 and δl+1 is the local gradient vector of the neuronsin layer l + 1 Then the gradient of a weight at layer l iscomputed by multiplying the local gradient with the recordedoutput of the designated neuron ie nablaF (wl) = minusoδl

In [56] it is found that the local gradient can be decomposedinto two parts δ = δ+ minus δminus where δ+ is contributed bythe desired output of the NN and δminus by the actual outputof the NN So the gradient for weight w can be rewritten asnablaF (w) = minuso(δ+minusδminus) Similar to the implementation of (10)the DSC circuit in Fig 22 can be used to implement the GDalgorithm to perform the online training (one sample at a time)of the fully connected NN with the input signals δ+ o and δminusHowever a backpropagation ie δ = f prime(v)wl+1δl+1 is stillcomputed by using a conventional method (eg a fixed-pointarithmetic circuit) to obtain the local gradients Otherwisethe accuracy loss would be too much for the algorithm toconverge

Fig 25 shows the convergence of cross entropy (as a costfunction) and testing accuracy for the training of a 784-128-128-10 NN with the MNIST handwritten digit dataset As canbe seen the testing accuracy of the NN trained by the DSCcircuit using the sign-magnitude representation is similar to theone produced by the fixed-point implementation (around 97)However the accuracy of the DSC implementation using thebipolar representation is relatively low

The proposed circuitry can be useful for online learningwhere real-time interaction with the environment is requiredwith an energy constraint [57] It can also be used to train amachine learning model using private or security-critical dataon mobile devices if data are sensitive and cannot be uploaded

to a cloud computer

V ERROR REDUCTION AND ASSESSMENT

One major issue in SC is the loss of accuracy [58] Indepen-dent RNGs have been used to avoid correlation and improvethe accuracy of components such as stochastic multipliersThis method inevitably increases the hardware overheadHowever in a stochastic integrator the sharing of RNGs forgenerating the two input sequences reduces the variance thusreducing the error This is shown as follows

The error introduced by DSC at each step of computation isrelated to the variance of the difference of the two accumulatedstochastic bits AiminusBi When independent RNGs are used togenerate Ai and Bi the variance at a single step is given by

Var[Ai minusBi] = E[(Ai minusBi minus E(Ai minusBi))2]

= E[(Ai minusBi)2]minus 2E[Ai minusBi]

(PAi minus PBi) + (PAi minus PBi)2

(12)

where PAi and PBi are the probabilities of Ai and Bi beinglsquo1rsquo respectively ie E[Ai] = PAi and E[Bi] = PBi Since(AiminusBi)

2 is 0 when Ai and Bi are equal or 1 when they aredifferent E[(AiminusBi)

2] equals PAi(1minusPBi)+PBi(1minusPAi)which is the probability that Ai and Bi are different when Ai

and Bi are independently generated1 (12) is then rewritten as

Varind[Ai minusBi] = PAi(1minus PBi) + PBi(1minus PAi)

minus (PAi minus PBi)2

= PAi(1minus PAi) + PBi(1minus PBi)

(13)

On the other hand if the same RNG is used to generate Ai

and Bi the variance of Ai minusBi can be derived from (12) aswell However since only a shared random number is used togenerate Ai and Bi E[(AiminusBi)

2] equals |PAiminusPBi| which

1E[(AiminusBi)2] = Prob[(AiminusBi)

2 = 1]times1+Prob[(AiminusBi)2 = 0]times0

0 5 10 15 20Epoch

0

01

02

03

04

05

Cro

ss e

ntro

py

Signed DSCBipolar DSCFixed-point

(a)

0 5 10 15 20Epoch

88

90

92

94

96

98

Test

ing

accu

racy

(in

perc

enta

ge)

Signed DSCBipolar DSCFixed-point

(b)

Fig 25 (a) Convergence of cross entropy and (b) testingaccuracy of MNIST hand-written digit recognition datasetCross entropy is the cost function to be minimized during thetraining of an NN The weights of the NN are updated by theDSC circuits The DSC circuits using the sign-magnitude andbipolar representations and the fixed-point implementation areconsidered for comparison

gives the probability that Ai and Bi are different Thus (12)can be rewritten as

Varshare[Ai minusBi] = |PAi minus PBi| minus (PAi minus PBi)2

= |PAi minus PBi|(1minus |PAi minus PBi|)(14)

Since Varshare[Ai minus Bi] minus Varind[Ai minus Bi] = 2PBiPAi minus2min(PAi PBi) le 0 we obtain Varshare[Ai minus Bi] leVarind[AiminusBi] for any i = 0 1 2 [54] Therefore sharingthe use of RNGs to generate input stochastic bit streamsimproves the accuracy

For the circuit in Fig 22 though the inputs of the stochasticintegrator are the outputs of the stochastic multipliers insteadof being directly generated by a DSNG the RNG sharingscheme still works in the generated DSSrsquos encoding δ+ andδminus ie it produces a smaller variance in the computed result[56] However an uncorrelated RNG must be used to generatethe DSS encoding the recorded output of the neuron obecause independency is still required in general for accuratestochastic multiplication

The accuracy of the accumulated result is also related to thebit-width of the counter in the stochastic integrator The bitwidth does not only determine the resolution of the result butalso the step size especially in the ODE solver adaptive filterand the gradient descent circuit A larger bit width indicatesa smaller resolution and a smaller step size for fine tuning atthe cost of extra computation time It is shown in [56] that themulti-step variance bound exponentially decreases with the bitwidth of the counter However for the IIR filter the changeof bit-width affects the characteristic of the filter

For the ODE solver circuit it is also found that the use ofquasirandom numbers improves the accuracy compared to theuse of pseudorandom numbers [54] However when the DSS isused to encode signals from the input or images such as in theadaptive filter and the gradient descent algorithm quasirandomand pseudorandom numbers produce similar accuracy Thereason may lie in the fact that the values of the encodedsignals in these two applications are independently selectedfrom the training dataset However for the ODE solver thesignals encoded are samples from continuous functions andadjacent samples have similar values As a result withina short segment of the DSS in this case it approximatelyencodes the same value It then works similarly as CSCfor which the Sobol sequence helps improving the accuracyFig 26 compares the convergence of misalignment duringthe training of the adaptive filter using pseudorandom andquasirandom numbers 16-bit stochastic integrators are usedin this experiment The misalignment is given by

Misalignment =E[(w minus w)2]

E[w2] (15)

where w denotes the target value and w is the trained resultIt can be seen that the convergence speed of using these twotypes of random numbers is similar and their misalignmentsboth converge to around -52 dB

TABLE II Hardware evaluation of the gradient descent circuitarray training a 784-128-128-10 neural network

Metrics DSC Fixed-point Ratio

Step size 2minus10 2minus10 -Epochs 20 20 -Min time (ns) 16times 106 47times 106 13EPO (fJ) 12times 107 11times 108 192TPA (imagesmicrom2) 57times 101 15 381Aver test Accu 9704 9749 -

TABLE III Hardware performance comparison of the ODEsolvers for (8)

Metric DSC Fixed-point Ratio

Step size 2minus8 2minus8 -Iterations 3217 3217 -Min time (ns) 257359 855720 133EPO (fJ) 20121 46600 123TPA (wordmicrosmicrom2) 475 058 821RMSE 47times 10minus3 36times 10minus3 -

VI HARDWARE EFFICIENCY EVALUATION

Due to the extremely low sequence length used for encodinga value in DSC the power consumption of DSC circuits ismuch lower than conventional arithmetic circuits Moreoversince each bit in a DSS is used to encode a number DSCis more energy-efficient than CSC The gradient descent andODE solver circuits are considered as examples to show thehardware efficiency Implemented in VHDL the circuits aresynthesized by Synopsys Design Compiler with a 28-nm tech-nology to obtain the power consumption maximum frequencyand hardware measurements The minimum runtime energyper operation (EPO) throughput per area (TPA) and accuracyare reported in Table II for the gradient descent circuits andin Table III for ODE solvers

As can be seen with a slightly degraded accuracy theDSC circuits outperform conventional circuits using a fixed-point representation in speed energy efficiency and hardwareefficiency For the gradient descent circuit since the two RNGscan be shared among the 118282 stochastic integrators (for118282 weights) the improvement in hardware efficiency iseven more significant than the ODE solver for which one RNG

0 05 1 15 2 25 3Number of steps ( 64 input samples) 104

-60

-50

-40

-30

-20

-10

0

Mis

alig

nmen

t (dB

)

PseudorandomQuasirandom

Fig 26 Comparison of the convergence of misalignment usingpseudorandom and quasirandom numbers

is shared between two stochastic integrators However sincethe sharing of RNGs does not significantly affect the criticalpath the performance improvements are similar for these twocircuits

VII CONCLUSION

In this article a new type of stochastic computing dynamicstochastic computing (DSC) is introduced to encode a varyingsignal by using a dynamically variable binary bit streamreferred to as a dynamic stochastic sequence (DSS) A DSCcircuit can efficiently implement an iterative accumulation-based algorithm It is useful in implementing the Euler methodtraining a neural network and an adaptive filter and IIRfiltering In those applications integrations are implementedby stochastic integrators that accumulate the stochastic bits in aDSS Since the computation is performed on single stochasticbits instead of conventional binary values or long stochasticsequences a significant improvement in hardware efficiencyis achieved with a high accuracy

ACKNOWLEDGMENT

This work was supported by the Natural Sciences andEngineering Research Council of Canada (NSERC) underProjects RES0025211 and RES0048688

REFERENCES

[1] B R Gaines Stochastic Computing Systems Boston MA SpringerUS 1969 pp 37ndash172

[2] W Qian X Li M D Riedel K Bazargan and D J Lilja ldquoAnarchitecture for fault-tolerant computation with stochastic logicrdquo IEEETransactions on Computers vol 60 no 1 pp 93ndash105 2011

[3] P Li and D J Lilja ldquoUsing stochastic computing to implementdigital image processing algorithmsrdquo in 2011 IEEE 29th InternationalConference on Computer Design (ICCD) 2011 pp 154ndash161

[4] S C Smithson N Onizawa B H Meyer W J Gross and T HanyuldquoEfficient CMOS invertible logic using stochastic computingrdquo IEEETransactions on Circuits and Systems I Regular Papers vol 66 no 6pp 2263ndash2274 June 2019

[5] W Maass and C M Bishop Pulsed neural networks MIT press 2001[6] Y Liu S Liu Y Wang F Lombardi and J Han ldquoA stochastic

computational multi-layer perceptron with backward propagationrdquo IEEETransactions on Computers vol 67 no 9 pp 1273ndash1286 2018

[7] J Zhao J Shawe-Taylor and M van Daalen ldquoLearning in stochastic bitstream neural networksrdquo Neural Networks vol 9 no 6 pp 991ndash9981996

[8] B D Brown and H C Card ldquoStochastic neural computation Icomputational elementsrdquo IEEE Transactions on Computers vol 50no 9 pp 891ndash905 2001

[9] D Zhang and H Li ldquoA stochastic-based FPGA controller for aninduction motor drive with integrated neural network algorithmsrdquo IEEETransactions on Industrial Electronics vol 55 no 2 pp 551ndash561 2008

[10] Y Ji F Ran C Ma and D J Lilja ldquoA hardware implementationof a radial basis function neural network using stochastic logicrdquo inProceedings of the Design Automation amp Test in Europe Conference(DATE) 2015 pp 880ndash883

[11] V Canals A Morro A Oliver M L Alomar and J L RosselloldquoA new stochastic computing methodology for efficient neural networkimplementationrdquo IEEE Transactions on Neural Networks and LearningSystems vol 27 no 3 pp 551ndash564 2016

[12] A Ardakani F Leduc-Primeau N Onizawa T Hanyu and W J GrossldquoVLSI implementation of deep neural network using integral stochasticcomputingrdquo IEEE Transactions on Very Large Scale Integration (VLSI)Systems vol 25 no 10 pp 2688ndash2699 2017

[13] A Ren Z Li C Ding Q Qiu Y Wang J Li X Qian and B YuanldquoSC-DCNN Highly-scalable deep convolutional neural network us-ing stochastic computingrdquo in Proceedings of the 22nd InternationalConference on Architectural Support for Programming Languages andOperating Systems (ASPLOS) 2017 pp 405ndash418

[14] V T Lee A Alaghi J P Hayes V Sathe and L Ceze ldquoEnergy-efficienthybrid stochastic-binary neural networks for near-sensor computingrdquo inDesign Automation amp Test in Europe Conference amp Exhibition (DATE)2017 pp 13ndash18

[15] R Hojabr K Givaki S Tayaranian P Esfahanian A Khonsari D Rah-mati and M H Najafi ldquoSkippyNN An embedded stochastic-computingaccelerator for convolutional neural networksrdquo in Proceedings of the56th Annual Design Automation Conference (DAC) 2019 pp 1ndash6

[16] Y Liu L Liu F Lombardi and J Han ldquoAn energy-efficient and noise-tolerant recurrent neural network using stochastic computingrdquo IEEETransactions on Very Large Scale Integration (VLSI) Systems vol 27no 9 pp 2213ndash2221 Sep 2019

[17] R Cai A Ren O Chen N Liu C Ding X Qian J Han W LuoN Yoshikawa and Y Wang ldquoA stochastic-computing based deeplearning framework using adiabatic quantum-flux-parametron supercon-ducting technologyrdquo in Proceedings of the 46th International Symposiumon Computer Architecture ser ISCA rsquo19 New York NY USA ACM2019 pp 567ndash578

[18] A Alaghi and J P Hayes ldquoSurvey of stochastic computingrdquo ACMTrans Embed Comput Syst vol 12 no 2s pp 921ndash9219 May 2013

[19] A Zhakatayev S Lee H Sim and J Lee ldquoSign-magnitude SC getting10x accuracy for free in stochastic computing for deep neural networksrdquoin Proceedings of the 55th Annual Design Automation Conference(DAC) 2018 pp 1ndash6

[20] P K Gupta and R Kumaresan ldquoBinary multiplication with PN se-quencesrdquo IEEE Transactions on Acoustics Speech and Signal Process-ing vol 36 no 4 pp 603ndash606 1988

[21] A Alaghi and J P Hayes ldquoFast and accurate computation using stochas-tic circuitsrdquo in Proceedings of the conference on Design Automation ampTest in Europe (DATE) 2014 pp 1ndash4

[22] S Liu and J Han ldquoEnergy efficient stochastic computing with Sobolsequencesrdquo in Design Automation amp Test in Europe Conference ampExhibition (DATE) 2017 pp 650ndash653

[23] H Niederreiter Random Number Generation and quasi-Monte CarloMethods Philadelphia PA USA Society for Industrial and AppliedMathematics 1992

[24] P Bratley and B L Fox ldquoAlgorithm 659 Implementing Sobolrsquosquasirandom sequence generatorrdquo ACM Transactions on MathematicalSoftware (TOMS) vol 14 no 1 pp 88ndash100 1988

[25] S Liu and J Han ldquoToward energy-efficient stochastic circuits usingparallel sobol sequencesrdquo IEEE Transactions on Very Large ScaleIntegration (VLSI) Systems vol 26 no 7 pp 1326ndash1339 July 2018

[26] A Alaghi and J P Hayes ldquoSTRAUSS Spectral transform use instochastic circuit synthesisrdquo IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems vol 34 no 11 pp 1770ndash1783 2015

[27] P Li D J Lilja W Qian K Bazargan and M Riedel ldquoThe synthesisof complex arithmetic computation on stochastic bit streams usingsequential logicrdquo in Proceedings of the International Conference onComputer-Aided Design (ICCAD) 2012 pp 480ndash487

[28] P Mars and W J Poppelbaum Stochastic and deterministic averagingprocessors Peter Peregrinus Press 1981 no 1

[29] Z Li A Ren J Li Q Qiu Y Wang and B Yuan ldquoDSCNN Hardware-oriented optimization for stochastic computing based deep convolutionalneural networksrdquo in IEEE 34th International Conference on ComputerDesign (ICCD) 2016 pp 678ndash681

[30] A Ren Z Li Y Wang Q Qiu and B Yuan ldquoDesigning reconfigurablelarge-scale deep learning systems using stochastic computingrdquo in IEEEInternational Conference on Rebooting Computing (ICRC) 2016 pp1ndash7

[31] K Kim J Kim J Yu J Seo J Lee and K Choi ldquoDynamic energy-accuracy trade-off using stochastic computing in deep neural networksrdquoin Proceedings of the 53rd Annual Design Automation Conference(DAC) 2016 pp 1241ndash1246

[32] H Sim and J Lee ldquoA new stochastic computing multiplier withapplication to deep convolutional neural networksrdquo in Proceedings ofthe 54th Annual Design Automation Conference (DAC) 2017 pp 1ndash6

[33] Y Liu Y Wang F Lombardi and J Han ldquoAn energy-efficient online-learning stochastic computational deep belief networkrdquo IEEE Journal

on Emerging and Selected Topics in Circuits and Systems vol 8 no 3pp 454ndash465 Sep 2018

[34] J F Keane and L E Atlas ldquoImpulses and stochastic arithmetic forsignal processingrdquo in IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP) vol 2 2001 pp 1257ndash1260

[35] R Kuehnel ldquoBinomial logic extending stochastic computing to high-bandwidth signalsrdquo in Conference Record of the Thirty-Sixth AsilomarConference on Signals Systems and Computers vol 2 2002 pp 1089ndash1093

[36] N Yamamoto H Fujisaka K Haeiwa and T Kamio ldquoNanoelectroniccircuits for stochastic computingrdquo in 6th IEEE Conference on Nanotech-nology (IEEE-NANO) vol 1 2006 pp 306ndash309

[37] K K Parhi and Y Liu ldquoArchitectures for IIR digital filters usingstochastic computingrdquo in IEEE International Symposium on Circuitsand Systems (ISCAS) 2014 pp 373ndash376

[38] B Yuan Y Wang and Z Wang ldquoArea-efficient scaling-free DFTFFTdesign using stochastic computingrdquo IEEE Transactions on Circuits andSystems II Express Briefs vol 63 no 12 pp 1131ndash1135 2016

[39] R Wang J Han B F Cockburn and D G Elliott ldquoStochastic circuitdesign and performance evaluation of vector quantization for differenterror measuresrdquo IEEE Transactions on Very Large Scale Integration(VLSI) Systems vol 24 no 10 pp 3169ndash3183 2016

[40] H Ichihara T Sugino S Ishii T Iwagaki and T Inoue ldquoCompactand accurate digital filters based on stochastic computingrdquo IEEE Trans-actions on Emerging Topics in Computing vol 7 no 1 pp 31ndash432016

[41] T Hammadou M Nilson A Bermak and P Ogunbona ldquoA 96spltimes64 intelligent digital pixel array with extended binary stochasticarithmeticrdquo in Proceedings of the International Symposium on Circuitsand Systems (ISCAS) 2003

[42] A Alaghi C Li and J P Hayes ldquoStochastic circuits for real-timeimage-processing applicationsrdquo in Proceedings of the 50th AnnualDesign Automation Conference (DAC) 2013 pp 1361ndash1366

[43] P Li D J Lilja W Qian K Bazargan and M D Riedel ldquoComputationon stochastic bit streams digital image processing case studiesrdquo IEEETransactions on Very Large Scale Integration (VLSI) Systems vol 22no 3 pp 449ndash462 2014

[44] M H Najafi S Jamali-Zavareh D J Lilja M D Riedel K Bazarganand R Harjani ldquoTime-encoded values for highly efficient stochasticcircuitsrdquo IEEE Transactions on Very Large Scale Integration (VLSI)Systems vol 25 no 5 pp 1644ndash1657 2017

[45] S R Faraji M Hassan Najafi B Li D J Lilja and K BazarganldquoEnergy-efficient convolutional neural networks with deterministic bit-stream processingrdquo in 2019 Design Automation Test in Europe Confer-ence Exhibition (DATE) 2019 pp 1757ndash1762

[46] S Liu and J Han ldquoDynamic stochastic computing for digital signalprocessing applicationsrdquo in Design Automation amp Test in EuropeConference amp Exhibition (DATE) 2020

[47] I L Dalal D Stefan and J Harwayne-Gidansky ldquoLow discrepancysequences for Monte Carlo simulations on reconfigurable platformsrdquo inInternational Conference on Application-Specific Systems Architecturesand Processors (ASAP) 2008 pp 108ndash113

[48] W G McCallum D Hughes-Hallett A M Gleason D O LomenD Lovelock J Tecosky-Feldman T W Tucker D Flath T ThrashK R Rhea et al Multivariable calculus Wiley 1997 vol 200 no 1

[49] S S Tehrani A Naderi G-A Kamendje S Hemati S Mannor andW J Gross ldquoMajority-based tracking forecast memories for stochasticLDPC decodingrdquo IEEE Transactions on Signal Processing vol 58no 9 pp 4883ndash4896 2010

[50] N Saraf K Bazargan D J Lilja and M D Riedel ldquoIIR filtersusing stochastic arithmeticrdquo in Design Automation amp Test in EuropeConference amp Exhibition (DATE) 2014 pp 1ndash6

[51] F Maloberti ldquoNon conventional signal processing by the use of sigmadelta technique a tutorial introductionrdquo in Proceedings of IEEE Inter-national Symposium on Circuits and Systems (ISCAS) vol 6 1992 pp2645ndash2648

[52] J C Butcher The Numerical Analysis of Ordinary Differential Equa-tions Runge-Kutta and General Linear Methods USA Wiley-Interscience 1987

[53] W Gerstner W M Kistler R Naud and L Paninski Neuronaldynamics From single neurons to networks and models of cognitionCambridge University Press 2014

[54] S Liu and J Han ldquoHardware ODE solvers using stochastic circuitsrdquo inProceedings of the 54th Annual Design Automation Conference (DAC)2017 pp 1ndash6

[55] S S Haykin Neural networks and learning machines Pearson UpperSaddle River NJ USA 2009 vol 3

[56] S Liu H Jiang L Liu and J Han ldquoGradient descent using stochasticcircuits for efficient training of learning machinesrdquo IEEE Transactionson Computer-Aided Design of Integrated Circuits and Systems vol 37no 11 pp 2530ndash2541 Nov 2018

[57] R Nishihara P Moritz S Wang A Tumanov W Paul J Schleier-Smith R Liaw M Niknami M I Jordan and I StoicaldquoReal-time machine learning The missing piecesrdquo in Proceedingsof the 16th Workshop on Hot Topics in Operating Systemsser HotOS rsquo17 ACM 2017 pp 106ndash110 [Online] Availablehttpdoiacmorg10114531029803102998

[58] J P Hayes ldquoIntroduction to stochastic computing and its challengesrdquo inProceedings of the 52nd Annual Design Automation Conference (DAC)2015 pp 591ndash593

Page 5: Introduction to Dynamic Stochastic Computing

Digital signals (from ADCs or

storage)

DSNGsSignal

reconstruction To be stored or converted back to

analog signals

Stochastic logicDynamic

stochastic sequences

Dynamic stochastic sequences

Fig 12 A DSC system

RNGDynamic stochastic sequence

encoding signal xi

111010001011hellipComparator

B

AAltB

xi

Fig 13 A DSNG xi is an input to the comparator at onesample per clock cycle

A DSS generation

A DSS can be generated by a dynamic SNG (DSNG) asshown in Fig 13 It is similar to a conventional SNG exceptthat the input is a varying signal instead of a static number TheRNG generates uniformly distributed random numbers within[0 1] Given a discrete signal xi as shown in Fig 13 itis compared with a random number one sample at a timeIf xi is larger than the random number a lsquo1rsquo is generatedotherwise the DSNG generates a lsquo0rsquo Generating every bit inthe DSS is a Bernoulli process (in the ideally random case)so the expectation getting a lsquo1rsquo is equal to the correspondingsample value from the discrete digital signal The discretesignal can either be sampled from an analog signal or readfrom a memory device

B Data representations in DSC

Similar to CSC all the aforementioned mapping schemescan be used to encode numbers within certain ranges in DSCwhich subsequently requires the use of different computationalelements Specifically a DSS can encode a signal within [0 1]in the unipolar representation or within [minus1 1] in the bipolaror sign-magnitude representation For the sign-magnitude rep-resentation two sequences are used to encode a signal Oneencodes the magnitude of the signal Since the signal can haveboth positive and negative values another sequence is used torepresent the signs of the numbers

C Stochastic circuits

The DSC circuits can be combinational or sequential1) Combinational circuits A combinational circuit using

DSSrsquos as its inputs implements the composite function of theoriginal stochastic function For example an AND gate or aunipolar stochastic multiplier computes zi = xiyi for eachpair of bits in the input DSSrsquos that independently encode xiand yi (i = 0 1 2 ) respectively [46] due to E[Zi] =E[XiYi] = E[Xi]E[Yi] = xiyi where Zi Xi and Yi are theith bits in the output and input DSSrsquos respectively Note thatfor a continuous signal oversampling is required to producethe discrete signals xi and yi for accurate multiplication [46]Fig 14(a) shows the DSC circuit for a signal multiplicationand (b) shows the results of multiplying two sinusoidal signals

DSNG Signal Xx(t)

DSNG

DSNG

Signal reconstruction

X

Y

x(t)

y(t)ziZ

(a)

0 02 04 06 08 1 12 14 160

05

1

x(t)

Original signalDSS

0 02 04 06 08 1 12 14 160

05

1

y(t)

0 05 1 15t (s)

0

05

1

z(t)

Double precisionDynamic stochastic

(b)

Fig 14 A frequency mixer by using a stochastic multiplierwith DSSrsquos as the inputs (a) circuit and (b) results

with frequencies of 15 Hz and 2 Hz both sampled at a rateof 210 Hz The sampled signals are converted to DSSrsquos by twoDSNGs with independent Sobol sequence generators [47] asthe RNGs The output DSS is reconstructed to a digital signalby a moving average [28] The DSS can also be reconstructedby an ADDIE [1]

2) Sequential circuits The stochastic integrator is an im-portant component used in DSC for iterative accumulation Itimplements integrations of the signal encoded by accumulatingthe stochastic bits in a DSS Fig 15 shows that a bipolarstochastic integrator can be used to perform numerical inte-gration In Fig 15(a) the input digital signal is cos(πt2)sampled at a sampling rate of 28 Hz and converted to a DSSby the DSNG The other input of the stochastic integrator isset to a bipolar stochastic sequence encoding 0 (a stochasticbit stream with half of the bits being lsquo1rsquo) By accumulatingthe input bits the stochastic integrator provides an unbiasedestimate to the Riemann sum [48] The integration is obtainedby recording the changing values of the counter Since theintegrated results are directly provided by the counter areconstruction unit is not required

As can be seen from Fig 15(b) the results produced bythe stochastic integrator is very close to the analytical resultsSince the output DSS is generated by comparing the changingsignal (as the integration results) with uniformly distributedrandom numbers the RNG and the comparator inside theintegrator works as a DSNG As a result the output sequenceencodes the discretized stochastic integration solution for theanalytical solution (2π) sin(πt2) The output DSS is alsoshown in Fig 15(b)

MUX

01100110hellip

p1=05

10101010hellip

p2=05

00100010hellip

p=p1p2=025

01100101hellip

p1=05

11101110hellip

p2=075

sel

psel=05

00110110hellip

0

1 p=(1-psel)p1+pselp2=0625

01100111hellip (a)

(c)

10010000hellip

00101000hellip01000111hellip times2-1= -05

416

p1=

times2-1= -05 416

p2=

times2-1= 025 1016

p1p2=

(b) A bipolar stochastic multiplier

00101000hellip 11010111hellip

p1=025 p=1-p1=075

(b)

p1

p2

p=p1p2

1

1

p1

p2

(1 1)

AltB

AltB

RN1

RN2

Counter

(b) A bipolar stochastic multiplier

Stochastic number generator (SNG)

array

hellip

Binary inputs(from memory

devices)

hellip

Stochastic sequences

Stochastic computing

circuits

Stochastic sequences

Probability estimators

(PEs)

hellip hellip

Binary outputs

+

-

DSNGDigital signal

Bipolar stochastic

sequence encodinglsquo0rsquo

Integration results

Output DSS

(a)

0 05 1 15 2 25 3 35 4t

-1

-05

0

05

1

Analytical resultsResults by DSCOriginal signalOutput DSS

(b)

Fig 15 A bipolar stochastic integrator for numerical integra-tion (a) circuit and (b) results The output DSS is decimatedfor a clear view

The stochastic integrator has been used in a stochastic low-density parity-check (LDPC) decoder to implement a trackingforecast memory in [49] It solves the latch-up problem in thedecoding process with a low hardware complexity

For the FSM-based sequential circuits however the DSScannot work well because the probability consistently changesand a steady hyper-state is hard to reach The DSS could alsoviolate the condition of the model being a Markov chain dueto the autocorrelation of the encoded signal

IV APPLICATION OF DSC

By using the stochastic integrators DSC can implement aseries of algorithms that involve iterative accumulations suchas an IIR filter the Euler method and a GD algorithm

A IIR filtering

In [50] it is shown that an ADDIE with different parameterscan be used as a low-pass IIR filter with different frequencyresponses as shown in Fig 16 Given the bit width of thestochastic integrator n the transfer function of such a filter inthe Z-domain is given by

H(Z) =1

2nZ + 1minus 2n (3)

for a stable first-order low-pass IIR filter [50] An oversampled∆ minus Σ modulated bit stream is then used as the input of thecircuit Since the generation of a ∆minusΣ modulated bit stream

2n-state Counter INC

DEC

RNG

gtSeqout

A

BSeqout

(a) (b)

comparator

A

B

+

-

Countout

Countout

Seqout

(b)

+

-

Countout

X

+

-

p

y(t)

Fig 16 ADDIE circuit produces an exponential function whenthe input is a sequence encoding a static number

0 01 02 03 04t (s)

(a) Original and filtered signals

-1

0

1

0 50 100 150 200 250f (Hz)

(b) Spectra of original and filtered signals

0

02

04Original signalFiltered signal

Fig 17 The low-pass filtering results in (a) the time domainand (b) the frequency domain

can also be considered as a Bernoulli process [51] the DSScan be used for IIR filtering as well

Fig 17 shows the filtering of mixed sinusoidal signals of 4Hz and 196 Hz The mixture is first sampled at a sampling rateof 656k Hz and the sampled digital signal is then convertedto a DSS The DSS is filtered by an 8-bit ADDIE As can beseen the high-frequency signals are almost filtered out whilethe low-frequency signals remain

B ODE solvers

The Euler method provides numerical solutions for ODEsby accumulating the derivative functions [52] For an ODEdydt = f(t y(t)) the estimated solution y(t) is given by

y(ti+1) = yi+1 = yn + hf(ti yi) (4)

with h as the step size and ti+1 = ti + h After k steps ofaccumulation

yk = y0 + h

kminus1sumi=0

f(ti yi) (5)

To use a stochastic integrator let the pair of input DSSrsquos(eg A and B in (2)) encode the discrete derivative functionf(ti yi) such that E[Ai minus Bi] = f(ti yi) By doing sothe stochastic integrator implements (2) and approximatelycomputes (5) with a step size h = 12n As a result the bitwidth of the counter does not only decide the precision of thecomputation but also the time resolution The integration in theEuler method is implemented by accumulating the stochasticbits in the DSSrsquos instead of real values

Using this method an exponential function can be generatedby a stochastic integrator The ADDIE circuit as shown inFig 16 solves

dy(t)

dt= pminus y(t) (6)

which is the differential equation for a leaky integratormodel [53] Thus the ADDIE can be used to implement

neuronal dynamics in a leaky integrate-and-fire model Theanalytical solution of (6) is given by

y(t) = pminus (pminus y0)eminust (7)

where y0 is provided in the initial condition ie y(0) = y0This model is commonly seen in natural processes such asneuronal dynamics [53] Fig 18 shows the results producedby the ADDIE when y0 = 1 and p = 0

0 2 4 6 8t

0

02

04

06

08

1

y(t)

DSCAnalytical

Fig 18 ADDIE used as an ODE solver The solution is anexponential function

For a set of ODEsdy1(t)

dt = y2(t)minus 05dy2(t)

dt = 05minus y1(t)(8)

with the initial values y1(0) = 0 and y2(0) = 05 a numericalsolution can be found by the circuit in Fig 19 The upperintegrator computes the numerical estimate of y1(t) by takingthe output DSS from the lower integrator and a conventionalstochastic sequence encoding 05 as inputs So the differenceof the two input sequences approximately encodes y2(t)minus05the derivative of y1(t) Simultaneously the output DSS of theupper stochastic integrator encodes the numerical solution ofy1(t) which is used to solve the second differential equationin (8) Thus the stochastic integrator can be viewed as anunbiased Euler solution estimator [54] The results producedby the dynamic stochastic circuit is shown in Fig 20 8-bitupdown counters are used in both stochastic integrators andthe RNG is implemented by a Sobol sequence generator [47]Since the results are provided by the counter in the stochasticintegrator an explicit reconstruction unit as shown in Fig 12is not required

A partial differential equation such as a Laplacersquos equationcan also be solved by an array of stochastic integrators witha finite-difference method

C Weight update for an adaptive filterDSC can be applied to update the weights in an adaptive

digital filter An adaptive filter system consists of a linear filterand a weight update component as shown in Fig 21 Theoutput of the linear filter at time i is given by

yi = F (wixi) = wixi =sumMminus1

j=0wj

iximinusM+j+1 (9)

+

-

y(t)

1

z(t)

times2

+

-

Y

+

-

+

-

y1(t)

y2(t)

p=05Y1

Y2

(a) (b)

p=05(times2)

bi

ai

+

-

y(t)

1

z(t)

times2

+

-

+

-

+

-

y1(t)

y2(t)

p=05

p=05(times2)

bi

ai

+

-

1

0+

-0+

-0

16

y(t)= t3

y(t)=t

y(t)= t212

Fig 19 A pair of stochastic integrators solving (8) [54]

0 2 4 6 8 10 120

05

1

y 1(t

)

DSCAnalytical solution

0 2 4 6 8 10 12t

0

05

1

y 2(t

)Fig 20 Simulation results of the ODE solver in Fig 19 Theyare compared with the analytical solutions

where M is the length of the filter or the number of weightswi is a vector of M weights wi = [w

(0)i w

(1)i w

(Mminus1)i ]

and xi is the input vector for the digital signal sampled atdifferent time points xi = [ximinusM+1 ximinusM+2 xi]

T

+

-

y(xi)

F(wi xi)

F(wi xi) wi

ti

SNGpart F(wi xi)

part wijSNG

SNG

ai

bi

EEEE

Linear filter with weights wi

LMS weight update unit

xi-M+1xi-M+2

xi

hellip

yi +- ti+

ei

xi

+ -

xi-M+j

wij

xixi-M+1

+ -

yi

wi1

ti

RNG

+ -

wiM

RNG

hellip hellip

hellip hellip

hellip hellip

Comparator

+ - + - + -

wijwi1 wiM

xi-M+j+1 xixi-M+1

yi

ti

RNG

RNGhellip hellip

hellip hellip

hellip hellip

Comparator

Linear filter with parameters wi

Δwi

LMS weight updating unit

hellip

yi +- ti+

ei

hellip

Δwi

(Ci)

wij[Pi] = [12nCi] =

Fig 21 An adaptive filter

In the least-mean-square (LMS) algorithm the cost functionis the mean squared error between the filter output and theguidance signal ti The weights are updated by

wi+1 = wi + ηxi(ti minus yi) (10)

where η is the step size The update is implemented by thecircuit in Fig 22 The two multipliers are used to computexiti and xiyi respectively The integrator is then used toaccumulate ηxi(ti minus yi) over i = 0 1

The dynamic stochastic circuit is used to perform systemidentification for a high pass 103-tap FIR filter 103 weightsare trained by using randomly generated samples with theLMS algorithm 14-bit stochastic integrators are used Afteraround 340k iterations of training the misalignment of the

Actual classification results

Trained parameters

-

+

1 6 8 0

Estimate classification results (distribution)

SNGarray

Images

10 classes

00000hellip

11010hellip

00010hellip

Trained parameters

-

+

1 6 8 0DSNGarray

10 classes

times

times

ti

yi

xi

Fig 22 A DSC circuit training a weight in an adaptive filterThe stochastic multipliers are denoted by rectangles with ldquotimesrdquoBipolar and sign-magnitude representations can be used for thestochastic multipliers and integrator

weights in the adaptive filter converges to about -45 dBcompared to the target system The frequency responses ofthe target and the trained filter are shown in Fig 23 As canbe seen the adaptive filter has a similar frequency responseto the target system especially in the passband indicating anaccurate identification result

0 05 1 15 2 25 3Normalized frequency

-80

-60

-40

-20

0

Mag

nitu

de in

dB

target systemadaptive filter

Fig 23 Frequency responses of the target and trained filterusing the sign-magnitude representation

D A gradient descent (GD) circuit

GD is a simple yet effective optimization algorithm It findsthe local minima by iteratively taking steps along the negativesof the gradients at current points One-step GD is given by

wi+1 = wi minus ηnablaF (wi) (11)

where F (wi) is the function to be minimized at the ith stepnablaF (wi) denotes the current gradient and η is the step sizeThe step size is an indicator of how fast the model learns Asmall step size leads to a slow convergence while a large onecan lead to divergence

Similar to the implementation of the Euler method let apair of differential DSSrsquos encode the gradient minusnablaF (wi)the stochastic integrator can be used to compute (11) with astep size η = 12n by using these DSSrsquos as inputs DSC canthen be used to perform the optimization of the weights in aneural network (NN) by minimizing the cost function

The training of an NN usually involves two phases Take afully connected NN as an example shown in Fig 24 In theforward propagation the sum of product of the input vector xand the weight vector w is obtained as v = wx The output

Inputs Hidden layers

Outputs

(a)

hellip

x

w o=f(wx)

hellip

(b)

(c)

wl+1

δl+1o wl

hellip

δwl = f`wl+1δl+1

hellip

Random number generator (RNG)

p

Stochastic sequence with probability p

0101100hellip

Comparator

B

AAltB

N

N

(a)

CLK

01100110hellip

p1=05

10101010hellip

p2=05

00100010hellip

p=p1p2=025

(a) A stochastic multiplier for the unipolar representation

0100010010010000hellip

1000100010001000hellip0011001111100111hellip

times2-1= -05 4

16p1=

times2-1= -05 416

p2=

times2-1= 025 1016

p1p2=

(b) A stochastic multiplier for the bipolar representation

01100110hellip

|p1|=05

10101010hellip

|p2|=05

00100010hellip

|p|=|p1||p2|=025

(c) A stochastic multiplier for the sign-magnitude representation

|p1|=05

Sign(p1)=1(negative)

Sign(p2)=0(positve)

Sign(p)=1(negative)

Fig 24 (a) A fully connected NN in which the neuronsare organized layer by layer (b) The computation duringthe forward propagation (c) The local gradient for wl iscomputed by using the backward propagation The output ofrom previous neuron is required to compute the gradient

of the neuron o is then computed by o = f(v) where f isthe activation function During this phase the weights do notchange In the backward propagation the difference betweenthe actual and the desired final outputs ie the classificationresults is calculated and the local gradients are computed foreach neuron based on the difference [55] Take the neuronin Fig 24(c) as an example the local gradient is obtainedby δl = f prime(v)wl+1δl+1 where f prime(middot) is the derivative of theactivation function wl+1 is the weight vector connecting layerl and l+1 and δl+1 is the local gradient vector of the neuronsin layer l + 1 Then the gradient of a weight at layer l iscomputed by multiplying the local gradient with the recordedoutput of the designated neuron ie nablaF (wl) = minusoδl

In [56] it is found that the local gradient can be decomposedinto two parts δ = δ+ minus δminus where δ+ is contributed bythe desired output of the NN and δminus by the actual outputof the NN So the gradient for weight w can be rewritten asnablaF (w) = minuso(δ+minusδminus) Similar to the implementation of (10)the DSC circuit in Fig 22 can be used to implement the GDalgorithm to perform the online training (one sample at a time)of the fully connected NN with the input signals δ+ o and δminusHowever a backpropagation ie δ = f prime(v)wl+1δl+1 is stillcomputed by using a conventional method (eg a fixed-pointarithmetic circuit) to obtain the local gradients Otherwisethe accuracy loss would be too much for the algorithm toconverge

Fig 25 shows the convergence of cross entropy (as a costfunction) and testing accuracy for the training of a 784-128-128-10 NN with the MNIST handwritten digit dataset As canbe seen the testing accuracy of the NN trained by the DSCcircuit using the sign-magnitude representation is similar to theone produced by the fixed-point implementation (around 97)However the accuracy of the DSC implementation using thebipolar representation is relatively low

The proposed circuitry can be useful for online learningwhere real-time interaction with the environment is requiredwith an energy constraint [57] It can also be used to train amachine learning model using private or security-critical dataon mobile devices if data are sensitive and cannot be uploaded

to a cloud computer

V ERROR REDUCTION AND ASSESSMENT

One major issue in SC is the loss of accuracy [58] Indepen-dent RNGs have been used to avoid correlation and improvethe accuracy of components such as stochastic multipliersThis method inevitably increases the hardware overheadHowever in a stochastic integrator the sharing of RNGs forgenerating the two input sequences reduces the variance thusreducing the error This is shown as follows

The error introduced by DSC at each step of computation isrelated to the variance of the difference of the two accumulatedstochastic bits AiminusBi When independent RNGs are used togenerate Ai and Bi the variance at a single step is given by

Var[Ai minusBi] = E[(Ai minusBi minus E(Ai minusBi))2]

= E[(Ai minusBi)2]minus 2E[Ai minusBi]

(PAi minus PBi) + (PAi minus PBi)2

(12)

where PAi and PBi are the probabilities of Ai and Bi beinglsquo1rsquo respectively ie E[Ai] = PAi and E[Bi] = PBi Since(AiminusBi)

2 is 0 when Ai and Bi are equal or 1 when they aredifferent E[(AiminusBi)

2] equals PAi(1minusPBi)+PBi(1minusPAi)which is the probability that Ai and Bi are different when Ai

and Bi are independently generated1 (12) is then rewritten as

Varind[Ai minusBi] = PAi(1minus PBi) + PBi(1minus PAi)

minus (PAi minus PBi)2

= PAi(1minus PAi) + PBi(1minus PBi)

(13)

On the other hand if the same RNG is used to generate Ai

and Bi the variance of Ai minusBi can be derived from (12) aswell However since only a shared random number is used togenerate Ai and Bi E[(AiminusBi)

2] equals |PAiminusPBi| which

1E[(AiminusBi)2] = Prob[(AiminusBi)

2 = 1]times1+Prob[(AiminusBi)2 = 0]times0

0 5 10 15 20Epoch

0

01

02

03

04

05

Cro

ss e

ntro

py

Signed DSCBipolar DSCFixed-point

(a)

0 5 10 15 20Epoch

88

90

92

94

96

98

Test

ing

accu

racy

(in

perc

enta

ge)

Signed DSCBipolar DSCFixed-point

(b)

Fig 25 (a) Convergence of cross entropy and (b) testingaccuracy of MNIST hand-written digit recognition datasetCross entropy is the cost function to be minimized during thetraining of an NN The weights of the NN are updated by theDSC circuits The DSC circuits using the sign-magnitude andbipolar representations and the fixed-point implementation areconsidered for comparison

gives the probability that Ai and Bi are different Thus (12)can be rewritten as

Varshare[Ai minusBi] = |PAi minus PBi| minus (PAi minus PBi)2

= |PAi minus PBi|(1minus |PAi minus PBi|)(14)

Since Varshare[Ai minus Bi] minus Varind[Ai minus Bi] = 2PBiPAi minus2min(PAi PBi) le 0 we obtain Varshare[Ai minus Bi] leVarind[AiminusBi] for any i = 0 1 2 [54] Therefore sharingthe use of RNGs to generate input stochastic bit streamsimproves the accuracy

For the circuit in Fig 22 though the inputs of the stochasticintegrator are the outputs of the stochastic multipliers insteadof being directly generated by a DSNG the RNG sharingscheme still works in the generated DSSrsquos encoding δ+ andδminus ie it produces a smaller variance in the computed result[56] However an uncorrelated RNG must be used to generatethe DSS encoding the recorded output of the neuron obecause independency is still required in general for accuratestochastic multiplication

The accuracy of the accumulated result is also related to thebit-width of the counter in the stochastic integrator The bitwidth does not only determine the resolution of the result butalso the step size especially in the ODE solver adaptive filterand the gradient descent circuit A larger bit width indicatesa smaller resolution and a smaller step size for fine tuning atthe cost of extra computation time It is shown in [56] that themulti-step variance bound exponentially decreases with the bitwidth of the counter However for the IIR filter the changeof bit-width affects the characteristic of the filter

For the ODE solver circuit it is also found that the use ofquasirandom numbers improves the accuracy compared to theuse of pseudorandom numbers [54] However when the DSS isused to encode signals from the input or images such as in theadaptive filter and the gradient descent algorithm quasirandomand pseudorandom numbers produce similar accuracy Thereason may lie in the fact that the values of the encodedsignals in these two applications are independently selectedfrom the training dataset However for the ODE solver thesignals encoded are samples from continuous functions andadjacent samples have similar values As a result withina short segment of the DSS in this case it approximatelyencodes the same value It then works similarly as CSCfor which the Sobol sequence helps improving the accuracyFig 26 compares the convergence of misalignment duringthe training of the adaptive filter using pseudorandom andquasirandom numbers 16-bit stochastic integrators are usedin this experiment The misalignment is given by

Misalignment =E[(w minus w)2]

E[w2] (15)

where w denotes the target value and w is the trained resultIt can be seen that the convergence speed of using these twotypes of random numbers is similar and their misalignmentsboth converge to around -52 dB

TABLE II Hardware evaluation of the gradient descent circuitarray training a 784-128-128-10 neural network

Metrics DSC Fixed-point Ratio

Step size 2minus10 2minus10 -Epochs 20 20 -Min time (ns) 16times 106 47times 106 13EPO (fJ) 12times 107 11times 108 192TPA (imagesmicrom2) 57times 101 15 381Aver test Accu 9704 9749 -

TABLE III Hardware performance comparison of the ODEsolvers for (8)

Metric DSC Fixed-point Ratio

Step size 2minus8 2minus8 -Iterations 3217 3217 -Min time (ns) 257359 855720 133EPO (fJ) 20121 46600 123TPA (wordmicrosmicrom2) 475 058 821RMSE 47times 10minus3 36times 10minus3 -

VI HARDWARE EFFICIENCY EVALUATION

Due to the extremely low sequence length used for encodinga value in DSC the power consumption of DSC circuits ismuch lower than conventional arithmetic circuits Moreoversince each bit in a DSS is used to encode a number DSCis more energy-efficient than CSC The gradient descent andODE solver circuits are considered as examples to show thehardware efficiency Implemented in VHDL the circuits aresynthesized by Synopsys Design Compiler with a 28-nm tech-nology to obtain the power consumption maximum frequencyand hardware measurements The minimum runtime energyper operation (EPO) throughput per area (TPA) and accuracyare reported in Table II for the gradient descent circuits andin Table III for ODE solvers

As can be seen with a slightly degraded accuracy theDSC circuits outperform conventional circuits using a fixed-point representation in speed energy efficiency and hardwareefficiency For the gradient descent circuit since the two RNGscan be shared among the 118282 stochastic integrators (for118282 weights) the improvement in hardware efficiency iseven more significant than the ODE solver for which one RNG

0 05 1 15 2 25 3Number of steps ( 64 input samples) 104

-60

-50

-40

-30

-20

-10

0

Mis

alig

nmen

t (dB

)

PseudorandomQuasirandom

Fig 26 Comparison of the convergence of misalignment usingpseudorandom and quasirandom numbers

is shared between two stochastic integrators However sincethe sharing of RNGs does not significantly affect the criticalpath the performance improvements are similar for these twocircuits

VII CONCLUSION

In this article a new type of stochastic computing dynamicstochastic computing (DSC) is introduced to encode a varyingsignal by using a dynamically variable binary bit streamreferred to as a dynamic stochastic sequence (DSS) A DSCcircuit can efficiently implement an iterative accumulation-based algorithm It is useful in implementing the Euler methodtraining a neural network and an adaptive filter and IIRfiltering In those applications integrations are implementedby stochastic integrators that accumulate the stochastic bits in aDSS Since the computation is performed on single stochasticbits instead of conventional binary values or long stochasticsequences a significant improvement in hardware efficiencyis achieved with a high accuracy

ACKNOWLEDGMENT

This work was supported by the Natural Sciences andEngineering Research Council of Canada (NSERC) underProjects RES0025211 and RES0048688

REFERENCES

[1] B R Gaines Stochastic Computing Systems Boston MA SpringerUS 1969 pp 37ndash172

[2] W Qian X Li M D Riedel K Bazargan and D J Lilja ldquoAnarchitecture for fault-tolerant computation with stochastic logicrdquo IEEETransactions on Computers vol 60 no 1 pp 93ndash105 2011

[3] P Li and D J Lilja ldquoUsing stochastic computing to implementdigital image processing algorithmsrdquo in 2011 IEEE 29th InternationalConference on Computer Design (ICCD) 2011 pp 154ndash161

[4] S C Smithson N Onizawa B H Meyer W J Gross and T HanyuldquoEfficient CMOS invertible logic using stochastic computingrdquo IEEETransactions on Circuits and Systems I Regular Papers vol 66 no 6pp 2263ndash2274 June 2019

[5] W Maass and C M Bishop Pulsed neural networks MIT press 2001[6] Y Liu S Liu Y Wang F Lombardi and J Han ldquoA stochastic

computational multi-layer perceptron with backward propagationrdquo IEEETransactions on Computers vol 67 no 9 pp 1273ndash1286 2018

[7] J Zhao J Shawe-Taylor and M van Daalen ldquoLearning in stochastic bitstream neural networksrdquo Neural Networks vol 9 no 6 pp 991ndash9981996

[8] B D Brown and H C Card ldquoStochastic neural computation Icomputational elementsrdquo IEEE Transactions on Computers vol 50no 9 pp 891ndash905 2001

[9] D Zhang and H Li ldquoA stochastic-based FPGA controller for aninduction motor drive with integrated neural network algorithmsrdquo IEEETransactions on Industrial Electronics vol 55 no 2 pp 551ndash561 2008

[10] Y Ji F Ran C Ma and D J Lilja ldquoA hardware implementationof a radial basis function neural network using stochastic logicrdquo inProceedings of the Design Automation amp Test in Europe Conference(DATE) 2015 pp 880ndash883

[11] V Canals A Morro A Oliver M L Alomar and J L RosselloldquoA new stochastic computing methodology for efficient neural networkimplementationrdquo IEEE Transactions on Neural Networks and LearningSystems vol 27 no 3 pp 551ndash564 2016

[12] A Ardakani F Leduc-Primeau N Onizawa T Hanyu and W J GrossldquoVLSI implementation of deep neural network using integral stochasticcomputingrdquo IEEE Transactions on Very Large Scale Integration (VLSI)Systems vol 25 no 10 pp 2688ndash2699 2017

[13] A Ren Z Li C Ding Q Qiu Y Wang J Li X Qian and B YuanldquoSC-DCNN Highly-scalable deep convolutional neural network us-ing stochastic computingrdquo in Proceedings of the 22nd InternationalConference on Architectural Support for Programming Languages andOperating Systems (ASPLOS) 2017 pp 405ndash418

[14] V T Lee A Alaghi J P Hayes V Sathe and L Ceze ldquoEnergy-efficienthybrid stochastic-binary neural networks for near-sensor computingrdquo inDesign Automation amp Test in Europe Conference amp Exhibition (DATE)2017 pp 13ndash18

[15] R Hojabr K Givaki S Tayaranian P Esfahanian A Khonsari D Rah-mati and M H Najafi ldquoSkippyNN An embedded stochastic-computingaccelerator for convolutional neural networksrdquo in Proceedings of the56th Annual Design Automation Conference (DAC) 2019 pp 1ndash6

[16] Y Liu L Liu F Lombardi and J Han ldquoAn energy-efficient and noise-tolerant recurrent neural network using stochastic computingrdquo IEEETransactions on Very Large Scale Integration (VLSI) Systems vol 27no 9 pp 2213ndash2221 Sep 2019

[17] R Cai A Ren O Chen N Liu C Ding X Qian J Han W LuoN Yoshikawa and Y Wang ldquoA stochastic-computing based deeplearning framework using adiabatic quantum-flux-parametron supercon-ducting technologyrdquo in Proceedings of the 46th International Symposiumon Computer Architecture ser ISCA rsquo19 New York NY USA ACM2019 pp 567ndash578

[18] A Alaghi and J P Hayes ldquoSurvey of stochastic computingrdquo ACMTrans Embed Comput Syst vol 12 no 2s pp 921ndash9219 May 2013

[19] A Zhakatayev S Lee H Sim and J Lee ldquoSign-magnitude SC getting10x accuracy for free in stochastic computing for deep neural networksrdquoin Proceedings of the 55th Annual Design Automation Conference(DAC) 2018 pp 1ndash6

[20] P K Gupta and R Kumaresan ldquoBinary multiplication with PN se-quencesrdquo IEEE Transactions on Acoustics Speech and Signal Process-ing vol 36 no 4 pp 603ndash606 1988

[21] A Alaghi and J P Hayes ldquoFast and accurate computation using stochas-tic circuitsrdquo in Proceedings of the conference on Design Automation ampTest in Europe (DATE) 2014 pp 1ndash4

[22] S Liu and J Han ldquoEnergy efficient stochastic computing with Sobolsequencesrdquo in Design Automation amp Test in Europe Conference ampExhibition (DATE) 2017 pp 650ndash653

[23] H Niederreiter Random Number Generation and quasi-Monte CarloMethods Philadelphia PA USA Society for Industrial and AppliedMathematics 1992

[24] P Bratley and B L Fox ldquoAlgorithm 659 Implementing Sobolrsquosquasirandom sequence generatorrdquo ACM Transactions on MathematicalSoftware (TOMS) vol 14 no 1 pp 88ndash100 1988

[25] S Liu and J Han ldquoToward energy-efficient stochastic circuits usingparallel sobol sequencesrdquo IEEE Transactions on Very Large ScaleIntegration (VLSI) Systems vol 26 no 7 pp 1326ndash1339 July 2018

[26] A Alaghi and J P Hayes ldquoSTRAUSS Spectral transform use instochastic circuit synthesisrdquo IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems vol 34 no 11 pp 1770ndash1783 2015

[27] P Li D J Lilja W Qian K Bazargan and M Riedel ldquoThe synthesisof complex arithmetic computation on stochastic bit streams usingsequential logicrdquo in Proceedings of the International Conference onComputer-Aided Design (ICCAD) 2012 pp 480ndash487

[28] P Mars and W J Poppelbaum Stochastic and deterministic averagingprocessors Peter Peregrinus Press 1981 no 1

[29] Z Li A Ren J Li Q Qiu Y Wang and B Yuan ldquoDSCNN Hardware-oriented optimization for stochastic computing based deep convolutionalneural networksrdquo in IEEE 34th International Conference on ComputerDesign (ICCD) 2016 pp 678ndash681

[30] A Ren Z Li Y Wang Q Qiu and B Yuan ldquoDesigning reconfigurablelarge-scale deep learning systems using stochastic computingrdquo in IEEEInternational Conference on Rebooting Computing (ICRC) 2016 pp1ndash7

[31] K Kim J Kim J Yu J Seo J Lee and K Choi ldquoDynamic energy-accuracy trade-off using stochastic computing in deep neural networksrdquoin Proceedings of the 53rd Annual Design Automation Conference(DAC) 2016 pp 1241ndash1246

[32] H Sim and J Lee ldquoA new stochastic computing multiplier withapplication to deep convolutional neural networksrdquo in Proceedings ofthe 54th Annual Design Automation Conference (DAC) 2017 pp 1ndash6

[33] Y Liu Y Wang F Lombardi and J Han ldquoAn energy-efficient online-learning stochastic computational deep belief networkrdquo IEEE Journal

on Emerging and Selected Topics in Circuits and Systems vol 8 no 3pp 454ndash465 Sep 2018

[34] J F Keane and L E Atlas ldquoImpulses and stochastic arithmetic forsignal processingrdquo in IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP) vol 2 2001 pp 1257ndash1260

[35] R Kuehnel ldquoBinomial logic extending stochastic computing to high-bandwidth signalsrdquo in Conference Record of the Thirty-Sixth AsilomarConference on Signals Systems and Computers vol 2 2002 pp 1089ndash1093

[36] N Yamamoto H Fujisaka K Haeiwa and T Kamio ldquoNanoelectroniccircuits for stochastic computingrdquo in 6th IEEE Conference on Nanotech-nology (IEEE-NANO) vol 1 2006 pp 306ndash309

[37] K K Parhi and Y Liu ldquoArchitectures for IIR digital filters usingstochastic computingrdquo in IEEE International Symposium on Circuitsand Systems (ISCAS) 2014 pp 373ndash376

[38] B Yuan Y Wang and Z Wang ldquoArea-efficient scaling-free DFTFFTdesign using stochastic computingrdquo IEEE Transactions on Circuits andSystems II Express Briefs vol 63 no 12 pp 1131ndash1135 2016

[39] R Wang J Han B F Cockburn and D G Elliott ldquoStochastic circuitdesign and performance evaluation of vector quantization for differenterror measuresrdquo IEEE Transactions on Very Large Scale Integration(VLSI) Systems vol 24 no 10 pp 3169ndash3183 2016

[40] H Ichihara T Sugino S Ishii T Iwagaki and T Inoue ldquoCompactand accurate digital filters based on stochastic computingrdquo IEEE Trans-actions on Emerging Topics in Computing vol 7 no 1 pp 31ndash432016

[41] T Hammadou M Nilson A Bermak and P Ogunbona ldquoA 96spltimes64 intelligent digital pixel array with extended binary stochasticarithmeticrdquo in Proceedings of the International Symposium on Circuitsand Systems (ISCAS) 2003

[42] A Alaghi C Li and J P Hayes ldquoStochastic circuits for real-timeimage-processing applicationsrdquo in Proceedings of the 50th AnnualDesign Automation Conference (DAC) 2013 pp 1361ndash1366

[43] P Li D J Lilja W Qian K Bazargan and M D Riedel ldquoComputationon stochastic bit streams digital image processing case studiesrdquo IEEETransactions on Very Large Scale Integration (VLSI) Systems vol 22no 3 pp 449ndash462 2014

[44] M H Najafi S Jamali-Zavareh D J Lilja M D Riedel K Bazarganand R Harjani ldquoTime-encoded values for highly efficient stochasticcircuitsrdquo IEEE Transactions on Very Large Scale Integration (VLSI)Systems vol 25 no 5 pp 1644ndash1657 2017

[45] S R Faraji M Hassan Najafi B Li D J Lilja and K BazarganldquoEnergy-efficient convolutional neural networks with deterministic bit-stream processingrdquo in 2019 Design Automation Test in Europe Confer-ence Exhibition (DATE) 2019 pp 1757ndash1762

[46] S Liu and J Han ldquoDynamic stochastic computing for digital signalprocessing applicationsrdquo in Design Automation amp Test in EuropeConference amp Exhibition (DATE) 2020

[47] I L Dalal D Stefan and J Harwayne-Gidansky ldquoLow discrepancysequences for Monte Carlo simulations on reconfigurable platformsrdquo inInternational Conference on Application-Specific Systems Architecturesand Processors (ASAP) 2008 pp 108ndash113

[48] W G McCallum D Hughes-Hallett A M Gleason D O LomenD Lovelock J Tecosky-Feldman T W Tucker D Flath T ThrashK R Rhea et al Multivariable calculus Wiley 1997 vol 200 no 1

[49] S S Tehrani A Naderi G-A Kamendje S Hemati S Mannor andW J Gross ldquoMajority-based tracking forecast memories for stochasticLDPC decodingrdquo IEEE Transactions on Signal Processing vol 58no 9 pp 4883ndash4896 2010

[50] N Saraf K Bazargan D J Lilja and M D Riedel ldquoIIR filtersusing stochastic arithmeticrdquo in Design Automation amp Test in EuropeConference amp Exhibition (DATE) 2014 pp 1ndash6

[51] F Maloberti ldquoNon conventional signal processing by the use of sigmadelta technique a tutorial introductionrdquo in Proceedings of IEEE Inter-national Symposium on Circuits and Systems (ISCAS) vol 6 1992 pp2645ndash2648

[52] J C Butcher The Numerical Analysis of Ordinary Differential Equa-tions Runge-Kutta and General Linear Methods USA Wiley-Interscience 1987

[53] W Gerstner W M Kistler R Naud and L Paninski Neuronaldynamics From single neurons to networks and models of cognitionCambridge University Press 2014

[54] S Liu and J Han ldquoHardware ODE solvers using stochastic circuitsrdquo inProceedings of the 54th Annual Design Automation Conference (DAC)2017 pp 1ndash6

[55] S S Haykin Neural networks and learning machines Pearson UpperSaddle River NJ USA 2009 vol 3

[56] S Liu H Jiang L Liu and J Han ldquoGradient descent using stochasticcircuits for efficient training of learning machinesrdquo IEEE Transactionson Computer-Aided Design of Integrated Circuits and Systems vol 37no 11 pp 2530ndash2541 Nov 2018

[57] R Nishihara P Moritz S Wang A Tumanov W Paul J Schleier-Smith R Liaw M Niknami M I Jordan and I StoicaldquoReal-time machine learning The missing piecesrdquo in Proceedingsof the 16th Workshop on Hot Topics in Operating Systemsser HotOS rsquo17 ACM 2017 pp 106ndash110 [Online] Availablehttpdoiacmorg10114531029803102998

[58] J P Hayes ldquoIntroduction to stochastic computing and its challengesrdquo inProceedings of the 52nd Annual Design Automation Conference (DAC)2015 pp 591ndash593

Page 6: Introduction to Dynamic Stochastic Computing

MUX

01100110hellip

p1=05

10101010hellip

p2=05

00100010hellip

p=p1p2=025

01100101hellip

p1=05

11101110hellip

p2=075

sel

psel=05

00110110hellip

0

1 p=(1-psel)p1+pselp2=0625

01100111hellip (a)

(c)

10010000hellip

00101000hellip01000111hellip times2-1= -05

416

p1=

times2-1= -05 416

p2=

times2-1= 025 1016

p1p2=

(b) A bipolar stochastic multiplier

00101000hellip 11010111hellip

p1=025 p=1-p1=075

(b)

p1

p2

p=p1p2

1

1

p1

p2

(1 1)

AltB

AltB

RN1

RN2

Counter

(b) A bipolar stochastic multiplier

Stochastic number generator (SNG)

array

hellip

Binary inputs(from memory

devices)

hellip

Stochastic sequences

Stochastic computing

circuits

Stochastic sequences

Probability estimators

(PEs)

hellip hellip

Binary outputs

+

-

DSNGDigital signal

Bipolar stochastic

sequence encodinglsquo0rsquo

Integration results

Output DSS

(a)

0 05 1 15 2 25 3 35 4t

-1

-05

0

05

1

Analytical resultsResults by DSCOriginal signalOutput DSS

(b)

Fig 15 A bipolar stochastic integrator for numerical integra-tion (a) circuit and (b) results The output DSS is decimatedfor a clear view

The stochastic integrator has been used in a stochastic low-density parity-check (LDPC) decoder to implement a trackingforecast memory in [49] It solves the latch-up problem in thedecoding process with a low hardware complexity

For the FSM-based sequential circuits however the DSScannot work well because the probability consistently changesand a steady hyper-state is hard to reach The DSS could alsoviolate the condition of the model being a Markov chain dueto the autocorrelation of the encoded signal

IV APPLICATION OF DSC

By using the stochastic integrators DSC can implement aseries of algorithms that involve iterative accumulations suchas an IIR filter the Euler method and a GD algorithm

A IIR filtering

In [50] it is shown that an ADDIE with different parameterscan be used as a low-pass IIR filter with different frequencyresponses as shown in Fig 16 Given the bit width of thestochastic integrator n the transfer function of such a filter inthe Z-domain is given by

H(Z) =1

2nZ + 1minus 2n (3)

for a stable first-order low-pass IIR filter [50] An oversampled∆ minus Σ modulated bit stream is then used as the input of thecircuit Since the generation of a ∆minusΣ modulated bit stream

2n-state Counter INC

DEC

RNG

gtSeqout

A

BSeqout

(a) (b)

comparator

A

B

+

-

Countout

Countout

Seqout

(b)

+

-

Countout

X

+

-

p

y(t)

Fig 16 ADDIE circuit produces an exponential function whenthe input is a sequence encoding a static number

0 01 02 03 04t (s)

(a) Original and filtered signals

-1

0

1

0 50 100 150 200 250f (Hz)

(b) Spectra of original and filtered signals

0

02

04Original signalFiltered signal

Fig 17 The low-pass filtering results in (a) the time domainand (b) the frequency domain

can also be considered as a Bernoulli process [51] the DSScan be used for IIR filtering as well

Fig 17 shows the filtering of mixed sinusoidal signals of 4Hz and 196 Hz The mixture is first sampled at a sampling rateof 656k Hz and the sampled digital signal is then convertedto a DSS The DSS is filtered by an 8-bit ADDIE As can beseen the high-frequency signals are almost filtered out whilethe low-frequency signals remain

B ODE solvers

The Euler method provides numerical solutions for ODEsby accumulating the derivative functions [52] For an ODEdydt = f(t y(t)) the estimated solution y(t) is given by

y(ti+1) = yi+1 = yn + hf(ti yi) (4)

with h as the step size and ti+1 = ti + h After k steps ofaccumulation

yk = y0 + h

kminus1sumi=0

f(ti yi) (5)

To use a stochastic integrator let the pair of input DSSrsquos(eg A and B in (2)) encode the discrete derivative functionf(ti yi) such that E[Ai minus Bi] = f(ti yi) By doing sothe stochastic integrator implements (2) and approximatelycomputes (5) with a step size h = 12n As a result the bitwidth of the counter does not only decide the precision of thecomputation but also the time resolution The integration in theEuler method is implemented by accumulating the stochasticbits in the DSSrsquos instead of real values

Using this method an exponential function can be generatedby a stochastic integrator The ADDIE circuit as shown inFig 16 solves

dy(t)

dt= pminus y(t) (6)

which is the differential equation for a leaky integratormodel [53] Thus the ADDIE can be used to implement

neuronal dynamics in a leaky integrate-and-fire model Theanalytical solution of (6) is given by

y(t) = pminus (pminus y0)eminust (7)

where y0 is provided in the initial condition ie y(0) = y0This model is commonly seen in natural processes such asneuronal dynamics [53] Fig 18 shows the results producedby the ADDIE when y0 = 1 and p = 0

0 2 4 6 8t

0

02

04

06

08

1

y(t)

DSCAnalytical

Fig 18 ADDIE used as an ODE solver The solution is anexponential function

For a set of ODEsdy1(t)

dt = y2(t)minus 05dy2(t)

dt = 05minus y1(t)(8)

with the initial values y1(0) = 0 and y2(0) = 05 a numericalsolution can be found by the circuit in Fig 19 The upperintegrator computes the numerical estimate of y1(t) by takingthe output DSS from the lower integrator and a conventionalstochastic sequence encoding 05 as inputs So the differenceof the two input sequences approximately encodes y2(t)minus05the derivative of y1(t) Simultaneously the output DSS of theupper stochastic integrator encodes the numerical solution ofy1(t) which is used to solve the second differential equationin (8) Thus the stochastic integrator can be viewed as anunbiased Euler solution estimator [54] The results producedby the dynamic stochastic circuit is shown in Fig 20 8-bitupdown counters are used in both stochastic integrators andthe RNG is implemented by a Sobol sequence generator [47]Since the results are provided by the counter in the stochasticintegrator an explicit reconstruction unit as shown in Fig 12is not required

A partial differential equation such as a Laplacersquos equationcan also be solved by an array of stochastic integrators witha finite-difference method

C Weight update for an adaptive filterDSC can be applied to update the weights in an adaptive

digital filter An adaptive filter system consists of a linear filterand a weight update component as shown in Fig 21 Theoutput of the linear filter at time i is given by

yi = F (wixi) = wixi =sumMminus1

j=0wj

iximinusM+j+1 (9)

+

-

y(t)

1

z(t)

times2

+

-

Y

+

-

+

-

y1(t)

y2(t)

p=05Y1

Y2

(a) (b)

p=05(times2)

bi

ai

+

-

y(t)

1

z(t)

times2

+

-

+

-

+

-

y1(t)

y2(t)

p=05

p=05(times2)

bi

ai

+

-

1

0+

-0+

-0

16

y(t)= t3

y(t)=t

y(t)= t212

Fig 19 A pair of stochastic integrators solving (8) [54]

0 2 4 6 8 10 120

05

1

y 1(t

)

DSCAnalytical solution

0 2 4 6 8 10 12t

0

05

1

y 2(t

)Fig 20 Simulation results of the ODE solver in Fig 19 Theyare compared with the analytical solutions

where M is the length of the filter or the number of weightswi is a vector of M weights wi = [w

(0)i w

(1)i w

(Mminus1)i ]

and xi is the input vector for the digital signal sampled atdifferent time points xi = [ximinusM+1 ximinusM+2 xi]

T

+

-

y(xi)

F(wi xi)

F(wi xi) wi

ti

SNGpart F(wi xi)

part wijSNG

SNG

ai

bi

EEEE

Linear filter with weights wi

LMS weight update unit

xi-M+1xi-M+2

xi

hellip

yi +- ti+

ei

xi

+ -

xi-M+j

wij

xixi-M+1

+ -

yi

wi1

ti

RNG

+ -

wiM

RNG

hellip hellip

hellip hellip

hellip hellip

Comparator

+ - + - + -

wijwi1 wiM

xi-M+j+1 xixi-M+1

yi

ti

RNG

RNGhellip hellip

hellip hellip

hellip hellip

Comparator

Linear filter with parameters wi

Δwi

LMS weight updating unit

hellip

yi +- ti+

ei

hellip

Δwi

(Ci)

wij[Pi] = [12nCi] =

Fig 21 An adaptive filter

In the least-mean-square (LMS) algorithm the cost functionis the mean squared error between the filter output and theguidance signal ti The weights are updated by

wi+1 = wi + ηxi(ti minus yi) (10)

where η is the step size The update is implemented by thecircuit in Fig 22 The two multipliers are used to computexiti and xiyi respectively The integrator is then used toaccumulate ηxi(ti minus yi) over i = 0 1

The dynamic stochastic circuit is used to perform systemidentification for a high pass 103-tap FIR filter 103 weightsare trained by using randomly generated samples with theLMS algorithm 14-bit stochastic integrators are used Afteraround 340k iterations of training the misalignment of the

Actual classification results

Trained parameters

-

+

1 6 8 0

Estimate classification results (distribution)

SNGarray

Images

10 classes

00000hellip

11010hellip

00010hellip

Trained parameters

-

+

1 6 8 0DSNGarray

10 classes

times

times

ti

yi

xi

Fig 22 A DSC circuit training a weight in an adaptive filterThe stochastic multipliers are denoted by rectangles with ldquotimesrdquoBipolar and sign-magnitude representations can be used for thestochastic multipliers and integrator

weights in the adaptive filter converges to about -45 dBcompared to the target system The frequency responses ofthe target and the trained filter are shown in Fig 23 As canbe seen the adaptive filter has a similar frequency responseto the target system especially in the passband indicating anaccurate identification result

0 05 1 15 2 25 3Normalized frequency

-80

-60

-40

-20

0

Mag

nitu

de in

dB

target systemadaptive filter

Fig 23 Frequency responses of the target and trained filterusing the sign-magnitude representation

D A gradient descent (GD) circuit

GD is a simple yet effective optimization algorithm It findsthe local minima by iteratively taking steps along the negativesof the gradients at current points One-step GD is given by

wi+1 = wi minus ηnablaF (wi) (11)

where F (wi) is the function to be minimized at the ith stepnablaF (wi) denotes the current gradient and η is the step sizeThe step size is an indicator of how fast the model learns Asmall step size leads to a slow convergence while a large onecan lead to divergence

Similar to the implementation of the Euler method let apair of differential DSSrsquos encode the gradient minusnablaF (wi)the stochastic integrator can be used to compute (11) with astep size η = 12n by using these DSSrsquos as inputs DSC canthen be used to perform the optimization of the weights in aneural network (NN) by minimizing the cost function

The training of an NN usually involves two phases Take afully connected NN as an example shown in Fig 24 In theforward propagation the sum of product of the input vector xand the weight vector w is obtained as v = wx The output

Inputs Hidden layers

Outputs

(a)

hellip

x

w o=f(wx)

hellip

(b)

(c)

wl+1

δl+1o wl

hellip

δwl = f`wl+1δl+1

hellip

Random number generator (RNG)

p

Stochastic sequence with probability p

0101100hellip

Comparator

B

AAltB

N

N

(a)

CLK

01100110hellip

p1=05

10101010hellip

p2=05

00100010hellip

p=p1p2=025

(a) A stochastic multiplier for the unipolar representation

0100010010010000hellip

1000100010001000hellip0011001111100111hellip

times2-1= -05 4

16p1=

times2-1= -05 416

p2=

times2-1= 025 1016

p1p2=

(b) A stochastic multiplier for the bipolar representation

01100110hellip

|p1|=05

10101010hellip

|p2|=05

00100010hellip

|p|=|p1||p2|=025

(c) A stochastic multiplier for the sign-magnitude representation

|p1|=05

Sign(p1)=1(negative)

Sign(p2)=0(positve)

Sign(p)=1(negative)

Fig 24 (a) A fully connected NN in which the neuronsare organized layer by layer (b) The computation duringthe forward propagation (c) The local gradient for wl iscomputed by using the backward propagation The output ofrom previous neuron is required to compute the gradient

of the neuron o is then computed by o = f(v) where f isthe activation function During this phase the weights do notchange In the backward propagation the difference betweenthe actual and the desired final outputs ie the classificationresults is calculated and the local gradients are computed foreach neuron based on the difference [55] Take the neuronin Fig 24(c) as an example the local gradient is obtainedby δl = f prime(v)wl+1δl+1 where f prime(middot) is the derivative of theactivation function wl+1 is the weight vector connecting layerl and l+1 and δl+1 is the local gradient vector of the neuronsin layer l + 1 Then the gradient of a weight at layer l iscomputed by multiplying the local gradient with the recordedoutput of the designated neuron ie nablaF (wl) = minusoδl

In [56] it is found that the local gradient can be decomposedinto two parts δ = δ+ minus δminus where δ+ is contributed bythe desired output of the NN and δminus by the actual outputof the NN So the gradient for weight w can be rewritten asnablaF (w) = minuso(δ+minusδminus) Similar to the implementation of (10)the DSC circuit in Fig 22 can be used to implement the GDalgorithm to perform the online training (one sample at a time)of the fully connected NN with the input signals δ+ o and δminusHowever a backpropagation ie δ = f prime(v)wl+1δl+1 is stillcomputed by using a conventional method (eg a fixed-pointarithmetic circuit) to obtain the local gradients Otherwisethe accuracy loss would be too much for the algorithm toconverge

Fig 25 shows the convergence of cross entropy (as a costfunction) and testing accuracy for the training of a 784-128-128-10 NN with the MNIST handwritten digit dataset As canbe seen the testing accuracy of the NN trained by the DSCcircuit using the sign-magnitude representation is similar to theone produced by the fixed-point implementation (around 97)However the accuracy of the DSC implementation using thebipolar representation is relatively low

The proposed circuitry can be useful for online learningwhere real-time interaction with the environment is requiredwith an energy constraint [57] It can also be used to train amachine learning model using private or security-critical dataon mobile devices if data are sensitive and cannot be uploaded

to a cloud computer

V ERROR REDUCTION AND ASSESSMENT

One major issue in SC is the loss of accuracy [58] Indepen-dent RNGs have been used to avoid correlation and improvethe accuracy of components such as stochastic multipliersThis method inevitably increases the hardware overheadHowever in a stochastic integrator the sharing of RNGs forgenerating the two input sequences reduces the variance thusreducing the error This is shown as follows

The error introduced by DSC at each step of computation isrelated to the variance of the difference of the two accumulatedstochastic bits AiminusBi When independent RNGs are used togenerate Ai and Bi the variance at a single step is given by

Var[Ai minusBi] = E[(Ai minusBi minus E(Ai minusBi))2]

= E[(Ai minusBi)2]minus 2E[Ai minusBi]

(PAi minus PBi) + (PAi minus PBi)2

(12)

where PAi and PBi are the probabilities of Ai and Bi beinglsquo1rsquo respectively ie E[Ai] = PAi and E[Bi] = PBi Since(AiminusBi)

2 is 0 when Ai and Bi are equal or 1 when they aredifferent E[(AiminusBi)

2] equals PAi(1minusPBi)+PBi(1minusPAi)which is the probability that Ai and Bi are different when Ai

and Bi are independently generated1 (12) is then rewritten as

Varind[Ai minusBi] = PAi(1minus PBi) + PBi(1minus PAi)

minus (PAi minus PBi)2

= PAi(1minus PAi) + PBi(1minus PBi)

(13)

On the other hand if the same RNG is used to generate Ai

and Bi the variance of Ai minusBi can be derived from (12) aswell However since only a shared random number is used togenerate Ai and Bi E[(AiminusBi)

2] equals |PAiminusPBi| which

1E[(AiminusBi)2] = Prob[(AiminusBi)

2 = 1]times1+Prob[(AiminusBi)2 = 0]times0

0 5 10 15 20Epoch

0

01

02

03

04

05

Cro

ss e

ntro

py

Signed DSCBipolar DSCFixed-point

(a)

0 5 10 15 20Epoch

88

90

92

94

96

98

Test

ing

accu

racy

(in

perc

enta

ge)

Signed DSCBipolar DSCFixed-point

(b)

Fig 25 (a) Convergence of cross entropy and (b) testingaccuracy of MNIST hand-written digit recognition datasetCross entropy is the cost function to be minimized during thetraining of an NN The weights of the NN are updated by theDSC circuits The DSC circuits using the sign-magnitude andbipolar representations and the fixed-point implementation areconsidered for comparison

gives the probability that Ai and Bi are different Thus (12)can be rewritten as

Varshare[Ai minusBi] = |PAi minus PBi| minus (PAi minus PBi)2

= |PAi minus PBi|(1minus |PAi minus PBi|)(14)

Since Varshare[Ai minus Bi] minus Varind[Ai minus Bi] = 2PBiPAi minus2min(PAi PBi) le 0 we obtain Varshare[Ai minus Bi] leVarind[AiminusBi] for any i = 0 1 2 [54] Therefore sharingthe use of RNGs to generate input stochastic bit streamsimproves the accuracy

For the circuit in Fig 22 though the inputs of the stochasticintegrator are the outputs of the stochastic multipliers insteadof being directly generated by a DSNG the RNG sharingscheme still works in the generated DSSrsquos encoding δ+ andδminus ie it produces a smaller variance in the computed result[56] However an uncorrelated RNG must be used to generatethe DSS encoding the recorded output of the neuron obecause independency is still required in general for accuratestochastic multiplication

The accuracy of the accumulated result is also related to thebit-width of the counter in the stochastic integrator The bitwidth does not only determine the resolution of the result butalso the step size especially in the ODE solver adaptive filterand the gradient descent circuit A larger bit width indicatesa smaller resolution and a smaller step size for fine tuning atthe cost of extra computation time It is shown in [56] that themulti-step variance bound exponentially decreases with the bitwidth of the counter However for the IIR filter the changeof bit-width affects the characteristic of the filter

For the ODE solver circuit it is also found that the use ofquasirandom numbers improves the accuracy compared to theuse of pseudorandom numbers [54] However when the DSS isused to encode signals from the input or images such as in theadaptive filter and the gradient descent algorithm quasirandomand pseudorandom numbers produce similar accuracy Thereason may lie in the fact that the values of the encodedsignals in these two applications are independently selectedfrom the training dataset However for the ODE solver thesignals encoded are samples from continuous functions andadjacent samples have similar values As a result withina short segment of the DSS in this case it approximatelyencodes the same value It then works similarly as CSCfor which the Sobol sequence helps improving the accuracyFig 26 compares the convergence of misalignment duringthe training of the adaptive filter using pseudorandom andquasirandom numbers 16-bit stochastic integrators are usedin this experiment The misalignment is given by

Misalignment =E[(w minus w)2]

E[w2] (15)

where w denotes the target value and w is the trained resultIt can be seen that the convergence speed of using these twotypes of random numbers is similar and their misalignmentsboth converge to around -52 dB

TABLE II Hardware evaluation of the gradient descent circuitarray training a 784-128-128-10 neural network

Metrics DSC Fixed-point Ratio

Step size 2minus10 2minus10 -Epochs 20 20 -Min time (ns) 16times 106 47times 106 13EPO (fJ) 12times 107 11times 108 192TPA (imagesmicrom2) 57times 101 15 381Aver test Accu 9704 9749 -

TABLE III Hardware performance comparison of the ODEsolvers for (8)

Metric DSC Fixed-point Ratio

Step size 2minus8 2minus8 -Iterations 3217 3217 -Min time (ns) 257359 855720 133EPO (fJ) 20121 46600 123TPA (wordmicrosmicrom2) 475 058 821RMSE 47times 10minus3 36times 10minus3 -

VI HARDWARE EFFICIENCY EVALUATION

Due to the extremely low sequence length used for encodinga value in DSC the power consumption of DSC circuits ismuch lower than conventional arithmetic circuits Moreoversince each bit in a DSS is used to encode a number DSCis more energy-efficient than CSC The gradient descent andODE solver circuits are considered as examples to show thehardware efficiency Implemented in VHDL the circuits aresynthesized by Synopsys Design Compiler with a 28-nm tech-nology to obtain the power consumption maximum frequencyand hardware measurements The minimum runtime energyper operation (EPO) throughput per area (TPA) and accuracyare reported in Table II for the gradient descent circuits andin Table III for ODE solvers

As can be seen with a slightly degraded accuracy theDSC circuits outperform conventional circuits using a fixed-point representation in speed energy efficiency and hardwareefficiency For the gradient descent circuit since the two RNGscan be shared among the 118282 stochastic integrators (for118282 weights) the improvement in hardware efficiency iseven more significant than the ODE solver for which one RNG

0 05 1 15 2 25 3Number of steps ( 64 input samples) 104

-60

-50

-40

-30

-20

-10

0

Mis

alig

nmen

t (dB

)

PseudorandomQuasirandom

Fig 26 Comparison of the convergence of misalignment usingpseudorandom and quasirandom numbers

is shared between two stochastic integrators However sincethe sharing of RNGs does not significantly affect the criticalpath the performance improvements are similar for these twocircuits

VII CONCLUSION

In this article a new type of stochastic computing dynamicstochastic computing (DSC) is introduced to encode a varyingsignal by using a dynamically variable binary bit streamreferred to as a dynamic stochastic sequence (DSS) A DSCcircuit can efficiently implement an iterative accumulation-based algorithm It is useful in implementing the Euler methodtraining a neural network and an adaptive filter and IIRfiltering In those applications integrations are implementedby stochastic integrators that accumulate the stochastic bits in aDSS Since the computation is performed on single stochasticbits instead of conventional binary values or long stochasticsequences a significant improvement in hardware efficiencyis achieved with a high accuracy

ACKNOWLEDGMENT

This work was supported by the Natural Sciences andEngineering Research Council of Canada (NSERC) underProjects RES0025211 and RES0048688

REFERENCES

[1] B R Gaines Stochastic Computing Systems Boston MA SpringerUS 1969 pp 37ndash172

[2] W Qian X Li M D Riedel K Bazargan and D J Lilja ldquoAnarchitecture for fault-tolerant computation with stochastic logicrdquo IEEETransactions on Computers vol 60 no 1 pp 93ndash105 2011

[3] P Li and D J Lilja ldquoUsing stochastic computing to implementdigital image processing algorithmsrdquo in 2011 IEEE 29th InternationalConference on Computer Design (ICCD) 2011 pp 154ndash161

[4] S C Smithson N Onizawa B H Meyer W J Gross and T HanyuldquoEfficient CMOS invertible logic using stochastic computingrdquo IEEETransactions on Circuits and Systems I Regular Papers vol 66 no 6pp 2263ndash2274 June 2019

[5] W Maass and C M Bishop Pulsed neural networks MIT press 2001[6] Y Liu S Liu Y Wang F Lombardi and J Han ldquoA stochastic

computational multi-layer perceptron with backward propagationrdquo IEEETransactions on Computers vol 67 no 9 pp 1273ndash1286 2018

[7] J Zhao J Shawe-Taylor and M van Daalen ldquoLearning in stochastic bitstream neural networksrdquo Neural Networks vol 9 no 6 pp 991ndash9981996

[8] B D Brown and H C Card ldquoStochastic neural computation Icomputational elementsrdquo IEEE Transactions on Computers vol 50no 9 pp 891ndash905 2001

[9] D Zhang and H Li ldquoA stochastic-based FPGA controller for aninduction motor drive with integrated neural network algorithmsrdquo IEEETransactions on Industrial Electronics vol 55 no 2 pp 551ndash561 2008

[10] Y Ji F Ran C Ma and D J Lilja ldquoA hardware implementationof a radial basis function neural network using stochastic logicrdquo inProceedings of the Design Automation amp Test in Europe Conference(DATE) 2015 pp 880ndash883

[11] V Canals A Morro A Oliver M L Alomar and J L RosselloldquoA new stochastic computing methodology for efficient neural networkimplementationrdquo IEEE Transactions on Neural Networks and LearningSystems vol 27 no 3 pp 551ndash564 2016

[12] A Ardakani F Leduc-Primeau N Onizawa T Hanyu and W J GrossldquoVLSI implementation of deep neural network using integral stochasticcomputingrdquo IEEE Transactions on Very Large Scale Integration (VLSI)Systems vol 25 no 10 pp 2688ndash2699 2017

[13] A Ren Z Li C Ding Q Qiu Y Wang J Li X Qian and B YuanldquoSC-DCNN Highly-scalable deep convolutional neural network us-ing stochastic computingrdquo in Proceedings of the 22nd InternationalConference on Architectural Support for Programming Languages andOperating Systems (ASPLOS) 2017 pp 405ndash418

[14] V T Lee A Alaghi J P Hayes V Sathe and L Ceze ldquoEnergy-efficienthybrid stochastic-binary neural networks for near-sensor computingrdquo inDesign Automation amp Test in Europe Conference amp Exhibition (DATE)2017 pp 13ndash18

[15] R Hojabr K Givaki S Tayaranian P Esfahanian A Khonsari D Rah-mati and M H Najafi ldquoSkippyNN An embedded stochastic-computingaccelerator for convolutional neural networksrdquo in Proceedings of the56th Annual Design Automation Conference (DAC) 2019 pp 1ndash6

[16] Y Liu L Liu F Lombardi and J Han ldquoAn energy-efficient and noise-tolerant recurrent neural network using stochastic computingrdquo IEEETransactions on Very Large Scale Integration (VLSI) Systems vol 27no 9 pp 2213ndash2221 Sep 2019

[17] R Cai A Ren O Chen N Liu C Ding X Qian J Han W LuoN Yoshikawa and Y Wang ldquoA stochastic-computing based deeplearning framework using adiabatic quantum-flux-parametron supercon-ducting technologyrdquo in Proceedings of the 46th International Symposiumon Computer Architecture ser ISCA rsquo19 New York NY USA ACM2019 pp 567ndash578

[18] A Alaghi and J P Hayes ldquoSurvey of stochastic computingrdquo ACMTrans Embed Comput Syst vol 12 no 2s pp 921ndash9219 May 2013

[19] A Zhakatayev S Lee H Sim and J Lee ldquoSign-magnitude SC getting10x accuracy for free in stochastic computing for deep neural networksrdquoin Proceedings of the 55th Annual Design Automation Conference(DAC) 2018 pp 1ndash6

[20] P K Gupta and R Kumaresan ldquoBinary multiplication with PN se-quencesrdquo IEEE Transactions on Acoustics Speech and Signal Process-ing vol 36 no 4 pp 603ndash606 1988

[21] A Alaghi and J P Hayes ldquoFast and accurate computation using stochas-tic circuitsrdquo in Proceedings of the conference on Design Automation ampTest in Europe (DATE) 2014 pp 1ndash4

[22] S Liu and J Han ldquoEnergy efficient stochastic computing with Sobolsequencesrdquo in Design Automation amp Test in Europe Conference ampExhibition (DATE) 2017 pp 650ndash653

[23] H Niederreiter Random Number Generation and quasi-Monte CarloMethods Philadelphia PA USA Society for Industrial and AppliedMathematics 1992

[24] P Bratley and B L Fox ldquoAlgorithm 659 Implementing Sobolrsquosquasirandom sequence generatorrdquo ACM Transactions on MathematicalSoftware (TOMS) vol 14 no 1 pp 88ndash100 1988

[25] S Liu and J Han ldquoToward energy-efficient stochastic circuits usingparallel sobol sequencesrdquo IEEE Transactions on Very Large ScaleIntegration (VLSI) Systems vol 26 no 7 pp 1326ndash1339 July 2018

[26] A Alaghi and J P Hayes ldquoSTRAUSS Spectral transform use instochastic circuit synthesisrdquo IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems vol 34 no 11 pp 1770ndash1783 2015

[27] P Li D J Lilja W Qian K Bazargan and M Riedel ldquoThe synthesisof complex arithmetic computation on stochastic bit streams usingsequential logicrdquo in Proceedings of the International Conference onComputer-Aided Design (ICCAD) 2012 pp 480ndash487

[28] P Mars and W J Poppelbaum Stochastic and deterministic averagingprocessors Peter Peregrinus Press 1981 no 1

[29] Z Li A Ren J Li Q Qiu Y Wang and B Yuan ldquoDSCNN Hardware-oriented optimization for stochastic computing based deep convolutionalneural networksrdquo in IEEE 34th International Conference on ComputerDesign (ICCD) 2016 pp 678ndash681

[30] A Ren Z Li Y Wang Q Qiu and B Yuan ldquoDesigning reconfigurablelarge-scale deep learning systems using stochastic computingrdquo in IEEEInternational Conference on Rebooting Computing (ICRC) 2016 pp1ndash7

[31] K Kim J Kim J Yu J Seo J Lee and K Choi ldquoDynamic energy-accuracy trade-off using stochastic computing in deep neural networksrdquoin Proceedings of the 53rd Annual Design Automation Conference(DAC) 2016 pp 1241ndash1246

[32] H Sim and J Lee ldquoA new stochastic computing multiplier withapplication to deep convolutional neural networksrdquo in Proceedings ofthe 54th Annual Design Automation Conference (DAC) 2017 pp 1ndash6

[33] Y Liu Y Wang F Lombardi and J Han ldquoAn energy-efficient online-learning stochastic computational deep belief networkrdquo IEEE Journal

on Emerging and Selected Topics in Circuits and Systems vol 8 no 3pp 454ndash465 Sep 2018

[34] J F Keane and L E Atlas ldquoImpulses and stochastic arithmetic forsignal processingrdquo in IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP) vol 2 2001 pp 1257ndash1260

[35] R Kuehnel ldquoBinomial logic extending stochastic computing to high-bandwidth signalsrdquo in Conference Record of the Thirty-Sixth AsilomarConference on Signals Systems and Computers vol 2 2002 pp 1089ndash1093

[36] N Yamamoto H Fujisaka K Haeiwa and T Kamio ldquoNanoelectroniccircuits for stochastic computingrdquo in 6th IEEE Conference on Nanotech-nology (IEEE-NANO) vol 1 2006 pp 306ndash309

[37] K K Parhi and Y Liu ldquoArchitectures for IIR digital filters usingstochastic computingrdquo in IEEE International Symposium on Circuitsand Systems (ISCAS) 2014 pp 373ndash376

[38] B Yuan Y Wang and Z Wang ldquoArea-efficient scaling-free DFTFFTdesign using stochastic computingrdquo IEEE Transactions on Circuits andSystems II Express Briefs vol 63 no 12 pp 1131ndash1135 2016

[39] R Wang J Han B F Cockburn and D G Elliott ldquoStochastic circuitdesign and performance evaluation of vector quantization for differenterror measuresrdquo IEEE Transactions on Very Large Scale Integration(VLSI) Systems vol 24 no 10 pp 3169ndash3183 2016

[40] H Ichihara T Sugino S Ishii T Iwagaki and T Inoue ldquoCompactand accurate digital filters based on stochastic computingrdquo IEEE Trans-actions on Emerging Topics in Computing vol 7 no 1 pp 31ndash432016

[41] T Hammadou M Nilson A Bermak and P Ogunbona ldquoA 96spltimes64 intelligent digital pixel array with extended binary stochasticarithmeticrdquo in Proceedings of the International Symposium on Circuitsand Systems (ISCAS) 2003

[42] A Alaghi C Li and J P Hayes ldquoStochastic circuits for real-timeimage-processing applicationsrdquo in Proceedings of the 50th AnnualDesign Automation Conference (DAC) 2013 pp 1361ndash1366

[43] P Li D J Lilja W Qian K Bazargan and M D Riedel ldquoComputationon stochastic bit streams digital image processing case studiesrdquo IEEETransactions on Very Large Scale Integration (VLSI) Systems vol 22no 3 pp 449ndash462 2014

[44] M H Najafi S Jamali-Zavareh D J Lilja M D Riedel K Bazarganand R Harjani ldquoTime-encoded values for highly efficient stochasticcircuitsrdquo IEEE Transactions on Very Large Scale Integration (VLSI)Systems vol 25 no 5 pp 1644ndash1657 2017

[45] S R Faraji M Hassan Najafi B Li D J Lilja and K BazarganldquoEnergy-efficient convolutional neural networks with deterministic bit-stream processingrdquo in 2019 Design Automation Test in Europe Confer-ence Exhibition (DATE) 2019 pp 1757ndash1762

[46] S Liu and J Han ldquoDynamic stochastic computing for digital signalprocessing applicationsrdquo in Design Automation amp Test in EuropeConference amp Exhibition (DATE) 2020

[47] I L Dalal D Stefan and J Harwayne-Gidansky ldquoLow discrepancysequences for Monte Carlo simulations on reconfigurable platformsrdquo inInternational Conference on Application-Specific Systems Architecturesand Processors (ASAP) 2008 pp 108ndash113

[48] W G McCallum D Hughes-Hallett A M Gleason D O LomenD Lovelock J Tecosky-Feldman T W Tucker D Flath T ThrashK R Rhea et al Multivariable calculus Wiley 1997 vol 200 no 1

[49] S S Tehrani A Naderi G-A Kamendje S Hemati S Mannor andW J Gross ldquoMajority-based tracking forecast memories for stochasticLDPC decodingrdquo IEEE Transactions on Signal Processing vol 58no 9 pp 4883ndash4896 2010

[50] N Saraf K Bazargan D J Lilja and M D Riedel ldquoIIR filtersusing stochastic arithmeticrdquo in Design Automation amp Test in EuropeConference amp Exhibition (DATE) 2014 pp 1ndash6

[51] F Maloberti ldquoNon conventional signal processing by the use of sigmadelta technique a tutorial introductionrdquo in Proceedings of IEEE Inter-national Symposium on Circuits and Systems (ISCAS) vol 6 1992 pp2645ndash2648

[52] J C Butcher The Numerical Analysis of Ordinary Differential Equa-tions Runge-Kutta and General Linear Methods USA Wiley-Interscience 1987

[53] W Gerstner W M Kistler R Naud and L Paninski Neuronaldynamics From single neurons to networks and models of cognitionCambridge University Press 2014

[54] S Liu and J Han ldquoHardware ODE solvers using stochastic circuitsrdquo inProceedings of the 54th Annual Design Automation Conference (DAC)2017 pp 1ndash6

[55] S S Haykin Neural networks and learning machines Pearson UpperSaddle River NJ USA 2009 vol 3

[56] S Liu H Jiang L Liu and J Han ldquoGradient descent using stochasticcircuits for efficient training of learning machinesrdquo IEEE Transactionson Computer-Aided Design of Integrated Circuits and Systems vol 37no 11 pp 2530ndash2541 Nov 2018

[57] R Nishihara P Moritz S Wang A Tumanov W Paul J Schleier-Smith R Liaw M Niknami M I Jordan and I StoicaldquoReal-time machine learning The missing piecesrdquo in Proceedingsof the 16th Workshop on Hot Topics in Operating Systemsser HotOS rsquo17 ACM 2017 pp 106ndash110 [Online] Availablehttpdoiacmorg10114531029803102998

[58] J P Hayes ldquoIntroduction to stochastic computing and its challengesrdquo inProceedings of the 52nd Annual Design Automation Conference (DAC)2015 pp 591ndash593

Page 7: Introduction to Dynamic Stochastic Computing

neuronal dynamics in a leaky integrate-and-fire model Theanalytical solution of (6) is given by

y(t) = pminus (pminus y0)eminust (7)

where y0 is provided in the initial condition ie y(0) = y0This model is commonly seen in natural processes such asneuronal dynamics [53] Fig 18 shows the results producedby the ADDIE when y0 = 1 and p = 0

0 2 4 6 8t

0

02

04

06

08

1

y(t)

DSCAnalytical

Fig 18 ADDIE used as an ODE solver The solution is anexponential function

For a set of ODEsdy1(t)

dt = y2(t)minus 05dy2(t)

dt = 05minus y1(t)(8)

with the initial values y1(0) = 0 and y2(0) = 05 a numericalsolution can be found by the circuit in Fig 19 The upperintegrator computes the numerical estimate of y1(t) by takingthe output DSS from the lower integrator and a conventionalstochastic sequence encoding 05 as inputs So the differenceof the two input sequences approximately encodes y2(t)minus05the derivative of y1(t) Simultaneously the output DSS of theupper stochastic integrator encodes the numerical solution ofy1(t) which is used to solve the second differential equationin (8) Thus the stochastic integrator can be viewed as anunbiased Euler solution estimator [54] The results producedby the dynamic stochastic circuit is shown in Fig 20 8-bitupdown counters are used in both stochastic integrators andthe RNG is implemented by a Sobol sequence generator [47]Since the results are provided by the counter in the stochasticintegrator an explicit reconstruction unit as shown in Fig 12is not required

A partial differential equation such as a Laplacersquos equationcan also be solved by an array of stochastic integrators witha finite-difference method

C Weight update for an adaptive filterDSC can be applied to update the weights in an adaptive

digital filter An adaptive filter system consists of a linear filterand a weight update component as shown in Fig 21 Theoutput of the linear filter at time i is given by

yi = F (wixi) = wixi =sumMminus1

j=0wj

iximinusM+j+1 (9)

+

-

y(t)

1

z(t)

times2

+

-

Y

+

-

+

-

y1(t)

y2(t)

p=05Y1

Y2

(a) (b)

p=05(times2)

bi

ai

+

-

y(t)

1

z(t)

times2

+

-

+

-

+

-

y1(t)

y2(t)

p=05

p=05(times2)

bi

ai

+

-

1

0+

-0+

-0

16

y(t)= t3

y(t)=t

y(t)= t212

Fig 19 A pair of stochastic integrators solving (8) [54]

0 2 4 6 8 10 120

05

1

y 1(t

)

DSCAnalytical solution

0 2 4 6 8 10 12t

0

05

1

y 2(t

)Fig 20 Simulation results of the ODE solver in Fig 19 Theyare compared with the analytical solutions

where M is the length of the filter or the number of weightswi is a vector of M weights wi = [w

(0)i w

(1)i w

(Mminus1)i ]

and xi is the input vector for the digital signal sampled atdifferent time points xi = [ximinusM+1 ximinusM+2 xi]

T

+

-

y(xi)

F(wi xi)

F(wi xi) wi

ti

SNGpart F(wi xi)

part wijSNG

SNG

ai

bi

EEEE

Linear filter with weights wi

LMS weight update unit

xi-M+1xi-M+2

xi

hellip

yi +- ti+

ei

xi

+ -

xi-M+j

wij

xixi-M+1

+ -

yi

wi1

ti

RNG

+ -

wiM

RNG

hellip hellip

hellip hellip

hellip hellip

Comparator

+ - + - + -

wijwi1 wiM

xi-M+j+1 xixi-M+1

yi

ti

RNG

RNGhellip hellip

hellip hellip

hellip hellip

Comparator

Linear filter with parameters wi

Δwi

LMS weight updating unit

hellip

yi +- ti+

ei

hellip

Δwi

(Ci)

wij[Pi] = [12nCi] =

Fig 21 An adaptive filter

In the least-mean-square (LMS) algorithm the cost functionis the mean squared error between the filter output and theguidance signal ti The weights are updated by

wi+1 = wi + ηxi(ti minus yi) (10)

where η is the step size The update is implemented by thecircuit in Fig 22 The two multipliers are used to computexiti and xiyi respectively The integrator is then used toaccumulate ηxi(ti minus yi) over i = 0 1

The dynamic stochastic circuit is used to perform systemidentification for a high pass 103-tap FIR filter 103 weightsare trained by using randomly generated samples with theLMS algorithm 14-bit stochastic integrators are used Afteraround 340k iterations of training the misalignment of the

Actual classification results

Trained parameters

-

+

1 6 8 0

Estimate classification results (distribution)

SNGarray

Images

10 classes

00000hellip

11010hellip

00010hellip

Trained parameters

-

+

1 6 8 0DSNGarray

10 classes

times

times

ti

yi

xi

Fig 22 A DSC circuit training a weight in an adaptive filterThe stochastic multipliers are denoted by rectangles with ldquotimesrdquoBipolar and sign-magnitude representations can be used for thestochastic multipliers and integrator

weights in the adaptive filter converges to about -45 dBcompared to the target system The frequency responses ofthe target and the trained filter are shown in Fig 23 As canbe seen the adaptive filter has a similar frequency responseto the target system especially in the passband indicating anaccurate identification result

0 05 1 15 2 25 3Normalized frequency

-80

-60

-40

-20

0

Mag

nitu

de in

dB

target systemadaptive filter

Fig 23 Frequency responses of the target and trained filterusing the sign-magnitude representation

D A gradient descent (GD) circuit

GD is a simple yet effective optimization algorithm It findsthe local minima by iteratively taking steps along the negativesof the gradients at current points One-step GD is given by

wi+1 = wi minus ηnablaF (wi) (11)

where F (wi) is the function to be minimized at the ith stepnablaF (wi) denotes the current gradient and η is the step sizeThe step size is an indicator of how fast the model learns Asmall step size leads to a slow convergence while a large onecan lead to divergence

Similar to the implementation of the Euler method let apair of differential DSSrsquos encode the gradient minusnablaF (wi)the stochastic integrator can be used to compute (11) with astep size η = 12n by using these DSSrsquos as inputs DSC canthen be used to perform the optimization of the weights in aneural network (NN) by minimizing the cost function

The training of an NN usually involves two phases Take afully connected NN as an example shown in Fig 24 In theforward propagation the sum of product of the input vector xand the weight vector w is obtained as v = wx The output

Inputs Hidden layers

Outputs

(a)

hellip

x

w o=f(wx)

hellip

(b)

(c)

wl+1

δl+1o wl

hellip

δwl = f`wl+1δl+1

hellip

Random number generator (RNG)

p

Stochastic sequence with probability p

0101100hellip

Comparator

B

AAltB

N

N

(a)

CLK

01100110hellip

p1=05

10101010hellip

p2=05

00100010hellip

p=p1p2=025

(a) A stochastic multiplier for the unipolar representation

0100010010010000hellip

1000100010001000hellip0011001111100111hellip

times2-1= -05 4

16p1=

times2-1= -05 416

p2=

times2-1= 025 1016

p1p2=

(b) A stochastic multiplier for the bipolar representation

01100110hellip

|p1|=05

10101010hellip

|p2|=05

00100010hellip

|p|=|p1||p2|=025

(c) A stochastic multiplier for the sign-magnitude representation

|p1|=05

Sign(p1)=1(negative)

Sign(p2)=0(positve)

Sign(p)=1(negative)

Fig 24 (a) A fully connected NN in which the neuronsare organized layer by layer (b) The computation duringthe forward propagation (c) The local gradient for wl iscomputed by using the backward propagation The output ofrom previous neuron is required to compute the gradient

of the neuron o is then computed by o = f(v) where f isthe activation function During this phase the weights do notchange In the backward propagation the difference betweenthe actual and the desired final outputs ie the classificationresults is calculated and the local gradients are computed foreach neuron based on the difference [55] Take the neuronin Fig 24(c) as an example the local gradient is obtainedby δl = f prime(v)wl+1δl+1 where f prime(middot) is the derivative of theactivation function wl+1 is the weight vector connecting layerl and l+1 and δl+1 is the local gradient vector of the neuronsin layer l + 1 Then the gradient of a weight at layer l iscomputed by multiplying the local gradient with the recordedoutput of the designated neuron ie nablaF (wl) = minusoδl

In [56] it is found that the local gradient can be decomposedinto two parts δ = δ+ minus δminus where δ+ is contributed bythe desired output of the NN and δminus by the actual outputof the NN So the gradient for weight w can be rewritten asnablaF (w) = minuso(δ+minusδminus) Similar to the implementation of (10)the DSC circuit in Fig 22 can be used to implement the GDalgorithm to perform the online training (one sample at a time)of the fully connected NN with the input signals δ+ o and δminusHowever a backpropagation ie δ = f prime(v)wl+1δl+1 is stillcomputed by using a conventional method (eg a fixed-pointarithmetic circuit) to obtain the local gradients Otherwisethe accuracy loss would be too much for the algorithm toconverge

Fig 25 shows the convergence of cross entropy (as a costfunction) and testing accuracy for the training of a 784-128-128-10 NN with the MNIST handwritten digit dataset As canbe seen the testing accuracy of the NN trained by the DSCcircuit using the sign-magnitude representation is similar to theone produced by the fixed-point implementation (around 97)However the accuracy of the DSC implementation using thebipolar representation is relatively low

The proposed circuitry can be useful for online learningwhere real-time interaction with the environment is requiredwith an energy constraint [57] It can also be used to train amachine learning model using private or security-critical dataon mobile devices if data are sensitive and cannot be uploaded

to a cloud computer

V ERROR REDUCTION AND ASSESSMENT

One major issue in SC is the loss of accuracy [58] Indepen-dent RNGs have been used to avoid correlation and improvethe accuracy of components such as stochastic multipliersThis method inevitably increases the hardware overheadHowever in a stochastic integrator the sharing of RNGs forgenerating the two input sequences reduces the variance thusreducing the error This is shown as follows

The error introduced by DSC at each step of computation isrelated to the variance of the difference of the two accumulatedstochastic bits AiminusBi When independent RNGs are used togenerate Ai and Bi the variance at a single step is given by

Var[Ai minusBi] = E[(Ai minusBi minus E(Ai minusBi))2]

= E[(Ai minusBi)2]minus 2E[Ai minusBi]

(PAi minus PBi) + (PAi minus PBi)2

(12)

where PAi and PBi are the probabilities of Ai and Bi beinglsquo1rsquo respectively ie E[Ai] = PAi and E[Bi] = PBi Since(AiminusBi)

2 is 0 when Ai and Bi are equal or 1 when they aredifferent E[(AiminusBi)

2] equals PAi(1minusPBi)+PBi(1minusPAi)which is the probability that Ai and Bi are different when Ai

and Bi are independently generated1 (12) is then rewritten as

Varind[Ai minusBi] = PAi(1minus PBi) + PBi(1minus PAi)

minus (PAi minus PBi)2

= PAi(1minus PAi) + PBi(1minus PBi)

(13)

On the other hand if the same RNG is used to generate Ai

and Bi the variance of Ai minusBi can be derived from (12) aswell However since only a shared random number is used togenerate Ai and Bi E[(AiminusBi)

2] equals |PAiminusPBi| which

1E[(AiminusBi)2] = Prob[(AiminusBi)

2 = 1]times1+Prob[(AiminusBi)2 = 0]times0

0 5 10 15 20Epoch

0

01

02

03

04

05

Cro

ss e

ntro

py

Signed DSCBipolar DSCFixed-point

(a)

0 5 10 15 20Epoch

88

90

92

94

96

98

Test

ing

accu

racy

(in

perc

enta

ge)

Signed DSCBipolar DSCFixed-point

(b)

Fig 25 (a) Convergence of cross entropy and (b) testingaccuracy of MNIST hand-written digit recognition datasetCross entropy is the cost function to be minimized during thetraining of an NN The weights of the NN are updated by theDSC circuits The DSC circuits using the sign-magnitude andbipolar representations and the fixed-point implementation areconsidered for comparison

gives the probability that Ai and Bi are different Thus (12)can be rewritten as

Varshare[Ai minusBi] = |PAi minus PBi| minus (PAi minus PBi)2

= |PAi minus PBi|(1minus |PAi minus PBi|)(14)

Since Varshare[Ai minus Bi] minus Varind[Ai minus Bi] = 2PBiPAi minus2min(PAi PBi) le 0 we obtain Varshare[Ai minus Bi] leVarind[AiminusBi] for any i = 0 1 2 [54] Therefore sharingthe use of RNGs to generate input stochastic bit streamsimproves the accuracy

For the circuit in Fig 22 though the inputs of the stochasticintegrator are the outputs of the stochastic multipliers insteadof being directly generated by a DSNG the RNG sharingscheme still works in the generated DSSrsquos encoding δ+ andδminus ie it produces a smaller variance in the computed result[56] However an uncorrelated RNG must be used to generatethe DSS encoding the recorded output of the neuron obecause independency is still required in general for accuratestochastic multiplication

The accuracy of the accumulated result is also related to thebit-width of the counter in the stochastic integrator The bitwidth does not only determine the resolution of the result butalso the step size especially in the ODE solver adaptive filterand the gradient descent circuit A larger bit width indicatesa smaller resolution and a smaller step size for fine tuning atthe cost of extra computation time It is shown in [56] that themulti-step variance bound exponentially decreases with the bitwidth of the counter However for the IIR filter the changeof bit-width affects the characteristic of the filter

For the ODE solver circuit it is also found that the use ofquasirandom numbers improves the accuracy compared to theuse of pseudorandom numbers [54] However when the DSS isused to encode signals from the input or images such as in theadaptive filter and the gradient descent algorithm quasirandomand pseudorandom numbers produce similar accuracy Thereason may lie in the fact that the values of the encodedsignals in these two applications are independently selectedfrom the training dataset However for the ODE solver thesignals encoded are samples from continuous functions andadjacent samples have similar values As a result withina short segment of the DSS in this case it approximatelyencodes the same value It then works similarly as CSCfor which the Sobol sequence helps improving the accuracyFig 26 compares the convergence of misalignment duringthe training of the adaptive filter using pseudorandom andquasirandom numbers 16-bit stochastic integrators are usedin this experiment The misalignment is given by

Misalignment =E[(w minus w)2]

E[w2] (15)

where w denotes the target value and w is the trained resultIt can be seen that the convergence speed of using these twotypes of random numbers is similar and their misalignmentsboth converge to around -52 dB

TABLE II Hardware evaluation of the gradient descent circuitarray training a 784-128-128-10 neural network

Metrics DSC Fixed-point Ratio

Step size 2minus10 2minus10 -Epochs 20 20 -Min time (ns) 16times 106 47times 106 13EPO (fJ) 12times 107 11times 108 192TPA (imagesmicrom2) 57times 101 15 381Aver test Accu 9704 9749 -

TABLE III Hardware performance comparison of the ODEsolvers for (8)

Metric DSC Fixed-point Ratio

Step size 2minus8 2minus8 -Iterations 3217 3217 -Min time (ns) 257359 855720 133EPO (fJ) 20121 46600 123TPA (wordmicrosmicrom2) 475 058 821RMSE 47times 10minus3 36times 10minus3 -

VI HARDWARE EFFICIENCY EVALUATION

Due to the extremely low sequence length used for encodinga value in DSC the power consumption of DSC circuits ismuch lower than conventional arithmetic circuits Moreoversince each bit in a DSS is used to encode a number DSCis more energy-efficient than CSC The gradient descent andODE solver circuits are considered as examples to show thehardware efficiency Implemented in VHDL the circuits aresynthesized by Synopsys Design Compiler with a 28-nm tech-nology to obtain the power consumption maximum frequencyand hardware measurements The minimum runtime energyper operation (EPO) throughput per area (TPA) and accuracyare reported in Table II for the gradient descent circuits andin Table III for ODE solvers

As can be seen with a slightly degraded accuracy theDSC circuits outperform conventional circuits using a fixed-point representation in speed energy efficiency and hardwareefficiency For the gradient descent circuit since the two RNGscan be shared among the 118282 stochastic integrators (for118282 weights) the improvement in hardware efficiency iseven more significant than the ODE solver for which one RNG

0 05 1 15 2 25 3Number of steps ( 64 input samples) 104

-60

-50

-40

-30

-20

-10

0

Mis

alig

nmen

t (dB

)

PseudorandomQuasirandom

Fig 26 Comparison of the convergence of misalignment usingpseudorandom and quasirandom numbers

is shared between two stochastic integrators However sincethe sharing of RNGs does not significantly affect the criticalpath the performance improvements are similar for these twocircuits

VII CONCLUSION

In this article a new type of stochastic computing dynamicstochastic computing (DSC) is introduced to encode a varyingsignal by using a dynamically variable binary bit streamreferred to as a dynamic stochastic sequence (DSS) A DSCcircuit can efficiently implement an iterative accumulation-based algorithm It is useful in implementing the Euler methodtraining a neural network and an adaptive filter and IIRfiltering In those applications integrations are implementedby stochastic integrators that accumulate the stochastic bits in aDSS Since the computation is performed on single stochasticbits instead of conventional binary values or long stochasticsequences a significant improvement in hardware efficiencyis achieved with a high accuracy

ACKNOWLEDGMENT

This work was supported by the Natural Sciences andEngineering Research Council of Canada (NSERC) underProjects RES0025211 and RES0048688

REFERENCES

[1] B R Gaines Stochastic Computing Systems Boston MA SpringerUS 1969 pp 37ndash172

[2] W Qian X Li M D Riedel K Bazargan and D J Lilja ldquoAnarchitecture for fault-tolerant computation with stochastic logicrdquo IEEETransactions on Computers vol 60 no 1 pp 93ndash105 2011

[3] P Li and D J Lilja ldquoUsing stochastic computing to implementdigital image processing algorithmsrdquo in 2011 IEEE 29th InternationalConference on Computer Design (ICCD) 2011 pp 154ndash161

[4] S C Smithson N Onizawa B H Meyer W J Gross and T HanyuldquoEfficient CMOS invertible logic using stochastic computingrdquo IEEETransactions on Circuits and Systems I Regular Papers vol 66 no 6pp 2263ndash2274 June 2019

[5] W Maass and C M Bishop Pulsed neural networks MIT press 2001[6] Y Liu S Liu Y Wang F Lombardi and J Han ldquoA stochastic

computational multi-layer perceptron with backward propagationrdquo IEEETransactions on Computers vol 67 no 9 pp 1273ndash1286 2018

[7] J Zhao J Shawe-Taylor and M van Daalen ldquoLearning in stochastic bitstream neural networksrdquo Neural Networks vol 9 no 6 pp 991ndash9981996

[8] B D Brown and H C Card ldquoStochastic neural computation Icomputational elementsrdquo IEEE Transactions on Computers vol 50no 9 pp 891ndash905 2001

[9] D Zhang and H Li ldquoA stochastic-based FPGA controller for aninduction motor drive with integrated neural network algorithmsrdquo IEEETransactions on Industrial Electronics vol 55 no 2 pp 551ndash561 2008

[10] Y Ji F Ran C Ma and D J Lilja ldquoA hardware implementationof a radial basis function neural network using stochastic logicrdquo inProceedings of the Design Automation amp Test in Europe Conference(DATE) 2015 pp 880ndash883

[11] V Canals A Morro A Oliver M L Alomar and J L RosselloldquoA new stochastic computing methodology for efficient neural networkimplementationrdquo IEEE Transactions on Neural Networks and LearningSystems vol 27 no 3 pp 551ndash564 2016

[12] A Ardakani F Leduc-Primeau N Onizawa T Hanyu and W J GrossldquoVLSI implementation of deep neural network using integral stochasticcomputingrdquo IEEE Transactions on Very Large Scale Integration (VLSI)Systems vol 25 no 10 pp 2688ndash2699 2017

[13] A Ren Z Li C Ding Q Qiu Y Wang J Li X Qian and B YuanldquoSC-DCNN Highly-scalable deep convolutional neural network us-ing stochastic computingrdquo in Proceedings of the 22nd InternationalConference on Architectural Support for Programming Languages andOperating Systems (ASPLOS) 2017 pp 405ndash418

[14] V T Lee A Alaghi J P Hayes V Sathe and L Ceze ldquoEnergy-efficienthybrid stochastic-binary neural networks for near-sensor computingrdquo inDesign Automation amp Test in Europe Conference amp Exhibition (DATE)2017 pp 13ndash18

[15] R Hojabr K Givaki S Tayaranian P Esfahanian A Khonsari D Rah-mati and M H Najafi ldquoSkippyNN An embedded stochastic-computingaccelerator for convolutional neural networksrdquo in Proceedings of the56th Annual Design Automation Conference (DAC) 2019 pp 1ndash6

[16] Y Liu L Liu F Lombardi and J Han ldquoAn energy-efficient and noise-tolerant recurrent neural network using stochastic computingrdquo IEEETransactions on Very Large Scale Integration (VLSI) Systems vol 27no 9 pp 2213ndash2221 Sep 2019

[17] R Cai A Ren O Chen N Liu C Ding X Qian J Han W LuoN Yoshikawa and Y Wang ldquoA stochastic-computing based deeplearning framework using adiabatic quantum-flux-parametron supercon-ducting technologyrdquo in Proceedings of the 46th International Symposiumon Computer Architecture ser ISCA rsquo19 New York NY USA ACM2019 pp 567ndash578

[18] A Alaghi and J P Hayes ldquoSurvey of stochastic computingrdquo ACMTrans Embed Comput Syst vol 12 no 2s pp 921ndash9219 May 2013

[19] A Zhakatayev S Lee H Sim and J Lee ldquoSign-magnitude SC getting10x accuracy for free in stochastic computing for deep neural networksrdquoin Proceedings of the 55th Annual Design Automation Conference(DAC) 2018 pp 1ndash6

[20] P K Gupta and R Kumaresan ldquoBinary multiplication with PN se-quencesrdquo IEEE Transactions on Acoustics Speech and Signal Process-ing vol 36 no 4 pp 603ndash606 1988

[21] A Alaghi and J P Hayes ldquoFast and accurate computation using stochas-tic circuitsrdquo in Proceedings of the conference on Design Automation ampTest in Europe (DATE) 2014 pp 1ndash4

[22] S Liu and J Han ldquoEnergy efficient stochastic computing with Sobolsequencesrdquo in Design Automation amp Test in Europe Conference ampExhibition (DATE) 2017 pp 650ndash653

[23] H Niederreiter Random Number Generation and quasi-Monte CarloMethods Philadelphia PA USA Society for Industrial and AppliedMathematics 1992

[24] P Bratley and B L Fox ldquoAlgorithm 659 Implementing Sobolrsquosquasirandom sequence generatorrdquo ACM Transactions on MathematicalSoftware (TOMS) vol 14 no 1 pp 88ndash100 1988

[25] S Liu and J Han ldquoToward energy-efficient stochastic circuits usingparallel sobol sequencesrdquo IEEE Transactions on Very Large ScaleIntegration (VLSI) Systems vol 26 no 7 pp 1326ndash1339 July 2018

[26] A Alaghi and J P Hayes ldquoSTRAUSS Spectral transform use instochastic circuit synthesisrdquo IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems vol 34 no 11 pp 1770ndash1783 2015

[27] P Li D J Lilja W Qian K Bazargan and M Riedel ldquoThe synthesisof complex arithmetic computation on stochastic bit streams usingsequential logicrdquo in Proceedings of the International Conference onComputer-Aided Design (ICCAD) 2012 pp 480ndash487

[28] P Mars and W J Poppelbaum Stochastic and deterministic averagingprocessors Peter Peregrinus Press 1981 no 1

[29] Z Li A Ren J Li Q Qiu Y Wang and B Yuan ldquoDSCNN Hardware-oriented optimization for stochastic computing based deep convolutionalneural networksrdquo in IEEE 34th International Conference on ComputerDesign (ICCD) 2016 pp 678ndash681

[30] A Ren Z Li Y Wang Q Qiu and B Yuan ldquoDesigning reconfigurablelarge-scale deep learning systems using stochastic computingrdquo in IEEEInternational Conference on Rebooting Computing (ICRC) 2016 pp1ndash7

[31] K Kim J Kim J Yu J Seo J Lee and K Choi ldquoDynamic energy-accuracy trade-off using stochastic computing in deep neural networksrdquoin Proceedings of the 53rd Annual Design Automation Conference(DAC) 2016 pp 1241ndash1246

[32] H Sim and J Lee ldquoA new stochastic computing multiplier withapplication to deep convolutional neural networksrdquo in Proceedings ofthe 54th Annual Design Automation Conference (DAC) 2017 pp 1ndash6

[33] Y Liu Y Wang F Lombardi and J Han ldquoAn energy-efficient online-learning stochastic computational deep belief networkrdquo IEEE Journal

on Emerging and Selected Topics in Circuits and Systems vol 8 no 3pp 454ndash465 Sep 2018

[34] J F Keane and L E Atlas ldquoImpulses and stochastic arithmetic forsignal processingrdquo in IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP) vol 2 2001 pp 1257ndash1260

[35] R Kuehnel ldquoBinomial logic extending stochastic computing to high-bandwidth signalsrdquo in Conference Record of the Thirty-Sixth AsilomarConference on Signals Systems and Computers vol 2 2002 pp 1089ndash1093

[36] N Yamamoto H Fujisaka K Haeiwa and T Kamio ldquoNanoelectroniccircuits for stochastic computingrdquo in 6th IEEE Conference on Nanotech-nology (IEEE-NANO) vol 1 2006 pp 306ndash309

[37] K K Parhi and Y Liu ldquoArchitectures for IIR digital filters usingstochastic computingrdquo in IEEE International Symposium on Circuitsand Systems (ISCAS) 2014 pp 373ndash376

[38] B Yuan Y Wang and Z Wang ldquoArea-efficient scaling-free DFTFFTdesign using stochastic computingrdquo IEEE Transactions on Circuits andSystems II Express Briefs vol 63 no 12 pp 1131ndash1135 2016

[39] R Wang J Han B F Cockburn and D G Elliott ldquoStochastic circuitdesign and performance evaluation of vector quantization for differenterror measuresrdquo IEEE Transactions on Very Large Scale Integration(VLSI) Systems vol 24 no 10 pp 3169ndash3183 2016

[40] H Ichihara T Sugino S Ishii T Iwagaki and T Inoue ldquoCompactand accurate digital filters based on stochastic computingrdquo IEEE Trans-actions on Emerging Topics in Computing vol 7 no 1 pp 31ndash432016

[41] T Hammadou M Nilson A Bermak and P Ogunbona ldquoA 96spltimes64 intelligent digital pixel array with extended binary stochasticarithmeticrdquo in Proceedings of the International Symposium on Circuitsand Systems (ISCAS) 2003

[42] A Alaghi C Li and J P Hayes ldquoStochastic circuits for real-timeimage-processing applicationsrdquo in Proceedings of the 50th AnnualDesign Automation Conference (DAC) 2013 pp 1361ndash1366

[43] P Li D J Lilja W Qian K Bazargan and M D Riedel ldquoComputationon stochastic bit streams digital image processing case studiesrdquo IEEETransactions on Very Large Scale Integration (VLSI) Systems vol 22no 3 pp 449ndash462 2014

[44] M H Najafi S Jamali-Zavareh D J Lilja M D Riedel K Bazarganand R Harjani ldquoTime-encoded values for highly efficient stochasticcircuitsrdquo IEEE Transactions on Very Large Scale Integration (VLSI)Systems vol 25 no 5 pp 1644ndash1657 2017

[45] S R Faraji M Hassan Najafi B Li D J Lilja and K BazarganldquoEnergy-efficient convolutional neural networks with deterministic bit-stream processingrdquo in 2019 Design Automation Test in Europe Confer-ence Exhibition (DATE) 2019 pp 1757ndash1762

[46] S Liu and J Han ldquoDynamic stochastic computing for digital signalprocessing applicationsrdquo in Design Automation amp Test in EuropeConference amp Exhibition (DATE) 2020

[47] I L Dalal D Stefan and J Harwayne-Gidansky ldquoLow discrepancysequences for Monte Carlo simulations on reconfigurable platformsrdquo inInternational Conference on Application-Specific Systems Architecturesand Processors (ASAP) 2008 pp 108ndash113

[48] W G McCallum D Hughes-Hallett A M Gleason D O LomenD Lovelock J Tecosky-Feldman T W Tucker D Flath T ThrashK R Rhea et al Multivariable calculus Wiley 1997 vol 200 no 1

[49] S S Tehrani A Naderi G-A Kamendje S Hemati S Mannor andW J Gross ldquoMajority-based tracking forecast memories for stochasticLDPC decodingrdquo IEEE Transactions on Signal Processing vol 58no 9 pp 4883ndash4896 2010

[50] N Saraf K Bazargan D J Lilja and M D Riedel ldquoIIR filtersusing stochastic arithmeticrdquo in Design Automation amp Test in EuropeConference amp Exhibition (DATE) 2014 pp 1ndash6

[51] F Maloberti ldquoNon conventional signal processing by the use of sigmadelta technique a tutorial introductionrdquo in Proceedings of IEEE Inter-national Symposium on Circuits and Systems (ISCAS) vol 6 1992 pp2645ndash2648

[52] J C Butcher The Numerical Analysis of Ordinary Differential Equa-tions Runge-Kutta and General Linear Methods USA Wiley-Interscience 1987

[53] W Gerstner W M Kistler R Naud and L Paninski Neuronaldynamics From single neurons to networks and models of cognitionCambridge University Press 2014

[54] S Liu and J Han ldquoHardware ODE solvers using stochastic circuitsrdquo inProceedings of the 54th Annual Design Automation Conference (DAC)2017 pp 1ndash6

[55] S S Haykin Neural networks and learning machines Pearson UpperSaddle River NJ USA 2009 vol 3

[56] S Liu H Jiang L Liu and J Han ldquoGradient descent using stochasticcircuits for efficient training of learning machinesrdquo IEEE Transactionson Computer-Aided Design of Integrated Circuits and Systems vol 37no 11 pp 2530ndash2541 Nov 2018

[57] R Nishihara P Moritz S Wang A Tumanov W Paul J Schleier-Smith R Liaw M Niknami M I Jordan and I StoicaldquoReal-time machine learning The missing piecesrdquo in Proceedingsof the 16th Workshop on Hot Topics in Operating Systemsser HotOS rsquo17 ACM 2017 pp 106ndash110 [Online] Availablehttpdoiacmorg10114531029803102998

[58] J P Hayes ldquoIntroduction to stochastic computing and its challengesrdquo inProceedings of the 52nd Annual Design Automation Conference (DAC)2015 pp 591ndash593

Page 8: Introduction to Dynamic Stochastic Computing

Actual classification results

Trained parameters

-

+

1 6 8 0

Estimate classification results (distribution)

SNGarray

Images

10 classes

00000hellip

11010hellip

00010hellip

Trained parameters

-

+

1 6 8 0DSNGarray

10 classes

times

times

ti

yi

xi

Fig 22 A DSC circuit training a weight in an adaptive filterThe stochastic multipliers are denoted by rectangles with ldquotimesrdquoBipolar and sign-magnitude representations can be used for thestochastic multipliers and integrator

weights in the adaptive filter converges to about -45 dBcompared to the target system The frequency responses ofthe target and the trained filter are shown in Fig 23 As canbe seen the adaptive filter has a similar frequency responseto the target system especially in the passband indicating anaccurate identification result

0 05 1 15 2 25 3Normalized frequency

-80

-60

-40

-20

0

Mag

nitu

de in

dB

target systemadaptive filter

Fig 23 Frequency responses of the target and trained filterusing the sign-magnitude representation

D A gradient descent (GD) circuit

GD is a simple yet effective optimization algorithm It findsthe local minima by iteratively taking steps along the negativesof the gradients at current points One-step GD is given by

wi+1 = wi minus ηnablaF (wi) (11)

where F (wi) is the function to be minimized at the ith stepnablaF (wi) denotes the current gradient and η is the step sizeThe step size is an indicator of how fast the model learns Asmall step size leads to a slow convergence while a large onecan lead to divergence

Similar to the implementation of the Euler method let apair of differential DSSrsquos encode the gradient minusnablaF (wi)the stochastic integrator can be used to compute (11) with astep size η = 12n by using these DSSrsquos as inputs DSC canthen be used to perform the optimization of the weights in aneural network (NN) by minimizing the cost function

The training of an NN usually involves two phases Take afully connected NN as an example shown in Fig 24 In theforward propagation the sum of product of the input vector xand the weight vector w is obtained as v = wx The output

Inputs Hidden layers

Outputs

(a)

hellip

x

w o=f(wx)

hellip

(b)

(c)

wl+1

δl+1o wl

hellip

δwl = f`wl+1δl+1

hellip

Random number generator (RNG)

p

Stochastic sequence with probability p

0101100hellip

Comparator

B

AAltB

N

N

(a)

CLK

01100110hellip

p1=05

10101010hellip

p2=05

00100010hellip

p=p1p2=025

(a) A stochastic multiplier for the unipolar representation

0100010010010000hellip

1000100010001000hellip0011001111100111hellip

times2-1= -05 4

16p1=

times2-1= -05 416

p2=

times2-1= 025 1016

p1p2=

(b) A stochastic multiplier for the bipolar representation

01100110hellip

|p1|=05

10101010hellip

|p2|=05

00100010hellip

|p|=|p1||p2|=025

(c) A stochastic multiplier for the sign-magnitude representation

|p1|=05

Sign(p1)=1(negative)

Sign(p2)=0(positve)

Sign(p)=1(negative)

Fig 24 (a) A fully connected NN in which the neuronsare organized layer by layer (b) The computation duringthe forward propagation (c) The local gradient for wl iscomputed by using the backward propagation The output ofrom previous neuron is required to compute the gradient

of the neuron o is then computed by o = f(v) where f isthe activation function During this phase the weights do notchange In the backward propagation the difference betweenthe actual and the desired final outputs ie the classificationresults is calculated and the local gradients are computed foreach neuron based on the difference [55] Take the neuronin Fig 24(c) as an example the local gradient is obtainedby δl = f prime(v)wl+1δl+1 where f prime(middot) is the derivative of theactivation function wl+1 is the weight vector connecting layerl and l+1 and δl+1 is the local gradient vector of the neuronsin layer l + 1 Then the gradient of a weight at layer l iscomputed by multiplying the local gradient with the recordedoutput of the designated neuron ie nablaF (wl) = minusoδl

In [56] it is found that the local gradient can be decomposedinto two parts δ = δ+ minus δminus where δ+ is contributed bythe desired output of the NN and δminus by the actual outputof the NN So the gradient for weight w can be rewritten asnablaF (w) = minuso(δ+minusδminus) Similar to the implementation of (10)the DSC circuit in Fig 22 can be used to implement the GDalgorithm to perform the online training (one sample at a time)of the fully connected NN with the input signals δ+ o and δminusHowever a backpropagation ie δ = f prime(v)wl+1δl+1 is stillcomputed by using a conventional method (eg a fixed-pointarithmetic circuit) to obtain the local gradients Otherwisethe accuracy loss would be too much for the algorithm toconverge

Fig 25 shows the convergence of cross entropy (as a costfunction) and testing accuracy for the training of a 784-128-128-10 NN with the MNIST handwritten digit dataset As canbe seen the testing accuracy of the NN trained by the DSCcircuit using the sign-magnitude representation is similar to theone produced by the fixed-point implementation (around 97)However the accuracy of the DSC implementation using thebipolar representation is relatively low

The proposed circuitry can be useful for online learningwhere real-time interaction with the environment is requiredwith an energy constraint [57] It can also be used to train amachine learning model using private or security-critical dataon mobile devices if data are sensitive and cannot be uploaded

to a cloud computer

V ERROR REDUCTION AND ASSESSMENT

One major issue in SC is the loss of accuracy [58] Indepen-dent RNGs have been used to avoid correlation and improvethe accuracy of components such as stochastic multipliersThis method inevitably increases the hardware overheadHowever in a stochastic integrator the sharing of RNGs forgenerating the two input sequences reduces the variance thusreducing the error This is shown as follows

The error introduced by DSC at each step of computation isrelated to the variance of the difference of the two accumulatedstochastic bits AiminusBi When independent RNGs are used togenerate Ai and Bi the variance at a single step is given by

Var[Ai minusBi] = E[(Ai minusBi minus E(Ai minusBi))2]

= E[(Ai minusBi)2]minus 2E[Ai minusBi]

(PAi minus PBi) + (PAi minus PBi)2

(12)

where PAi and PBi are the probabilities of Ai and Bi beinglsquo1rsquo respectively ie E[Ai] = PAi and E[Bi] = PBi Since(AiminusBi)

2 is 0 when Ai and Bi are equal or 1 when they aredifferent E[(AiminusBi)

2] equals PAi(1minusPBi)+PBi(1minusPAi)which is the probability that Ai and Bi are different when Ai

and Bi are independently generated1 (12) is then rewritten as

Varind[Ai minusBi] = PAi(1minus PBi) + PBi(1minus PAi)

minus (PAi minus PBi)2

= PAi(1minus PAi) + PBi(1minus PBi)

(13)

On the other hand if the same RNG is used to generate Ai

and Bi the variance of Ai minusBi can be derived from (12) aswell However since only a shared random number is used togenerate Ai and Bi E[(AiminusBi)

2] equals |PAiminusPBi| which

1E[(AiminusBi)2] = Prob[(AiminusBi)

2 = 1]times1+Prob[(AiminusBi)2 = 0]times0

0 5 10 15 20Epoch

0

01

02

03

04

05

Cro

ss e

ntro

py

Signed DSCBipolar DSCFixed-point

(a)

0 5 10 15 20Epoch

88

90

92

94

96

98

Test

ing

accu

racy

(in

perc

enta

ge)

Signed DSCBipolar DSCFixed-point

(b)

Fig 25 (a) Convergence of cross entropy and (b) testingaccuracy of MNIST hand-written digit recognition datasetCross entropy is the cost function to be minimized during thetraining of an NN The weights of the NN are updated by theDSC circuits The DSC circuits using the sign-magnitude andbipolar representations and the fixed-point implementation areconsidered for comparison

gives the probability that Ai and Bi are different Thus (12)can be rewritten as

Varshare[Ai minusBi] = |PAi minus PBi| minus (PAi minus PBi)2

= |PAi minus PBi|(1minus |PAi minus PBi|)(14)

Since Varshare[Ai minus Bi] minus Varind[Ai minus Bi] = 2PBiPAi minus2min(PAi PBi) le 0 we obtain Varshare[Ai minus Bi] leVarind[AiminusBi] for any i = 0 1 2 [54] Therefore sharingthe use of RNGs to generate input stochastic bit streamsimproves the accuracy

For the circuit in Fig 22 though the inputs of the stochasticintegrator are the outputs of the stochastic multipliers insteadof being directly generated by a DSNG the RNG sharingscheme still works in the generated DSSrsquos encoding δ+ andδminus ie it produces a smaller variance in the computed result[56] However an uncorrelated RNG must be used to generatethe DSS encoding the recorded output of the neuron obecause independency is still required in general for accuratestochastic multiplication

The accuracy of the accumulated result is also related to thebit-width of the counter in the stochastic integrator The bitwidth does not only determine the resolution of the result butalso the step size especially in the ODE solver adaptive filterand the gradient descent circuit A larger bit width indicatesa smaller resolution and a smaller step size for fine tuning atthe cost of extra computation time It is shown in [56] that themulti-step variance bound exponentially decreases with the bitwidth of the counter However for the IIR filter the changeof bit-width affects the characteristic of the filter

For the ODE solver circuit it is also found that the use ofquasirandom numbers improves the accuracy compared to theuse of pseudorandom numbers [54] However when the DSS isused to encode signals from the input or images such as in theadaptive filter and the gradient descent algorithm quasirandomand pseudorandom numbers produce similar accuracy Thereason may lie in the fact that the values of the encodedsignals in these two applications are independently selectedfrom the training dataset However for the ODE solver thesignals encoded are samples from continuous functions andadjacent samples have similar values As a result withina short segment of the DSS in this case it approximatelyencodes the same value It then works similarly as CSCfor which the Sobol sequence helps improving the accuracyFig 26 compares the convergence of misalignment duringthe training of the adaptive filter using pseudorandom andquasirandom numbers 16-bit stochastic integrators are usedin this experiment The misalignment is given by

Misalignment =E[(w minus w)2]

E[w2] (15)

where w denotes the target value and w is the trained resultIt can be seen that the convergence speed of using these twotypes of random numbers is similar and their misalignmentsboth converge to around -52 dB

TABLE II Hardware evaluation of the gradient descent circuitarray training a 784-128-128-10 neural network

Metrics DSC Fixed-point Ratio

Step size 2minus10 2minus10 -Epochs 20 20 -Min time (ns) 16times 106 47times 106 13EPO (fJ) 12times 107 11times 108 192TPA (imagesmicrom2) 57times 101 15 381Aver test Accu 9704 9749 -

TABLE III Hardware performance comparison of the ODEsolvers for (8)

Metric DSC Fixed-point Ratio

Step size 2minus8 2minus8 -Iterations 3217 3217 -Min time (ns) 257359 855720 133EPO (fJ) 20121 46600 123TPA (wordmicrosmicrom2) 475 058 821RMSE 47times 10minus3 36times 10minus3 -

VI HARDWARE EFFICIENCY EVALUATION

Due to the extremely low sequence length used for encodinga value in DSC the power consumption of DSC circuits ismuch lower than conventional arithmetic circuits Moreoversince each bit in a DSS is used to encode a number DSCis more energy-efficient than CSC The gradient descent andODE solver circuits are considered as examples to show thehardware efficiency Implemented in VHDL the circuits aresynthesized by Synopsys Design Compiler with a 28-nm tech-nology to obtain the power consumption maximum frequencyand hardware measurements The minimum runtime energyper operation (EPO) throughput per area (TPA) and accuracyare reported in Table II for the gradient descent circuits andin Table III for ODE solvers

As can be seen with a slightly degraded accuracy theDSC circuits outperform conventional circuits using a fixed-point representation in speed energy efficiency and hardwareefficiency For the gradient descent circuit since the two RNGscan be shared among the 118282 stochastic integrators (for118282 weights) the improvement in hardware efficiency iseven more significant than the ODE solver for which one RNG

0 05 1 15 2 25 3Number of steps ( 64 input samples) 104

-60

-50

-40

-30

-20

-10

0

Mis

alig

nmen

t (dB

)

PseudorandomQuasirandom

Fig 26 Comparison of the convergence of misalignment usingpseudorandom and quasirandom numbers

is shared between two stochastic integrators However sincethe sharing of RNGs does not significantly affect the criticalpath the performance improvements are similar for these twocircuits

VII CONCLUSION

In this article a new type of stochastic computing dynamicstochastic computing (DSC) is introduced to encode a varyingsignal by using a dynamically variable binary bit streamreferred to as a dynamic stochastic sequence (DSS) A DSCcircuit can efficiently implement an iterative accumulation-based algorithm It is useful in implementing the Euler methodtraining a neural network and an adaptive filter and IIRfiltering In those applications integrations are implementedby stochastic integrators that accumulate the stochastic bits in aDSS Since the computation is performed on single stochasticbits instead of conventional binary values or long stochasticsequences a significant improvement in hardware efficiencyis achieved with a high accuracy

ACKNOWLEDGMENT

This work was supported by the Natural Sciences andEngineering Research Council of Canada (NSERC) underProjects RES0025211 and RES0048688

REFERENCES

[1] B R Gaines Stochastic Computing Systems Boston MA SpringerUS 1969 pp 37ndash172

[2] W Qian X Li M D Riedel K Bazargan and D J Lilja ldquoAnarchitecture for fault-tolerant computation with stochastic logicrdquo IEEETransactions on Computers vol 60 no 1 pp 93ndash105 2011

[3] P Li and D J Lilja ldquoUsing stochastic computing to implementdigital image processing algorithmsrdquo in 2011 IEEE 29th InternationalConference on Computer Design (ICCD) 2011 pp 154ndash161

[4] S C Smithson N Onizawa B H Meyer W J Gross and T HanyuldquoEfficient CMOS invertible logic using stochastic computingrdquo IEEETransactions on Circuits and Systems I Regular Papers vol 66 no 6pp 2263ndash2274 June 2019

[5] W Maass and C M Bishop Pulsed neural networks MIT press 2001[6] Y Liu S Liu Y Wang F Lombardi and J Han ldquoA stochastic

computational multi-layer perceptron with backward propagationrdquo IEEETransactions on Computers vol 67 no 9 pp 1273ndash1286 2018

[7] J Zhao J Shawe-Taylor and M van Daalen ldquoLearning in stochastic bitstream neural networksrdquo Neural Networks vol 9 no 6 pp 991ndash9981996

[8] B D Brown and H C Card ldquoStochastic neural computation Icomputational elementsrdquo IEEE Transactions on Computers vol 50no 9 pp 891ndash905 2001

[9] D Zhang and H Li ldquoA stochastic-based FPGA controller for aninduction motor drive with integrated neural network algorithmsrdquo IEEETransactions on Industrial Electronics vol 55 no 2 pp 551ndash561 2008

[10] Y Ji F Ran C Ma and D J Lilja ldquoA hardware implementationof a radial basis function neural network using stochastic logicrdquo inProceedings of the Design Automation amp Test in Europe Conference(DATE) 2015 pp 880ndash883

[11] V Canals A Morro A Oliver M L Alomar and J L RosselloldquoA new stochastic computing methodology for efficient neural networkimplementationrdquo IEEE Transactions on Neural Networks and LearningSystems vol 27 no 3 pp 551ndash564 2016

[12] A Ardakani F Leduc-Primeau N Onizawa T Hanyu and W J GrossldquoVLSI implementation of deep neural network using integral stochasticcomputingrdquo IEEE Transactions on Very Large Scale Integration (VLSI)Systems vol 25 no 10 pp 2688ndash2699 2017

[13] A Ren Z Li C Ding Q Qiu Y Wang J Li X Qian and B YuanldquoSC-DCNN Highly-scalable deep convolutional neural network us-ing stochastic computingrdquo in Proceedings of the 22nd InternationalConference on Architectural Support for Programming Languages andOperating Systems (ASPLOS) 2017 pp 405ndash418

[14] V T Lee A Alaghi J P Hayes V Sathe and L Ceze ldquoEnergy-efficienthybrid stochastic-binary neural networks for near-sensor computingrdquo inDesign Automation amp Test in Europe Conference amp Exhibition (DATE)2017 pp 13ndash18

[15] R Hojabr K Givaki S Tayaranian P Esfahanian A Khonsari D Rah-mati and M H Najafi ldquoSkippyNN An embedded stochastic-computingaccelerator for convolutional neural networksrdquo in Proceedings of the56th Annual Design Automation Conference (DAC) 2019 pp 1ndash6

[16] Y Liu L Liu F Lombardi and J Han ldquoAn energy-efficient and noise-tolerant recurrent neural network using stochastic computingrdquo IEEETransactions on Very Large Scale Integration (VLSI) Systems vol 27no 9 pp 2213ndash2221 Sep 2019

[17] R Cai A Ren O Chen N Liu C Ding X Qian J Han W LuoN Yoshikawa and Y Wang ldquoA stochastic-computing based deeplearning framework using adiabatic quantum-flux-parametron supercon-ducting technologyrdquo in Proceedings of the 46th International Symposiumon Computer Architecture ser ISCA rsquo19 New York NY USA ACM2019 pp 567ndash578

[18] A Alaghi and J P Hayes ldquoSurvey of stochastic computingrdquo ACMTrans Embed Comput Syst vol 12 no 2s pp 921ndash9219 May 2013

[19] A Zhakatayev S Lee H Sim and J Lee ldquoSign-magnitude SC getting10x accuracy for free in stochastic computing for deep neural networksrdquoin Proceedings of the 55th Annual Design Automation Conference(DAC) 2018 pp 1ndash6

[20] P K Gupta and R Kumaresan ldquoBinary multiplication with PN se-quencesrdquo IEEE Transactions on Acoustics Speech and Signal Process-ing vol 36 no 4 pp 603ndash606 1988

[21] A Alaghi and J P Hayes ldquoFast and accurate computation using stochas-tic circuitsrdquo in Proceedings of the conference on Design Automation ampTest in Europe (DATE) 2014 pp 1ndash4

[22] S Liu and J Han ldquoEnergy efficient stochastic computing with Sobolsequencesrdquo in Design Automation amp Test in Europe Conference ampExhibition (DATE) 2017 pp 650ndash653

[23] H Niederreiter Random Number Generation and quasi-Monte CarloMethods Philadelphia PA USA Society for Industrial and AppliedMathematics 1992

[24] P Bratley and B L Fox ldquoAlgorithm 659 Implementing Sobolrsquosquasirandom sequence generatorrdquo ACM Transactions on MathematicalSoftware (TOMS) vol 14 no 1 pp 88ndash100 1988

[25] S Liu and J Han ldquoToward energy-efficient stochastic circuits usingparallel sobol sequencesrdquo IEEE Transactions on Very Large ScaleIntegration (VLSI) Systems vol 26 no 7 pp 1326ndash1339 July 2018

[26] A Alaghi and J P Hayes ldquoSTRAUSS Spectral transform use instochastic circuit synthesisrdquo IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems vol 34 no 11 pp 1770ndash1783 2015

[27] P Li D J Lilja W Qian K Bazargan and M Riedel ldquoThe synthesisof complex arithmetic computation on stochastic bit streams usingsequential logicrdquo in Proceedings of the International Conference onComputer-Aided Design (ICCAD) 2012 pp 480ndash487

[28] P Mars and W J Poppelbaum Stochastic and deterministic averagingprocessors Peter Peregrinus Press 1981 no 1

[29] Z Li A Ren J Li Q Qiu Y Wang and B Yuan ldquoDSCNN Hardware-oriented optimization for stochastic computing based deep convolutionalneural networksrdquo in IEEE 34th International Conference on ComputerDesign (ICCD) 2016 pp 678ndash681

[30] A Ren Z Li Y Wang Q Qiu and B Yuan ldquoDesigning reconfigurablelarge-scale deep learning systems using stochastic computingrdquo in IEEEInternational Conference on Rebooting Computing (ICRC) 2016 pp1ndash7

[31] K Kim J Kim J Yu J Seo J Lee and K Choi ldquoDynamic energy-accuracy trade-off using stochastic computing in deep neural networksrdquoin Proceedings of the 53rd Annual Design Automation Conference(DAC) 2016 pp 1241ndash1246

[32] H Sim and J Lee ldquoA new stochastic computing multiplier withapplication to deep convolutional neural networksrdquo in Proceedings ofthe 54th Annual Design Automation Conference (DAC) 2017 pp 1ndash6

[33] Y Liu Y Wang F Lombardi and J Han ldquoAn energy-efficient online-learning stochastic computational deep belief networkrdquo IEEE Journal

on Emerging and Selected Topics in Circuits and Systems vol 8 no 3pp 454ndash465 Sep 2018

[34] J F Keane and L E Atlas ldquoImpulses and stochastic arithmetic forsignal processingrdquo in IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP) vol 2 2001 pp 1257ndash1260

[35] R Kuehnel ldquoBinomial logic extending stochastic computing to high-bandwidth signalsrdquo in Conference Record of the Thirty-Sixth AsilomarConference on Signals Systems and Computers vol 2 2002 pp 1089ndash1093

[36] N Yamamoto H Fujisaka K Haeiwa and T Kamio ldquoNanoelectroniccircuits for stochastic computingrdquo in 6th IEEE Conference on Nanotech-nology (IEEE-NANO) vol 1 2006 pp 306ndash309

[37] K K Parhi and Y Liu ldquoArchitectures for IIR digital filters usingstochastic computingrdquo in IEEE International Symposium on Circuitsand Systems (ISCAS) 2014 pp 373ndash376

[38] B Yuan Y Wang and Z Wang ldquoArea-efficient scaling-free DFTFFTdesign using stochastic computingrdquo IEEE Transactions on Circuits andSystems II Express Briefs vol 63 no 12 pp 1131ndash1135 2016

[39] R Wang J Han B F Cockburn and D G Elliott ldquoStochastic circuitdesign and performance evaluation of vector quantization for differenterror measuresrdquo IEEE Transactions on Very Large Scale Integration(VLSI) Systems vol 24 no 10 pp 3169ndash3183 2016

[40] H Ichihara T Sugino S Ishii T Iwagaki and T Inoue ldquoCompactand accurate digital filters based on stochastic computingrdquo IEEE Trans-actions on Emerging Topics in Computing vol 7 no 1 pp 31ndash432016

[41] T Hammadou M Nilson A Bermak and P Ogunbona ldquoA 96spltimes64 intelligent digital pixel array with extended binary stochasticarithmeticrdquo in Proceedings of the International Symposium on Circuitsand Systems (ISCAS) 2003

[42] A Alaghi C Li and J P Hayes ldquoStochastic circuits for real-timeimage-processing applicationsrdquo in Proceedings of the 50th AnnualDesign Automation Conference (DAC) 2013 pp 1361ndash1366

[43] P Li D J Lilja W Qian K Bazargan and M D Riedel ldquoComputationon stochastic bit streams digital image processing case studiesrdquo IEEETransactions on Very Large Scale Integration (VLSI) Systems vol 22no 3 pp 449ndash462 2014

[44] M H Najafi S Jamali-Zavareh D J Lilja M D Riedel K Bazarganand R Harjani ldquoTime-encoded values for highly efficient stochasticcircuitsrdquo IEEE Transactions on Very Large Scale Integration (VLSI)Systems vol 25 no 5 pp 1644ndash1657 2017

[45] S R Faraji M Hassan Najafi B Li D J Lilja and K BazarganldquoEnergy-efficient convolutional neural networks with deterministic bit-stream processingrdquo in 2019 Design Automation Test in Europe Confer-ence Exhibition (DATE) 2019 pp 1757ndash1762

[46] S Liu and J Han ldquoDynamic stochastic computing for digital signalprocessing applicationsrdquo in Design Automation amp Test in EuropeConference amp Exhibition (DATE) 2020

[47] I L Dalal D Stefan and J Harwayne-Gidansky ldquoLow discrepancysequences for Monte Carlo simulations on reconfigurable platformsrdquo inInternational Conference on Application-Specific Systems Architecturesand Processors (ASAP) 2008 pp 108ndash113

[48] W G McCallum D Hughes-Hallett A M Gleason D O LomenD Lovelock J Tecosky-Feldman T W Tucker D Flath T ThrashK R Rhea et al Multivariable calculus Wiley 1997 vol 200 no 1

[49] S S Tehrani A Naderi G-A Kamendje S Hemati S Mannor andW J Gross ldquoMajority-based tracking forecast memories for stochasticLDPC decodingrdquo IEEE Transactions on Signal Processing vol 58no 9 pp 4883ndash4896 2010

[50] N Saraf K Bazargan D J Lilja and M D Riedel ldquoIIR filtersusing stochastic arithmeticrdquo in Design Automation amp Test in EuropeConference amp Exhibition (DATE) 2014 pp 1ndash6

[51] F Maloberti ldquoNon conventional signal processing by the use of sigmadelta technique a tutorial introductionrdquo in Proceedings of IEEE Inter-national Symposium on Circuits and Systems (ISCAS) vol 6 1992 pp2645ndash2648

[52] J C Butcher The Numerical Analysis of Ordinary Differential Equa-tions Runge-Kutta and General Linear Methods USA Wiley-Interscience 1987

[53] W Gerstner W M Kistler R Naud and L Paninski Neuronaldynamics From single neurons to networks and models of cognitionCambridge University Press 2014

[54] S Liu and J Han ldquoHardware ODE solvers using stochastic circuitsrdquo inProceedings of the 54th Annual Design Automation Conference (DAC)2017 pp 1ndash6

[55] S S Haykin Neural networks and learning machines Pearson UpperSaddle River NJ USA 2009 vol 3

[56] S Liu H Jiang L Liu and J Han ldquoGradient descent using stochasticcircuits for efficient training of learning machinesrdquo IEEE Transactionson Computer-Aided Design of Integrated Circuits and Systems vol 37no 11 pp 2530ndash2541 Nov 2018

[57] R Nishihara P Moritz S Wang A Tumanov W Paul J Schleier-Smith R Liaw M Niknami M I Jordan and I StoicaldquoReal-time machine learning The missing piecesrdquo in Proceedingsof the 16th Workshop on Hot Topics in Operating Systemsser HotOS rsquo17 ACM 2017 pp 106ndash110 [Online] Availablehttpdoiacmorg10114531029803102998

[58] J P Hayes ldquoIntroduction to stochastic computing and its challengesrdquo inProceedings of the 52nd Annual Design Automation Conference (DAC)2015 pp 591ndash593

Page 9: Introduction to Dynamic Stochastic Computing

to a cloud computer

V ERROR REDUCTION AND ASSESSMENT

One major issue in SC is the loss of accuracy [58] Indepen-dent RNGs have been used to avoid correlation and improvethe accuracy of components such as stochastic multipliersThis method inevitably increases the hardware overheadHowever in a stochastic integrator the sharing of RNGs forgenerating the two input sequences reduces the variance thusreducing the error This is shown as follows

The error introduced by DSC at each step of computation isrelated to the variance of the difference of the two accumulatedstochastic bits AiminusBi When independent RNGs are used togenerate Ai and Bi the variance at a single step is given by

Var[Ai minusBi] = E[(Ai minusBi minus E(Ai minusBi))2]

= E[(Ai minusBi)2]minus 2E[Ai minusBi]

(PAi minus PBi) + (PAi minus PBi)2

(12)

where PAi and PBi are the probabilities of Ai and Bi beinglsquo1rsquo respectively ie E[Ai] = PAi and E[Bi] = PBi Since(AiminusBi)

2 is 0 when Ai and Bi are equal or 1 when they aredifferent E[(AiminusBi)

2] equals PAi(1minusPBi)+PBi(1minusPAi)which is the probability that Ai and Bi are different when Ai

and Bi are independently generated1 (12) is then rewritten as

Varind[Ai minusBi] = PAi(1minus PBi) + PBi(1minus PAi)

minus (PAi minus PBi)2

= PAi(1minus PAi) + PBi(1minus PBi)

(13)

On the other hand if the same RNG is used to generate Ai

and Bi the variance of Ai minusBi can be derived from (12) aswell However since only a shared random number is used togenerate Ai and Bi E[(AiminusBi)

2] equals |PAiminusPBi| which

1E[(AiminusBi)2] = Prob[(AiminusBi)

2 = 1]times1+Prob[(AiminusBi)2 = 0]times0

0 5 10 15 20Epoch

0

01

02

03

04

05

Cro

ss e

ntro

py

Signed DSCBipolar DSCFixed-point

(a)

0 5 10 15 20Epoch

88

90

92

94

96

98

Test

ing

accu

racy

(in

perc

enta

ge)

Signed DSCBipolar DSCFixed-point

(b)

Fig 25 (a) Convergence of cross entropy and (b) testingaccuracy of MNIST hand-written digit recognition datasetCross entropy is the cost function to be minimized during thetraining of an NN The weights of the NN are updated by theDSC circuits The DSC circuits using the sign-magnitude andbipolar representations and the fixed-point implementation areconsidered for comparison

gives the probability that Ai and Bi are different Thus (12)can be rewritten as

Varshare[Ai minusBi] = |PAi minus PBi| minus (PAi minus PBi)2

= |PAi minus PBi|(1minus |PAi minus PBi|)(14)

Since Varshare[Ai minus Bi] minus Varind[Ai minus Bi] = 2PBiPAi minus2min(PAi PBi) le 0 we obtain Varshare[Ai minus Bi] leVarind[AiminusBi] for any i = 0 1 2 [54] Therefore sharingthe use of RNGs to generate input stochastic bit streamsimproves the accuracy

For the circuit in Fig 22 though the inputs of the stochasticintegrator are the outputs of the stochastic multipliers insteadof being directly generated by a DSNG the RNG sharingscheme still works in the generated DSSrsquos encoding δ+ andδminus ie it produces a smaller variance in the computed result[56] However an uncorrelated RNG must be used to generatethe DSS encoding the recorded output of the neuron obecause independency is still required in general for accuratestochastic multiplication

The accuracy of the accumulated result is also related to thebit-width of the counter in the stochastic integrator The bitwidth does not only determine the resolution of the result butalso the step size especially in the ODE solver adaptive filterand the gradient descent circuit A larger bit width indicatesa smaller resolution and a smaller step size for fine tuning atthe cost of extra computation time It is shown in [56] that themulti-step variance bound exponentially decreases with the bitwidth of the counter However for the IIR filter the changeof bit-width affects the characteristic of the filter

For the ODE solver circuit it is also found that the use ofquasirandom numbers improves the accuracy compared to theuse of pseudorandom numbers [54] However when the DSS isused to encode signals from the input or images such as in theadaptive filter and the gradient descent algorithm quasirandomand pseudorandom numbers produce similar accuracy Thereason may lie in the fact that the values of the encodedsignals in these two applications are independently selectedfrom the training dataset However for the ODE solver thesignals encoded are samples from continuous functions andadjacent samples have similar values As a result withina short segment of the DSS in this case it approximatelyencodes the same value It then works similarly as CSCfor which the Sobol sequence helps improving the accuracyFig 26 compares the convergence of misalignment duringthe training of the adaptive filter using pseudorandom andquasirandom numbers 16-bit stochastic integrators are usedin this experiment The misalignment is given by

Misalignment =E[(w minus w)2]

E[w2] (15)

where w denotes the target value and w is the trained resultIt can be seen that the convergence speed of using these twotypes of random numbers is similar and their misalignmentsboth converge to around -52 dB

TABLE II Hardware evaluation of the gradient descent circuitarray training a 784-128-128-10 neural network

Metrics DSC Fixed-point Ratio

Step size 2minus10 2minus10 -Epochs 20 20 -Min time (ns) 16times 106 47times 106 13EPO (fJ) 12times 107 11times 108 192TPA (imagesmicrom2) 57times 101 15 381Aver test Accu 9704 9749 -

TABLE III Hardware performance comparison of the ODEsolvers for (8)

Metric DSC Fixed-point Ratio

Step size 2minus8 2minus8 -Iterations 3217 3217 -Min time (ns) 257359 855720 133EPO (fJ) 20121 46600 123TPA (wordmicrosmicrom2) 475 058 821RMSE 47times 10minus3 36times 10minus3 -

VI HARDWARE EFFICIENCY EVALUATION

Due to the extremely low sequence length used for encodinga value in DSC the power consumption of DSC circuits ismuch lower than conventional arithmetic circuits Moreoversince each bit in a DSS is used to encode a number DSCis more energy-efficient than CSC The gradient descent andODE solver circuits are considered as examples to show thehardware efficiency Implemented in VHDL the circuits aresynthesized by Synopsys Design Compiler with a 28-nm tech-nology to obtain the power consumption maximum frequencyand hardware measurements The minimum runtime energyper operation (EPO) throughput per area (TPA) and accuracyare reported in Table II for the gradient descent circuits andin Table III for ODE solvers

As can be seen with a slightly degraded accuracy theDSC circuits outperform conventional circuits using a fixed-point representation in speed energy efficiency and hardwareefficiency For the gradient descent circuit since the two RNGscan be shared among the 118282 stochastic integrators (for118282 weights) the improvement in hardware efficiency iseven more significant than the ODE solver for which one RNG

0 05 1 15 2 25 3Number of steps ( 64 input samples) 104

-60

-50

-40

-30

-20

-10

0

Mis

alig

nmen

t (dB

)

PseudorandomQuasirandom

Fig 26 Comparison of the convergence of misalignment usingpseudorandom and quasirandom numbers

is shared between two stochastic integrators However sincethe sharing of RNGs does not significantly affect the criticalpath the performance improvements are similar for these twocircuits

VII CONCLUSION

In this article a new type of stochastic computing dynamicstochastic computing (DSC) is introduced to encode a varyingsignal by using a dynamically variable binary bit streamreferred to as a dynamic stochastic sequence (DSS) A DSCcircuit can efficiently implement an iterative accumulation-based algorithm It is useful in implementing the Euler methodtraining a neural network and an adaptive filter and IIRfiltering In those applications integrations are implementedby stochastic integrators that accumulate the stochastic bits in aDSS Since the computation is performed on single stochasticbits instead of conventional binary values or long stochasticsequences a significant improvement in hardware efficiencyis achieved with a high accuracy

ACKNOWLEDGMENT

This work was supported by the Natural Sciences andEngineering Research Council of Canada (NSERC) underProjects RES0025211 and RES0048688

REFERENCES

[1] B R Gaines Stochastic Computing Systems Boston MA SpringerUS 1969 pp 37ndash172

[2] W Qian X Li M D Riedel K Bazargan and D J Lilja ldquoAnarchitecture for fault-tolerant computation with stochastic logicrdquo IEEETransactions on Computers vol 60 no 1 pp 93ndash105 2011

[3] P Li and D J Lilja ldquoUsing stochastic computing to implementdigital image processing algorithmsrdquo in 2011 IEEE 29th InternationalConference on Computer Design (ICCD) 2011 pp 154ndash161

[4] S C Smithson N Onizawa B H Meyer W J Gross and T HanyuldquoEfficient CMOS invertible logic using stochastic computingrdquo IEEETransactions on Circuits and Systems I Regular Papers vol 66 no 6pp 2263ndash2274 June 2019

[5] W Maass and C M Bishop Pulsed neural networks MIT press 2001[6] Y Liu S Liu Y Wang F Lombardi and J Han ldquoA stochastic

computational multi-layer perceptron with backward propagationrdquo IEEETransactions on Computers vol 67 no 9 pp 1273ndash1286 2018

[7] J Zhao J Shawe-Taylor and M van Daalen ldquoLearning in stochastic bitstream neural networksrdquo Neural Networks vol 9 no 6 pp 991ndash9981996

[8] B D Brown and H C Card ldquoStochastic neural computation Icomputational elementsrdquo IEEE Transactions on Computers vol 50no 9 pp 891ndash905 2001

[9] D Zhang and H Li ldquoA stochastic-based FPGA controller for aninduction motor drive with integrated neural network algorithmsrdquo IEEETransactions on Industrial Electronics vol 55 no 2 pp 551ndash561 2008

[10] Y Ji F Ran C Ma and D J Lilja ldquoA hardware implementationof a radial basis function neural network using stochastic logicrdquo inProceedings of the Design Automation amp Test in Europe Conference(DATE) 2015 pp 880ndash883

[11] V Canals A Morro A Oliver M L Alomar and J L RosselloldquoA new stochastic computing methodology for efficient neural networkimplementationrdquo IEEE Transactions on Neural Networks and LearningSystems vol 27 no 3 pp 551ndash564 2016

[12] A Ardakani F Leduc-Primeau N Onizawa T Hanyu and W J GrossldquoVLSI implementation of deep neural network using integral stochasticcomputingrdquo IEEE Transactions on Very Large Scale Integration (VLSI)Systems vol 25 no 10 pp 2688ndash2699 2017

[13] A Ren Z Li C Ding Q Qiu Y Wang J Li X Qian and B YuanldquoSC-DCNN Highly-scalable deep convolutional neural network us-ing stochastic computingrdquo in Proceedings of the 22nd InternationalConference on Architectural Support for Programming Languages andOperating Systems (ASPLOS) 2017 pp 405ndash418

[14] V T Lee A Alaghi J P Hayes V Sathe and L Ceze ldquoEnergy-efficienthybrid stochastic-binary neural networks for near-sensor computingrdquo inDesign Automation amp Test in Europe Conference amp Exhibition (DATE)2017 pp 13ndash18

[15] R Hojabr K Givaki S Tayaranian P Esfahanian A Khonsari D Rah-mati and M H Najafi ldquoSkippyNN An embedded stochastic-computingaccelerator for convolutional neural networksrdquo in Proceedings of the56th Annual Design Automation Conference (DAC) 2019 pp 1ndash6

[16] Y Liu L Liu F Lombardi and J Han ldquoAn energy-efficient and noise-tolerant recurrent neural network using stochastic computingrdquo IEEETransactions on Very Large Scale Integration (VLSI) Systems vol 27no 9 pp 2213ndash2221 Sep 2019

[17] R Cai A Ren O Chen N Liu C Ding X Qian J Han W LuoN Yoshikawa and Y Wang ldquoA stochastic-computing based deeplearning framework using adiabatic quantum-flux-parametron supercon-ducting technologyrdquo in Proceedings of the 46th International Symposiumon Computer Architecture ser ISCA rsquo19 New York NY USA ACM2019 pp 567ndash578

[18] A Alaghi and J P Hayes ldquoSurvey of stochastic computingrdquo ACMTrans Embed Comput Syst vol 12 no 2s pp 921ndash9219 May 2013

[19] A Zhakatayev S Lee H Sim and J Lee ldquoSign-magnitude SC getting10x accuracy for free in stochastic computing for deep neural networksrdquoin Proceedings of the 55th Annual Design Automation Conference(DAC) 2018 pp 1ndash6

[20] P K Gupta and R Kumaresan ldquoBinary multiplication with PN se-quencesrdquo IEEE Transactions on Acoustics Speech and Signal Process-ing vol 36 no 4 pp 603ndash606 1988

[21] A Alaghi and J P Hayes ldquoFast and accurate computation using stochas-tic circuitsrdquo in Proceedings of the conference on Design Automation ampTest in Europe (DATE) 2014 pp 1ndash4

[22] S Liu and J Han ldquoEnergy efficient stochastic computing with Sobolsequencesrdquo in Design Automation amp Test in Europe Conference ampExhibition (DATE) 2017 pp 650ndash653

[23] H Niederreiter Random Number Generation and quasi-Monte CarloMethods Philadelphia PA USA Society for Industrial and AppliedMathematics 1992

[24] P Bratley and B L Fox ldquoAlgorithm 659 Implementing Sobolrsquosquasirandom sequence generatorrdquo ACM Transactions on MathematicalSoftware (TOMS) vol 14 no 1 pp 88ndash100 1988

[25] S Liu and J Han ldquoToward energy-efficient stochastic circuits usingparallel sobol sequencesrdquo IEEE Transactions on Very Large ScaleIntegration (VLSI) Systems vol 26 no 7 pp 1326ndash1339 July 2018

[26] A Alaghi and J P Hayes ldquoSTRAUSS Spectral transform use instochastic circuit synthesisrdquo IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems vol 34 no 11 pp 1770ndash1783 2015

[27] P Li D J Lilja W Qian K Bazargan and M Riedel ldquoThe synthesisof complex arithmetic computation on stochastic bit streams usingsequential logicrdquo in Proceedings of the International Conference onComputer-Aided Design (ICCAD) 2012 pp 480ndash487

[28] P Mars and W J Poppelbaum Stochastic and deterministic averagingprocessors Peter Peregrinus Press 1981 no 1

[29] Z Li A Ren J Li Q Qiu Y Wang and B Yuan ldquoDSCNN Hardware-oriented optimization for stochastic computing based deep convolutionalneural networksrdquo in IEEE 34th International Conference on ComputerDesign (ICCD) 2016 pp 678ndash681

[30] A Ren Z Li Y Wang Q Qiu and B Yuan ldquoDesigning reconfigurablelarge-scale deep learning systems using stochastic computingrdquo in IEEEInternational Conference on Rebooting Computing (ICRC) 2016 pp1ndash7

[31] K Kim J Kim J Yu J Seo J Lee and K Choi ldquoDynamic energy-accuracy trade-off using stochastic computing in deep neural networksrdquoin Proceedings of the 53rd Annual Design Automation Conference(DAC) 2016 pp 1241ndash1246

[32] H Sim and J Lee ldquoA new stochastic computing multiplier withapplication to deep convolutional neural networksrdquo in Proceedings ofthe 54th Annual Design Automation Conference (DAC) 2017 pp 1ndash6

[33] Y Liu Y Wang F Lombardi and J Han ldquoAn energy-efficient online-learning stochastic computational deep belief networkrdquo IEEE Journal

on Emerging and Selected Topics in Circuits and Systems vol 8 no 3pp 454ndash465 Sep 2018

[34] J F Keane and L E Atlas ldquoImpulses and stochastic arithmetic forsignal processingrdquo in IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP) vol 2 2001 pp 1257ndash1260

[35] R Kuehnel ldquoBinomial logic extending stochastic computing to high-bandwidth signalsrdquo in Conference Record of the Thirty-Sixth AsilomarConference on Signals Systems and Computers vol 2 2002 pp 1089ndash1093

[36] N Yamamoto H Fujisaka K Haeiwa and T Kamio ldquoNanoelectroniccircuits for stochastic computingrdquo in 6th IEEE Conference on Nanotech-nology (IEEE-NANO) vol 1 2006 pp 306ndash309

[37] K K Parhi and Y Liu ldquoArchitectures for IIR digital filters usingstochastic computingrdquo in IEEE International Symposium on Circuitsand Systems (ISCAS) 2014 pp 373ndash376

[38] B Yuan Y Wang and Z Wang ldquoArea-efficient scaling-free DFTFFTdesign using stochastic computingrdquo IEEE Transactions on Circuits andSystems II Express Briefs vol 63 no 12 pp 1131ndash1135 2016

[39] R Wang J Han B F Cockburn and D G Elliott ldquoStochastic circuitdesign and performance evaluation of vector quantization for differenterror measuresrdquo IEEE Transactions on Very Large Scale Integration(VLSI) Systems vol 24 no 10 pp 3169ndash3183 2016

[40] H Ichihara T Sugino S Ishii T Iwagaki and T Inoue ldquoCompactand accurate digital filters based on stochastic computingrdquo IEEE Trans-actions on Emerging Topics in Computing vol 7 no 1 pp 31ndash432016

[41] T Hammadou M Nilson A Bermak and P Ogunbona ldquoA 96spltimes64 intelligent digital pixel array with extended binary stochasticarithmeticrdquo in Proceedings of the International Symposium on Circuitsand Systems (ISCAS) 2003

[42] A Alaghi C Li and J P Hayes ldquoStochastic circuits for real-timeimage-processing applicationsrdquo in Proceedings of the 50th AnnualDesign Automation Conference (DAC) 2013 pp 1361ndash1366

[43] P Li D J Lilja W Qian K Bazargan and M D Riedel ldquoComputationon stochastic bit streams digital image processing case studiesrdquo IEEETransactions on Very Large Scale Integration (VLSI) Systems vol 22no 3 pp 449ndash462 2014

[44] M H Najafi S Jamali-Zavareh D J Lilja M D Riedel K Bazarganand R Harjani ldquoTime-encoded values for highly efficient stochasticcircuitsrdquo IEEE Transactions on Very Large Scale Integration (VLSI)Systems vol 25 no 5 pp 1644ndash1657 2017

[45] S R Faraji M Hassan Najafi B Li D J Lilja and K BazarganldquoEnergy-efficient convolutional neural networks with deterministic bit-stream processingrdquo in 2019 Design Automation Test in Europe Confer-ence Exhibition (DATE) 2019 pp 1757ndash1762

[46] S Liu and J Han ldquoDynamic stochastic computing for digital signalprocessing applicationsrdquo in Design Automation amp Test in EuropeConference amp Exhibition (DATE) 2020

[47] I L Dalal D Stefan and J Harwayne-Gidansky ldquoLow discrepancysequences for Monte Carlo simulations on reconfigurable platformsrdquo inInternational Conference on Application-Specific Systems Architecturesand Processors (ASAP) 2008 pp 108ndash113

[48] W G McCallum D Hughes-Hallett A M Gleason D O LomenD Lovelock J Tecosky-Feldman T W Tucker D Flath T ThrashK R Rhea et al Multivariable calculus Wiley 1997 vol 200 no 1

[49] S S Tehrani A Naderi G-A Kamendje S Hemati S Mannor andW J Gross ldquoMajority-based tracking forecast memories for stochasticLDPC decodingrdquo IEEE Transactions on Signal Processing vol 58no 9 pp 4883ndash4896 2010

[50] N Saraf K Bazargan D J Lilja and M D Riedel ldquoIIR filtersusing stochastic arithmeticrdquo in Design Automation amp Test in EuropeConference amp Exhibition (DATE) 2014 pp 1ndash6

[51] F Maloberti ldquoNon conventional signal processing by the use of sigmadelta technique a tutorial introductionrdquo in Proceedings of IEEE Inter-national Symposium on Circuits and Systems (ISCAS) vol 6 1992 pp2645ndash2648

[52] J C Butcher The Numerical Analysis of Ordinary Differential Equa-tions Runge-Kutta and General Linear Methods USA Wiley-Interscience 1987

[53] W Gerstner W M Kistler R Naud and L Paninski Neuronaldynamics From single neurons to networks and models of cognitionCambridge University Press 2014

[54] S Liu and J Han ldquoHardware ODE solvers using stochastic circuitsrdquo inProceedings of the 54th Annual Design Automation Conference (DAC)2017 pp 1ndash6

[55] S S Haykin Neural networks and learning machines Pearson UpperSaddle River NJ USA 2009 vol 3

[56] S Liu H Jiang L Liu and J Han ldquoGradient descent using stochasticcircuits for efficient training of learning machinesrdquo IEEE Transactionson Computer-Aided Design of Integrated Circuits and Systems vol 37no 11 pp 2530ndash2541 Nov 2018

[57] R Nishihara P Moritz S Wang A Tumanov W Paul J Schleier-Smith R Liaw M Niknami M I Jordan and I StoicaldquoReal-time machine learning The missing piecesrdquo in Proceedingsof the 16th Workshop on Hot Topics in Operating Systemsser HotOS rsquo17 ACM 2017 pp 106ndash110 [Online] Availablehttpdoiacmorg10114531029803102998

[58] J P Hayes ldquoIntroduction to stochastic computing and its challengesrdquo inProceedings of the 52nd Annual Design Automation Conference (DAC)2015 pp 591ndash593

Page 10: Introduction to Dynamic Stochastic Computing

TABLE II Hardware evaluation of the gradient descent circuitarray training a 784-128-128-10 neural network

Metrics DSC Fixed-point Ratio

Step size 2minus10 2minus10 -Epochs 20 20 -Min time (ns) 16times 106 47times 106 13EPO (fJ) 12times 107 11times 108 192TPA (imagesmicrom2) 57times 101 15 381Aver test Accu 9704 9749 -

TABLE III Hardware performance comparison of the ODEsolvers for (8)

Metric DSC Fixed-point Ratio

Step size 2minus8 2minus8 -Iterations 3217 3217 -Min time (ns) 257359 855720 133EPO (fJ) 20121 46600 123TPA (wordmicrosmicrom2) 475 058 821RMSE 47times 10minus3 36times 10minus3 -

VI HARDWARE EFFICIENCY EVALUATION

Due to the extremely low sequence length used for encodinga value in DSC the power consumption of DSC circuits ismuch lower than conventional arithmetic circuits Moreoversince each bit in a DSS is used to encode a number DSCis more energy-efficient than CSC The gradient descent andODE solver circuits are considered as examples to show thehardware efficiency Implemented in VHDL the circuits aresynthesized by Synopsys Design Compiler with a 28-nm tech-nology to obtain the power consumption maximum frequencyand hardware measurements The minimum runtime energyper operation (EPO) throughput per area (TPA) and accuracyare reported in Table II for the gradient descent circuits andin Table III for ODE solvers

As can be seen with a slightly degraded accuracy theDSC circuits outperform conventional circuits using a fixed-point representation in speed energy efficiency and hardwareefficiency For the gradient descent circuit since the two RNGscan be shared among the 118282 stochastic integrators (for118282 weights) the improvement in hardware efficiency iseven more significant than the ODE solver for which one RNG

0 05 1 15 2 25 3Number of steps ( 64 input samples) 104

-60

-50

-40

-30

-20

-10

0

Mis

alig

nmen

t (dB

)

PseudorandomQuasirandom

Fig 26 Comparison of the convergence of misalignment usingpseudorandom and quasirandom numbers

is shared between two stochastic integrators However sincethe sharing of RNGs does not significantly affect the criticalpath the performance improvements are similar for these twocircuits

VII CONCLUSION

In this article a new type of stochastic computing dynamicstochastic computing (DSC) is introduced to encode a varyingsignal by using a dynamically variable binary bit streamreferred to as a dynamic stochastic sequence (DSS) A DSCcircuit can efficiently implement an iterative accumulation-based algorithm It is useful in implementing the Euler methodtraining a neural network and an adaptive filter and IIRfiltering In those applications integrations are implementedby stochastic integrators that accumulate the stochastic bits in aDSS Since the computation is performed on single stochasticbits instead of conventional binary values or long stochasticsequences a significant improvement in hardware efficiencyis achieved with a high accuracy

ACKNOWLEDGMENT

This work was supported by the Natural Sciences andEngineering Research Council of Canada (NSERC) underProjects RES0025211 and RES0048688

REFERENCES

[1] B R Gaines Stochastic Computing Systems Boston MA SpringerUS 1969 pp 37ndash172

[2] W Qian X Li M D Riedel K Bazargan and D J Lilja ldquoAnarchitecture for fault-tolerant computation with stochastic logicrdquo IEEETransactions on Computers vol 60 no 1 pp 93ndash105 2011

[3] P Li and D J Lilja ldquoUsing stochastic computing to implementdigital image processing algorithmsrdquo in 2011 IEEE 29th InternationalConference on Computer Design (ICCD) 2011 pp 154ndash161

[4] S C Smithson N Onizawa B H Meyer W J Gross and T HanyuldquoEfficient CMOS invertible logic using stochastic computingrdquo IEEETransactions on Circuits and Systems I Regular Papers vol 66 no 6pp 2263ndash2274 June 2019

[5] W Maass and C M Bishop Pulsed neural networks MIT press 2001[6] Y Liu S Liu Y Wang F Lombardi and J Han ldquoA stochastic

computational multi-layer perceptron with backward propagationrdquo IEEETransactions on Computers vol 67 no 9 pp 1273ndash1286 2018

[7] J Zhao J Shawe-Taylor and M van Daalen ldquoLearning in stochastic bitstream neural networksrdquo Neural Networks vol 9 no 6 pp 991ndash9981996

[8] B D Brown and H C Card ldquoStochastic neural computation Icomputational elementsrdquo IEEE Transactions on Computers vol 50no 9 pp 891ndash905 2001

[9] D Zhang and H Li ldquoA stochastic-based FPGA controller for aninduction motor drive with integrated neural network algorithmsrdquo IEEETransactions on Industrial Electronics vol 55 no 2 pp 551ndash561 2008

[10] Y Ji F Ran C Ma and D J Lilja ldquoA hardware implementationof a radial basis function neural network using stochastic logicrdquo inProceedings of the Design Automation amp Test in Europe Conference(DATE) 2015 pp 880ndash883

[11] V Canals A Morro A Oliver M L Alomar and J L RosselloldquoA new stochastic computing methodology for efficient neural networkimplementationrdquo IEEE Transactions on Neural Networks and LearningSystems vol 27 no 3 pp 551ndash564 2016

[12] A Ardakani F Leduc-Primeau N Onizawa T Hanyu and W J GrossldquoVLSI implementation of deep neural network using integral stochasticcomputingrdquo IEEE Transactions on Very Large Scale Integration (VLSI)Systems vol 25 no 10 pp 2688ndash2699 2017

[13] A Ren Z Li C Ding Q Qiu Y Wang J Li X Qian and B YuanldquoSC-DCNN Highly-scalable deep convolutional neural network us-ing stochastic computingrdquo in Proceedings of the 22nd InternationalConference on Architectural Support for Programming Languages andOperating Systems (ASPLOS) 2017 pp 405ndash418

[14] V T Lee A Alaghi J P Hayes V Sathe and L Ceze ldquoEnergy-efficienthybrid stochastic-binary neural networks for near-sensor computingrdquo inDesign Automation amp Test in Europe Conference amp Exhibition (DATE)2017 pp 13ndash18

[15] R Hojabr K Givaki S Tayaranian P Esfahanian A Khonsari D Rah-mati and M H Najafi ldquoSkippyNN An embedded stochastic-computingaccelerator for convolutional neural networksrdquo in Proceedings of the56th Annual Design Automation Conference (DAC) 2019 pp 1ndash6

[16] Y Liu L Liu F Lombardi and J Han ldquoAn energy-efficient and noise-tolerant recurrent neural network using stochastic computingrdquo IEEETransactions on Very Large Scale Integration (VLSI) Systems vol 27no 9 pp 2213ndash2221 Sep 2019

[17] R Cai A Ren O Chen N Liu C Ding X Qian J Han W LuoN Yoshikawa and Y Wang ldquoA stochastic-computing based deeplearning framework using adiabatic quantum-flux-parametron supercon-ducting technologyrdquo in Proceedings of the 46th International Symposiumon Computer Architecture ser ISCA rsquo19 New York NY USA ACM2019 pp 567ndash578

[18] A Alaghi and J P Hayes ldquoSurvey of stochastic computingrdquo ACMTrans Embed Comput Syst vol 12 no 2s pp 921ndash9219 May 2013

[19] A Zhakatayev S Lee H Sim and J Lee ldquoSign-magnitude SC getting10x accuracy for free in stochastic computing for deep neural networksrdquoin Proceedings of the 55th Annual Design Automation Conference(DAC) 2018 pp 1ndash6

[20] P K Gupta and R Kumaresan ldquoBinary multiplication with PN se-quencesrdquo IEEE Transactions on Acoustics Speech and Signal Process-ing vol 36 no 4 pp 603ndash606 1988

[21] A Alaghi and J P Hayes ldquoFast and accurate computation using stochas-tic circuitsrdquo in Proceedings of the conference on Design Automation ampTest in Europe (DATE) 2014 pp 1ndash4

[22] S Liu and J Han ldquoEnergy efficient stochastic computing with Sobolsequencesrdquo in Design Automation amp Test in Europe Conference ampExhibition (DATE) 2017 pp 650ndash653

[23] H Niederreiter Random Number Generation and quasi-Monte CarloMethods Philadelphia PA USA Society for Industrial and AppliedMathematics 1992

[24] P Bratley and B L Fox ldquoAlgorithm 659 Implementing Sobolrsquosquasirandom sequence generatorrdquo ACM Transactions on MathematicalSoftware (TOMS) vol 14 no 1 pp 88ndash100 1988

[25] S Liu and J Han ldquoToward energy-efficient stochastic circuits usingparallel sobol sequencesrdquo IEEE Transactions on Very Large ScaleIntegration (VLSI) Systems vol 26 no 7 pp 1326ndash1339 July 2018

[26] A Alaghi and J P Hayes ldquoSTRAUSS Spectral transform use instochastic circuit synthesisrdquo IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems vol 34 no 11 pp 1770ndash1783 2015

[27] P Li D J Lilja W Qian K Bazargan and M Riedel ldquoThe synthesisof complex arithmetic computation on stochastic bit streams usingsequential logicrdquo in Proceedings of the International Conference onComputer-Aided Design (ICCAD) 2012 pp 480ndash487

[28] P Mars and W J Poppelbaum Stochastic and deterministic averagingprocessors Peter Peregrinus Press 1981 no 1

[29] Z Li A Ren J Li Q Qiu Y Wang and B Yuan ldquoDSCNN Hardware-oriented optimization for stochastic computing based deep convolutionalneural networksrdquo in IEEE 34th International Conference on ComputerDesign (ICCD) 2016 pp 678ndash681

[30] A Ren Z Li Y Wang Q Qiu and B Yuan ldquoDesigning reconfigurablelarge-scale deep learning systems using stochastic computingrdquo in IEEEInternational Conference on Rebooting Computing (ICRC) 2016 pp1ndash7

[31] K Kim J Kim J Yu J Seo J Lee and K Choi ldquoDynamic energy-accuracy trade-off using stochastic computing in deep neural networksrdquoin Proceedings of the 53rd Annual Design Automation Conference(DAC) 2016 pp 1241ndash1246

[32] H Sim and J Lee ldquoA new stochastic computing multiplier withapplication to deep convolutional neural networksrdquo in Proceedings ofthe 54th Annual Design Automation Conference (DAC) 2017 pp 1ndash6

[33] Y Liu Y Wang F Lombardi and J Han ldquoAn energy-efficient online-learning stochastic computational deep belief networkrdquo IEEE Journal

on Emerging and Selected Topics in Circuits and Systems vol 8 no 3pp 454ndash465 Sep 2018

[34] J F Keane and L E Atlas ldquoImpulses and stochastic arithmetic forsignal processingrdquo in IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP) vol 2 2001 pp 1257ndash1260

[35] R Kuehnel ldquoBinomial logic extending stochastic computing to high-bandwidth signalsrdquo in Conference Record of the Thirty-Sixth AsilomarConference on Signals Systems and Computers vol 2 2002 pp 1089ndash1093

[36] N Yamamoto H Fujisaka K Haeiwa and T Kamio ldquoNanoelectroniccircuits for stochastic computingrdquo in 6th IEEE Conference on Nanotech-nology (IEEE-NANO) vol 1 2006 pp 306ndash309

[37] K K Parhi and Y Liu ldquoArchitectures for IIR digital filters usingstochastic computingrdquo in IEEE International Symposium on Circuitsand Systems (ISCAS) 2014 pp 373ndash376

[38] B Yuan Y Wang and Z Wang ldquoArea-efficient scaling-free DFTFFTdesign using stochastic computingrdquo IEEE Transactions on Circuits andSystems II Express Briefs vol 63 no 12 pp 1131ndash1135 2016

[39] R Wang J Han B F Cockburn and D G Elliott ldquoStochastic circuitdesign and performance evaluation of vector quantization for differenterror measuresrdquo IEEE Transactions on Very Large Scale Integration(VLSI) Systems vol 24 no 10 pp 3169ndash3183 2016

[40] H Ichihara T Sugino S Ishii T Iwagaki and T Inoue ldquoCompactand accurate digital filters based on stochastic computingrdquo IEEE Trans-actions on Emerging Topics in Computing vol 7 no 1 pp 31ndash432016

[41] T Hammadou M Nilson A Bermak and P Ogunbona ldquoA 96spltimes64 intelligent digital pixel array with extended binary stochasticarithmeticrdquo in Proceedings of the International Symposium on Circuitsand Systems (ISCAS) 2003

[42] A Alaghi C Li and J P Hayes ldquoStochastic circuits for real-timeimage-processing applicationsrdquo in Proceedings of the 50th AnnualDesign Automation Conference (DAC) 2013 pp 1361ndash1366

[43] P Li D J Lilja W Qian K Bazargan and M D Riedel ldquoComputationon stochastic bit streams digital image processing case studiesrdquo IEEETransactions on Very Large Scale Integration (VLSI) Systems vol 22no 3 pp 449ndash462 2014

[44] M H Najafi S Jamali-Zavareh D J Lilja M D Riedel K Bazarganand R Harjani ldquoTime-encoded values for highly efficient stochasticcircuitsrdquo IEEE Transactions on Very Large Scale Integration (VLSI)Systems vol 25 no 5 pp 1644ndash1657 2017

[45] S R Faraji M Hassan Najafi B Li D J Lilja and K BazarganldquoEnergy-efficient convolutional neural networks with deterministic bit-stream processingrdquo in 2019 Design Automation Test in Europe Confer-ence Exhibition (DATE) 2019 pp 1757ndash1762

[46] S Liu and J Han ldquoDynamic stochastic computing for digital signalprocessing applicationsrdquo in Design Automation amp Test in EuropeConference amp Exhibition (DATE) 2020

[47] I L Dalal D Stefan and J Harwayne-Gidansky ldquoLow discrepancysequences for Monte Carlo simulations on reconfigurable platformsrdquo inInternational Conference on Application-Specific Systems Architecturesand Processors (ASAP) 2008 pp 108ndash113

[48] W G McCallum D Hughes-Hallett A M Gleason D O LomenD Lovelock J Tecosky-Feldman T W Tucker D Flath T ThrashK R Rhea et al Multivariable calculus Wiley 1997 vol 200 no 1

[49] S S Tehrani A Naderi G-A Kamendje S Hemati S Mannor andW J Gross ldquoMajority-based tracking forecast memories for stochasticLDPC decodingrdquo IEEE Transactions on Signal Processing vol 58no 9 pp 4883ndash4896 2010

[50] N Saraf K Bazargan D J Lilja and M D Riedel ldquoIIR filtersusing stochastic arithmeticrdquo in Design Automation amp Test in EuropeConference amp Exhibition (DATE) 2014 pp 1ndash6

[51] F Maloberti ldquoNon conventional signal processing by the use of sigmadelta technique a tutorial introductionrdquo in Proceedings of IEEE Inter-national Symposium on Circuits and Systems (ISCAS) vol 6 1992 pp2645ndash2648

[52] J C Butcher The Numerical Analysis of Ordinary Differential Equa-tions Runge-Kutta and General Linear Methods USA Wiley-Interscience 1987

[53] W Gerstner W M Kistler R Naud and L Paninski Neuronaldynamics From single neurons to networks and models of cognitionCambridge University Press 2014

[54] S Liu and J Han ldquoHardware ODE solvers using stochastic circuitsrdquo inProceedings of the 54th Annual Design Automation Conference (DAC)2017 pp 1ndash6

[55] S S Haykin Neural networks and learning machines Pearson UpperSaddle River NJ USA 2009 vol 3

[56] S Liu H Jiang L Liu and J Han ldquoGradient descent using stochasticcircuits for efficient training of learning machinesrdquo IEEE Transactionson Computer-Aided Design of Integrated Circuits and Systems vol 37no 11 pp 2530ndash2541 Nov 2018

[57] R Nishihara P Moritz S Wang A Tumanov W Paul J Schleier-Smith R Liaw M Niknami M I Jordan and I StoicaldquoReal-time machine learning The missing piecesrdquo in Proceedingsof the 16th Workshop on Hot Topics in Operating Systemsser HotOS rsquo17 ACM 2017 pp 106ndash110 [Online] Availablehttpdoiacmorg10114531029803102998

[58] J P Hayes ldquoIntroduction to stochastic computing and its challengesrdquo inProceedings of the 52nd Annual Design Automation Conference (DAC)2015 pp 591ndash593

Page 11: Introduction to Dynamic Stochastic Computing

[13] A Ren Z Li C Ding Q Qiu Y Wang J Li X Qian and B YuanldquoSC-DCNN Highly-scalable deep convolutional neural network us-ing stochastic computingrdquo in Proceedings of the 22nd InternationalConference on Architectural Support for Programming Languages andOperating Systems (ASPLOS) 2017 pp 405ndash418

[14] V T Lee A Alaghi J P Hayes V Sathe and L Ceze ldquoEnergy-efficienthybrid stochastic-binary neural networks for near-sensor computingrdquo inDesign Automation amp Test in Europe Conference amp Exhibition (DATE)2017 pp 13ndash18

[15] R Hojabr K Givaki S Tayaranian P Esfahanian A Khonsari D Rah-mati and M H Najafi ldquoSkippyNN An embedded stochastic-computingaccelerator for convolutional neural networksrdquo in Proceedings of the56th Annual Design Automation Conference (DAC) 2019 pp 1ndash6

[16] Y Liu L Liu F Lombardi and J Han ldquoAn energy-efficient and noise-tolerant recurrent neural network using stochastic computingrdquo IEEETransactions on Very Large Scale Integration (VLSI) Systems vol 27no 9 pp 2213ndash2221 Sep 2019

[17] R Cai A Ren O Chen N Liu C Ding X Qian J Han W LuoN Yoshikawa and Y Wang ldquoA stochastic-computing based deeplearning framework using adiabatic quantum-flux-parametron supercon-ducting technologyrdquo in Proceedings of the 46th International Symposiumon Computer Architecture ser ISCA rsquo19 New York NY USA ACM2019 pp 567ndash578

[18] A Alaghi and J P Hayes ldquoSurvey of stochastic computingrdquo ACMTrans Embed Comput Syst vol 12 no 2s pp 921ndash9219 May 2013

[19] A Zhakatayev S Lee H Sim and J Lee ldquoSign-magnitude SC getting10x accuracy for free in stochastic computing for deep neural networksrdquoin Proceedings of the 55th Annual Design Automation Conference(DAC) 2018 pp 1ndash6

[20] P K Gupta and R Kumaresan ldquoBinary multiplication with PN se-quencesrdquo IEEE Transactions on Acoustics Speech and Signal Process-ing vol 36 no 4 pp 603ndash606 1988

[21] A Alaghi and J P Hayes ldquoFast and accurate computation using stochas-tic circuitsrdquo in Proceedings of the conference on Design Automation ampTest in Europe (DATE) 2014 pp 1ndash4

[22] S Liu and J Han ldquoEnergy efficient stochastic computing with Sobolsequencesrdquo in Design Automation amp Test in Europe Conference ampExhibition (DATE) 2017 pp 650ndash653

[23] H Niederreiter Random Number Generation and quasi-Monte CarloMethods Philadelphia PA USA Society for Industrial and AppliedMathematics 1992

[24] P Bratley and B L Fox ldquoAlgorithm 659 Implementing Sobolrsquosquasirandom sequence generatorrdquo ACM Transactions on MathematicalSoftware (TOMS) vol 14 no 1 pp 88ndash100 1988

[25] S Liu and J Han ldquoToward energy-efficient stochastic circuits usingparallel sobol sequencesrdquo IEEE Transactions on Very Large ScaleIntegration (VLSI) Systems vol 26 no 7 pp 1326ndash1339 July 2018

[26] A Alaghi and J P Hayes ldquoSTRAUSS Spectral transform use instochastic circuit synthesisrdquo IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems vol 34 no 11 pp 1770ndash1783 2015

[27] P Li D J Lilja W Qian K Bazargan and M Riedel ldquoThe synthesisof complex arithmetic computation on stochastic bit streams usingsequential logicrdquo in Proceedings of the International Conference onComputer-Aided Design (ICCAD) 2012 pp 480ndash487

[28] P Mars and W J Poppelbaum Stochastic and deterministic averagingprocessors Peter Peregrinus Press 1981 no 1

[29] Z Li A Ren J Li Q Qiu Y Wang and B Yuan ldquoDSCNN Hardware-oriented optimization for stochastic computing based deep convolutionalneural networksrdquo in IEEE 34th International Conference on ComputerDesign (ICCD) 2016 pp 678ndash681

[30] A Ren Z Li Y Wang Q Qiu and B Yuan ldquoDesigning reconfigurablelarge-scale deep learning systems using stochastic computingrdquo in IEEEInternational Conference on Rebooting Computing (ICRC) 2016 pp1ndash7

[31] K Kim J Kim J Yu J Seo J Lee and K Choi ldquoDynamic energy-accuracy trade-off using stochastic computing in deep neural networksrdquoin Proceedings of the 53rd Annual Design Automation Conference(DAC) 2016 pp 1241ndash1246

[32] H Sim and J Lee ldquoA new stochastic computing multiplier withapplication to deep convolutional neural networksrdquo in Proceedings ofthe 54th Annual Design Automation Conference (DAC) 2017 pp 1ndash6

[33] Y Liu Y Wang F Lombardi and J Han ldquoAn energy-efficient online-learning stochastic computational deep belief networkrdquo IEEE Journal

on Emerging and Selected Topics in Circuits and Systems vol 8 no 3pp 454ndash465 Sep 2018

[34] J F Keane and L E Atlas ldquoImpulses and stochastic arithmetic forsignal processingrdquo in IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP) vol 2 2001 pp 1257ndash1260

[35] R Kuehnel ldquoBinomial logic extending stochastic computing to high-bandwidth signalsrdquo in Conference Record of the Thirty-Sixth AsilomarConference on Signals Systems and Computers vol 2 2002 pp 1089ndash1093

[36] N Yamamoto H Fujisaka K Haeiwa and T Kamio ldquoNanoelectroniccircuits for stochastic computingrdquo in 6th IEEE Conference on Nanotech-nology (IEEE-NANO) vol 1 2006 pp 306ndash309

[37] K K Parhi and Y Liu ldquoArchitectures for IIR digital filters usingstochastic computingrdquo in IEEE International Symposium on Circuitsand Systems (ISCAS) 2014 pp 373ndash376

[38] B Yuan Y Wang and Z Wang ldquoArea-efficient scaling-free DFTFFTdesign using stochastic computingrdquo IEEE Transactions on Circuits andSystems II Express Briefs vol 63 no 12 pp 1131ndash1135 2016

[39] R Wang J Han B F Cockburn and D G Elliott ldquoStochastic circuitdesign and performance evaluation of vector quantization for differenterror measuresrdquo IEEE Transactions on Very Large Scale Integration(VLSI) Systems vol 24 no 10 pp 3169ndash3183 2016

[40] H Ichihara T Sugino S Ishii T Iwagaki and T Inoue ldquoCompactand accurate digital filters based on stochastic computingrdquo IEEE Trans-actions on Emerging Topics in Computing vol 7 no 1 pp 31ndash432016

[41] T Hammadou M Nilson A Bermak and P Ogunbona ldquoA 96spltimes64 intelligent digital pixel array with extended binary stochasticarithmeticrdquo in Proceedings of the International Symposium on Circuitsand Systems (ISCAS) 2003

[42] A Alaghi C Li and J P Hayes ldquoStochastic circuits for real-timeimage-processing applicationsrdquo in Proceedings of the 50th AnnualDesign Automation Conference (DAC) 2013 pp 1361ndash1366

[43] P Li D J Lilja W Qian K Bazargan and M D Riedel ldquoComputationon stochastic bit streams digital image processing case studiesrdquo IEEETransactions on Very Large Scale Integration (VLSI) Systems vol 22no 3 pp 449ndash462 2014

[44] M H Najafi S Jamali-Zavareh D J Lilja M D Riedel K Bazarganand R Harjani ldquoTime-encoded values for highly efficient stochasticcircuitsrdquo IEEE Transactions on Very Large Scale Integration (VLSI)Systems vol 25 no 5 pp 1644ndash1657 2017

[45] S R Faraji M Hassan Najafi B Li D J Lilja and K BazarganldquoEnergy-efficient convolutional neural networks with deterministic bit-stream processingrdquo in 2019 Design Automation Test in Europe Confer-ence Exhibition (DATE) 2019 pp 1757ndash1762

[46] S Liu and J Han ldquoDynamic stochastic computing for digital signalprocessing applicationsrdquo in Design Automation amp Test in EuropeConference amp Exhibition (DATE) 2020

[47] I L Dalal D Stefan and J Harwayne-Gidansky ldquoLow discrepancysequences for Monte Carlo simulations on reconfigurable platformsrdquo inInternational Conference on Application-Specific Systems Architecturesand Processors (ASAP) 2008 pp 108ndash113

[48] W G McCallum D Hughes-Hallett A M Gleason D O LomenD Lovelock J Tecosky-Feldman T W Tucker D Flath T ThrashK R Rhea et al Multivariable calculus Wiley 1997 vol 200 no 1

[49] S S Tehrani A Naderi G-A Kamendje S Hemati S Mannor andW J Gross ldquoMajority-based tracking forecast memories for stochasticLDPC decodingrdquo IEEE Transactions on Signal Processing vol 58no 9 pp 4883ndash4896 2010

[50] N Saraf K Bazargan D J Lilja and M D Riedel ldquoIIR filtersusing stochastic arithmeticrdquo in Design Automation amp Test in EuropeConference amp Exhibition (DATE) 2014 pp 1ndash6

[51] F Maloberti ldquoNon conventional signal processing by the use of sigmadelta technique a tutorial introductionrdquo in Proceedings of IEEE Inter-national Symposium on Circuits and Systems (ISCAS) vol 6 1992 pp2645ndash2648

[52] J C Butcher The Numerical Analysis of Ordinary Differential Equa-tions Runge-Kutta and General Linear Methods USA Wiley-Interscience 1987

[53] W Gerstner W M Kistler R Naud and L Paninski Neuronaldynamics From single neurons to networks and models of cognitionCambridge University Press 2014

[54] S Liu and J Han ldquoHardware ODE solvers using stochastic circuitsrdquo inProceedings of the 54th Annual Design Automation Conference (DAC)2017 pp 1ndash6

[55] S S Haykin Neural networks and learning machines Pearson UpperSaddle River NJ USA 2009 vol 3

[56] S Liu H Jiang L Liu and J Han ldquoGradient descent using stochasticcircuits for efficient training of learning machinesrdquo IEEE Transactionson Computer-Aided Design of Integrated Circuits and Systems vol 37no 11 pp 2530ndash2541 Nov 2018

[57] R Nishihara P Moritz S Wang A Tumanov W Paul J Schleier-Smith R Liaw M Niknami M I Jordan and I StoicaldquoReal-time machine learning The missing piecesrdquo in Proceedingsof the 16th Workshop on Hot Topics in Operating Systemsser HotOS rsquo17 ACM 2017 pp 106ndash110 [Online] Availablehttpdoiacmorg10114531029803102998

[58] J P Hayes ldquoIntroduction to stochastic computing and its challengesrdquo inProceedings of the 52nd Annual Design Automation Conference (DAC)2015 pp 591ndash593

Page 12: Introduction to Dynamic Stochastic Computing

[54] S Liu and J Han ldquoHardware ODE solvers using stochastic circuitsrdquo inProceedings of the 54th Annual Design Automation Conference (DAC)2017 pp 1ndash6

[55] S S Haykin Neural networks and learning machines Pearson UpperSaddle River NJ USA 2009 vol 3

[56] S Liu H Jiang L Liu and J Han ldquoGradient descent using stochasticcircuits for efficient training of learning machinesrdquo IEEE Transactionson Computer-Aided Design of Integrated Circuits and Systems vol 37no 11 pp 2530ndash2541 Nov 2018

[57] R Nishihara P Moritz S Wang A Tumanov W Paul J Schleier-Smith R Liaw M Niknami M I Jordan and I StoicaldquoReal-time machine learning The missing piecesrdquo in Proceedingsof the 16th Workshop on Hot Topics in Operating Systemsser HotOS rsquo17 ACM 2017 pp 106ndash110 [Online] Availablehttpdoiacmorg10114531029803102998

[58] J P Hayes ldquoIntroduction to stochastic computing and its challengesrdquo inProceedings of the 52nd Annual Design Automation Conference (DAC)2015 pp 591ndash593