1 ELEC692 VLSI Signal Processing Architecture Lecture 10 Viterbi Decoder Architecture

1

ELEC692 VLSI Signal Processing ArchitectureLecture 10

Viterbi Decoder Architecture

2

Outline• Convolutional Code Structure

– Encoder Structure

– Finite state machine representation

– Trellis diagram

• Decoding Algorithm– Viterbi Decoder

• Viterbi Decoder VLSI Architecture

3

Convolution Code• Coding – add redundancy on the original bit data for error

checking or error correction– E.g. error checking – parity check code– Error correction code

• Either block codes or convolutional codes. The classification depends on the presence or absence of memory.

• A block codeblock code has no memory.

• Each output codeword of an (n,k) block code depends only on the current buffer.

– K is the # of original data bit and n is the # of encoded bit• The encoder adds n-k redundant bits to the buffered bits.

The added bits are algebraically, related to the buffered bits. • The encoded block contains n bits.• The ratio k/n is known as the code ratecode rate.

4

Convolution Code

• A convolutional coderconvolutional coder may process one of more samples during an encoding cycle. – It is described by 3 integers: n, k, and K.

– k/n = Code Rate (Information/coded bits).

– But n does not define a block or codeword length.

– K is the constraint lengthconstraint length and is a measure of the code redundancy.

– The encoder acts on the serial bit stream as it enters the transmitter.

– Convolutional codes have memorymemory.

– The n-tuple emitted by the encoder is not only a function of an input k-tuple, but is also a function of the previous K-1 input k-tuples.

5

Encoder Structure• Map k bits to n bits using the previous (K-1)k bits• rate k/n code with constraint length K• n generators or polynomial , each is a binary vector with

K bits long• The following shows the case where k=1 (easily

extendable)

• Example: k=1, n=2, K=3, g1=[101]=1+z-2, g2=[111]=1+z-

1+z-2 +

+

Input (b1,b2,…)

Output (c1,c2,c3,c4,…)

6

Basic Channel Coding for Basic Channel Coding for Wideband CDMAWideband CDMA

Convolutional code is Convolutional code is rate 1/3 and rate 1/2rate 1/3 and rate 1/2, all with , all with constraint length 9constraint length 9

Convolutional Codes

Concatenated Codes

7

Convolutional Encoding

• Let m = m1, m2, …, mi, … denote the input message bits.

U = U1, U2, …, Ui, … denote the codeword sequence.with

Ui = u1i, u2i, …, uni, = ith codeword and

uji, = jth binary code symbol of Ui.

• Let Z = Z1, Z2, …, Zi, … denote the demodulated sequence Estimate of the input message bits

with Zi = z1i, z2i, …, zni,,...ˆ,...,ˆ,ˆˆ 21 immmm

8

Convolutional Encoding DecodingConvolutional Encoding Decoding

Modulate

AWGNChannel

DemodulateConvolutionalDecoder

Informationsink

Informationsource

ConvolutionalEncoder

m=m1,m2,….,mi,…Input sequence

U=G(m) =U1,U2,…Ui,.. Codeword sequencewhere Ui=u1j,….,uji,….,uni

si(t)

Z=Z1,Z2,…,Zi,….where Zi=z1i,….,zji,…,zni

and zji is the jth demodulator output Symbol of branch word zi

,...ˆ,...,ˆ,ˆˆ 21 immmm )(ˆ tsi

9

Convolutional Encoding

• A general convolutional encoder with constraint length K and rate k/n consists of.– kK-stage shift register and n mod-2 adders– K = Number of k-bit shifts over which a single information bit can

influence the output.– At each unit of time:

• k bits are shifted into the 1st k stages of the register• All bits in the register are shifted k stages to the right• The outputs of the n adders are sequentially sampled to give the

coded bits.• There are n coded bits for each input group of k information or

message bits. Hence R=k/n information bit/coded bit is the code rate (k < n).

10

Convolutional Encoder (with Constraint Convolutional Encoder (with Constraint length K and rate k/n)length K and rate k/n)

1 2 3 … kKmm = m1 , m2 , …, mi , … Input sequence

(shifted in k at a time)

kK-stage shift register

1 2 . . . nn modulo-2

Adders

Codeword sequence U = U1, U2, … Ui, … where Ui = u1i,…,uji,…,uni,

= ith codeword branch

uji = jth binary code symbol

of branch word Ui.

Typically binary codes for which k=1 are used. Hence, we will mainly consider Rate mainly consider Rate 1/n1/n codes codes

11

Convolutional Codes Representation

• To describe a convolutional code, we must describe the encoding must describe the encoding functionfunction G(m) that characterizes the relationship between the information sequence m and the output coded sequence U.

• There are 4 popular methods for representation– Connection pictorial and Connections polynomials

– State Diagram

– Tree Diagram

– Trellis Diagram

12

Connection Representation

• Specify n connection vectorsconnection vectors, gi, (i=1, …, n) one for each of the n mod-2 adders.

• Each vector has K dimension and describes the connection of the shift register to the mod-2 adders.

• A 1 in the ith position of the connection vector implies shift register is connected.

• A 0 in the the ith position of the connection vector implies no connection exists.

13

Convolutional Encoder (K =3, Rate 1/2)

g1 = 1 1 1g2 = 1 0 1

If Initial Register Content is 0 0 0and Input Sequence is 0 0 1. ThenOutput (or Impulse Response) Sequence is 11 10 11.

Org1(X)=1+X+X2

g2(X)=1+X2

Input Input ...001001...

U1 : First code symbolFirst code symbol

U2 : Second Second

code symbolcode symbol

...111......111... OutputOutput...101......101...

14

Example (for the previous code)Example (for the previous code)

t1

u1

u2

1 11 0 0

t2

u1

u2

1 00 1 0 t5

u1

u2

1 10 0 1

t3

u1

u2

0 01 0 1 t6

u1

u2

0 00 0 0

t4

u1

u2

1 00 1 0

Encoderm=1 0 11 0 1 u

TimeTime OutputOutput TimeTime OutputOutput

Output sequenceOutput sequence: 11 10 00 10 1111 10 00 10 11

Message bits input at t1, t2, t3

(K-1)= 2 zeros are input at t4, t5 to flush

Register. Another 0 inputat t6 to get 00.

15

State Representation

• The statestate of a rate 1/n code = Contents of the rightmost K-1 stages.• Knowledge of the state and the next input is necessary and sufficient to determine

the next output.• Codes can be represented by a State DiagramState Diagram where the states representstates represent the the

possible contents of the rightmost possible contents of the rightmost K-1K-1 stages of the shift register stages of the shift register.• From each state there are only 2 transitionstransitions (to the next state) corresponding to the

2 possible input bits.• The transitions are represented by paths on which we write the output word represented by paths on which we write the output word

associated with the state transitionassociated with the state transition.– A solid line path corresponds to an input bit 0.– A dashed line path corresponds to an input bit 1A dashed line path corresponds to an input bit 1.

16

State Diagram for our CodeState Diagram for our Code(K=3, Rate ½)(K=3, Rate ½)

b=10

a=00

d=11

c=01

11

01

11

00

10

01

OutputOutputBranch wordBranch word

EncoderEncoderstatestate

Legend:Legend:Input bit 0Input bit 1

0010

17

ExampleExampleAssume that m=1101111011 is the input followed by K-1=2 zeros to flush the register. Also assume that the initial register contents are all zeros. Find the output sequence U.

Input bit mi

0 0 0 0 1 0 1 1 0 1 1 0 1 1 0

1

Register contents

State at time ti

State at time ti+1

Branch word at time ti

u1 u2

- 1 1 0 1 1 0 0

0 0 0 1 0 0 1 1 0 0 1 1 1 0 1 1 1 0 0 1 1 0 0 1

0 0 1 0 1 1 0 1 1 0 1 1 0 1 0 0

1 0 0 0 0 0 1

1 1 1 0 1 1 1

State tiState tState ti+1i+1

Output sequence: U = Output sequence: U = 11 01 01 00 01 01 1111 01 01 00 01 01 11

18

Tree Diagram Representation

• The tree diagram is similar to the state diagram, except that it adds adds the dimension of timethe dimension of time.

• The code is represented by a tree where each tree branch describes and output word– If the input is 0input is 0, then we move to the next rightmost branch in the upward upward

directiondirection.

– If the input is 1input is 1, then we move to the next rightmost branch in the downward downward directiondirection.

• Using the tree diagram, one can dynamically describe the encoder as a function of a particular input sequence.

19

Tree Diagram for our CodeTree Diagram for our Code

t1 t2 t3 t4 t5

Structure repeats itself after the 3rd branching

(at t4)

Heavy Line represents mm = 1 1 0 1 1

00

1110

0111

0001

10

00 a

11 b

10 c

01 d

00

1110

0111

0001

10

11 a

00 b

01 c

10 d

00 a

11 b

10 c

01 d

00 a

00 a

00

1110

0111

0001

10

00 a

11 b

10 c

01 d

00

1110

0111

0001

10

11 a

00 b

01 c

10 d

11 a

01 d

01 c

10 d

10 c

11 b

11 b

00 b

a

Codeword Codeword branchbranch

11

00

Output Codeword UU = 11 01 01 00 01

20

Trellis Diagram Representation

• In general, the tree structure repeats itself after tree structure repeats itself after KK branching branching (K = Constraint length).

• Label each node in the tree by its corresponding State.

• Each transition from a node state produces 2 nodes (2 states).

• Any 2 nodes having the same state label, at the same time, can be mergedmerged since all succeeding paths will be indistinguishable.

• The diagram we get by doing so is called the Trellis diagramTrellis diagram.

21

Trellis Diagram for our CodeTrellis Diagram for our Codestatestate

LegendLegend Input bit 0

Input bit 1

Codeword Codeword branchbranch

• 00t1

a=00 • 00t4•

00t3• 00t2 •

00t5 • t6

11111111

• • • • •

• • • • •

• • • • •

•

•

•

d=11

c=01

b=10

11

10 101010

01010101

1010 10

11 1111

000000

01 0101

Trellis structure repeats itself after depth K =3

22

Decoding of Convolutional Code

• Maximum Likelihood DecodingMaximum Likelihood Decoding• Viterbi AlgorithmViterbi Algorithm

23

Maximum Likelihood DecodingMaximum Likelihood Decoding

• Let U(m) denote one of the possible (say, the mth) transmitted sequence and Z the received sequence.

• The optimum decoderoptimum decoder (which minimizes probability of error) is the one that maximizes P(Z| U(m)). I.e., Optimum Decoder chooses the sequence U(j) if

• This is known as the Maximum Likelihood DecoderMaximum Likelihood Decoder.

)|(max)|( )(

)(

)(

m

U

j UZPUZPm

24

Maximum Likelihood MetricMaximum Likelihood Metric• Assume a memorylessmemoryless channel, i.e., noise components are

independent. Then, for a rate 1/n code

where Zi is the ith branch of Z. Then the problem is to find a path (each path defines a codeword) through the trellis (or tree) s.t.

)|()|()U|Z( )(

1 1

)(

1

)( mji

i

n

jji

mi

ii

m uzPUZPP

maximized is )|( )(

1 1

mji

i

n

jji uzP

maximized is )|logP(z :log) the(by takingor )(

1 1ji

mji

i

n

ju

25

Maximum Likelihood MetricMaximum Likelihood Metric

• This function which we need to maximize is known as the log-likelihood function or the log-likelihood metriclog-likelihood metric.

• To find the optimum path, we can compare all possible paths in the tree or trellis and find the path which maximizes the log-likelihood metriclog-likelihood metric. This is known as the brute-forcebrute-force or exhaustive approachexhaustive approach.

• The brute-force approach is not practical as the # paths grows exponentially as the path length increases.

• The optimum algorithm for solving this problem is the Viterbi Decoding Algorithm Viterbi Decoding Algorithm oror Viterbi Decoder. Viterbi Decoder.

26

Binary Symmetric Channel (BSC)

Have

p = Crossover ProbabilityCrossover Probability or Channel Symbol Error Channel Symbol Error Probability or Channel BERProbability or Channel BER

X Y

1-p

1-p

p

p

0 0

1 1

pXYPXYP 0110

pXYPXYP 11100

27

Log-Likelihood MetricLog-Likelihood Metric• Assume that U(m) and Z are each L-bit long and that they

differ in dm positions. I.e., Hamming Distance between them is dm. Then

where A and B are positive constants (as p < 0.5).

mm dLdm ppUZP )1()|( )(

BAd

pLp

pdUZP

m

mm

)1log(1

log)|(logor )(

28

Log-Likelihood MetricLog-Likelihood Metric• Since A and B > 0, maximizing the Log-Likelihood

Metric is equivalentequivalent to minimizing the Hamming minimizing the Hamming DistanceDistance.

• Maximum Likelihood (ML) Decoder Maximum Likelihood (ML) Decoder (Hard Hard Decision DecodingDecision Decoding):– Choose in the tree or trellis diagram, the path whose

corresponding sequence is at the minimum Hamming distance to the received sequence ZZ.

– I.e Choose the minimum distance metric.

i.e. Hard-Decision Maximum Likelihood DecoderHard-Decision Maximum Likelihood Decoder = Minimum Hamming Distance DecoderMinimum Hamming Distance Decoder

29

Viterbi Decoding (R=1/2 & K=3)Viterbi Decoding (R=1/2 & K=3)

Branch metric

Decoder tries to find the minimum distance pathDecoder tries to find the minimum distance path

state

• 2t1

a=00 • 1t4•

1t3• 1t2 •

1t5 • t6

1111

• • • • •

• • • • •

• • • • •

•

•

•

d=11

c=01

b=10

0

2 202

0 20

22 0

1 11111

0 01

0

Input data sequence

Transmitted codeword

Received sequenceReceived sequence

mm:

UU:

ZZ:

1

00

10

0

01

01

1

01

01

1

11

11

1

01

01

...

...

...

30

Viterbi DecoderViterbi Decoder• Basic IdeaBasic Idea:

– If any 2 paths in the trellis merge to a single state, one of them can always be eliminated in the search.

– E.g., at time t5, 2 paths merge tomerge to (enter) state 00.• The cumulative Hamming path metriccumulative Hamming path metric of a given path at ti = Sum

of the branch hamming distance metrics along that path up to time ti.– The upper path metric is 4 and the lower path metric is 1.– The upper path cannot thus be part of the optimum path since the

lower path which enters the same state has a lower metric.– This is true because future output branches depend only on the

current state and not the previous states.

31

Path Metrics for 2 Merging PathsPath Metrics for 2 Merging Paths

statestate

• • • • •

• • • • •

Path metric = 4

Path metric = 1

•

t1

a=00 • 1

t4

•

t3

•

t2

• • • • • d=11

c=01

b=10

0

2

0

1 1

0

•

t5

32

Viterbi DecodingViterbi Decoding• At time ti, there are 22K-1K-1 states states in the trellis where K is

the constraint length. (NB: #states#states is an important complexity measurecomplexity measure for Viterbi decoders.)

• Each state can be entered by means of 2 states.• Viterbi Decoding consists of computing the metrics for

the 2 paths entering each state and eliminating one of eliminating one of themthem.

• This is done for each of the 22K-1K-1 nodes nodes at time ti. • The decoder then moves to at time ti+1, and repeats the

process.

33

Viterbi decoding ExampleViterbi decoding Example

• •

Path metrics

• t1

a=00 • t2

b=100

2a=2

b=0

Path metrics

a=3

b=3

d=0

c=2

a=00

b=10

t1 t2

• • • •

0

2

• • 1

1 •

• • • • • •

0

2•

t3

c=01

d=11

• • • • • • • •

• t1

a=00 • t4

• t3

• t2

• • • • d=11

c=01

b=10 0

20

1

0

1 12

2

02

1 • • • • • • • •

• t1

a=00 • t4

• t3

• t2

• • • • d=11

c=01

b=10 0

20

1

2

0

1

Path metricsPath metrics

a=3

b=3

d=2

c=0

(a)

(d)(c)

(b)

1

34

Viterbi decoding ExampleViterbi decoding Example

(e)

(h)(g)

a=1

b=1

d=2

c=3

a=00

c=01

b=10

t5

(f)

• • • • • •

• t1

• t3

• t2

• • • d=11

0

20

• •

• t4

• 2

0

1 • •

•

• 0

0

1

1• • • • • •

• t1

a=00 • t3

• t2

• • • d=11

c=01

b=10 0

20

• •

• t4

•

1

2

0

1 • •

• t5

•

1

0

0

11

1

1 2

• •

• t6

•

1

2

2

11

1

0 0

a=00

c=01

b=10

t5

• • • • • •

• t1

• t3

• t2

• • • d=11

0

20

• •

• t4

• 2

0

1 • •

•

• 0

0

1

1

a=2

b=2

d=1

c=2

• • t6

•

11

0 0

t5

• • • • • •

• t1

• t3

• t2

• • •

0

0 •

• t4

• 2

0

• •

•

• 0

1

1• •

35

Convolutional Codes Distance Properties

• The minimum distanceminimum distance between all pairs of possible codewords is quite important and is related to the error-correcting capability of the code.

• To compute it we can simply consider the all-zeros sequence (since the code is linear).

• Assuming that the all-zeros path is the correct one.– An error eventerror event (or errors) would occur when there exists a path which starts

and ends at the a=00 state at time ti (but does not return to the 00 state in between) with a metric that is smaller than the all-zeros path at ti. In this case, we say the correct path does not survive.

• The minimum distance of such an error path can be found using an exhaustive search for all possible error events.

36

Trellis labeled with distances from the Trellis labeled with distances from the all-zeros pathall-zeros path

statestate

• 0t1

a=00 • 0t4•

0t3• 0t2 •

0t5 • t6

2222

• • • • •

• • • • •

• • • • •

•

•

•

d=11

c=01

b=10

2

1 111

1111

11 1

2 22

000

1 11

37

Minimum Distance

• In the previous example there are:– 1 path with distance 5 (merges at t4) and correspond to the input sequence 1 0 0.

– 2 paths at distance 6 (One merges at t5 and the other at t6.). They are 1 1 0 0 and 1 0 1 0 0.

• df = Minimum Free DistanceMinimum Free Distance = Minimum distance of all

arbitrary long paths that diverge and remerge. df = 5 in this case and the code can correct any t=2 errors.

• A code can correct any can correct any tt channel errors channel errors where (this is an approximation).

2

1fdt

38

Formalized Viterbi algorithm

• Use the maximum likelihood decoding procedure• Find the closest sequence of symbols in the given trellis, using

either the Euclidean distance of the Hamming distance as distance measure

• The resulting sequence is called the global most-likely sequence• For a received N-state sequence v containing L symbols

v={v(0),v(1),…,v(L-1)}, where the first symbol v(0) is received at time instance 0 and the last one v(L-1) is received at time instance L-1, the Viterbi decoder iteratively computes the survivor path entering each state at time instances 1,…,L-1.

• The survivor path for a given state at time instance n is the sequence of the symbols closest in distance of the received sequence up to time n.

39

Viterbi algorithm

• Path metrics xi(n) – a metric assigned to each state denoting the distance between the survivor path for state i and the received sequence up to time n.

• Branch metrics- difference between the current received symbol v(n) and the output symbol in the encoding trellis.

• From time instance n to n+1, the Viterbi algorithm updates the survivor paths and the path metrics values xi (n+1) from the survivor path metrics at time instance n and the branch metrics (aij(n)) in the given trellis as follows:

• The updating mechanism is based on an optimization algorithm called dynamic programming.

Njinanxnx ijii

j ,...,2,1,)],()([min)1(

40

Viterbi Algorithm

• Let PM(s0=a,sn=b) be the maximum path metric (sum of accumulated branch metric BMs) from s0=a to sn=b

• Then, we can calculate PM(s0=a,s10=b) easily if we know PM(s0=a,s9=s) for all possible s, particularly those that has a branch to state b in the trellis

• PM(s0=a,s10=b) = maxsPM(s0=a,s9=s) +BM(s9=s,s10=b)

At this point, we can eliminate one of these two paths

41

Example• For this encoding trellis, assume at

the time instances n, the path metrics for the 4 states are:– x1(n)=2, x2(n)=0, x3(n)=1, x4(n)=2,

– Received symbol is v(n)=11

– Using the Hamming distance as the measure of the distance, we have the following branch metrics for all the transitions in the trellis

1|)0111(|)(,1|)1011(|)(

2|)0011(|)(,0|)1111(|)(

1|)1011(|)(,1|)0111(|)(

0|)1111(|)(,2|)0011(|)(

4443

3231

2423

1211

weightnaweightna

weightnaweightna

weightnaweightna

weightnaweightna

S11S00

S10S01

0/001/01

0/111/11 1/10 0/10

1/00

0/01

g1(z)=1+z-2

g2(z)=1+z-1+z-2

S00

S01

S10

S11

S00

S01

S10

S11

0/00

1/110/11

0/01

1/10 1/00

0/10

1/01

Timen

Timen+1

42

Example• Survivor path and its path

metrics for each state from time n to n+1 are updated.

• 2 possible path entering each state, the one with large metric is discarded.

The update process is carried out iteratively from n=1 to n=L

43

Example

• The global most-likely sequence is the survivor path of the state with minimum path metrics at time=L, i.e.

– Where ind-1 means “take the index of the corresponding state”.

• Optimality guaranteed based on the dynamic programming algorithms have the property that the optimum solution from an initial iteration to the iteration n+m must consist of the optimum solution from initial iteration to iteration n and from iteration n to iteration n+m.

)})({(min1 Lxindi jj

44

Example

• Example

• Figure 1.11

45

Computation in Viterbi algorithm

• Computing of branch metrics aij(n)

• Updating the path metrics– Requires addition, comparison and selection (ACS) for every

state each time instance

• Selecting the final state• Tracing back its survivor path

46

Design and Implementation of Viterbi Decoder

• Real Viterbi Decoder need to consider the following practical problems– Arbitrarily long decoding delays cannot be tolerated. The

decoder has to output decoded information bits before the entire encoded message has been retrieved.

– Incoming analog signals has to be quantized by ADC

– The decoder may be brought on line in the middle of transmission and will thus not know where one n-bit block ends and the next begins.

• Need block synchronization

47

Block Diagram of a practical Viterbi decoder

48

Quantization

• Difference in performance between an un-quantized soft-decision and a hard-decision decoder

• B-bit quantization provides decoder performance in between

• B=3 (8 levels) quantization introduces only a slight reduction in performance (~0.25db)

49

Block synchronizer

• Segment the received bit stream into n-bit blocks, each block corresponding to a stage in the trellis.

• If the received bits are not properly divided up, the results are disastrous.

• We can use this disastrous nature to help the draw the block boundary.– If the boundary is correct, one or a few partial path metrics will be much

lower than the others after a few constraint lengths of branch metric computations.

– If the alignment is wrong, the metric tends to be random and all paths have similar partial path metrics and there is not dominant path.

– We can use this detect “out-of-sync” and adjust the block boundary until this is fixed

– We can use a simple threshold for this detection.

50

Branch Metric (BM) Computer• Typically based on a look-up table containing the various bit metrics

• Look up n bit metrics associated with each branch and sums them to obtain the branch metric

• For symmetric channel, the BM calculation is simpler. The second row of the bit metric table is simply a reversed image of the first row.

• Same look-up function is performed n times per branch for each 2MK branches per stage in the trellis. – An extreme fast decoder may need n2MK look-up table circuits or a simple

decoder needs to use the same look-up table n2MK times.

• Reducing number of bits required for the BM by simplification and approximation

M(r|y)

y=0y =1

R = 0’ 0 1 1’

5 4 2 0

0 2 4 5

M(r|y)

y=0y =1

R = 0’ 0 1 1’

3 2 1 0

0 1 2 3

(need 3 bits) (need only 2 bits)

51

Path Metric Updating and Storage• Basic trellis element of a rate 1/n convolution code

Sj,t

Sj+2^(M-1),t

S2j,t+1

S2j+1,t+1

Mj,2j(rt+1)

Mj+2^(M-1),2j+1(rt+1)

Mj+2^(M-1),2j (rt+1)

Mj,2j+1(rt+1)

• A common circuit, add-compare-select (ACS) to calculate the above basic trellis element– Parallel or single ACS can be used depending on the throughput requirement

Add

Add

Compare Muxselect

V(Sj,t)

V(Sj+2^(M-1),t)

Mj,2j(rt+1), Mj,2j+1(rt+1)

Mj+2^(M-1),2j (rt+1)Mj+2^(M-1),2j+1(rt+1)

V(S2j,t+1)

V(S2j+1,t+1)

52

Information Sequence Updating and Storage

• This unit is responsible for keeping track of the information bits associated with the surviving paths.

• Two basic design approaches: – register exchange and trace back

– Both need shift register to associate with every trellis node throughout the decoding operations

53

Decoding depth (or Survivor Path Length)

• # of bits that a register must be capable of storing is a function of decoding depth

• At some point during decoding, the decoder can begin to output information bits

• The information bits associated with a survivor branch at time t can be released when the decoder begins operation on the branches at time t+, where is called the decoding depth and is usually set to be five to ten times the constraint length of the code.

• The meaning of the survivor path length is that after trace back to that point, all the shortest path (survivor paths) from all possible starting states should have merged and the input corresponding to the transition from the state at time t is decoded.

• # of registers need = length K• Once the register is full, (t = ) the oldest bits in the register are output as

new bits are entered. The register are thus FIFO of fixed length

54

Example of Trellis

…x2,x1,x0

…y2(0),y1

(0),y0(0)

…y2(1),y1

(1),y0(1)

…y2(2),y1

(2),y0(2)

S0

S1

S3

S2

0/000

1/1110/110

0/111

0/0011/001

1/000

1/110

55

Register Exchange• Register for a given node at a given time contains the information bits

associated with the surviving partial path that terminates at the node.

• As the decoding operations proceed, the contents of the registers in the bank are updated and exchanged as dictated by the surviving branches

Hardware intensive -Each register must be able to send and receive strings of bits to and from two other registers

Simple to implement

S0

S1

S2

S3

10100

11001

11010

10111

RegisterBank

t=5

0

1

-

-

RegisterBank

t=1

00

01

10

11

RegisterBank

t=2

000

101

110

111

RegisterBank

t=3

1100

1101

1010

1011

RegisterBank

t=4

-

-

-

-

RegisterBank

t=0

56

Trace back

• Register for each state, but the contents of the registers do not move back and forth.

• It contains the past history of the surviving branches entering that state.

• Information bits are obtained by “tracing” back through the trellis as dictated by this connection history.

• The states in the state diagram (or trellis) were associated with the encoder shift-register contents.– E.g. State S2 in a two-state encoder corresponds to the encoder shift-

register contents 01.– In general, a state Sxy can be preceded only by state SY0 or SY1.– A zero or one may this be used to uniquely designate the surviving

branch entering a given state.

57

Trace back register content

S0

S1

S2

S3

00011

00110

0100

0101

RegisterBank

t=5

0

0

-

-

RegisterBank

t=1

00

00

0

0

RegisterBank

t=2

000

001

01

01

RegisterBank

t=3

0001

0011

010

010

RegisterBank

t=4

-

-

-

-

RegisterBank

t=0

58

Low Power ACS unit for IS-95

• E.g. IS-95: Rate 1/2, K=9 Convolutional Code, generator functions:g0 = 753 (octal), g1 = 561(octal)

g00

g11

c00

c11

information bit(input)

59

• The path metric coming into state j from state i, at recursion n (PM(i,j)

n): BM(i,j) + PM in-1

• The branch metric (BM(i,j)) is the squared distance between the received noisy symbol, yn,and the idea noiseless output symbol of that transition.

Branch Metric calculation• For IS-95, k = 9, code rate = 1/2, there are 2 competitive

paths arriving at each state at each cycle, branch metric calculation: BMi,j,t = (yt - xi,j)2:

• For IS-95,n=2, there are only 4 different BMs• Carefully examining the rate 1/2 convolution code, we can

find– and hence

– m can be anyone of 512 possible branches in the trellis.

– Consequently, there is no need to have additional additions in BMU.

1

0

)(,,

)(n

l

mltlt

mt xyBM

)'(1

)(1

mt

mt BMBM

)(0,

)'(0,

)(0, 2 m

tm

tm

t BMBMBM

Path Metric Calculation• Partial path metrics of 2 competitive paths (m1 and m2), from states

s1 and s2 to state s at cycle i:

– PMi(m1) = PM(s1)

i-1 + BM(s1,s)i

– PMi(m2) = M(s2)

i-1 + BM(s2,s)i

• After the new partial path metrics are calculated, the following comparison is carried out:– PMi

(s) = min (PMi(m1) , PMi

(m2) ):

• For IS-95, there are 256 states,512 Add and 256 Compare and Select operations have to be done for every decoded bit

• Comparing with BMU and SMU the number of ACS operations is significant and hence reducing its power consumption is essential.

Conventional ACS Unit• ONE ACS operation requires reading two Path

Metric values• Butterfly operation

Si1

So2Si2

So1

t-1 t

• The number of read accesses can be reduce if the ACS operations to calculate the survivor paths at So1 and So2 are done together.

63

Bit width requirement for path metric

• Re-normalization of path metric values is required to avoid overflow.

• Increase the number of un-necessary operation of the ACSU

• Modular Normalization [Shung-1990] - if path metric memory bit width is >2Dmax where Dmax the maximum possible difference between the path metrics, no normalization is required.

• For IS-95 the maximum number of bits required for path metrics is 9 bits if the bit precision of the received symbol is 4.

64

ACSU: Modulo Normalization

• All binary values are evenly distributed on the circle.• PMs run on the circle clock-wisely• To compare two path metrics, compute the (n-1)th bit of the result of a

straightforward 2’s complement subtraction of the two 9-bit numbers. E.g. m1 = (m1,8,…,m1,0), m2 = (m2,8,…,m2,0) and d = (d8,…,d0) = (m1-m2) then

)(1

)(1

)'(1

)'(1

mt

mt

mt

mt BMPMBMPM )(

1)(

1)'(

1)'(

1m

tm

tm

tm

t BMPMBMPM

000001

111

dif m m

otherwise8

1 21

0

,

,

65

Architecture of conventional ACSU

• For a butterfly operation, 4 9-bit to 5-bit additions and 2 9-bit comparisons are needed

PMt-1(sa) Adder

Adder

COMP

BM t(sa,S0)

BM t (sb,s0)

Adder

Adder

COMP

BM t (sa,S1)

BM t,1(sb,s1)

PMt-1(sb)

PMt(S0)

PMt(S1)

Conventional butterfly

Re-arranging the ACS Calculation in the Butterfly

• For calculating PM(S0)t, instead of finding min(PM(sa)

t-1 + BM(sa,S0)t, PM(sb)

t-1 + BM(sb,S0)t), we can compare the values of (PM(sa)

t-1 - PM(sb)t-1, BM(sb,S0)

t - BM(sa,S0)t) instead.

• Similarly, For calculating PM(S1)t, we compare the values of (PM(sa)

t-1 - PM(sb)t-1, BM(sb,S1)

t - BM(sa,S1)t).

• Both computations share PM(sa)t-1 - PM(sb)

t-1.

• One computation can be saved.

• For IS95, the two values BM(sb,S0)t - BM(sa,S0)

t and BM(sb,S1)t - BM(sa,S1)

t can be precomputed and stored.

The proposed ACSU Architecture

COMP

COMP

Sub.

Adder

Adder

PMt-1(sa)

PMt-1(sb)

BM t(sa,S0)

BM t(sb,S0)

BM t(sa,S1)

BM t(sb,S1)

PMt(S0)

PMt(S1)

New butterfly

BM t(sb,S0)- BM t

(sa,S0)

BM t(sb,S1)- BM t

(sa,S1)

• For a butterfly operation, 1 9-bit subtraction, 2 9-bit to 5-bit additions, and 2 9-bit to 5-bit comparison are needed

Comparison of the two architecturesConventional ACS Proposed ACS

Operation Type Operation Type

PM(m)t,0=PM(sa)

t-1 + BM(sa,S0) t 9 to 5 bit addition PM=PM(sa)

t-1 - PM(sb) t-1 9 bit subtraction

PM(m)t,1=PM(sa)

t-1 + BM(sa,S1) t 9 to 5 bit addition

comp(PM, BM(sb,S0)t - BM(sa,S0)

t)9 to 6 bit comparison

PM(m’)t,0=PM(sb)

t-1 + BM(sb,S0) t 9 to 5 bit addition comp(PM, BM(sb,S1)

t - BM(sa,S1) t) 9 to 5 bit comparison

PM(m’)t,1=PM(sb)

t-1 + BM(sb,S1) t 9 to 5 bit addition PM(S0)

t = PM(s*)t-1 + BM(s*,S0)

t 9 to 5 bit addition

PM(S0)t = min(PM(m)

t,0,

PM(m’)t,0)

9 bit comparison(subtraction) andselect

PM(S1)t = PM(s’)

t-1 + BM(s’,S1) t 9 to 5 bit addition

PM(S1)t = min(PM(m)

t,1,

PM(m’)t,1)

9 bit comparison(subtraction) andselect

69

Pre-computational architecture

• Further reduction of number of comparisons required during the ACS operation using pre-computation concept.

• Comparison is done on a 9-bit data and a 6-bit data. Instead of doing 9-bit comparison, we use 4 MSBs of the 9 bits data and the sign bit of the 6-bit data to pre-determine whether the magnitude of 9-bit data is larger and if not, then use a 5-bit comparator to compare the magnitude of the 6-bit data and the 5 LSBs of the 9-bit data.

70

Pre-computation Architecture

clk

Sel_sa/Sel_sb

Ni[4:0]>=Di[4:0]?

Di[4:0]

Ni[4:0]

Di[4:0]

Di[5]

Ni[7:5]

Ni[8:0]

Ni[8]

sa

sb

BMi(sb,S0) - BMi

(sa,S0)

PMi-1(m)

PMi-1(m’)

Su

btr

acto

r

5-b

itre

gist

er5-

bit

regi

ster

1-b

itre

gist

er1-

bit

regi

ster

5-b

itco

mp

arat

or

PrecomputationLogic

71

Pre-computation Architecture

• A two-stage pipeline for calculating the selection signal

• At the first stage, Ni[8:5] and Di [5] are used to pre-compute the condition of selecting sa or sb.

• When the condition is detected, the clock signal going to the 2 5-bit registers is gated to save power for the 5-bit comparison.

72

Results• Both conventional and proposed ACSU were

synthesized with Synopsys using MOSIS 0.8m technology library

• Power consumption was estimated using a gate level power simulator

• Simulation vectors were generated in compliance with IS-95 and IS-98 standard.

ConventionalACSU

ProposedACSU

%reduction

Power (W) 477.2 333.3 30.2

Area (m2) 777980 623530 19.9

73

Memory Organization for path metrics

• For M state Viterbi decoder, we need to store m path metrics. Since path metrics at time i+1 are computed using path metrics at time i, it seems that it is necessary to double buffer the path metric memory and we need 2*M memory.

• One way to eliminate the double buffer is to use in-place computation.– We need only the metrics of the M present states ( j i , j i - l , … , j i - k + 2 ,

x)-for M choices of x-to compute metrics for the M hypotheses, i.e. next state ( y , j i , j i - l , … , j i - k + 2)-for M choices of y .

– If the M metrics needed are read from memory, then M memory locations become available to store the M newly computed metrics, and no double buffering is required for metrics.

74

In-place computation

• It is natural to treat the contents of the shift register, (ak, ak- 1, …, a l ) , as a k-digit M-ary number and use this number as the address to the memory of path metrics. Such an addressing scheme is inconsistent with writing new metrics over old metrics.

• Consider the example of M = 2, k = 3. – The decoder has eight hypotheses ending in 000=0;001=1;010=2;

…;lll=7.

– Our natural order would store stage i metrics in table locations 0 through 7.

– But the two successors to, say, 000 and 001 are 000 and 100. This means we read metrics from 0 and 1 and write them (by definition of natural order) in 0 and 4. This is not in-place.

75

In-place computationSuppose the path metrics are originally placed in nature1 order. After one, two, and three stages of decoding we would see the evolution of memory organization for path metrics pointers as follows:

Computing in-place means we want to write back the result of the current path metric to the path metric locations that are used to calculated the current path metrics

To guarantee in-place computation, we need to have an addressing scheme which changes after each decoding cycle. E.g. at the first cycle, we input 0,1 and output 0,4 and put the 0,4 to the location of 0,1. At the second cycle, we need to take input 0,1, again, but now 0 is stored in location 0 but 1 is stored in location 2. SO we need to change the addressing of the input every cycle

76

In-place computation

• From the previous figure, we can see that if the path metric of the hypothesis with shift register contents (a,b,c) (i.e. the state content) at time t is in memory location 4a+2b+c, then the path metric of the hypothesis with the shift register content (a,b,c) at time i+1 will be in location 4c+2a+b.

• In general, the metrics accessed together are found by generating their natural address but rotating the bits of these address by i places before reading (or writing) the metrics from (or into) the memory.

• A cyclic shift of i places is identical to a cyclic shift of i modulo k places.

77

Example

000

001

010

011

100

101

110

111

000

001

010

011

100

101

110

111

000

001

010

011

100

101

110

111

000->000,001->010,010->100,011->110100->001,101->011,110->101,111->111

Left rotate by 1 bit

000

001

010

011

100

101

110

111

000

001

010

011

100

101

110

111

000

001

010

011

100

101

110

111

000->000,001->100,010->001,011->101100->010,101->110,110->011,111->111

Left rotate by 2 bit

78

Survival path memory organization

• To “prune” the survivor path, the hypothesis with the lowest path metric is identified and its old symbols are the decoder output.’ The oldest symbols may then be dropped from all survivor paths. A symbol may be pruned from the path memory once each decoding cycle, or p symbols may be pruned away after every p decoding cycles.

79

Survivor path memory organization• For minimum error rate, the length of the survivor path memory

field should be made as large as possible. There is a rule of thumb that four or five constraint lengths is adequate.

• For M ,= 2, the constraint length is k + 1. A practical case has k = 6 , so a survivor path memory field of 35 bits is implied. It is inconvenient to handle such a long field all at once, although the operations needed are quite simple.

• To store the survivor path, w can use a pointer mechanism to avoid handling the entire field.

• Since each pointer can only point to M ancestors, the pointer can be abbreviated to an M-ary symbol. This M-ary symbol is identical to the M-ary symbol which is appended to the path.

• Thus no extra storage is needed for the pointers as we can interpret the path memory contents as pointers.

80

Survivor path memory organization

• During the decoding cycle, an M-ary choice is recorded for reach hypothesis in the ith digital position of the survivor path field for each of the Mk surviving hypothesis.– E.g. for the hypothesis with shift register content (ai, ai-1,

…,ai-k+1) the symbol stored is x. To find its predecessor we look in digital position i-1 of the memory word whose address is (ai-1, …,ai-k+1,x);

– If we read a y there, we look in digital position i-2 of the memory word whose address is (ai-2, …,ai-k+1,x,y);

– The procedure carries on forward.

81

Example of the Survivor path memory organization

m=2, k=3

82

Survivor path memory organization

• Whenever two survivor paths agree on k successive pointers , they must necessary converge

• We need to trace back such a path to prune and decode• If the path memory field is L M-ary symbols wide, we

may decode after p decoding cycles, obtaining p decoding symbols, then overwrite new path symbols into the newly freed digital positions on the next p decoding cycles.

• A new symbol will be stored in digital position (i mod L) of the path during decoding cycle i.

83

Survivor Sequence Memory Management supporting simultaneous updating and reading the memory• Here we discuss several different survivor sequence memory

management scheme that support simultaneous updating and reading the memory

• Traceback memory is organized in a 2-dimensional structure, with rows and columns– # of rows = # of states N = 2v.– Each column stores the results of N comparisons corresponding to one symbol

interval or one stage.• 3 types of operations inside a Trace-back decoder

– Traceback Read (TB) – reading a bit and interpreting this bit in conjunction with the present state number as a pointer that indicate the previous state number. Pointer values are not output as decoded values.

• Run to a predetermined depth T before being used to initiate the decode read operation

84

Survivor Sequence Memory Management supporting simultaneous updating and reading the memory

• 3 types of operations inside a Trace-back decoder – Decode read (DC) – operation same as TB, but operates on older data,

with the state number of the first DC in a memory bank being determined by the previously completed traceback. Pointer values are the decoded values and are sent to the bit-order reversing circuit.

• Decode read multiple columns using one traceback read operation of T columns

– Writing New Data (WR) – decisions made by the ACS are written into locations corresponding to the states.

• Data are written to locations just freed by the DC operations

• For every set of column write operations (N-bit wide), an average of one decode read must be performed.

* ref: G. Feygin, P.G Gulak “Architectural Tradeoffs for Survivor Sequence Memory Management in Viterbi Decoders” IEEE Transactions on Communications, pp. 425-429, March, 1993

85

K-pointer Even AlgorithmK=3

86

K-pointer Even Algorithm• The memory is divided into 2k2 memory banks, each of size (T/(k2-1))

columns.• Each read pointer is used to perform the traceback operation in k2-1

memory banks, and the decode read in one memory bank.• Every T stages, a new traceback front is started from the fixed state that

has the best path metric.• Since the traceback depth T must be achieved before decoding can be

performed, so k2-1 memory banks must be greater than equal to T.• Total # of memory required: 2k2* (T/ (k2-1))• The decoded bit are generated in a reverse order, this a scheme is

required for reversing the ordering of the decoded bits– A simple two-stack LIFO is used to perform the bit order reversal

• Each stack is T/(k2-1) in depth• During decoding, decoded bits are pushed on one stack while the bits stored on

the other stack are popped.• Upon completion of the decoding of a given memory bank, stacks switch from

pushing to popping and vice versa.

87

K-pointer Odd Algorithm

88

K-pointer Odd Algorithm

• There are 2k2-1 memory banks, each of length (T/(k2-1))

• Total length = (2k2-2)T/(k2-1)

• A 2-stack LIFO structure is also required to perform bit order reversal.

• The decode pointer and the write pointer always point to the same column in the memory, although the decode pointer will be used to read only one memory location, while the write pointer will be used to sequentially update memory locations corresponding to all states in a given trellis state.

• It is necessary to perform decoding before new data can be written, otherwise the memory being used may be overwritten.

89

One-pointer algorithm• Different from the k-pointer algorithm which use k read pointers to perform

the required k reads for every column write operation, a single read pointer, but accelerate read operations are used.

• Every time the write counter advances by one column, k column reads occur.• The acceleration is based that among the writing new data, traceback read

and decoder read operations, writing new data is the most time consuming – 2v bits are written every stage, comparing with only k bits being read at every stage.

• k1+1 memory banks, each T/(k1-1) columns long.• A single read pointer produces the decoded bit in bursts.

– During the decode read operation in the k1th memory bank, decoded bits are generated at a rate of k1 per stage.

– 2-stack structure can perform both bit order reversal and burst elimintion at the same time.

90

One-pointer algorithm

91

Hybrid algorithm

• Combine some features of the k-pointer algorithm and a one-pointer algorithm.

• K column reads per stage are performed using k2 read pointers, each advancing at a rate of k1 column per stage. (k = k1k2 and k =< T+1)

92

Hybrid algorithm

93

Radix-4 Viterbi Decoder

• Radix-2 Trellis and ACS

Radix-2 trellis 2-way ACS Radix-2 ACS unit

94

Radix-4 ACS

• A 2v-state trellis can be iterated from time index n-k to n by decomposing the trellis into 2v-k sub-trellis, each consisting of k iterations of a 2k-state trellis.

• Each 2k-state subtrellis can be collapsed into an equivalent one-stage radix-2k trellis by applying k levels of lookahead to the recursive ACS udpate

E.g. 8-state Radix-4 trellis

95

Radix-4 ACS

• Parallel and Serial implementation of the ACS unit– Parallel – one ACS butterfly for each pair of states

– Serial – for large constraint length, parallel implementation may not be feasible, use single/(or fewer # than the # of states) ACS butterfly

• Throughput can be increased if the number of ACS iteration for each stage can be reduced.

• # of ACS iteration is reduced by half using Radix-4 ACS. If the critical path of a radix-4 ACS is the same as that of a radix-2 ACS, a potential 2 fold speed up is achievable.

• Of course the potential speedup comes with a complexity increase since the radix-4 ACS is more complex. Therefore higher-radix ACS is not very practical.

96

Radix-4 ACS (cont.)

Radix-4 trellis 4-way ACSRadix-4 ACS unit

97

A 4-way ACS Block diagram

Documents

1 ELEC692 VLSI Signal Processing Architecture Lecture 10 Viterbi Decoder Architecture