Upload
nickolas-reed
View
262
Download
3
Tags:
Embed Size (px)
Citation preview
ELEC692 VLSI Signal Processing Architecture
Lecture 1Introduction to DSP Systems
Issues of VLSI Signal Processing Architecture
• Performance• Area/Cost• Speed of execution, throughput and clock rate• Power dissipation or amount of energy required
to perform a given task• Fixed-point DSP systems- finite wordlength
performance– Quantization and roundoff noise
• Special features of DSP systems– Real-time throughput requirements– Data-driven property
Typical DSP algorithm and applications (I)
• Speech coding and decoding, Speech encryption and decryption– Cell phones, cordless phone,multimedia computer, secure
communications
• Speech recognition– Advanced user interface, phones, consumer products,
machine/human interface
• Speech synthesis– Advanced user interface, consumer products, machine/human
interface
• Modem algorithms– Phones, wireless communications, data/fax modems, secure
communications
Typical DSP algorithm and applications (II)
• Noise cancellation– Audio applications, wireless communications
• Audio Equalization– Audio applications
• Image compression and decompression– Digital camera, video, multimedia applications
• Beamforming– Navigation, radar/sonar, wireless communications
• Echo cancellation– Speakerphones, modems, telephone switches
Issues in wireless system design
• Ubiquitous services put wireless system spectrum at a premium
• Current spectral efficiency far below theoretical limits
• Emerging solutions– Adoption of better spectrum utilization techniques
• E.g. interference cancellation, multiple antenna, MIMO system
• Multi-functional, adaptive systems
• Even higher bit-rate wireless applications– IEEE 802.11a, wireless IEEE 1394
Improving Spectral Density and higher bit rate comes at a performance and power cost
• Digital baseband processing requirements
Wide-band CDMA FDMA with multiple antenna
Match Filter
Blind MMSE
Exact Decorrelator
SVD
Performance Bits/sec/Hz
1 2 2 6
Multiplications
124 496 230,000 736
Memory 248 1240 640,000 2120
ALU 124 502 240,000 800
Word-length 8-bit 12-bit 16-bit 16-bit
From Jan Rabaey of UC Berkeley
Shannon beats Moore’s Law
Energy plays a critical role
Battery capacity
Programmable processor vs. ASIC
• DSP Selection guide for mobile multimedia
DSP computation - Convolution
k
knhkxnhnxny )()()()()(
•Describe and analyze linear time-invariant (LTI) systems, which are completely characterized by their unit-sample( or impluse) response h(n)•Finite impulse response (FIR) – systems containing a finite number of nonzero samples, i.e. h(n) is of finite duration•infinite impulse response (IIR) –h(n) is of infinite duration•A system is causal of y(n0) depends only on the past input samples x(k) , k<= n0.
DSP computation - Correlation
• Widely used in digital communication• Correlation of 2 sequences a(n) and x(n):
• It can be described as a convolution as follows:
• If a(n) and x(n) have finite length N, these are nonzero for n=0,1,…,N-1, the digital correlation operations is given as:
k
knxkany )()()(
)(*)()()()( nxnaknxkanyk
1
0
)()()(N
k
knxkany
DSP computation – Digital Filters
• Properties of a causal digital filter is characterized by its unit-sample response h(n) or its frequency response H(ejw) or by difference equations.
• A linear, time-invariant, and causal filter is given by
• If ak=0 for 1<= k <= N, we have
• This is a non-recursive M-tap finite impulse response (FIR) Filter, where h(k) = bk.
• If one of the is ak>0, then this is a recursive filter and its corresponding unit-sample response has infinite duration. This is referred as IIR filter
1
01
)()()(M
kk
N
kk knxbknyany
1
0
)()(M
kk knxbny
DSP computation – Digital Filters
• Linear-phase FIR filter– Unit-sample responses are
symmetric and require only half the number of multiplications
– For a M-tap linear phase FIR filter: h(n)=h(M-n).
– E.g. 7-tap linear phase FIR filter with impulse response h(0)=h(6)=b0 h(1)=h(5)=b1, h(2)=h(4)= b2, h(3)= b3,
– Y(n)= b0x(n)+ b1x(n-1)+ b2x(n-2)+ b3x(n-3)+ b2x(n-4)+ b1x(n-5)+ b0x(n-6)
DSP computation – Adaptive Filter
• The filter coefficient is changing and updated at each iteration.
• Used for applications such as echo cancellation, channel equalization, voiceband modem and many others.
• It predict one random process y(n) from observations of another random process x(n) using linear models such as digital filters.
• Coefficients are updated in order to minimize the difference between the filter output and the desried signal. Updating process continues until the coefficient converges.
• Consists of two blocks: a general filter block and a coefficient updating block.
DSP computation – LMS Adaptive Filter
• Notations:– WT(n) = [w1(n), w2(n),..,wN(n)]=weighted vector
– UT(n) = [u(n),u(n-1),…,u(n-N+1)]= vector of current and past input samples
– is the estimated signal and e(n) is the estimation error.
– We have
)(ˆ nd
)()1()()(ˆ)()(
)()1()(ˆ
nUnWndndndne
nUnWndT
T
DSP computation – LMS Adaptive Filter
• In the n-th iteration, the LMS algorithm selects WT(n) which minimizes the square error e(n)2
• LMS adaptive filters consists of an FIR filter block with coefficient vector WT(n) and input sequence u(n) and a weight update block.
DSP computation – LMS Adaptive Filter
• Weight update algorithm
eUUUWd
UUWdUW
ee
T
TTW T
2)(2
22)(2
2
)()()1()(
))((2
1)1()( 2
nUnenWnW
nenWnW TW
Other common DSP computations• Motion estimation
– Used in interframe predictive coding• Discrete Cosine Transform
– Frequency transform used in image processing• Fast Fourier Transform
– Frequency transform used in communication and audio/voice processing
• Vector Quantization– Used for data compression in speech, image and video coding
• Viterbi algorithm– Error control coding, used for communication and other data
correction applications.• Decimator and Expanding
– Multirate systems for image compression, digital audio and adaptive signal processing
Implementation of DSP algorithms
• A lot of applications can be implemented in programmable DSP processor or media-microprocessor
• For some applications, due to complexity and power issue, special VLSI architecture or ASICs are still required
• E.g. – MPEG2 encoder – Block Matching for ME for HDTV frame needs ~370 GOPs/sec
• - 2D-DCT for HDTV = 3.84 GOPs/sec
DSP representation• Non-terminating programs and iteration based
)2()1()()( 210 nxhnxhnxhny
• Iteration period – time required to execute one iteration• Sampling rate (throughput) – number of samples processed per second• Latency – difference between the time an output is generated and the time at which its
corresponding input was received• Critical path delay• Clock period (clock rate is not equal to sampling rate)
DSPInput x(n) Output y(n)
For n=1 to n=
DSP representation• Mathematical formulation• Behavioral descriptive Language
– Applicative language• Set of equations
– Prescriptive languages• Specify order of assignment statement
– E.g. Pascal, C, SystemC
– Descriptive Languages• Represent structure of the DSP system• E.g. VHDL, Verilog
• Graphical Representation– For investigating and analyzing data flow properties– Exhibit parallelism and data-driven (dependency) properties, provide
insight for space-time tradeoff.– Mapping DSP algorithms to hardware implementation
• Block diagram, Signal-Flow Graph (SFG), Data-Flow Graph (DFG), and dependence graph (DG).
Block Diagram
• Consists of functional blocks connected with directed edges, which represents the data flow from its input block to output block.
• Edges may or may not contain delay elements
Signal Flow Graph (SFG)
• SFG is a graph whose nodes represent computations/tasks and directed edge e(j,k) denotes a branch from node j and terminating at node k.
• With input signal at node j and output signal at node k, e(j,k) denotes a linear transformation from the signal at node j to the signal at node k.
• In digital network, the edges are usually restricted to constant gain multipliers, or delay elements
• Adders and multipliers are described by a node with multiple incoming edges and one outgoing edge.
• 2 special nodes – sink and source
Example SFG of a direct-form 3-tap FIR filter
Transposition of SFG
• Linear SFGs can be transformed into different forms– Flow graph reversal or transposition for
Single-input-single-output (SISO) systems– Transform operations
• Reversing the direction of all edges• Exchanging the input and output nodes while
keeping the edge gain or edge delay unchanged• Resulting SFG maintains the same functionality
Data Flow Graph (DFG)• Graph G = (N,E) where nodes represent computations
(or functions or subtasks) and directed edges represent data paths (communications between nodes). Each edge has a non-negative number of delays associated.
Data Flow Graph (DFG)
• DFG captures the data-driven property• Node can execute only when all the input data are
available.• Concurrency execution• A node with multiple input edges can only execute when
all its precedent nodes have executed, thus, describing the precedence constraints– If edge has zero delay – intra-iteration precedence– If edge has non-zero delay – inter-iteration precedence
• DFG are generally used for high-level synthesis, map concurrent implementation of DSP applications onto parallel hardware– Task scheduling and resource allocation
Example of DFG
Synchronous Data Flow graph (SDFG)
• Special case of DFG– Number of data samples produced or consumed by each node
in each execution is specified a priori– Both for single-rate and multi-rate systems– Unrolling (unfolding) multirate systems to single-rate.
Dependence Graph
• A directed graph that shows the dependence of the computation
• Nodes represent computations and edges represent precedence constraints
• Similar to DFG except nodes in DFG only cover the computations in one iteration, where as DG contains computations for all iterations. DFG contains delay elements that store and pass data between iterations while DG does not contain delay elelments
Example of a DG
Critical Path of a DFG• Critical path – path with the longest computation time among all
paths that contain zero delay (i.e. without delay element)• The minimum clock period of the DSP system depends on the
critical path delay• In DSP systems, e.g. filter element, the critical path depends on the
delay of the following:– Input to the delay element– Input to the output– Delay element to the output– Delay element to delay element E.g.
D D D D
X X X
++X
In
Out
2 2 2
111
Critical path comparison
D D
X
+ D+ +
X X X
X(n)
y(n)
D D
X
+
D
+ +
X X X
X(n)
y(n)
Direct Form 4-tap FIR
Transposed Form 4-tap FIR
Critical Path = Delay(mult)+(N-1) delay(add)Delay element: shorter bitwidth
Critical Path = Delay(mult+ delay(add)Delay element: longer bitwidth- Fanout of the input is larger
Iteration Period• Iteration: execution of all computations of an
algorithm once• Iteration period: the time required for execution
of an iteration• E.g. y(n) = ay(n-1) + x(n)
D
X(n) y(n-1)
a
(2)
(4)
...221100 BABABA
y(n)
X(n)
D
(2)(4)
aAB
Loop Bound
• Loop: a directed path that begins and ends at the same nodes.
• Loop Bound of the loop– Lower bound on the loop computation time
– Defined as tl/wl, where tl is the loop computation time and wl is the number of delays in the loop
• E.g.y(n)
X(n)
D
(2)(4)
aAB
A,B, A is a loop andTl = 2+ 4, Wl = 1And hence loop bound =6
Loop Bound• Another example
y(n)
X(n)
2D
(2)(4)
aAB
A,B, A is a loop andTl = 2+ 4, Wl = 2 (since 2D)And hence loop bound =3
It means one iteration of loop can be executed in 3 time unit. This can be done in two independent set of precedence constraints
oddBABABA
evenBABABA
...
...
553311
442200
• Another example
A B C
2D
(2) (4)(5)
Two loopsA->B->A: T = 6, W = 2, bound = 3
A->B->C->A, T = 11, W = 1, bound = 11
Hence the loop bound of this isMax{3,11} = 11
D
Iteration Bound
• Critical Loop- the loop with maximum loop bound
• Iteration bound (Tit)- the loop bound of the critical loop,
• Not possible to achieve iteration period lower than iteration bound even with infinite processing power
• E.g.
ii
ii
i
i
loopalliit loopindelayofW
loopoftimentcomputatioT
W
TT
#
_max_
A B C D
D
D
D
2D
(4) (3) (2) (4)
Loop(A->B->A) (T/W=7/1=7Loop(A-B->C->A) T/W = 9/2=4.5Loop(B->C->D->B) T/W = 9/3=3Iteration Bound= max(7,4.5,3)=7
Algorithms for computing iteration bound
• Longest Path Matrix Algorithm
• Minimum Cycle Mean Algorithm
Longest Path Matrix Algorithm (LPM)
• Construct a series of matrix, iteration bound is found by examining the diagonal elements of the matrices
• Let d be the number of delay element in the DFG, and di be the ith delay element.
• Construct matrix L(m), where m =1,2,…,d such that the value of is the longest computation time of all paths from delay element di to dj that pass through exactly m-1 delays. =-1 if no such path.
• L(m+1) can be obtained form L(1) and L(m) recursively by, if there is k such that ,
otherwise =-1
)(,mjil
)(,mjil
mjkki
mji lll ,
1,
)1(,
)1(,
mjil
LPM algorithm• The diagonal element represents the longest
computation time of all loops with m delays contains di. Then the iteration bound is equal to
dmiform
lT
mii
it ,1}max{)(
,
LPM algorithm (example)1
2
1115
0115
1014
1101
)1(L
3
4
5
6
D
D
D
D
(1)
(1)
(1)
(2)
(2)
(2)
d1
d2
d3
d4
)1(1,3le.g. All paths form d3 to d1 that pass
Through exactly zero delay:Path: d3->5->3->2->1->d1,
)1(1,3l =2+1+1+1=5e.g.
5)50,1max(
),1(max )1(1,
)1(,2
}3{
)2(1,2
kkk
lll
1151
1155
0144
1014
)2(L
1519
1559
1458
0145
)3(L
51910
55910
4589
1458
)4(L
2}4
5,4
5,4
8,4
8,3
5,3
5,3
5,2
4,2
4max{
max ,
},...,2,1{,
m
lT
mii
dmiit
LPM algorithm (another example)
1616
1212
88
44
)2(
)1(
L
L
1 2 3 4 5 6
7
DD
(1) (2) (1) (1) (2) (1)
(1)d2 d1
8}2
16,
2
12,1
8,1
4max{ itT