Upload
tapan-sahoo
View
235
Download
0
Embed Size (px)
Citation preview
7/31/2019 Introduction to Dsp
1/54
1
Maurizio Palesi 1
Introduction to DSPIntroduction to DSP
Maurizio PalesiDipartimento di Ingegneria Informatica e delle Telecomunicazioni
University of Catania, Italy
http://www.diit.unict.it/users/mpalesi
Introduction to Embedded SystemsIntroduction to Embedded Systems
Maurizio Palesi 2
ContentsContents
Part I: Digital Signal Processing: Conceptsand Theory
Part II: Signal Processing using a DSP
7/31/2019 Introduction to Dsp
2/54
2
Maurizio Palesi 3
Part-Part-IIDigital SignalDigital Signal ProcessingProcessing
Concepts and TheoryConcepts and Theory
Maurizio Palesi
Maurizio Palesi 4
WhatWhat isis a DSP?a DSP? Digital
Operating by the use of discrete signals to represent data in theform of numbers
Signal
A variable parameter by which information is conveyed through anelectronic circuit
ProcessingTo perform operations on data according to programmed
instructions
Digital Signal Processing
Changing or analysing information which is measured as discretesequences of numbers
7/31/2019 Introduction to Dsp
3/54
3
Maurizio Palesi 5
MainMain CharacteristicsCharacteristics Compared to other embedded computing applications,
DSP applications are differentiated by the following
Computationally demanding
Iterative numeric algorithms
Sensitivity to small numeric errors (audible noise)
Stringent real-time requirements
Streaming data
High data bandwidth
Predictable (though often eccentric) memory access pattern
Predictable program flow (nested loops)
Maurizio Palesi 6
DSPDSP ProcessorsProcessors
1970DSP techniques intelecommunication
equipment
1970DSP techniques intelecommunication
equipment
MicroprocessorMicroprocessorCustom
fixed functionhardware
Customfixed function
hardware
DSP processorsDSP processors
Not adequateperformance
Not adequateflexibility andreusability
7/31/2019 Introduction to Dsp
4/54
4
Maurizio Palesi 7
DSP vs.DSP vs. GeneralGeneral PurposePurpose DSPs adpot a range of specialized features
Single-cycle multiplier
Multiply-accumulate operations
Saturation arithmetic
Separate program and data memories
Dedicated, specilized addressing hw
Complex, specialized instruction sets
Today, virtually very commercial 32-bit microprocessor architecture(from ARM to 80x86) has been subject to some kind of DSP-orientedenhancement
GPGP DSPDSP
VLIW, Superscalar, SIMD,multiprocessing, ...
Maurizio Palesi 8
ConvertingConverting AnalogueAnalogue SignalsSignals
A continuous signal
Measured against a clock
Is first held at each clock tick
The signal is measured, andthe measurement converted toa digital value
7/31/2019 Introduction to Dsp
5/54
7/31/2019 Introduction to Dsp
6/54
6
Maurizio Palesi 11
ReconstructionReconstruction Although Nyquist showed that we have all the information
needed to reconstruct the signal, the sampling theoremdoes not say the samples will look like the signal
sampled fast enough
may still look wrong
but can be reconstructed
A high frequency signal
Maurizio Palesi 12
ReconstructionReconstruction The signal is properly reconstructed from the samples by
low pass filtering
The low pass filter should be the same as the original antialias filter
must be low pass filtered
to reconstruct the original
A sampled signal
7/31/2019 Introduction to Dsp
7/54
7
Maurizio Palesi 13
FrequencyFrequency ResolutionResolution We cannot see slow changes in the signal if we don't wait
long enough
We must sample for at least one complete cycle of the lowestfrequency we want to resolve
Compromise
We must sample fast to avoid and for a long time to achieve a goodfrequency resolution
Sampling fast for a long time means we will have a lot of samples
Lots of samples means lots of computation
So we will have to compromise between resolving frequencycomponents of the signal, and being able to see high frequencies
Maurizio Palesi 14
QuantisationQuantisation When the signal is converted to digital form, the precision is limited by
the number of bits available
The errors introduced by digitisation are both
Non linear: We cannot calculate their effects using normal maths
Signal dependent: the errors are coherent and so cannot be reduced bysimple means
Limited precision leads to errors...
which are signal dependent
7/31/2019 Introduction to Dsp
8/54
8
Maurizio Palesi 15
QuantisationQuantisation errorerror A real DSP system suffers from three sources of error
Limited precision due to word length in A/D conversion
Errors in arithmetic due to limited precision within the processor itself
Limited precision due to word length in D/A conversion
Z-1 Z-1
A/D D/A
QAn ideal DSP system
With quatisationproblems
Z-1+ +
+noise noise
noise
A simple way to get an ideaof the effects of limited wordlength is to model each of thesources of quantisation erroras if it were a source ofrandom noise
Maurizio Palesi 16
TimeTime DomainDomain ProcessingProcessing
Correlation
Autocorrelation to extract a signal fromnoise
Cross correlation to locate a know signal
Cross correlation to identify a signal
Convolution
7/31/2019 Introduction to Dsp
9/54
9
Maurizio Palesi 17
CorrelationCorrelation Correlation is a weighted moving average
Requires a lot of calculation
If one signal is of length Mand the other is of length N, then we
need (N* M) multiplications, to calculate the whole correlationfunction
Note that really, we want to multiply and then accumulate the result -this is typical of DSP operations and is called a multiply & accumulateoperation
+=k
nkykxnr )()()(
x
y
Shift y by n
Multiply the two together
Integrate
Maurizio Palesi 18
CorrelationCorrelation Correlation is a maximum when two signals are similar in shape
Correlation is a measure of the similarity between two signals as afunction of time shift between them
If two signals are similar and unshifted...
their product is all positive
But as the shift increase...
parts of it become negative...
and the correlation function shows where thesignals are similar and unshifted
7/31/2019 Introduction to Dsp
10/54
10
Maurizio Palesi 19
EEG signal
EEGautocorrelation
DetectingDetecting PeriodicityPeriodicity
Autocorrelation as a way to detect periodicity in signals
Maurizio Palesi 20
EEG signalwith noise
EEG with noiseautocorrelation
DetectingDetecting PeriodicityPeriodicity
Although a rhythm is not even visible (upper trace) it is detected byautocorrelation (lower trace)
7/31/2019 Introduction to Dsp
11/54
11
Maurizio Palesi 21
AlignAlign SignalsSignals
6
6
Signal x
Signal y
corr(x,y)
Maurizio Palesi 22
AlignAlign SignalsSignals
x
y
7/31/2019 Introduction to Dsp
12/54
12
Maurizio Palesi 23
CrossCross correlationcorrelation Cross correlation (correlating a signal with another) can be
used to detect and locate known reference signal in noise
A radar or sonar chirp signal
bounced off a target may be buried in noise...
bounced but correlating with the chirp reference
crearly reveals when the echo comes
Maurizio Palesi 24
CrossCross CorelationCorelation toto IdentifyIdentify aa SignalSignal
Cross correlation (correlating a signal with another) can be
used to identify a signal by comparison with a library ofknown reference signals
The chirp of a nightingale...
correlates strongly with another nightgale...
but weakly with a dove...
or a heron...
7/31/2019 Introduction to Dsp
13/54
13
Maurizio Palesi 25
CrossCross CorelationCorelation toto IdentifyIdentify aa SignalSignal Cross correlation is one way in which sonar can
identify different types of vessel
Each vessel has a unique sonar signature
The sonar system has a library of pre-recorded echoesfrom different vessels
An unknown sonar echo is correlated with a library of
reference echoes
The largest correlation is the most likely match
Maurizio Palesi 26
ConvolutionConvolution Correlation is a weighted moving average with one signal
flipped back to front
Requires a lot of calculation
If one signal is of length Mand the other is of length N, then weneed (N* M) multiplications, to calculate the whole convolutionfunction
We need to multiply and then accumulate the result - this is typical of
DSP operations and is called a multiply & accumulateoperation
=k
nkykxnr )()()(
To convolve one signal
with another signal
first flip the second signal
Then shift it
Then multiply the two together
And integrate under the curve
7/31/2019 Introduction to Dsp
14/54
14
Maurizio Palesi 27
ConvolutionConvolution vs.vs. CorrelationCorrelation Convolution is used for digital filtering
Convolving two signals is equivalent to multiplying thefrequency spectraof the two signals together
It is easily understood, and is what we mean by filtering
Correlation is equivalent to multiplying the complexconjugate of the frequency spectrumof one signal bythe frequency spectrum of the other
It is not so easily understood and so convolution is used for
digital filtering Convolving by multiplying frequency spectra is
called fast convolution
Maurizio Palesi 28
FourierFourier TransformTransform The Fourier Transformis a mathematical procedure that allows to convert a signal from
the time domain to the frequency domain
Any signal or waveform could be made up just by adding together a series of sine waveswith appropriate amplitudeand phase
A square wave can be made by adding...
the fundamental
minus 1/3 of the third harmonic
plus 1/5 of the fifth harmonic...
minus 1/7 of the 7th harmonic...
7/31/2019 Introduction to Dsp
15/54
15
Maurizio Palesi 29
FourierFourier TransformTransform The Fourier transform is an equation to calculate the frequency,
amplitudeand phaseof each sine needed to make up any given signal
The Fourier Transform(FT) is a mathematical formula using integrals
The Discrete Fourier Transform(DFT) is a discrete numerical equivalentusing sums instead of integrals
The Fast Fourier Transform(FFT) is just a computationally fast way tocalculate the DFT
The Discrete Fourier Transform involves a summation
DFT and the FFT involve a lot of multiply and accumulate the result
This is typical of DSP operations and is called a multiply & accumulateoperation
[ ]
=
k
fjkekcfH )(2)(
Maurizio Palesi 30
FrequencyFrequency SpectraSpectra
Using the Fourier transform, any signal canbe analysed into its frequency components
Frequency
Amplitude
A recording of speech
Can be analysed to show the spectrum
7/31/2019 Introduction to Dsp
16/54
16
Maurizio Palesi 31
FrequencyFrequency SpectraSpectraWith some signals it is easy to see that they are
composed of different frequencies
A chord played on the piano is obviously made up ofthe different pure tones generated by the keys pressed
You can use a piano as an acoustic spectrum analyserto show that a hand clap has a frequency spectrum
Open the lid of the piano and hold down the 'loud' pedal
Clap your hands loudly over the piano
You will hear (and see) the strings vibrate to echo the clapsound
The strings that vibrate show the frequencies
The amount of vibration shows the amplitude
Maurizio Palesi 32
ShortShort TermTerm FourierFourier TransformTransform
Fourier transform
The signal is analysed over all time
...an infinite durationShort Time Fourier Transform (STFT)
Evaluates the way frequency content changeswith time
[ ]
=k
fjkekcfH)(2
)(
7/31/2019 Introduction to Dsp
17/54
17
Maurizio Palesi 33
ShortShort SignalsSignals If we measure the signal for a short time
What happened to the signal before and after we measured it?
FFT makes an assumption about what happened before and after
The FFT assumes the data are periodic for all time
Real signals
are infinite
Sampled data are finited The FFT only sees this
The FFT works
as if it saw this
Maurizio Palesi 34
ShortShort SignalsSignals
This isthe signal
Period fits the sample time
The FFT worksas if it saw this
7/31/2019 Introduction to Dsp
18/54
18
Maurizio Palesi 35
ShortShort SignalsSignalsThis isthe signal
Not quite an integral numberof cycles fit into the totalduration of the measurement
The FFT worksas if it saw this
Glitches
Maurizio Palesi 36
ShortShort SignalsSignals If the period exactly fits the measurement time
The frequency spectrum is correct
If the period does not match the measurementtime
The frequency spectrum is incorrect - it is broadened
The size of the glitch depends on when the first
measurement occurred in the cycle
The broadening will change if the measurement isrepeated
7/31/2019 Introduction to Dsp
19/54
19
Maurizio Palesi 37
WindowingWindowing
If the period doesnot fit the time,spurious spectrallines result
But windowing thedata can norrow thespectrum
Maurizio Palesi 38
FilteringFiltering
The function of a filter is to remove unwanted parts of the
signal
Random noise
Extract useful parts of the signal
Components lying within a certain frequency range
Filters
Analog
Digital
FilterFilterRaw
signalFilteredsignal
7/31/2019 Introduction to Dsp
20/54
20
Maurizio Palesi 39
AnalogAnalog FiltersFiltersAn analog filteruses analog electronic
circuits
Use components such as resistors, capacitorsand op amps
Widely used in such applications
Noise reduction
Video signal enhancement
Graphic equalisers in hi-fi systems
..., and many other areas
Maurizio Palesi 40
DigitalDigital FiltersFilters
A digital filteruses a digital processor toperform numerical calculations on sampledvalues of the signal
Specialised DSP chip
DSPDSP
Unfilteredanalog
signal
Filteredanalog
signal
A/DA/D D/AD/A
Sampleddigitised
signal
Digitallyfiltered
signal
7/31/2019 Introduction to Dsp
21/54
21
Maurizio Palesi 41
AdvantageAdvantage ofof DigitalDigital FiltersFilters Programmability
The digital filter can easily be changed without affecting the circuitry
Analog filter circuits are subject to drift and are dependenton temperature
Digital filters can handle low frequency signals accurately
As the speed of DSP technology continues to increase,digital filters are being applied to high frequency signals inthe RF domain
Versatility Adapt to changes in the characteristics of the signal
Maurizio Palesi 42
OperationOperation ofof DigitalDigital FiltersFilters
t
x(t)
h
x0
x1 x2 x3x4
x5
xi = x(hi), i=0,1,...
yn= a0xn+ a1xn-1 + a2xn-2+ ... + amxn-myn= a0xn+ a1xn-1 + a2xn-2+ ... + amxn-m m-order filter
FilterFilterxn yn
7/31/2019 Introduction to Dsp
22/54
22
Maurizio Palesi 43
RecursiveRecursive andand Non-RecursiveNon-Recursiveyn= a0xn+ a1xn-1 + a2xn-2+ ... + amxn-myn= a0xn+ a1xn-1 + a2xn-2+ ... + amxn-m
yn= a0xn+ a1xn-1 + a2xn-2+ ... + amxn-m+
b1yn-1 + b2yn-2+ b3yn-3+ ... + bpyn-p
yn= a0xn+ a1xn-1 + a2xn-2+ ... + amxn-m+
b1yn-1 + b2yn-2+ b3yn-3+ ... + bpyn-p
Non recursive filterorder = m
Recursive filterorder = max{m,p}
Maurizio Palesi 44
FIR and IIRFIR and IIR FiltersFilters
Finite Impulse Response (FIR)
Non-recursive filter
Infinite Impulse Response (IIR)
Recursive filter
7/31/2019 Introduction to Dsp
23/54
23
Maurizio Palesi 45
RecursiveRecursive vs.vs. Non-RecursiveNon-Recursive
It might seem that recursive filters require morecalculations to be performed
To achieve a given frequency response characteristicusing a recursive filter generally requires a much lowerorder filter
Fewer terms to be evaluated by the processor
yn= a0xn+ a1xn-1 + a2xn-2+ ... + amxn-myn= a0xn+ a1xn-1 + a2xn-2+ ... + amxn-m yn= a0xn+ a1xn-1 + a2xn-2+ ... + amxn-m+
b1yn-1 + b2yn-2+ b3yn-3+ ... + bpyn-p
yn= a0xn+ a1xn-1 + a2xn-2+ ... + amxn-m+
b1yn-1 + b2yn-2+ b3yn-3+ ... + bpyn-p
Maurizio Palesi 46
ExampleExample ofof RecursiveRecursive FilterFilter
yn= xn+ yn-1yn= xn+ yn-1
y0 = x0 + y-1
y1 = x1 + y0y2 = x2 + y1y3 = x3 + y2y10 = x10 + y9
y0 = x0 + y-1
y1 = x1 + x0 + y-1y2 = x2 + x1 + x0 + y-1y3 = x3 + x2 + x1 + x0 + y-1...y10 = x10 + x9 + x8 + + x3 + x2 + x1 + x0 + y-1
Recursive Non-recursive
7/31/2019 Introduction to Dsp
24/54
24
Maurizio Palesi 47
SymmetricalSymmetrical FormFormyn= (a0xn+ a1xn-1 - b1yn-1)/b0yn= (a0xn+ a1xn-1 -b1yn-1)/b0
b0yn+ b1yn-1 = a0xn+ a1xn-1b0yn+ b1yn-1= a0xn+ a1xn-1
b0yn+ b1yn-1 + + bpyn-p= a0xn+ a1xn-1+ + amxn-mb0yn+ b1yn-1+ + bpyn-p= a0xn+ a1xn-1+ + amxn-m
Maurizio Palesi 48
TransferTransfer FunctionFunction (IIR)(IIR)
z-1 unit delay operator
z-1xn = xn-1z-2xn = xn-2
b0yn+ b1yn-1 + b2yn-2= a0xn+ a1xn-1+ a2xn-2b0yn+ b1yn-1+ b2yn-2= a0xn+ a1xn-1+ a2xn-2
(b0+ b1z-1
+ b2z-2)yn= (a0+ a1z
-1+ a2z-2)xn(b0+ b1z
-1+ b2z
-2)yn=(a0+ a1z-1+ a2z
-2)xn
(a0+ a1z-1+ a2z
-2)(b0+ b1z
-1+ b2z
-2)ynxn
=Transfer functionfor a second-orderrecursive (IIR)filter
7/31/2019 Introduction to Dsp
25/54
25
Maurizio Palesi 49
TransferTransfer FunctionFunction (FIR)(FIR)z-1 unit delay operator
z-1xn = xn-1z-2xn = xn-2
yn= a0xn+ a1xn-1+ a2xn-2yn= a0xn+ a1xn-1+ a2xn-2
yn= (a0+ a1z-1+ a2z
-2)xnyn=(a0+ a1z-1+ a2z
-2)xn
a0+ a1z-1+ a2z
-2ynxn
=Transfer functionfor a second-ordernon-recursive(FIR) filter
Maurizio Palesi 50
DigitalDigital FilterFilter EquationEquationOutput from a digital filter is made up from
previous inputs and previous outputs, using theoperation of convolution
[ ] [ ] [ ] [ ] [ ] +=jk
jnyjdknxkcny
Output Previous input Previous output
Coefficients
7/31/2019 Introduction to Dsp
26/54
26
Maurizio Palesi 51
DigitalDigital FilterFilter EquationEquation[ ] [ ] [ ] [ ] [ ] +=
jk
jnyjdknxkcny
Z-1
Z-2
c[0]
c[1]
c[2]
Z-1
Z-2
d[0]
d[1]
d[2]
x y
Maurizio Palesi 52
FilterFilter FrequencyFrequency ResponseResponse
When designing a digital filter we want to do the
inverseoperationCalculate the filter coefficients having first defined the
desired frequency response
Additional constraint
We usually want to design a filter that meets the requirementbut which requires the least possible amount of computation
Using the smallest number of coefficients
[ ]
[ ]
=
k
fjk
k
fjk
ekd
ekc
fH)(2
)(2
1)(
7/31/2019 Introduction to Dsp
27/54
27
Maurizio Palesi 53
FIRFIR FiltersFilters The filter equation is simplified by excluding the possibility
of feedback
The filter response is
This frequency response is just the Fourier transform of the filtercoefficients
[ ] [ ] [ ] =k
knxkcny Z-1
Z-2
c[0]
c[1]
c[2]
x y
[ ]
=
k
fjkekcfH
)(2)(
Maurizio Palesi 54
FIRFIR FiltersFilters So the coefficients for an FIR filter can be
calculated simply by taking the Inverse FourierTransformof the desired frequency response
Here is a recipe for calculating FIR filter
coefficients
Decide upon the desired frequency response
Calculate the inverse Fourier transform
Use the result as the filter coefficients
BUT...
7/31/2019 Introduction to Dsp
28/54
28
Maurizio Palesi 55
FIRFIR FiltersFilters BUT
The iFT has to take samples of the continuous desired frequency response
To define a sharp filter needs closely spaced frequency samples - so a lot of them
So the iFT will give us a lot of filter coefficients
But we don't want a lot of filter coefficients
A better recipe for calculating FIR filter is:
Specify the desired frequency response using lots of samples
Calculate the inverse iFT
This gives us a lot of filter coefficients
So truncate the filter coefficients to give us less
Then calculate the FT of the truncated set of coefficients to see if it still matches ourrequirement
BUT...
Maurizio Palesi 56
FIRFIR FiltersFilters Truncating the filter coefficients means we have a truncated signal...
...And a truncated signal has a broad frequency spectrum
Applying a window function is a simple way to sharpen up thefrequency spectrum of a truncated signal
Specify the desired frequency response using lots of samples
Calculate the inverse Fourier transform
This gives us a lot of filter coefficients
So truncate the filter coefficients to give us less
Apply a window function to sharpen up the filter's frequency response
Then calculate the Fourier transform of the truncated set of coefficients tosee if it still matches our requirement
This is called the window method of FIR filter design
7/31/2019 Introduction to Dsp
29/54
29
Maurizio Palesi 57
Summary so farSummary so far Converting analogue signals
Aliasing and Quantisation problems
Time domain processing
Correlation and Convolution
Frequency domain processing
Fourier transform and windowing
Digital filters
FIR and IIR filters
More reading
http://www.bores.com
Maurizio Palesi 58
Part-IIPart-II
Signal Processing using a DSPSignal Processing using a DSPMaurizio Palesi
7/31/2019 Introduction to Dsp
30/54
30
Maurizio Palesi 59
DSPDSP ProcessorsProcessorsCharacteristic features of DSP processors
Special features for arithmetic
Addressing modes
Data formats
Programming a DSP
Maurizio Palesi 60
CharacteristicsCharacteristics of DSPof DSP ProcessorsProcessors
DSP processors are mostly designed withthe same few basic operations in mind
They share the same set of basic
characteristics
Specialised high speed arithmeticData transfer to and from the real world
Multiple access memory architectures
7/31/2019 Introduction to Dsp
31/54
31
Maurizio Palesi 61
CharacteristicsCharacteristics of DSPof DSP ProcessorsProcessors The basic DSP operations
Additions and multiplications
Fetch two operands
Perform the addition ormultiplication (usually both)
Store the result or hold it for arepetition
Delays
Hold a value for later use
Array handling
Fetch values from consecutivememory locations
Copy data from memory tomemory
Z-1
Z-2
c[0]
c[1]
c[2]
x y
Maurizio Palesi 62
CharacteristicsCharacteristics of DSPof DSP ProcessorsProcessors
To suit these fundamental operations DSP processors
often have
Parallel multiply and add
Multiple memory accesses (to fetch two operands and store theresult)
Lots of registers to hold data temporarily
Efficient address generation for array handlingSpecial features such as delays or circular addressing
7/31/2019 Introduction to Dsp
32/54
32
Maurizio Palesi 63
DSPDSP ProcessorsProcessors:: MathematicsMathematics
Type conversion
P S
a0
a1
a2
a3
Single bus 32
3240
45 40
40
40
32
Lucent DSP32Cdatapath
Multiply and accumulateoperate in parallel
Maurizio Palesi 64
AddressAddress GenerationGeneration The ability to generate new addresses efficiently is a characteristic
feature of DSP processors
Usually, the next needed address can be generated during the datafetch or store operation, and with no overhead
DSP processors have rich sets of address generation operations
*rP register indirect read the data pointed to by the address in register rP
*rP++ postincrement
having read the data, postincrement the address
pointer to point to the next value in the array
*rP-- postdecrement
having read the data, postdecrement the address
pointer to point to the previous value in the array
*rP++rI
register
postincrement
having read the data, postincrement the address
pointer by the amount held in register rIto point to rI
values further down the array
*rP++rIr bit reversed
having read the data, postincrement the address
pointer to point to the next value in the array, as if the
address bits were in bi t reversed order
*rP register indirect read the data pointed to by the address in register rP
*rP++ postincrement
having read the data, postincrement the address
pointer to point to the next value in the array
*rP-- postdecrement
having read the data, postdecrement the address
pointer to point to the previous value in the array
*rP++rI
register
postincrement
having read the data, postincrement the address
pointer by the amount held in register rI to point to rI
values further down the array
*rP++rIr bit reversed
having read the data, postincrement the address
pointer to point to the next value in the array, as if the
address bits were in b it reversed order
7/31/2019 Introduction to Dsp
33/54
33
Maurizio Palesi 65
BitBit ReversedReversed AddressingAddressing DSPs are tightly targeted to a small number of algorithms
It is surprising that an addressing mode has been specificallydefined for just one application (the FFT)
0 (0002) 0 (0002)1 (0012) 4 (1002)2 (0102) 2 (0102)3 (0112) 6 (1102)
4 (1002)
1 (0012)5 (1012) 5 (1012)6 (1102) 3 (0112)7 (1112) 7 (1112)
Addresses generated by a radix-2 FFT Whithout special support suchaddress transformations would
Take an extra memory access toget the new address
Involve a fair amount of logicalinstructions
Maurizio Palesi 66
MemoryMemory AddressingAddressing As DSP programmers migrate toward larger programs, they are more
attracted to compilers
Such compilers are not able to fully exploit such specific addressing modes
DSP community routinely uses library routines
Programmers may benefit even if they write at a high level
Addressing mode Percent
Immediate 30,02%
Displacement 10,82%
Register indirect 17,42%
Direct 11,99%
Autoincrement, postincrement 18,84%
Autoincrement, preincrement with 16 bit immediate 0,77%
Autoincrement, preincrement with circular addresing 0,08%
Autoincrement, postincrement by contents of AR0 1,54%
Autoincrement, postincrement by contents of AR0, with circular addressing 2,15%
Autodecrement, postdecrement 6,08%
Addressing mode Percent
Immediate 30,02%Displacement 10,82%
Register indirect 17,42%
Direct 11,99%
Autoincrement, postincrement 18,84%
Autoincrement, preincrement with 16 bit immediate 0,77%
Autoincrement, preincrement with circular addresing 0,08%
Autoincrement, postincrement by contents of AR0 1,54%
Autoincrement, postincrement by contents of AR0, with circular addressing 2,15%
Autodecrement, postdecrement 6,08%
~90%
7/31/2019 Introduction to Dsp
34/54
34
Maurizio Palesi 67
Compiler for DSPsCompiler for DSPs Despite the well documented advantages in programmer productivity
and software maintenance...
TMS320C54 D (C54)
for DSPstone kernels
Ratio to assembly in
execution time (>1
means slower)
Ratio to assembly in
code space (>1
means bigger)
TMS320C6203 (C62) for
EEMBC Telecom
kernels
Ratio to assembly in
execution time (>1
means slower)
Ratio to assembly in
code space (>1
means bigger)
Convolution 11,8 16,5 Convolution encoder 44,0 0,5
FIR 11,5 8,7 Fixed-point complex FFT 13,5 1,0
Matrix 1x3 7,7 8,1 Viterbi GSM decoder 13,0 0,7
FIR2dim 5,3 6,5 Fixed-point bit allocation 7,0 1,4
Dot product 5,2 14,1 Autocorrelation 1,8 0,7
LMS 5,1 0,7
N real update 4,7 14,1
IIR n biquad 2,4 8,6
N complex update 2,4 9,8
Matrix 1,2 5,1Complex update 1,2 8,7
IIR one biquad 1,0 6,4
Real update 0,8 15,6
C54 geometric mean 3,2 7,8 C62 geometric mean 10,0 0,8
TMS320C54 D (C54)
for DSPstone kernels
Ratio to assembly in
execution time (>1
means slower)
Ratio to assembly in
code space (>1
means bigger)
TMS320C6203 (C62) for
EEMBC Telecom
kernels
Ratio to assembly in
execution time (>1
means slower)
Ratio to assembly in
code space (>1
means bigger)
Convolution 11,8 16,5 Convolution encoder 44,0 0,5
FIR 11,5 8,7 Fixed-point complex FFT 13,5 1,0
Matrix 1x3 7,7 8,1 Viterbi GSM decoder 13,0 0,7
FIR2dim 5,3 6,5 Fixed-point bit allocation 7,0 1,4
Dot product 5,2 14,1 Autocorrelation 1,8 0,7
LMS 5,1 0,7
N real update 4,7 14,1
IIR n biquad 2,4 8,6
N complex update 2,4 9,8
Matrix 1,2 5,1Complex update 1,2 8,7
IIR one biquad 1,0 6,4
Real update 0,8 15,6
C54 geometric mean 3,2 7,8 C62 geometric mean 10,0 0,8
Maurizio Palesi 68
DSPDSP ProcessorsProcessors: Input/Output: Input/Output
DSPDSP
DSPDSP DSPDSP
System controller
Other DSP
Signal In Signal Out
DSP is mostly dealing with the real world Communication with an overall system controller
Signals coming in and going out
Communication with other DSP processors
7/31/2019 Introduction to Dsp
35/54
35
Maurizio Palesi 69
SignalsSignalsThey are usually handled by high speed
synchronous serial ports
Serial ports are inexpensive
Having only two or three wires
Well suited to audio or telecommunicationsdata rates up to 10 Mbit/s
Usually operate under DMAData presented at the port is automatically written
into DSP memory without stopping the DSP
Maurizio Palesi 70
DataData FormatsFormats DSP processors store data in fixedor floating pointformats
The programmer has to make some decisions
If a fixed point number becomes too large for the available wordlength, he has to scale the number down, by shifting it to the right
If a fixed point number is small, he has to scale the number up, inorder to use more of the available word length
0 1 0 1 0 0 1 1
-27 26 25 24 23 22 21 20= 26 + 24 + 21 + 20= 83
Integer
0 1 0 1 0 0 0 0-20 2-1 2-2 2-3 2-4 2-5 2-6 2-7
= 2-1 + 2-3 = 0.5 + 0.125 = 0.625
Fixed point
7/31/2019 Introduction to Dsp
36/54
36
Maurizio Palesi 71
FixedFixed PointPointFixed pointcan be thought of as just low-
cost floating point
It does not include an exponent in every word
No hw that automatically aligns and normalizesoperands
DSP programmer take cares to keep the exponent ina separate variable
Often this variable is shared by a set of fixed-pointvariables
Blocked floating point
Maurizio Palesi 72
FloatingFloating PointPoint Floating pointformat has the remarkable property of
automatically scaling all numbers by moving, and keepingtrack of, the binary point so that all numbers use the fullword length available but never overflow
-2-1 20 2-1 2-2 2-3 2-4 2-5 2-6 2-7
Mantissa = 20 + 2-1 + 2-3= 1 + 0.5 + 0.125 = 1.625
Exponent = 22 + 21 = 6
Decimal value = 1.625 26
0 1 1 0
Mantissa Exponent
0 1 1 0 1 0 0 0 0
-23 22 21 20
7/31/2019 Introduction to Dsp
37/54
37
Maurizio Palesi 73
DataData FormatsFormats In Floating Point the HW automatically scalesand
normalisesevery number
Errors due to truncation and rounding depend on the sizeof the number
These errors can be seen as a source of quantisation
noise
Then the noise is modulated by the size of the signal
The signal dependend modulation of the noise is undesiderablebecause is audible
The audio industry prefers to use fixed point DSP processors overfloating point
Maurizio Palesi 74
SaturatingSaturating ArithmeticsArithmetics DSPs are often used in real-time applications
No exception on arithmentic overflow
It could miss an event
To support such an environment, DSP architectures use saturatingarithmetic
If the result is too large to be represented, it is set to the largest representablenumber
Saturating arithmeticNormal twoscomplement arithmetic
7/31/2019 Introduction to Dsp
38/54
38
Maurizio Palesi 75
ProgrammingProgramming a DSPa DSP ProcessorProcessorA simple FIR filter program
Using pointers
Avoiding memory bottlenecks
Assembler programming
Maurizio Palesi 76
AA SimpleSimple FIRFIR FilterFilter The simple FIR filter equation is
Which can be implemented quite directly in Clanguage
[ ] [ ] [ ] =k
knxkcny
y[n] = 0.0;for (k=0; k
7/31/2019 Introduction to Dsp
39/54
39
Maurizio Palesi 77
ProblemProblem inin AddressingAddressingFive operation to calculate the address of
the element x[n-k]
Load the start address of the table in memory
Load the value of the index n
Load the value of the index k
Calculate the offset [n- k]
Add the offset to the start address of the arrayOnly after all five operations can the
compiler actually read the array element
Maurizio Palesi 78
UsingUsing PointersPointersy[n] = 0.0;
for (k=0; k
7/31/2019 Introduction to Dsp
40/54
40
Maurizio Palesi 79
float *y_ptr, *c_ptr, *x_ptr;
y_ptr = &y[n];
for (k=0; k
7/31/2019 Introduction to Dsp
41/54
41
Maurizio Palesi 81
LimitingLimiting MemoryMemory AccessesAccesses
Fourmemory accesses
Even without counting the need to load the instruction,this exceeds the capacity of a DSP processor
Fortunately, DSP processors have lots of registers
float *y_ptr, *c_ptr, *x_ptr;
y_ptr = &y[n];
for (k=0; k
7/31/2019 Introduction to Dsp
42/54
42
Maurizio Palesi 83
IntroductionIntroduction The TMS320C3x generation of DSPs are high
performance 32-bit floating-point devices in theTMS320 family
Extensive internal busing
2 operands from memory & 2 operands from theregisters file
Powerful DSP instruction set
60 MFLOPS High degree of on-chip parallelism
Up to 11 operations in a single instruction
Maurizio Palesi 84
GeneralGeneral FeaturesFeatures
General-purpose register file
Program cache
Dedicated auxiliary register arithmetic units(ARAU)
Internal dual-access memories
Direct memory access (DMA)
Short machine-cycle time
7/31/2019 Introduction to Dsp
43/54
43
Maurizio Palesi 85
StatusStatus RegisterRegister (ST)(ST) Contains global information about the state of the CPU
Operations usually set the condition flags of the status registeraccording to whether the result is 0, negative, etc.
Globalinterruptenable
Clearcache
Cacheenable
Cachefreeze
Repeatmode
Overflowmode
Latchedfloatingpoint
overflow
Latchedoverflow
floatingpoint
underflow
Negative
Zero
Overflow
Carry
Maurizio Palesi 86
RepeatRepeat CounterCounter (RC) and Block(RC) and Block RepeatRepeat (RS,RE)(RS,RE)
RC is a 32-bit register that specifies the number of
times a block of code is to be repeated when ablock repeat is performed
If RC=n, the loop is executed n+1 times
RS register contains the starting address of the
program-memory block to be repeated when theCPU is operating in the repeat mode
RE register contains the ending address of theprogram-memory block to be repeated when the
CPU is operating in the repeat mode
7/31/2019 Introduction to Dsp
44/54
44
Maurizio Palesi 87
InstructionInstruction CacheCache 6432-bit instruction cache
2-way set associative
LRU replacement policy
It allows the use of slow, external memories while stillachieving single-cycle access performances
The cache also frees external buses from program fetchesso that they can be used by the DMA or other systemelements
Maurizio Palesi 88
AddressingAddressing ModesModes
Five types of addressing
Register addressing
Direct addressing
Indirect addressing
Immediate addressingPC-relative addressing
Plus two specialized addressing modes
Circular addressing
Bit-reverse addressing
7/31/2019 Introduction to Dsp
45/54
45
Maurizio Palesi 89
CircularCircular AddressingAddressing Many DSP algorithms, such as convolution and correlation, require a
circular buffer in memory
In convolution and correlation, the circular buffer acts as a slidingwindow that contains the most recent data to process
As new data is brought in, the new data overwrites the oldest data
Logical representation Physical representation
Start
End
Maurizio Palesi 90
CircularCircular AddressingAddressing
Logical representation Physical representation
Start
End
value5
value5
value0 value0
value1
value1
value2
value2
value3
value3value4
value4
value6 value6
value7
value7
7/31/2019 Introduction to Dsp
46/54
46
Maurizio Palesi 91
ImplementationImplementationBK Length of the circular buffer
(16 bit, buffer length
Length of buffer BK register value Starting address of buffer
31 31 XXXXXXXXXXXXXXXXXXX00000
32 32 XXXXXXXXXXXXXXXXXX000000
1024 1024 XXXXXXXXXXXXX00000000000
Maurizio Palesi 92
AlgorithmAlgorithm forfor CircularCircular AddressingAddressing
Start
End
Bufferlength
(BK)
Index
circ(index, step) =
if (0 index+step < BK)
index = index+step;
else if (index+step BK)index = index+step-BK;
else
index = index+step+BK;
7/31/2019 Introduction to Dsp
47/54
47
Maurizio Palesi 93
CircularCircular AddressingAddressing -- ExampleExample*ARn++(disp)% ; addr= ARn
; ARn = circ(ARn+disp)
*AR0++(5)%; Now AR0 is circ(0+5)=5
2
3
4
5
67
8
...
0
1
MemoryAddr
*AR0++(2)%; Now AR0 is circ(5+2)=1
*AR0(3)%; Now AR0 is circ(1-3)=4
*AR0++(6)%; Now AR0 is circ(4+6)=4
*AR0%; Now AR0 is circ(4-1)=3
; AR0 is 0; BK is 6
*AR0%
Maurizio Palesi 94
ParallelParallel OperationsOperations The 13 parallel-operations instructions make a
high degree of parallelism possible
Some of the C3x instructions can occur in pairsthat are executed in parallel
Parallel loading of registers
Parallel arithmetic operations
Arithmetic/logical instructions used in parallel with astore instruction
7/31/2019 Introduction to Dsp
48/54
48
Maurizio Palesi 95
ParallelParallel OperationsOperations Parallel arithmeticwith storeinstructions
Many other
Maurizio Palesi 96
ParallelParallel OperationsOperations Parallel loadinstructions
Parallel multiplyand add/subtractinstructions
7/31/2019 Introduction to Dsp
49/54
49
Maurizio Palesi 97
Matrix-VectorMatrix-Vector MultiplicationMultiplication [P]K1=[M]KN [V]N1
for (i=0; i
7/31/2019 Introduction to Dsp
50/54
50
Maurizio Palesi 99
Matrix-VectorMatrix-Vector MultiplicationMultiplication* AR0 : ADDRESS OF M(0,0)* AR1 : ADDRESS OF V(0)
* AR2 : ADDRESS OF P(0)
* AR3 : NUMBER OF ROWS - 1 (K-1)
* R1 : NUMBER OF COLUMNS - 2 (N-2)
MAT LDI R1,IR0 ; Number of columns-2 -> IR0
ADDI 2,IR0 ; Number of columns -> IR0
ROWS LDF 0.0,R2 ; Initialize R2
MPYF3 *AR0++(1),*AR1++(1),R0 ; m(i,0) * v(0) -> R0
RPTS R1 ; Multiply a row by a column
MPYF3 *AR0++(1),*AR1++(1),R0 ; m(i,j) * v(j) -> R0
|| ADDF3 R0,R2,R2 ; m(i,j-1) * v(j-1) + R2 -> R2SUBI 1,AR3
BNZD ROWS ; Counts the no. of rows left
ADDF R0,R2 ; Last accumulate
STF R2,*AR2++(1) ; Result -> p(i)
NOP *AR1(IR0) ; Set AR1 to point to v(0)
* AR0 : ADDRESS OF M(0,0)
* AR1 : ADDRESS OF V(0)
* AR2 : ADDRESS OF P(0)
* AR3 : NUMBER OF ROWS - 1 (K-1)
* R1 : NUMBER OF COLUMNS - 2 (N-2)
MAT LDI R1,IR0 ; Number of columns-2 -> IR0
ADDI 2,IR0 ; Number of columns -> IR0
ROWS LDF 0.0,R2 ; Initialize R2
MPYF3 *AR0++(1),*AR1++(1),R0 ; m(i,0) * v(0) -> R0
RPTS R1 ; Multiply a row by a column
MPYF3 *AR0++(1),*AR1++(1),R0 ; m(i,j) * v(j) -> R0
|| ADDF3 R0,R2,R2 ; m(i,j-1) * v(j-1) + R2 -> R2
SUBI 1,AR3
BNZD ROWS ; Counts the no. of rows left
ADDF R0,R2 ; Last accumulate
STF R2,*AR2++(1) ; Result -> p(i)
NOP *AR1(IR0) ; Set AR1 to point to v(0)
Delayslot
Maurizio Palesi 100
CC ProgrammingProgramming TipsTips After writing your application in C language, debug the
program and determine whether it runs efficiently
If the program does not run efficiently
Use the optimizer with o2 or o3 options when compiling
Use registers to pass parameters (ms compiling option)
Use inlining (x compiling option)
Remove the g option when compiling
Follow some of the efficient code generation tips
Use register variables for often-used variables
Precompute subexpressions
Use *++ to step through arrays
Use structure assignments to copy blocks of data
7/31/2019 Introduction to Dsp
51/54
51
Maurizio Palesi 101
UseUse RegisterRegister VariablesVariables Exchange one object in memory with another
register float *src, *dest, temp;
do {
temp = *++src;
*src = *++dest;
*dest = temp;
} while (n);
registerfloat *src, *dest, temp;
do {
temp = *++src;
*src = *++dest;
*dest = temp;
}while (n);
Maurizio Palesi 102
PrecomputePrecompute SubexpressionSubexpression andand useuse *++*++
main() {float a[10], b[10];int i;for (i = 0; i < 10; ++i)
a[i] = (a[i] * 20) + b[i];}
main() {float a[10], b[10];int i;for (i = 0; i < 10; ++i)
a[i] = (a[i] * 20) + b[i];}
main() {float a[10], b[10];int i;register float *p = a, *q = b;for (i = 0; i < 10; ++i)
*p++ = (*p * 20) + *q++;}
main() {float a[10], b[10];int i;registerfloat *p = a, *q = b;for (i = 0; i < 10; ++i)
*p++ = (*p * 20) + *q++;}
19 cycles
12 cycles
7/31/2019 Introduction to Dsp
52/54
52
Maurizio Palesi 103
StructureStructure AssignmentsAssignments The compiler generates very efficient code for structure
assignments
Nest objects within structures and use simple assignments to copythem
int x1, y1, c1;int x2, y2, c2;
x1 = x2;y1 = y2;c1 = c2;
int x1, y1, c1;int x2, y2, c2;
x1 = x2;
y1 = y2;c1 = c2;
struct Pixel {int x, y, c;
};
struct Pixel p1, p2;
p1 = p2;
struct Pixel { int x, y, c;};
struct Pixel p1, p2;
p1 = p2;
Maurizio Palesi 104
HintsHints forfor AssemblyAssembly CodingCoding
Use delayed branches
Delayed branches execute in a single cycle
Regular branches execute in four cycles
The next three instructions are executed
whether the branch is taken or notIf fewer than three instructions are required, use the
delayed branch and append NOPs
A reduction in machine cycles still occurs
7/31/2019 Introduction to Dsp
53/54
53
Maurizio Palesi 105
HintsHints forfor AssemblyAssembly CodingCodingApply the repeat single/block construct
In this way, loops are achieved with no
overhead
Note that using RPTS instruction the executedinstruction is not refetched for execution
This frees the buses for operand fetches
Maurizio Palesi 106
HintsHints forfor AssemblyAssembly CodingCoding
Use parallel instructions
Maximize the use of registers
Use the cache
Use internal memory instead of externalmemory
Avoid pipeline conflicts
7/31/2019 Introduction to Dsp
54/54
Maurizio Palesi 107
Summary and ConclusionsSummary and Conclusions Characteristic features of DSP processors
Special features for arithmetic & data formats
Addressing modes
Programming a DSP
More reading
TMS320C3x General-Purpose Applications Users
Guide
http://focus.ti.com/general/docs/tecdocs.tsp