Upload
shantanu-gawande
View
256
Download
0
Embed Size (px)
DESCRIPTION
VLSI-DSP Based Embedded System
Citation preview
Department of Electronics & ECE
Indian Institute of Technology, Kharagpur - 721302
Prof. Swapna Banerjee
VLSI-DSP Based Embedded System : An Overview
Digital Signal Processing was born in the 1960’s. The (re)discovery of the Fast Fourier Transform, marked the birth.
Special purpose DSP chips also saw their birth in the late 1970’s. This class of chips uses special arithmetic structures and arrays-have driven much of the work in VLSI Research Group.
In the 1990’s, a merger of two architectures into programmable DSP chips with special hardware functions. An example TMS320C8x generation contain several DSP cores with a master RISC controller and an IEEE standard floating point ALU.
Field Programmable Gate Arrays (FPGAs), came onto the market in the 1980s. Devices are ubiquitous, with RAM based designs. The densities of FPGAs have now reached the point where they are serious contenders for implementing small DSP arrays.
The advantage, of this programmable approach, is a very fast design cycle with the ability to correct errors and update features without a change of hardware; thus the sacrifice in inefficiency is acceptable.
A Brief History
The state-of-the-art at 2010 allow 2 orders of magnitude increase in DRAM and SRAM bits in the following Table.
Driver
Year of first DRAM shipment 1995 1998 2001 2004 2007 2010
Feature Size (m) .35 .25 .18 .13 .10 .07
DRAM bits/Chip 64M 256M 1G 4G 18G 64G D
SRAM (cache) Bits/cm2 2M 6M 20M 50M 100M 300M L(P)
Logic Transistors/cm2 (packed) 4M 7M 13M 25M 50M 90M L(P)
On-chip clock (MHz) (high perf. DSP) 400 600 800 1100 1500 1900 DSP
Wiring Levels (Logic) 4-5 5 5-6 6 6-7 7-8 P
Power Supply (Desktop) 3.3 2.5 1.8 1.5 1.2 0.9 P
Power Supply (Battery) 2.5 1.8-2.5 0.9-1.8 0.9 0.9 0.9 A
Maximum Power W (high perf. w. heat-sink)
80 100 120 140 160 180 P
Design issues associated with the technology include:
Algorithm design Architecture design Implementation technology Verification and test Library circuit design Arithmetic implementation
The combination of constantly evolving DSP algorithm with DSPP hardware have formed the basis for an exponential growth in DSP applications.
TMS32060 DSP96002 TMS32080 ADSP21060DIVIDE 45ns 233ns 125ns 150nsPRECISION 32bits 32bits 32bits 32bitsRAM/ROM 1Mbit 8Kbytes 50Kbytes 4MbytesI CYCLE TIME 5ns 33.3ns 25ns 25ns
Table : The following table summarizes some of the key features of the DSPPs
Figure : Heterogeneity in the top-down design flow of complex systems
Subsystem Model of Computationaudio processingdigital image processingimage/ video resamplinguser interfacecommunication protocolsdigital controlimage understandingscalable descriptions
1-D dataflow2-D dataflowm-D multirate dataflowsynchronous/ reactivefinite-state machinedataflowknowledge-based controlprocess networks
Table : Models of computation for describing the signal processing, communications, and control aspects of image and video processing systems.
Figure . Typical structure of a large system
Figure . A heterogeneous hardware and software platform.
DSP
FPGA
DSP + FPGA
Search Window
I/P
P
Video Stream
External Video Frame Buffers
Reconstructed Frame
Figure: Macroblock-based pipeline processing
DaVinci technology integrated portfolio of DSP-based processors, software, tools, and support for developing a broad spectrum of optimized digital video end equipments.
DaVinci processors :
- Digital cameras - Video security- Video telephones - Portable media players- IP set-top box - Medical imaging- Automotive infotainment - Networked video for emerging applications.
Inter-chip communication is critical in the implementation model.
Figure : Architecture B significantly increases DSP/FPGA/memory interactions.
Ptolemy uses object-oriented software principles to achieve the following goals:Agility: Support distinct computational models, so that each subsystem can be simulated and prototyped in a manner that is appropriate and natural to that subsystem.Heterogeneity: Allow distinct computational models to coexist seamlessly for the purpose of studying interactions among subsystems.
Extensibility: Support seamless integration of new computational models and allow them to interoperate with existing models with no changes to Ptolemy or to existing models.Friendliness: Use a modern graphical interface with a hierarchical block-diagram style of representation.
Typical SOC Architecture will have:
A processor core One or more caches On chip bus hierarchy On chip memory A large number of peripheral cores (they provide application specific functionality such as multimedia and communication processing)
A designer must have a method for finding a feasible set of parameter values, referred to as a configuration of the SOC that meets the specification requirements.
Patient
Sensor
Pre-processingUnit
ADC
Signal ProcUnit
COMPUTER
Expert System
Adaptive Control
Data Bank
Image Processing
UnitComm.
Interface
The SystemThe SystemYou may say I'm a dreamerBut I'm not the only oneI hope someday you'll join usAnd the world will be as one
Resistive sensors, Inductive/ Capacitive Sensors
•Quasi-digital sensors:
Gives outputs with variable frequency, pulse-rate
or pulse duration that are easily converted to digital signals.
•Piezoelectric sensors
•Thermistors
•Fiber-Optic Temperature sensor
•Laser
•Photo conductive cells
•Photo junction sensors
SensorsSensors
Domain of VLSIDomain of VLSI
•Diagnostic products
•Therapeutic products
•Analytical Instruments
•Monitoring Instruments
•Rehabilitative Devices
•Processing Instrument
VLSI’s Objectives
Low Power System Real-time Processing High Precision Design Algorithmic Applicability
Design Domain Analog Front-end Design Digital Signal Processing units viz. DFT, DCT, DHT and DST
Implementation Platform Xilinx FPGA Synopsys and Cadence (ASIC Platform)
Arterial Condition
(probabilistic)
Spectrogram ImageFeature
Extraction
Flow Diagnosis
Probability Measurement
Contour Detection
Contour Motion Detection
Probabilitymeasure
Arterial Conditiondetection
Final Final InferenceInference
VLSI based DopplerVLSI based Doppler ultrasonography system
BP Neural Network
Bayesian Probability
Inertial Snake
Bayesian Probability
Doppler Ultrasonography System
Analog Frontal
End
128 Point FFTCORDIC Processors
(16-bit operation8-bit o/p)
PCI Bus Interface
8-bit data to PC
8-bit data Display
(spectrogram)
KnowledgeBase
A/D ConversionSampling
Frequency 32KHz
PZT Transducer
8MHzf0= f0±Δf
Low-cost Colour Doppler Ultrasonography System
Different Parameters in the spectrogram
Systolic Window (SW)
Period
BA
Systolic peak (S)
SB
SB: Spectral BroadeningDiastolic trough (D)
Structure of the Knowledge Base system Age and
Region based Grouping
Feature Extraction
No
I = N?Yes Known
Pattern?
Inference
Add to Database
ANN –based
classifier
Upgrade Classifier
Store Weight Matrix
Train
No
Input Spectro-grams
Yes
BPNN Structure
AABBS/DP
SWSBICKC
Normal
Distal Stenosis
Proximal Stenosis
Vasodilatation
Ischemic
i-nodesj-nodes
k-nodes
Wij Wjk
Brachial
Left External Carotid Artery
Left Common Carotid Artery
Right External Carotid Artery
Heart
Radial Ulnas
Aorta
Common Femoral
Popletial
Anterior Tibial Posterior Tibial
Arterial Distribution in Human BodyArterial Distribution in Human Body
Knowledge-Base Development
Spectrogram recognition Spectrogram recognition using BPNNusing BPNN
The main components of DSP/DIP systems
Spectral analysis of time varying signals (ECG, EEG, EMG, EGG, Doppler ultrasonography signal etc.) Discrete Fourier Transform (DFT) and Discrete Hartley
Transform (DHT)
Archiving the digitized data in compressed format Discrete Cosine Transform (DCT) and Discrete Sine
Transform (DST) Also used for reconstruction of MRI image and
delineation of the ECG signal into its component waves
Pattern recognition for bone fractures, tumors and detection of abnormal cell nuclei Hough Transform (HT)
Unified architecture of DXT (DFT/DCT/DHT/DST)
For a real sample sequence f(n), where n {0, 1, …, N-1} DXT can be defined as :
DFT: F(k) =
1
0
])/2sin()/2)[cos((N
n
knNjknNnf
DHT: H(k) =
1
0
])/2sin()/2)[cos((N
n
knNknNnf
DCT: C(k) =
1
0]2/)12(cos[)(
N
nNnknf
DST: Z(k) =
1
0
]2/)12(sin[)1(N
n
Nnknf … … (4)
= Fx(k) + j Fy(k) … … (1)
… … (2)
… … (3)
Reformulation in terms of CORDIC rotation
1
0)(0 )()( )(
N
nyx
θmRotnfkFkF
DHT
1
0
)()( )()( )(N
n
mRotnfnfkNHkH
DCT )(])(0 )([)( )(
1
0φkRotθmRotnrkNCkC
N
n
DFT
DCT )(])(0 )([)( )(
1
0
kRotmRotnrkSkNSN
n
k = 0, 1, ....., N 1
m = kn modulo N = <kn>N
= 2/N,
= /4
r(n) = f(2n) for n = 0, 1, ....., (N 1)/2
= f(2N 2n 1) for n = (N + 1)/2, ....., (N 1)
)(])()( )([)( )(1
0φkRotθmRotnhnhkYkY
N
nyxyx
The Unified Equation for DXTThe Unified Equation for DXT
Reformulation in terms of CORDIC rotation (contd.)
)(cos sinsin cos
Rot
nn θRotθRotθRotθθθRot .....)....( 2121
Basic CORDIC Matrix
Cm Cm Cm
fx(n)
fy(n)fx(n1)
fy(n1)
fx(1)
fy(1)
Uy(N1)
Ux(N1)
PC PC PCj
1 2 (N1)
Arrangement of CORDIC unit for DXTArrangement of CORDIC unit for DXT
Processing Element
x xf
y Cm yf
z PC
Rx
Ry
Rz
Mode Control
Unit
xi
xi1
yi1
Clock
PCi
yi/
xi/
DXT Architecture
DATA
DATA
DATA
Switch and control structure
MUX
(N 1)/2
MUX
[Y(1) Y(N 1)]
Y(N 1)/2
FIFO Bank
f0
(N 1)/2
(N 1)/2
0
0
1
1
Select
Core and critical path of the DXT Chip
S o f t w a r e
FPGA 1
FPGA 2
FPGA 3
FPGA 4
COHERENTLY ADDED OUTPUT
DELAY MEMORY WEIGHTAGE
DELAY VALUE GENERATOR
APODIZATION UNIT
DELAY MEMORY WEIGHTAGE
AD
DE
R
FOR CHANNEL: 1
FOR CHANNEL: 16
F P G A 1 / 2 / 3 / 4
AD
DE
R
AMPLITUDE EXTRACTOR
SCAN CONVERSION
NOISE CLEANING
COMPRESSION (DWT+CODING)
DATA COMBINER
MEMORY BANK1
MEMORY BANK2
PCI BRIDGE
PC NORTH BRIDGE
APPLICATION PROGRAM D
RIV
ER
DISPLAY (PC MONITOR)
STORAGE (HARD DISK)
FINE NOISE CLEANING
CONTOUR DETECTION
CONTROL SIGNAL GENERATION FOR
HARDWARE
LOSSY / LOSSLESS
MOVIE /
STATICTGC GAIN CONTROL
POWER CONTROL OF TRANSMIT
BEAMFORMER
MOVIE/ STATIC
LOSSY/ LOSSLESS
USB
H A R D W A R E
PC
FPGA5
FPGA6
PCI Based Ultrasonography System
Arterial Condition
(probabilistic)
Spectrogram ImageFeature
Extraction
Flow Diagnosis
Probability Measurement
Contour Detection
Contour Motion Detection
Probabilitymeasure
Arterial Conditiondetection
Final Final InferenceInference
VLSI based DopplerVLSI based Doppler ultrasonography system
BP Neural Network
Bayesian Probability
Inertial Snake
Bayesian Probability
Basic CT system
CT collects projections.
Projected X-ray data.
Polar domain computation.
Need for conversion from Raster Scan Grid to Polar Grid.
Data interpolation.
Improvement
Slope -Intercept Radon Transform (Beylkin, 1987) Fast Radon Transform (FRT) (Kelly- Madisetti, 1993).
New FRT (Mitra- Banerjee, 2004).
Problems of FBP
Solution Slope-Intercept ( p- ) domain Radon Transform :
p- Radon Transform
dydxpxyy,xuy,xu,Image
Line integrals along various angles and intercepts in an image plane.
Mathematical Basis of the Image Reconstruction from Projections (e.g. X-ray CT, MRI ).
Line function ( Line Sampler ) at 8o
Projection dataA straight line in an image space
p- Radon Transform in Frequency Domain
kkUFl,muR:RT LPL1
kkWFl,mu:IRT PLP1
UL(k) frequency domain image function
LP(k) frequency domain line function
WP(k) frequency domain filtered Radon transform.
Composition of LP(k) and PL(k) are different.
Problems of the Frequency Domain Technology
(i) Line function aliasing for angle, > tan-1(r) ,
r Image aspect ratio,
for a Square Image r =1, hence max() < 45o.
Line function at 60.95o showing aliasing effect.
Line function (Line Sampler) at 8o
Our solution
450 450 x
y x
y
Below 450 Above 450An Image space divided into two angular parts
i/p Image
FRT above 45o
New FRT Algorithm.
FRT below 45o
FRT above 45o
FRT below 45o
Reconstructed Image
Flow Diagram
FPGA Module for image processing
A module with Virtex-II FPGA & 256 Mb of external RAM.
Result
above 45obelow 45o
Original Image
Complete Reconstruction
Back-Projection
Reconstruction
O utput im a ge
1 2FFT IFFT
V e c t o r -M a tr i xM u l t i p l i e r
1 61 2 1 6R A M 1 F ilter R A M 1 R A M 1
+1 2
R A M 1
C o r e P r o c e s s i ng U ni t
1 6
Sino grambe lo w 4 5 o
Sino gramabo ve 4 5 o
1 2
s in θc o s θP U 1
P U 2
C o ntr o lU ni t
C L K
LS (k)
FFT IFFTV e c to r -M a t r i x
M u l t i p l i e rR A M 2 F ilter R A M 2 R A M 2
c o s θ s in θ
FRT based Image Reconstruction for CT imaging
RECONSTRUCTED 3-D IMAGE
Glucose Monitoring and Drug Delivery System
LaserPZT Crystal
Signal Detector
DigitalOscilloscope
Attenuators
Joule-meter
Laser Source With optics
Porous SiliconMEMS Sensor
SignalProcessing
Circuit
LUT basedController
Infusion pump
VARIATION OF PA SIGNAL OF A SUBJECT WITH TIME AFTER DRINKING GLUCOSE WATER
0 5 10 15 20 25 30 35 400
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
TIME IN MINUTE
PA
SIG
NA
L IN
mv
ABSORPTION SPECTRUM OF TISSUE & BLOOD
PA SIGNAL VS GLUCOSE CONCENTRATION
RELATIVE PA CHANGE OF GLUCOSE SOLUTION WITH DIFFERENT CONCENTRATION
RE
LATI
VE
PA
CH
AN
GE
(%)
GLUCOSE CONCENTRATION(%)
VARIATION OF INSULIN WITH GLUCOSETIME(min) GUCOSE(mg/dl) INSULIN(Uu/ml)0 92 112 350 264 287 1306 251 858 240 5110 216 4912 211 4514 205 4116 196 35
Signal Localization•Slice selection
Fourier transform
Spatial Information Encoding•Freq Encoding•Phase Encoding
Affine Transform
MRIMRI
Image Guided SurgeryImage Guided Surgery
Imagery sub-system
MRI Skin Segmentation
MRI Internal structure
segmentation
Laser sub-systemLaser
Scanner
Laser Data/MRI
Registration
Registration Verification
Tracking Sub-system
Head Tracking
Instrument Tracking
Head Tracking Verification
Laser/Flashpoint Calibration
a priori Subject
Visualization sub-system
MR Imaging Process viewed as two mathematical transformations
DataSpace
image
object
Data processing
Transform II
Transform I
Spin Processing
“e-ear”
Seeing through the sandSeeing through the sand
Telemedicine SystemTelemedicine System
The Destination
Design of VLSI based biomedical instruments with adaptive monitoring and drug-delivery system
May there be peace in heaven.May there be peace in the sky.May there be peace on earth.May there be peace in the water