Upload
srinivasa-raonookala
View
217
Download
0
Embed Size (px)
Citation preview
7/29/2019 DSP Alg Arch Mm1 Slides
1/29
1MM1 DSP Algorithms and Architectures
DSP Algorithms and ArchitecturesMinimodule 1.
Anders Brdls Olsen
2MM1 DSP Algorithms and Architectures
P2-18 DSP Algorithms and Architectures Purpose
The purpose of the course is to aid the student in getting an
understanding of the concepts needed in order to map with a
good interaction a DSP algorithm onto a real-time architecture.
Objectives
After the course the student should demonstrate:
Comprehension of basic and advance concepts in algorithmic
and architectural interaction
Application of methods for designing and optimizing data- andcontrol-paths for DSP algorithms
7/29/2019 DSP Alg Arch Mm1 Slides
2/29
3MM1 DSP Algorithms and Architectures
The course content
Two part course1. (ABO) The architectural aspects
2. (PK) The more algorithmic aspects
Contents
Design concepts for DSP Systems from specs to prototype
Cost functions Area, Time, Power, Numeric,
General DSP Architectures
Algoritmic representation SFG, DFG, SDF, Precedence Graphs
Critical path, critical loop
Timing in algorithms retiming, unfolding, pipelining
Look-ahead transformations
Allocation, Assignment, and Scheduling
Finite State Machine with Datapath Methods for Data- and Controlpath optimization
Memory management in real-time DSP systems
4MM1 DSP Algorithms and Architectures
Course practicalities Literature:
Gajski book
I will find additional reading in form of papers
Course Web(http://kom.aau.dk/~abo/Teaching/DSP_alg_arch/index.htm)
7/29/2019 DSP Alg Arch Mm1 Slides
3/29
5MM1 DSP Algorithms and Architectures
Topics of today
From functionality to silicon
An introduction
Motivation for application specific
architecture
Cost functions (Noise, Power, Area, Time, )
Representing a design
Abstraction levels
6MM1 DSP Algorithms and Architectures
The First Computer
The Babbage
Difference Engine(1832)
25,000 parts
cost: 17,470
7/29/2019 DSP Alg Arch Mm1 Slides
4/29
7MM1 DSP Algorithms and Architectures
ENIAC - The first electronic computer (1946)
8MM1 DSP Algorithms and Architectures
Intel 4004 Micro-Processor
1971
~2000 transistors
7/29/2019 DSP Alg Arch Mm1 Slides
5/29
9MM1 DSP Algorithms and Architectures
Todays processors
2008
> 300M transistors
> 3000 MHz operation
~150mm2
10MM1 DSP Algorithms and Architectures
The A3
ParadigmApplication
Algorithm
Architecture
LP-filter
(specification)
FIR
IIR (parallel)
IIR (cascade)
DSP-Controller
ASIC/FPGA
Design dedicated
architectures that fits our
algorithmic demands.
CAD tools typical help us, but
we need to know why and
how
7/29/2019 DSP Alg Arch Mm1 Slides
6/29
11MM1 DSP Algorithms and Architectures
The A3 Paradigm
Application
Algorithm
Architecture
LP-filter
(specification)
FIR
IIR (parallel)
IIR (cascade)
DSP
-Controller
ASIC
1:many mapping
Attributes
Numerical properties
Attributes
size, execution time
This course
Specifications
12MM1 DSP Algorithms and Architectures
Design representation
7/29/2019 DSP Alg Arch Mm1 Slides
7/29
13MM1 DSP Algorithms and Architectures
Design Abstraction Levels
n+n+S
G
D
+
DEVICE
CIRCUIT
GATE
MODULE
SYSTEM
14MM1 DSP Algorithms and Architectures
The design process Top-down design strategies
Refine Specification successively
Decompose each component into small components
Lowest-level primitive components
Over-sold methodology - only works with plenty of experience
Bottom-up design strategies Build-up from primitive components
Combined to form more complex components
Risk wrong interpretation of specifications
Mixed strategies Mostly top-down, but also bits of bottom-up Reality: need to know both top level and bottom level constraints
7/29/2019 DSP Alg Arch Mm1 Slides
8/29
15MM1 DSP Algorithms and Architectures
Typical signal processing algorithms
HX(n) Y(n)
Sampler
(quantizer)
Analog
Reconstructor
DigitalSignal
Processor
1001101000101001
1010101011101011
Typical filter operations
IIR:
FIR:
Additions and Products
(Control)
REAL-TIME
Vector, Matrix: y=Mx
16MM1 DSP Algorithms and Architectures
General architectures -controllers
General Purpose Processors (GPP)
Application Specific Instruction-set
Processor (ASIP)
Digital Signal Processors (DSP)
Application Specific Integrated Circuit
(ASIC)
Field Programmable Gate Array (FPGA)
7/29/2019 DSP Alg Arch Mm1 Slides
9/29
17MM1 DSP Algorithms and Architectures
-controllers and GPPs
Known as a Von Neumann architecture Product calculations on a ALU!
Shared instruction and data bus
Control
Mem ALU
Operation cyclus:
C1: Instruction fetch
C2: Data 1 fetch
C3: Data 2 fetch
C4: operation execution
C5: Output data storage
Computational capacity
Bus capacity
18MM1 DSP Algorithms and Architectures
8bit PIC controller
7/29/2019 DSP Alg Arch Mm1 Slides
10/29
19MM1 DSP Algorithms and Architectures
-controllers and GPPs
Introducing a multiplier in the architecture Precision
M=N: Single precision
M=2N: Double precision
M>2N: Overflow precision
Control
Mem ALU
MUL
Computational capacity
Bus capacity
(Still using the same bus
for instruction and data)
20MM1 DSP Algorithms and Architectures
ARM7
7/29/2019 DSP Alg Arch Mm1 Slides
11/29
21MM1 DSP Algorithms and Architectures
Digital Signal Processors [1]
Harvard architecture
Individual data and instruction busses
Fetch of instruction and data simultaneous
Micro parallelism (architectural)
Mem ALU
MULControl
Mem
DataProgram
Control PathData Path
Operation cyclus:
C1: Instruction fetch
C1: Data 1
C2: Data 2
C3: execution || inst fetch
C4: Output data storage
Computational capacity
Bus capacity
(Two operands!)
22MM1 DSP Algorithms and Architectures
TMS32010
7/29/2019 DSP Alg Arch Mm1 Slides
12/29
23MM1 DSP Algorithms and Architectures
Digital Signal Processors [2]
Modified Harvard architecture Duplicated data busses
Multiple data memory banks
Mem
ALU
MULC
ontrol
M
em
Data 2Program
Data 1
Mem
Operation cyclus:
C1: Instruction fetch
C1: Data 1 || Data 2
C2: execution || inst fetch
C3: Output data storage
Computational capacity
Bus capacity
24MM1 DSP Algorithms and Architectures
Blackfin architecture
7/29/2019 DSP Alg Arch Mm1 Slides
13/29
25MM1 DSP Algorithms and Architectures
Digital Signal Processors [3]
Utilizing algorithmic and architecturalproperties
Using address arithmetic unit the core
of the above algorithm becomes a
single line of parallel instructions
.
.
A0 += data1*data2 || A1+=data3*data4;.
.
26MM1 DSP Algorithms and Architectures
Dual-core DSP processors
7/29/2019 DSP Alg Arch Mm1 Slides
14/29
27MM1 DSP Algorithms and Architectures
Digital Signal Processors [4]
Question: Is it always possible to utilizetwo (or more) MACs?
Condition: As long as the inherent
algorithmic parallelism is not fully utilized,
additional hardware may provide a
performance optimization!
28MM1 DSP Algorithms and Architectures
ASIC and FPGAs [1] ASIC
Customized for a particular use, in silicon
Specific combining of functional units, routed by
busses.
FPGA
Customized for a particular use, using programmable
logic components and programmable interconnecting
busses
From an algorithmic point the design
methodologies is more or less similar for the two
7/29/2019 DSP Alg Arch Mm1 Slides
15/29
29MM1 DSP Algorithms and Architectures
ASIC and FPGA [2]
Mapping of algorithm onto a customdesign HW architecture!
Example alg.
1:1 mapping (fully utilizing parallelism)
Cost: T, A
Multiplexed (HW-sharing)
Cost T, A (+ Control)
30MM1 DSP Algorithms and Architectures
Algorithmic parallelism [1]
HX[n] Y[n]
HaX[n] Y[n]
Hb Hc Hd
Time of operation Throughput
T1
T2 = Ta+Tb+Tc+Td
The operation time of a given transfer function is obviously
dependent on the algorithmic complexity, but also on the
implementation technology used.
7/29/2019 DSP Alg Arch Mm1 Slides
16/29
31MM1 DSP Algorithms and Architectures
Algorithmic parallelism [2]
HaX[n] Y[n-3]
Hb Hc Hd
Ha
X[n]Y[n]
Hb
Hc
Hd
LatchThe latency is increased
Can be parallelized
Factorization
Partial Fraction Expansion
The latency is not increased
Can be parallelized
Algorithmic manipulation
is a very important toolwhen optimizing
architecture designs
32MM1 DSP Algorithms and Architectures
Representation methods of alg. [1] Block diagram
Consists of functional blocks connected with directed
edges, which represent data flow from its input block
to its output block
7/29/2019 DSP Alg Arch Mm1 Slides
17/29
33MM1 DSP Algorithms and Architectures
Representation methods of alg. [2]
Signal-Flow Graph Nodes: represents computations or tasks,
sum all incoming signal
Edges: denotes a linear transformation from
the input to the output
34MM1 DSP Algorithms and Architectures
Graphical Representations Data Flow Graphs (DFG)
Control Flow Graphs (CFG)
Control Data Flow Graphs (CDFG)
State Transition Graphs (STG)
nodes (orvertices)
edges (or arcs)
7/29/2019 DSP Alg Arch Mm1 Slides
18/29
35MM1 DSP Algorithms and Architectures
Data Flow Graph
Nodes: represents computations (or functions) Edges: represents data paths (or communications)
Models data dependencies: a node can perform its
operation whenever data is present
Data flow forms directed acyclic graph (DAG):
x1=a+b
y=a*c
z=x1+d
x2=y-dx3=x2+c
36MM1 DSP Algorithms and Architectures
CDFG, DFG, CFG
7/29/2019 DSP Alg Arch Mm1 Slides
19/29
37MM1 DSP Algorithms and Architectures
Cost Functions
38MM1 DSP Algorithms and Architectures
Cost Functions Implementation quality is determined by cost
functions noise, power, area, time
ai ,depends on the importance of the associated costparameter
Noise: wordlength
Power: technology Area: circuit
Time: the three above
Interaction
7/29/2019 DSP Alg Arch Mm1 Slides
20/292
39MM1 DSP Algorithms and Architectures
Minimizing the cost function
Choice of alg. / alg. Manipulation / wordlength
Extraction and utilization of inherent parallelism
Number and types of execution units
Scheduling
Application
Algorithm
Architecture
40MM1 DSP Algorithms and Architectures
Sources of Power Consumption
Short Circuit: Leakage:
Vout
Vin
Vin
I
I
VoutVin
Ioff
Vout=VddVin=0
Ids
VgsVth
t0 t1
v(t)
Vdd
I
Vin10
i(t)
v(t)
Dynamic:
7/29/2019 DSP Alg Arch Mm1 Slides
21/292
41MM1 DSP Algorithms and Architectures
Controlling Energy Consumption
Largest contributing component to CMOS
power consumption is switching power:
What control do you have over
each factor?
How does each effect the total
Energy? (think about f)
What control do you have as a designer?
2
ddavgavgavg VcfnP =
Circuit Delay:
42MM1 DSP Algorithms and Architectures
Energy and Power Warning! In everyday language, the term
power is used incorrectly in place of energy.
Power is not energy.
Power is not something you can run out of.
Power can not be lost or used up.
Power is not a thing, it is merely a rate.
Power can not be put into a battery any morethan velocity can be put in the gas tank of a car.
7/29/2019 DSP Alg Arch Mm1 Slides
22/292
43MM1 DSP Algorithms and Architectures
Design Representation and
Abstraction levels
44MM1 DSP Algorithms and Architectures
Design Representation Behavioral or funct ional representation
Specifies the behavior or the functions of adesign without any implementationinformation
Structural representation
Specifies the implementation of a design interms of components and their interactions
Physical representation Specifies the physical characteristics of the
design (Blueprint for manufacturing)
7/29/2019 DSP Alg Arch Mm1 Slides
23/292
45MM1 DSP Algorithms and Architectures
Digital System Design
IDEA
Behavioral Design
Structural Design
Logic Design
Physical Design
Fabrication
Product
Algorithm
State machine,ALU,Regs
Gate level netlist
Transistor list
Plain English
46MM1 DSP Algorithms and Architectures
Levels of Design Abstractions
7/29/2019 DSP Alg Arch Mm1 Slides
24/292
47MM1 DSP Algorithms and Architectures
Implementation Technologies
48MM1 DSP Algorithms and Architectures
HW Design Abstraction
Polygons of Silicon
Transistors
Logic Gates
Processor-Memory Level
RT LevelLevels ofLevels of
DesignDesign
AbstractionAbstraction
7/29/2019 DSP Alg Arch Mm1 Slides
25/292
49MM1 DSP Algorithms and Architectures
Representation and Abstraction
Algorithm
RT Language
Boolean Eqn
Differential EqnTransistor
Gate
RT
Proc. Mem. Switch
Function
al
Function
alStructural
Structural
GeometricGeometric
Polygons
Sticks
Standard Cells
Floorplan
YY--ChartChart
50MM1 DSP Algorithms and Architectures
Heterogeneous HW/SW Implementations
Cost
Performance
Only SW,Low cost andLow performance.
Only HW,High cost andHigh performance.
Mixed HW-SW,Medium cost andperformance.
Additionally, flexibility and tight time to marketrequirements favour SW implementations.
7/29/2019 DSP Alg Arch Mm1 Slides
26/292
51MM1 DSP Algorithms and Architectures
System-level HW-SW Co-design
IDEA
System-levelHW-SW Co-design
Memory hierarchyand mapping
SW behavior, RTOS,schedule policyand processors
Interconnectand buses
ConstraintsSpecification
HW behaviorand components
Components(HW,SW)
52MM1 DSP Algorithms and Architectures
Issues in System-level HW-SW Co-design
Specification of functionality and constraints.Simulation of functionality.
Components as building blocks SW processors: DSP and Micro-controllers HW co-processors: ASICs, FPGA Storage elements: Cache, Scratchpad, SRAM, DRAM Interconnection elements: Buses and arbiters Interface and I/O units: DMA, UART, D/A, A/D,
Wireless communication Software platform: RTOS and scheduling
7/29/2019 DSP Alg Arch Mm1 Slides
27/292
53MM1 DSP Algorithms and Architectures
Issues in System-level HW-SW Co-design
Performance analysis (timing, power, area)(timing, power, area)
Design and optimization (timing, power, area)(timing, power, area)
Architecture selection: processing elements,
memory units and inter-connect.
RTOS and schedule scheme.
54MM1 DSP Algorithms and Architectures
Design flow and Abstraction levels
7/29/2019 DSP Alg Arch Mm1 Slides
28/292
55MM1 DSP Algorithms and Architectures
The A3 model and design flows
Application
Algorithm
Architecture
LP-filter
(specification)
FIR
IIR (parallel)
IIR (cascade)
DSP
-Controller
ASIC
Design
flows
56MM1 DSP Algorithms and Architectures
Summary Algorithms and Architectures
Data path
Control path
Algorithmic properties
Cost functions
Design flows and representations Design representations
Design abstractions
Following courses Architectural optimization (mm2-mm3)
Scheduling concepts (mm4-mm5)
7/29/2019 DSP Alg Arch Mm1 Slides
29/29
57MM1 DSP Algorithms and Architectures
Exercises
Gajski: 1.1, 1.4, and 1.8
Cost functions: Discuss power vs. energy optimization Why is there a difference?
How can you optimize energy, only taking the dynamic contribution into account?
Taking an outset in the paper by C.H. Wang, Algorithmic Implementation ofLow-Power High Performance FIR Filtering IP Cores (Hint: only sections 1and 2). For these exercises you should prepare a few notes such that youcan present your findings next Thursday (no more than 5 minutes).
Gr840:
Find the various representation forms of the FIR filter used, and writ them inmathematical form and make a block-diagram representation
Gr841:
Discuss or verify that the data-path in figure 2 is reasonable and try to mapthe algorithms onto it.
Gr842
Make a 1:1 mapping and propose an architecture for a four tap FIR filter