Lecture 1 TNE027 FPGA Arithmetic

Digital Kommunikationselektronik TNE064 Lecture 1

1

TNE064Digital Communication

Electronics

Qin-Zhong Ye ITN

Linköping University

email: [email protected]://www.itn.liu.se/~qinye/dce


2

Text book

U. Meyer-Baese

Digital Signal Processing with Field Programmable Gate ArraysSecond Edition or Third EditionSpringer


3

Digital Signal Porcessing and Digital Communication Systems

• Introduction (Chapter 1)• Computer Arithmetic (Chapter 2)• Finite Impulse Response (FIR) Digital

Filter (Chapter 3)• Fouier Transforms (Chapter 6)• Error Control and Cryptography (Chapter

7.2)• WLAN and Bluetooth


4

Introduction

• Overview of Digital Signal Processing (DSP)

• FPGA Technology• DSP Technology Requirements• Design Implementation• VHDL


5


6

Typical DSP Application


7

Classification of VLSI Circuits


8

Custom Chips, Standard Cells, and Gate Arrays

• Custom Chips– Largest number of logic gates– Highest speed– Designer may create any layout.– Large design effort– Long development time– Large production quantity is required.


9

• Standard Cells– Often called Application-Specific Integrated

Circuits (ASICs)– The layout of individual gates (standard cells)

is predesigned and stored in a library.– The chip layout can be created automatically by

CAD tools because of the regular arrangement of logic gates (cells) in rows.


10

A section of two rows in a standard-cell chip

f 1

f 2 x 1

x 3

x 2


11

• Gate Arrays– Transistor layers on the silicon wafer are first

fabricated to produce a gate-array template.– Connecting wires are then fabricated on the

template to produce a user´s circuit.– The technology is also known as a sea-of-gates

technology.


12

A sea-of-gates gate array


13

An example of a logic function in a gate array

f 1

x 1

x 3

x 2


14

General structure of a PLAf 1

AND plane OR plane

Input buffers

inverters and

P 1

P k

f m

1 2 n

x 1 x 1 x n x n

• Programmable Logic Array (PLA)– A collection of AND gates

that feeds a set of OR gates– The inputs to each gate are

programmable.


15

Gate-level diagram of a PLAf1

P1

P2

f2

x1 x2 x3

OR plane

Programmable

AND plane

connections

P3

P4


16

Customary schematic of a PLA f 1

P 1

P 2

f 2

x 1 x 2 x 3

OR plane

AND plane

P 3

P 4


17

An example of a PAL

f 1

P 1

P 2

f 2

x 1 x 2 x 3

AND plane

P 3

P 4

• Programmable Array Logic (PAL)– The AND gates are

programmable, but the OR gates are fixed.


18

Output circuitry

f 1

To AND plane

D Q

Clock

Select Enable

Flip-flop

Macrocell


19

• Complex Programmable Logic Devices (CPLD)– Multiple blocks of sum-of-product logic

circuits (PAL-like blocks)– Internal wiring resources (interconnection

wires) to connect the circuit blocks– I/O blocks– In-System Programming (ISP) with JTAG port– Nonvolatile programming


20

Structure of a CPLD

PAL-likeblock I/O

blo

ck

PAL-likeblock

I/O block

PAL-likeblock I/O

blo

ck

PAL-likeblock

I/O block

Interconnection wires


21A section of a CPLD

D Q

D Q

D Q

PAL-like block (details not shown)

PAL-like block


22

• Field-Programmable Gate Arrays (FPGA)– An array of logic blocks– Each logic block typically has a small number

of inputs and one output.– FPGA products have different types of logic

blocks.– Interconnection wires and switches (routing

channels)– I/O blocks– In-System Programming (ISP) with JTAG port– Storage cells are volatile.


23

Structure of an FPGA

Logic block Interconnection switches

I/O block

I/O block

I/O block I/O b

lock


24

A two-input lookup table

(a) Circuit for a two-input LUT

x 1

x 2

f

0/1

0/1

0/1

0/1 0 0 1 1

0 1 0 1

1 0 0 1

x 1 x 2

(b) f 1 x 1 x 2 x 1 x 2 + =

(c) Storage cell contents in the LUT

x 1

x 2

1

0

0

1

f 1

f 1

Lookup table

LUTs usually have 4 to 6 inputs (16 to 64 storage cells).


25

Inclusion of a flip-flop with a LUT

Out

D Q

Clock

Select

Flip-flop In1 In2 In3

LUT


26

A section of a programmed FPGA

0 1 0 0

0 1 1 1

0 0 0 1

x 1

x 2

x 2

x 3

f 1

f 2

f 1 f 2

f

x 1

x 2

x 3 f


27

FPGA Structure• Small look-up tables (LUT)

– Xilinx XC4000: Eech Configurable Logic Block (CLB) has 2 separate 4-input 1-output LUTs.Each CLB can be used as 16x2- or 32x1-bit RAM or ROM.

– Altera Flex 10K: Each Logic Element (LE) consists of a flip-flop, a 4-input 1-output LUT or 3-input 1-output LUT and a fast-carry logic.

• Large RAM blocks: Embedded Array Blocks (EABs), e.g., 2-kbit RAM


28

FPL technology


29

Advantages of FPLDcompared with ASIC

• A reduction in development time (rapid propotyping) by 3 to 4

• In-circuit reprogrammability• Lower NRE costs resulting in more

ecomomical designs for solutions requiring less than 1000 units


30

Comparison of PDSP and FPGA• Programmable Digital Signal Processors (PDSPs)

– RISC architecture– Multiply and accumulate (MAC) unit with a multistage

pipeline architecture– Suitable for algorithms using MAC

• FPGA– Suitable for high throughput applications– Suitable for front-end applications (e.g., FIR filters,

CORDIC algorithms, FFTs)


31

Computer Arithmetic• Number Representation

See Fig. 2.1.• Fixed-point numbers

– Unsigned integer– Signed magnitude (SM)– Two’s compliment (2C)– One’s compliment (1C)– Diminished one system (D1)– Bias system


32

• Unconventional fixed-point numbers– Signed digit numbers (SD)

• SD is not unique.• Canonic signed digit system (CSD)

– With minimum number of none-zero elements

• Classical CSD coding algorithmStarting with the LSB substitute all 1 sequences equal or

larger than two with 10…01.

• Classical CSD has at least one zero between two digits which may have values 1 or 1.

– Carry-free Addition


33

Multiplication with a constant coefficient– Multiplier Adder Graph (MAG)

• Factor the coefficient into several factors and realize the individual factors in an optimal CSD sense.One adder: A = 2k0 (2k1 ± 2k2)Two adders: A = 2k0 (2k1 ± 2k2 ± 2k3)

A = 2k0 (2k1 ± 2k2) (2k3 ± 2k4) Three adders: A = 2k0 (2k1 ± 2k2 ± 2k3 ± 2k4)

.

.See Fig. 2.2 and Fig. 2.3.


34

• Logarithmic Number System (LNS)– Fixed mantissa (system’s radix)– Fractional exponent

x = ± r ±ex

– Efficient implementation of multiplication, division, square-rooting, or squaring.

– Addition and subtraction require look-up tables.


35

• Residue Number System (RNS)– RNS is defined with respect to a positive integer

basis set {m1, m2, …, mL}, where ml’s are all relatively (pairwise) prime.

– An integer X is mapped into a RNS L-tupleX (x1, x2, …, xL), where xl = X mod ml , for l = 1, 2, …L.

– For X = (x1, x2, …, xL) and Y = (y1, y2, …, yL), the algebraic operations +, – or * are defined byzl = xl y� l mod ml, for l = 1, 2, …L, and the result is Z = (z1, z2, …, zL).

Documents

Lecture 1 TNE027 FPGA Arithmetic