43
Digital Circuit Lab Introduction to the FPGAs Chun-Jen Tsai and Lan-Da Van Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall, 2019

Introduction to the FPGAs

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to the FPGAs

Digital Circuit Lab

Introduction to the FPGAs

Chun-Jen Tsai and Lan-Da Van

Department of Computer Science

National Chiao Tung University

Taiwan, R.O.C.

Fall, 2019

Page 2: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

History of the FPGAs

Before the invention of Field Programmable Gate

Array (FPGA), there are many different

programmable IC such as PLA, PAL, CPLD, etc.

These programmable IC’s have very limited logic capacity

CPLD IC’s are still popular in industry because their “digital

logic programs” do not disappear when the power is off

R. H. Freeman invented first practical FPGA in 1985†

Freeman and Vonderschmitt co-funded Xilinx in 1984

Xilinx is the first fabless IC design house in the world

TSMC enters the IC OEM business for fabless IC design

houses in 1987

2

† R. H. Freeman, “Configurable Electrical Circuit Having Configurable Logic Elements and Configurable Interconnect,”

U.S. Patent 4,870,302, Sep. 26, 1989

Page 3: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Combinational PLDs

Programmable two-level logic an AND array and an OR array

Page 4: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

programmable interconnections close (two lines are connected)

or open

A fuse that can be blown by applying a high voltage pulse

Diagram of 32x8 ROM

Page 5: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

An example

F1 = AB + AC + ABC

F2 = (AC + BC)

XOR gates

can invert the outputs

PLA with three inputs, four product terms, and two outputs

PLA Example

Page 6: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Lan-Da Van DCD-07-6

w = ABC + ABCD

x = A + BCD

y = AB + CD + BD

z = w + ACD + ABCD

Page 7: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Programming Concept of the FPGA

Theoretically speaking, creating a digital logic on an

FPGA strictly follows the “stored program concept” of

software programming

Thus, FPGA coding should be regarded as “software coding”

Stored program concept:

The computing behavior of a piece of hardware is

controlled by information stored in memory devices.

7

CPU(hardwired logic)

RAM

instructions

data

bus

Page 8: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Digital Circuits in an FPGA

The basic ideas of FPGA’s is to inter-connect small

“truth tables” to create complex digital circuits

In an FPGA, these tables are called “Lookup Table” (LUT)

8

Table 5

Table 1

Table 4

1

0

1

outputinput

0000

0001

1111

0

1

1

outputinput

0000

0001

1111

0

0

1

outputinput

0000

0001

1111

Each table is equivalent to an-to-1 gate, where n = 4 or 6for Xilinx FPGA’s

Page 9: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Programmability of the FPGA

FPGA still adopts the “stored program concept”

Both the n-to-1 gates and the wires can be programmed

9

logic cell logic cell

logic cell logic cell

FPGA programs are stored in distributed memory bits

Routing grid

n

n

1

1 1

1n

n

clock

Page 10: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

A Generic Logic Cell

Basic logic cell (LC) structure:

10

inputs output

0 0 0 0 1

0 0 0 1 0

. . .

1 1 1 1 1

LC LC LC

LC LC LC

LC LC LC

IOB

IOB

IOB

IOB

IOB

IOB

Interconnects

Interconnects

Logic Cell

Lookup Table

(LUT) D Q

select bit

LC out

clock

Page 11: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Features of SRAM-based LUT

SRAM is often used to implement LUT, the properties

of this type of LUT is as follows:

n-input LUT can handle function of 2n different inputs

All logic functions take the same amount of area

All functions have the same delay

For CMOS custom logics, XOR is much slower than NAND

Burns power even at idle

An LUT is more “powerful” than a two-input gate

Gate-count is not a good measure of FPGA logic cost

For static gate, n input NAND/NOR gate has ~ 2n transistors

For an FPGA LC, 4-input LUT has 128 transistors in SRAM,

96 in multiplexer

11

Page 12: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Function Implementation with LUTs

The datapath that implements F = ABC + ABC + ABis as follows, the LUT4 has entries as follows:

A function with more than 4 variables can always be decomposed to the sum (OR) of 4-variable function

12

X1 X2 X3 X4 F

0000

0001

0

0

0010

0011

1

1

... ...

1111 1

LUT4 table entries(red means don’t care)

Page 13: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Carry Chains in FPGA

Since additions are very important, many FPGAs

have dedicated circuits for carry calculation and

propagation

Both carry look-ahead and carry-ripple adders can be

implemented efficiently on FPGAs

13

Page 14: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Review: Look-ahead Carry Generator

A look-ahead carry adder is more efficient than a

carry-ripple adder:

14

Page 15: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Review: Look-ahead Carry Generator

CLA GEN

Source: M. Morris Mano and Charles R. Kime, Logic and Computer Design Fundamentals.

Page 16: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Programmable Interconnect Points

The routing wires in an FPGA is programmed using a

single-bit SRAM cell connected to a transistor:

16

gate

drainsource

Connect/disconnect a single wire: Routing a signal:

Page 17: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Interconnect Architecture

On an FPGA, we must be able to control

Connections from wiring channels to LCs

Connections between wires in the wiring channels

17

LC LCHoriz. channel Horiz. channel

Ve

rt. ch

an

ne

lV

ert

. ch

an

ne

l

Page 18: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Place and Route Problem

Wiring among LCs are organized into channels

Channels are arranged horizontally and vertically on the chip

There are many wires per channel

Connections between wires made at programmable

interconnection points

An EDA tool must choose:

Channels from source to

destination

Wires within the channels

18

LC LC LC

LC LC LC

LC LC LC

LC

LC

LC

Page 19: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Xilinx FPGA Family

The Xilinx FPGA generations are coupled with the

TSMC manufacturing process

Usually two lines of product before the TSMC 28nm process

Old products before the 28 nm TSMC process

Virtex-II / Spartan-2 (130 nm)

Virtex-4 / Spartan-3 (90nm)

Virtex-5 (65 nm)

Virtex-6 (40 nm)/ Spartan-6 (45 nm)

New products after the 28 nm TSMC process

Virtex-7 / Kintex-7 / Artix-7 / Zynq-7 (28 nm)

Virtex UltraScale / Kintex UltraScale (20nm)

Virtex UltraScale+ / Zynq UltraScale+ (16 nm FinFET+)

19

4-input LUT

6-input LUT

6-input LUT

Page 20: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Xilinx FPGA Architecture

Xilinx FPGAs are composed of CLBs, BRAMs,

Multipliers (in DSP slices), CMTs, and IOBs

20

XC7A35T: 2,600 CLBs 1800 Kb BRAMs 90 25x18-bit multipliers 5 CMTs

Page 21: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

CLB Architecture of Artix-7

The CLB architecture of an Artix-7 FPGA:

The main resources for logic synthesis

Each CLB contains two slices, each slice four logic cells

Each CLB is connected to a switch matrix

Carry chain runs vertically in a column from one slice to the

one above

21

CLB CLB

Sw

itch

Ma

trix

Slic

e 0

Slic

e 1

Sw

itch

Ma

trix

Slic

e 0

Slic

e 1

Clocks

Data Data cin

cout

Page 22: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

CLB Architecture of Spartan II

Each CLB contains four

logic cells, organized as

a pair of slices.

Each slice has a four-

input lookup table, logic

for carry and control,

and a D-type flip-flop.

Spartan II part family

provides the flexibility

and capacity of an on-

chip block RAM.

1 CLB = 2 slices = 4

logic cells

Page 23: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Slice Structure

The left-hand and right-hand slices have different

architecture:

23

LUT6 can be used as a shift-register or a small RAM block, i.e., a distributed RAM

LUT6

LUT6 LUT6

LUT6

SRL32

SRL32

RAM32

RAM32

(M means memory) (L means logic)

Page 24: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Artix-7 Slice Structure (SliceM)

24

Output flip-flop can beconfigured as a latch!

Carry

pro

pa

gatio

n c

ha

in

Page 25: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Synthesis of a Large Multiplexor

Several 2-to-1 MUXes can be

combined to form a large mux

Large mux consumes FPGA

resources significantly and

creating long signal path delay

Please use them wisely

25

input 1

input 2

input 3

input 4

input 5

input 6

input 7

input 8

output

Page 26: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Configurable LUT6 or LUT5

Each 6-input LUT can also be configured as two 5-

input LUT’s for logic synthesis:

26

Page 27: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Multi-Purpose LUT in SliceM

The LUTs in SliceM can also be configured as either

32-bit shift registers or 32-bit RAM blocks (called

distributed RAM)

27

32-bit

32x1

6

D1 ~ D6

Page 28: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

LUT as Distributed RAM

LUT can be used as Distributed RAM

Each unit is a 321 RAM;

5 of the LUT input lines becomes the

address lines of a RAM block

LUTs can be cascaded to increase RAM size

Each LUTs can be used to Implement either

a 641-bit RAM

a 322-bit RAM

a 321-bit RAM with dual output ports

Synchronous write

Synchronous / Asynchronous read

Output flip-flop is used for synchronous read

28

LUT

LUT

LUT

R

R

R

W

W

ADDR

Page 29: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

LUT as Shift Register

Each LUT can be configured

as a shift register

Bit depth of the shift register

is configurable from 1 to 32 bits

The 5 of the input lines of

the LUT is used to set the

depth of the shift register

Multiple LUTs can be

cascaded to form larger

shift registers

29

D Q

CE

D Q

CE

D Q

CE

D Q

CE

LUT

INCE

CLK

DEPTH[4:0]

OUT

Page 30: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

DSP Slice

Each DSP slice contains:

25x18 multiplier, 25-bit pre-adder, 48-bit ALU, 17-bit shifter

configurable pipeline

30

Page 31: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

XADC: Dual 12-Bit 1-MSPS ADCs

31

Page 32: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Structure of the IOB

32

inout control signal

output driver

input signal

Page 33: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

DDR RAM Access

IOB provides double-edge DDR read/write

capabilities:

33

DDR inputDDRoutput

ToFabric

D1

D2

CLK1 and CLK2 are 180 phase shifted

DDR input timing diagrams DDR output timing diagrams

data fetch

Page 34: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Clock Distribution Network (Spartan 3)

34

A

A

B

B

C

C

D

D

Page 35: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Digital Clock Manager (DCM)

Each Clock Manager Tile (CMT) contains a DCM

circuit

DCMs can be used to

Create different clock domains in a complex circuit design

De-skew the clock signals

Phase-shift a clock

35

Page 36: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Clock Skew Problem

In the real world, the clock inputs may be skewed:

36

FPGAD Q

OthercircuitsA

B

Cb c

b

c

A

B

C

Page 37: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Delay-Locked Loop for De-skewing

We can use a DLL to compensate for the clock skew

37

PhaseDetector

3

clock

feedbackSEL

clock

feedback

Synchronous Circuit I

Synchronous

Circuit II

Page 38: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

DCM Block Diagram

The block diagram of a DCM for Spartan 3 FPGAs

38

To othercircuits

Page 39: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

FPGA Configuration Technologies

FPGA’s logic elements, interconnect switch, and I/O

pins can be programmed using one of the following

three technologies:

SRAM-based

Can be programmed many times

Must be programmed after power-up

Antifuse-based

Programmed once via a burn-in step

Often called Complex Programmable Logic Device (CPLD)

Flash-based

Similar to SRAM but using flash memory

39

Page 40: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Pros and Cons of SRAM FPGAs

SRAM-based FPGA is the most popular FPGA

technology today

Advantages:

Re-programmable

Dynamically reconfigurable

Uses standard semiconductor processes

Disadvantages:

SRAM burns power

Configuration lost at power-down (but not on reset!)

Possible to steal, disrupt configuration bits

Just like piracy & virus issues of software

40

Page 41: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Configuring SRAM-based FPGA

There are several ways to configure an FPGA

A host PC configures FPGA using JTAG interface, not good

for “turn-key” systems

FPGA boots in master mode, reads configuration data from

a flash ROM (or SD card) through the SPI bus

FPGA boots in slave mode, a microcontroller (through GPIO)

or a CPLD can be used to configure an FPGA

41

Page 42: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

Behavior of FPGA Configuration

When a Xilinx FPGA is configured or reconfigured,

every cells are initialized

Configuration has the same effect as a global reset, it

set/preset all the flip-flops and initializes all RAM cells

42

FDC – D Flip-flops with asynchronous clearFDP – D Flip-flops with asynchronous preset

Page 43: Introduction to the FPGAs

Mat 1

Digital Circuit Lab

References

Mentor Graphics, The Design Warrior’s Guide to FPGAs

Devices, Tools, and Flows, ISBN 0750676043, 2004.

Xilinx, 7 Series FPGAs Configurable Logic Block User Guide,

Xilinx UG474, v1.8, Sep. 27, 2016.

P. Garrault and B. Philofsky, HDL Coding Practices to

Accelerate Design Performance, Xilinx WP231, Jan. 6, 2006.

Xilinx, Using Digital Clock Managers (DCMs) in Spartan-3

FPGAs, XAPP462, Jan. 2006.

M. Peattie, Using a Microprocessor to Configure Xilinx FPGAs

via Slave Serial or SelectMAP Mode, Xilinx AN502, Aug. 2009.

43