Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Digital Circuit Lab
Introduction to the FPGAs
Chun-Jen Tsai and Lan-Da Van
Department of Computer Science
National Chiao Tung University
Taiwan, R.O.C.
Fall, 2019
Mat 1
Digital Circuit Lab
History of the FPGAs
Before the invention of Field Programmable Gate
Array (FPGA), there are many different
programmable IC such as PLA, PAL, CPLD, etc.
These programmable IC’s have very limited logic capacity
CPLD IC’s are still popular in industry because their “digital
logic programs” do not disappear when the power is off
R. H. Freeman invented first practical FPGA in 1985†
Freeman and Vonderschmitt co-funded Xilinx in 1984
Xilinx is the first fabless IC design house in the world
TSMC enters the IC OEM business for fabless IC design
houses in 1987
2
† R. H. Freeman, “Configurable Electrical Circuit Having Configurable Logic Elements and Configurable Interconnect,”
U.S. Patent 4,870,302, Sep. 26, 1989
Mat 1
Digital Circuit Lab
Combinational PLDs
Programmable two-level logic an AND array and an OR array
Mat 1
Digital Circuit Lab
programmable interconnections close (two lines are connected)
or open
A fuse that can be blown by applying a high voltage pulse
Diagram of 32x8 ROM
Mat 1
Digital Circuit Lab
An example
F1 = AB + AC + ABC
F2 = (AC + BC)
XOR gates
can invert the outputs
PLA with three inputs, four product terms, and two outputs
PLA Example
Mat 1
Digital Circuit Lab
Lan-Da Van DCD-07-6
w = ABC + ABCD
x = A + BCD
y = AB + CD + BD
z = w + ACD + ABCD
Mat 1
Digital Circuit Lab
Programming Concept of the FPGA
Theoretically speaking, creating a digital logic on an
FPGA strictly follows the “stored program concept” of
software programming
Thus, FPGA coding should be regarded as “software coding”
Stored program concept:
The computing behavior of a piece of hardware is
controlled by information stored in memory devices.
7
CPU(hardwired logic)
RAM
instructions
data
bus
Mat 1
Digital Circuit Lab
Digital Circuits in an FPGA
The basic ideas of FPGA’s is to inter-connect small
“truth tables” to create complex digital circuits
In an FPGA, these tables are called “Lookup Table” (LUT)
8
Table 5
Table 1
Table 4
1
0
1
outputinput
0000
0001
1111
0
1
1
outputinput
0000
0001
1111
0
0
1
outputinput
0000
0001
1111
Each table is equivalent to an-to-1 gate, where n = 4 or 6for Xilinx FPGA’s
Mat 1
Digital Circuit Lab
Programmability of the FPGA
FPGA still adopts the “stored program concept”
Both the n-to-1 gates and the wires can be programmed
9
logic cell logic cell
logic cell logic cell
FPGA programs are stored in distributed memory bits
Routing grid
n
n
1
1 1
1n
n
clock
Mat 1
Digital Circuit Lab
A Generic Logic Cell
Basic logic cell (LC) structure:
10
inputs output
0 0 0 0 1
0 0 0 1 0
. . .
1 1 1 1 1
…
LC LC LC
LC LC LC
LC LC LC
IOB
IOB
IOB
IOB
IOB
IOB
Interconnects
Interconnects
Logic Cell
Lookup Table
(LUT) D Q
select bit
LC out
clock
Mat 1
Digital Circuit Lab
Features of SRAM-based LUT
SRAM is often used to implement LUT, the properties
of this type of LUT is as follows:
n-input LUT can handle function of 2n different inputs
All logic functions take the same amount of area
All functions have the same delay
For CMOS custom logics, XOR is much slower than NAND
Burns power even at idle
An LUT is more “powerful” than a two-input gate
Gate-count is not a good measure of FPGA logic cost
For static gate, n input NAND/NOR gate has ~ 2n transistors
For an FPGA LC, 4-input LUT has 128 transistors in SRAM,
96 in multiplexer
11
Mat 1
Digital Circuit Lab
Function Implementation with LUTs
The datapath that implements F = ABC + ABC + ABis as follows, the LUT4 has entries as follows:
A function with more than 4 variables can always be decomposed to the sum (OR) of 4-variable function
12
X1 X2 X3 X4 F
0000
0001
0
0
0010
0011
1
1
... ...
1111 1
LUT4 table entries(red means don’t care)
Mat 1
Digital Circuit Lab
Carry Chains in FPGA
Since additions are very important, many FPGAs
have dedicated circuits for carry calculation and
propagation
Both carry look-ahead and carry-ripple adders can be
implemented efficiently on FPGAs
13
Mat 1
Digital Circuit Lab
Review: Look-ahead Carry Generator
A look-ahead carry adder is more efficient than a
carry-ripple adder:
14
Mat 1
Digital Circuit Lab
Review: Look-ahead Carry Generator
CLA GEN
Source: M. Morris Mano and Charles R. Kime, Logic and Computer Design Fundamentals.
Mat 1
Digital Circuit Lab
Programmable Interconnect Points
The routing wires in an FPGA is programmed using a
single-bit SRAM cell connected to a transistor:
16
gate
drainsource
Connect/disconnect a single wire: Routing a signal:
Mat 1
Digital Circuit Lab
Interconnect Architecture
On an FPGA, we must be able to control
Connections from wiring channels to LCs
Connections between wires in the wiring channels
17
LC LCHoriz. channel Horiz. channel
Ve
rt. ch
an
ne
lV
ert
. ch
an
ne
l
Mat 1
Digital Circuit Lab
Place and Route Problem
Wiring among LCs are organized into channels
Channels are arranged horizontally and vertically on the chip
There are many wires per channel
Connections between wires made at programmable
interconnection points
An EDA tool must choose:
Channels from source to
destination
Wires within the channels
18
LC LC LC
LC LC LC
LC LC LC
LC
LC
LC
Mat 1
Digital Circuit Lab
Xilinx FPGA Family
The Xilinx FPGA generations are coupled with the
TSMC manufacturing process
Usually two lines of product before the TSMC 28nm process
Old products before the 28 nm TSMC process
Virtex-II / Spartan-2 (130 nm)
Virtex-4 / Spartan-3 (90nm)
Virtex-5 (65 nm)
Virtex-6 (40 nm)/ Spartan-6 (45 nm)
New products after the 28 nm TSMC process
Virtex-7 / Kintex-7 / Artix-7 / Zynq-7 (28 nm)
Virtex UltraScale / Kintex UltraScale (20nm)
Virtex UltraScale+ / Zynq UltraScale+ (16 nm FinFET+)
19
4-input LUT
6-input LUT
6-input LUT
Mat 1
Digital Circuit Lab
Xilinx FPGA Architecture
Xilinx FPGAs are composed of CLBs, BRAMs,
Multipliers (in DSP slices), CMTs, and IOBs
20
XC7A35T: 2,600 CLBs 1800 Kb BRAMs 90 25x18-bit multipliers 5 CMTs
Mat 1
Digital Circuit Lab
CLB Architecture of Artix-7
The CLB architecture of an Artix-7 FPGA:
The main resources for logic synthesis
Each CLB contains two slices, each slice four logic cells
Each CLB is connected to a switch matrix
Carry chain runs vertically in a column from one slice to the
one above
21
CLB CLB
Sw
itch
Ma
trix
Slic
e 0
Slic
e 1
Sw
itch
Ma
trix
Slic
e 0
Slic
e 1
Clocks
Data Data cin
cout
Mat 1
Digital Circuit Lab
CLB Architecture of Spartan II
Each CLB contains four
logic cells, organized as
a pair of slices.
Each slice has a four-
input lookup table, logic
for carry and control,
and a D-type flip-flop.
Spartan II part family
provides the flexibility
and capacity of an on-
chip block RAM.
1 CLB = 2 slices = 4
logic cells
Mat 1
Digital Circuit Lab
Slice Structure
The left-hand and right-hand slices have different
architecture:
23
LUT6 can be used as a shift-register or a small RAM block, i.e., a distributed RAM
LUT6
LUT6 LUT6
LUT6
SRL32
SRL32
RAM32
RAM32
(M means memory) (L means logic)
Mat 1
Digital Circuit Lab
Artix-7 Slice Structure (SliceM)
24
Output flip-flop can beconfigured as a latch!
Carry
pro
pa
gatio
n c
ha
in
Mat 1
Digital Circuit Lab
Synthesis of a Large Multiplexor
Several 2-to-1 MUXes can be
combined to form a large mux
Large mux consumes FPGA
resources significantly and
creating long signal path delay
Please use them wisely
25
input 1
input 2
input 3
input 4
input 5
input 6
input 7
input 8
output
Mat 1
Digital Circuit Lab
Configurable LUT6 or LUT5
Each 6-input LUT can also be configured as two 5-
input LUT’s for logic synthesis:
26
Mat 1
Digital Circuit Lab
Multi-Purpose LUT in SliceM
The LUTs in SliceM can also be configured as either
32-bit shift registers or 32-bit RAM blocks (called
distributed RAM)
27
32-bit
32x1
6
D1 ~ D6
Mat 1
Digital Circuit Lab
LUT as Distributed RAM
LUT can be used as Distributed RAM
Each unit is a 321 RAM;
5 of the LUT input lines becomes the
address lines of a RAM block
LUTs can be cascaded to increase RAM size
Each LUTs can be used to Implement either
a 641-bit RAM
a 322-bit RAM
a 321-bit RAM with dual output ports
Synchronous write
Synchronous / Asynchronous read
Output flip-flop is used for synchronous read
28
LUT
LUT
LUT
R
R
R
W
W
ADDR
Mat 1
Digital Circuit Lab
LUT as Shift Register
Each LUT can be configured
as a shift register
Bit depth of the shift register
is configurable from 1 to 32 bits
The 5 of the input lines of
the LUT is used to set the
depth of the shift register
Multiple LUTs can be
cascaded to form larger
shift registers
29
D Q
CE
D Q
CE
D Q
CE
D Q
CE
LUT
INCE
CLK
DEPTH[4:0]
OUT
Mat 1
Digital Circuit Lab
DSP Slice
Each DSP slice contains:
25x18 multiplier, 25-bit pre-adder, 48-bit ALU, 17-bit shifter
configurable pipeline
30
Mat 1
Digital Circuit Lab
XADC: Dual 12-Bit 1-MSPS ADCs
31
Mat 1
Digital Circuit Lab
Structure of the IOB
32
inout control signal
output driver
input signal
Mat 1
Digital Circuit Lab
DDR RAM Access
IOB provides double-edge DDR read/write
capabilities:
33
DDR inputDDRoutput
ToFabric
D1
D2
CLK1 and CLK2 are 180 phase shifted
DDR input timing diagrams DDR output timing diagrams
data fetch
Mat 1
Digital Circuit Lab
Clock Distribution Network (Spartan 3)
34
A
A
B
B
C
C
D
D
Mat 1
Digital Circuit Lab
Digital Clock Manager (DCM)
Each Clock Manager Tile (CMT) contains a DCM
circuit
DCMs can be used to
Create different clock domains in a complex circuit design
De-skew the clock signals
Phase-shift a clock
35
Mat 1
Digital Circuit Lab
Clock Skew Problem
In the real world, the clock inputs may be skewed:
36
FPGAD Q
OthercircuitsA
B
Cb c
b
c
A
B
C
Mat 1
Digital Circuit Lab
Delay-Locked Loop for De-skewing
We can use a DLL to compensate for the clock skew
37
PhaseDetector
3
clock
feedbackSEL
clock
feedback
Synchronous Circuit I
Synchronous
Circuit II
Mat 1
Digital Circuit Lab
DCM Block Diagram
The block diagram of a DCM for Spartan 3 FPGAs
38
To othercircuits
Mat 1
Digital Circuit Lab
FPGA Configuration Technologies
FPGA’s logic elements, interconnect switch, and I/O
pins can be programmed using one of the following
three technologies:
SRAM-based
Can be programmed many times
Must be programmed after power-up
Antifuse-based
Programmed once via a burn-in step
Often called Complex Programmable Logic Device (CPLD)
Flash-based
Similar to SRAM but using flash memory
39
Mat 1
Digital Circuit Lab
Pros and Cons of SRAM FPGAs
SRAM-based FPGA is the most popular FPGA
technology today
Advantages:
Re-programmable
Dynamically reconfigurable
Uses standard semiconductor processes
Disadvantages:
SRAM burns power
Configuration lost at power-down (but not on reset!)
Possible to steal, disrupt configuration bits
Just like piracy & virus issues of software
40
Mat 1
Digital Circuit Lab
Configuring SRAM-based FPGA
There are several ways to configure an FPGA
A host PC configures FPGA using JTAG interface, not good
for “turn-key” systems
FPGA boots in master mode, reads configuration data from
a flash ROM (or SD card) through the SPI bus
FPGA boots in slave mode, a microcontroller (through GPIO)
or a CPLD can be used to configure an FPGA
41
Mat 1
Digital Circuit Lab
Behavior of FPGA Configuration
When a Xilinx FPGA is configured or reconfigured,
every cells are initialized
Configuration has the same effect as a global reset, it
set/preset all the flip-flops and initializes all RAM cells
42
FDC – D Flip-flops with asynchronous clearFDP – D Flip-flops with asynchronous preset
Mat 1
Digital Circuit Lab
References
Mentor Graphics, The Design Warrior’s Guide to FPGAs
Devices, Tools, and Flows, ISBN 0750676043, 2004.
Xilinx, 7 Series FPGAs Configurable Logic Block User Guide,
Xilinx UG474, v1.8, Sep. 27, 2016.
P. Garrault and B. Philofsky, HDL Coding Practices to
Accelerate Design Performance, Xilinx WP231, Jan. 6, 2006.
Xilinx, Using Digital Clock Managers (DCMs) in Spartan-3
FPGAs, XAPP462, Jan. 2006.
M. Peattie, Using a Microprocessor to Configure Xilinx FPGAs
via Slave Serial or SelectMAP Mode, Xilinx AN502, Aug. 2009.
43