Upload
chastity-rich
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
05/12/06 BR Fall 99 1
Programmable Logic• So far, have only talked about PALs (see
22V10 figure next page).
• What is the next step in the evolution of PLDs?– More gates!
• How do we get more gates? We could put several PALs on one chip and put an interconnection matrix between them!!– This is called a Complex PLD (CPLD).
05/12/06 BR Fall 99 2
22V10 PLD
05/12/06 BR Fall 99 3
Cypress CPLD
Each logic block is similar to a 22V10.
Programmable interconnect matrix.
05/12/06 BR Fall 99 4
Any other approaches?Another approach to building a “better” PLD is place a lot of primitive gates on a die, and then place programmable interconnect between them:
05/12/06 BR Fall 99 5
Field Programmable Gate ArraysThe FPGA approach to arrange primitive logic elements (logic cells) arrange in rows/columns with programmable routing between them.
What constitutes a primitive logic element? Lots of different choices can be made! Primitive element must be classified as a “complete logic family”.
• A primitive gate like a NAND gate
• A 2/1 mux (this happens to be a complete logic family)
• A Lookup table (I.e, 16x1 lookup table can implement any 4 input logic function).
Often combine one of the above with a DFF to form the primitive logic element.
05/12/06 BR Fall 99 6
Other FPGA features• Besides primitive logic elements and
programmable routing, some FPGA families add other features
• Embedded memory– Many hardware applications need memory for data
storage. Many FPGAs include blocks of RAM for this purpose
• Dedicated logic for carry generation, or other arithmetic functions
• Phase locked loops for clock synchronization, division, multiplication.
05/12/06 BR Fall 99 7
Altera Flex 10K FPGA Family
05/12/06 BR Fall 99 8
Altera Flex 10K FPGA Family (cont)
05/12/06 BR Fall 99 9
Dedicated memory
05/12/06 BR Fall 99 10
16 x1 LUT
DFF
05/12/06 BR Fall 99 11
05/12/06 BR Fall 99 12
Emedded Array Block• Memory block, Can be configured:
– 256 x 8, 512 x 4, 1024 x 2, 2048 x 1
05/12/06 BR Fall 99 13
Issues in FPGA Technologies• Complexity of Logic Element
– How many inputs/outputs for the logic element?
– Does the basic logic element contain a FF? What type?
• Interconnect
– How fast is it? Does it offer ‘high speed’ paths that cross the chip? How many of these?
– Can I have on-chip tri-state busses?
– How routable is the design? If 95% of the logic elements are used, can I route the design?
• More routing means more routability, but less room for logic elements
05/12/06 BR Fall 99 14
Issues in FPGA Technologies (cont)• Macro elements
– Are there SRAM blocks? Is the SRAM dual ported?
– Is there fast adder support (i.e. fast carry chains?)
– Is there fast logic support (i.e. cascade chains)
– What other types of macro blocks are available (fast decoders? register files? )
• Clock support
– How many global clocks can I have?
– Are there any on-chip Phase Logic Loops (PLLs) or Delay Locked Loops (DLLs) for clock synchronization, clock multiplication?
05/12/06 BR Fall 99 15
Issues in FPGA Technologies (cont)• What type of IO support do I have?
– TTL, CMOS are a given
– Support for mixed 5V, 3.3v IOs?
• 3.3 v internal, but 5V tolerant inputs?
– Support for new low voltage signaling standards?• GTL+, GTL (Gunning Tranceiver Logic) - used on Pentium II
• HSTL - High Speed Transceiver Logic
• SSTL - Stub Series-Terminate Logic
• USB - IO used for Universal Serial Bus (differential signaling)
• AGP - IO used for Advanced Graphics Port
– Maximum number of IO? Package types?
• Ball Grid Array (BGA) for high density IO
05/12/06 BR Fall 99 16
Altera FPGA Family Summaries• Altera Flex10K/10KE
– LEs (Logic elements) have 4-input LUTS (look-up tables) +1 FF
– Fast Carry Chain between LE’s, Cascade chain for logic operations
– Large blocks of SRAM available as well
• Altera Max7000/Max7000A
– EEPROM based, very fast (Tpd = 7.5 ns)
– Basically a PLD architecture with programmable interconnect.
– Max 7000A family is 3.3 v
05/12/06 BR Fall 99 17
Xilinx FPGA Family Summaries• Virtex Family
– SRAM Based
– Largest device has 1M gates
– Configurable Logic Blocks (CLBs) have two 4-input LUTS, 2 DFFs
– Four onboard Delay Locked Loops (DLLs) for clock synchronization
– Dedicated RAM blocks (LUTs can also function as RAM).
– Fast Carry Logic
• XC4000 Family
– Previous version of Virtex
– No DLLs, No dedicated RAM blocks
05/12/06 BR Fall 99 18
Actel FPGA Family Summaries
• MXDS Family
– Fine grain Logic Elements that contain Mux logic + DFF
– Embedded Dual Port SRAM
– One Time Programmable (OTP) - means that no configuration loading on powerup, no external serial ROM
– AntiFuse technology for programming (AntiFuse means that you program the fuse to make the connection).
– Fast (Tpd = 7.5 ns)
– Low density compared to Altera, Xilinx - maximum number of gates is 36,000
05/12/06 BR Fall 99 19
Cypress CPLDs
• Ultra37000 Family– 32 to 512 Macrocells– Fast (Tpd 5 to 10ns depending on number of
macrocells)– Very good routing resources for a CPLD
BR Fall 99 20
trend toward
higher levels
of integration
Evolution of Implementation Technologies Discrete devices: relays, transistors (1940s-50s) Discrete logic gates (1950s-60s) Integrated circuits (1960s-70s)
e.g. TTL packages: Data Book for 100’s of different parts Map your circuit to the Data Book parts
Gate Arrays (IBM 1970s) “Custom” integrated circuit chips Design using a library (like TTL) Transistors are already on the chip Place and route software puts the chip together
automatically + Large circuits on a chip + Automatic design tools (no tedious custom layout) - Only good if you want 1000’s of parts
BR Fall 99 21
Gate Array Technology (IBM - 1970s) Simple logic gates
Use transistors toimplement combinationaland sequential logic
Interconnect Wires to connect inputs and
outputs to logic blocks
I/O blocks Special blocks at periphery
for external connections
Add wires to make connections Done when chip is fabbed
“mask-programmable” Construct any circuit
BR Fall 99 22
Programmable Logic
Disadvantages of the Data Book method Constrained to parts in the Data Book Parts are necessarily small and standard Need to stock many different parts
Programmable logic Use a single chip (or a small number of chips) Program it for the circuit you want No reason for the circuit to be small
BR Fall 99 23
Programmable Logic Technologies Fuse and anti-fuse
Fuse makes or breaks link between two wires Typical connections are 50-300 ohm One-time programmable (testing before programming?) Very high density
EPROM and EEPROM High power consumption Typical connections are 2K-4K ohm Fairly high density
RAM-based Memory bit controls a switch that connects/disconnects
two wires Typical connections are .5K-1K ohm Can be programmed and re-programmed in the circuit Low density
BR Fall 99 24
Programmable Logic Program a connection
Connect two wires Set a bit to 0 or 1
Regular structures for two-level logic (1960s-70s) All rely on two-level logic minimization PROM connections - permanent EPROM connections - erase with UV light EEPROM connections - erase electrically PROMs
Program connections in the _____________ plane PLAs
Program the connections in the ____________ plane PALs
Program the connections in the ____________ plane
BR Fall 99 25
Making Large Programmable Logic Circuits
Alternative 1 : “CPLD” Put a lot of PLDS on a chip Add wires between them whose connections can be
programmed Use fuse/EEPROM technology
Alternative 2: “FPGA” Emulate gate array technology Hence Field Programmable Gate Array You need:
A way to implement logic gatesA way to connect them together
BR Fall 99 26
Field-Programmable Gate Arrays
PALs, PLAs = 10 - 100 Gate Equivalents
Field Programmable Gate Arrays = FPGAs Altera MAX Family Actel Programmable Gate Array Xilinx Logical Cell Array
100 - 1000(s) of Gate Equivalents!
BR Fall 99 27
Field-Programmable Gate Arrays Logic blocks
To implement combinationaland sequential logic
Interconnect Wires to connect inputs and
outputs to logic blocks
I/O blocks Special logic blocks at
periphery of device forexternal connections
Key questions: How to make logic blocks programmable? How to connect the wires? After the chip has been fabbed
BR Fall 99 28
Tradeoffs in FPGAs Logic block - how are functions implemented: fixed functions
(manipulate inputs) or programmable? Support complex functions, need fewer blocks, but they are
bigger so less of them on chip Support simple functions, need more blocks, but they are
smaller so more of them on chip
Interconnect How are logic blocks arranged? How many wires will be needed between them? Are wires evenly distributed across chip? Programmability slows wires down – are some wires specialized
to long distances? How many inputs/outputs must be routed to/from each logic
block? What utilization are we willing to accept? 50%? 20%? 90%?
BR Fall 99 29
Clk MUX
Output MUXQ
F/B MUX
Invert Control
AND ARRAY
CLK
pad
8 Product TermAND-OR Array
+Programmable
MUX's
Programmable polarity
I/O Pin
Seq. LogicBlock
Programmable feedback
Altera EPLD (Erasable Programmable Logic Devices)
Historical Perspective PALs: same technology as programmed once bipolar PROM EPLDs: CMOS erasable programmable ROM (EPROM) erased by UV light
Altera building block = MACROCELL
BR Fall 99 30
Altera EPLDs contain 8 to 48 independently programmed macrocells
Personalizedby EPROMbits: Flipflop controlled
by global clock signal
local signal computesoutput enable
Flipflop controlledby locally generatedclock signal
+ Seq Logic: could be D, T positive or negative edge triggered+ product term to implement clear function
Synchronous Mode
Asynchronous Mode
Global CLK
OE/Local CLK
EPROM Cell
1
Global CLK
OE/Local CLK
EPROM Cell
1
Clk MUX
Clk MUX
Q
Q
Altera EPLD
BR Fall 99 31
LAB A LAB H
LAB B LAB G
LAB C LAB F
LAB D LAB E
P I A
AND-OR structures are relatively limited Cannot share signals/product terms among macrocells
LogicArrayBlocks
(similar tomacrocells)
Global Routing:ProgrammableInterconnect
Array
8 Fixed Inputs52 I/O Pins8 LABs16 Macrocells/LAB32 Expanders/LAB
EPM5128:
Altera Multiple Array Matrix (MAX)
BR Fall 99 32
LAB Architecture
Expander Terms shared among allmacrocells within the LAB
Macrocell ARRAY
I/O Block
Expander Product
Term ARRAY
I NPUTS
P I A
I/O Pad
I/O Pad
Macrocell P-Terms
Expander P-Terms
BR Fall 99 33
0ASYNCHRONOUS RESET (TO ALL REGISTERS)
23AR
88132176220264308352396
44
22
2
OUTPUT LOGIC
MACROCELL
P - 5810 R - 5811
528572616660704748792836
484
880
440
21
3
OUTPUT LOGIC
MACROCELL
P - 5812 R - 5813
10561100114411881232127613201364
1012
1408
924
968
1452
20
4
OUTPUT LOGIC
MACROCELL
P - 5814 R - 5815
16721716176018041848189219361980
1628
2024
1496
1584
2068
1540
2112
19
5
OUTPUT LOGIC
MACROCELL
P - 5816 R - 5817
23762420246425082552259626402684
2332
2728
2156
2288
2772
22442200
28162860
1
1
0
0
1
0
0
1
D Q
QSP
10
5808
P
R
5809
10 4 8 12 16 20 24 28 32 36 40
INCREMENT
FIRST FUSE NUMBERS
15
9
OUTPUT LOGIC
MACROCELL
P - 5824 R - 5825
49725016506051045148519252365280
4928
5324
4884
17
7
OUTPUT LOGIC
MACROCELL
P - 5820 R - 5821
38283872391639604004404840924136
3784
4180
3652
3740
4224
3696
4268
16
8
OUTPUT LOGIC
MACROCELL
P - 5822 R - 5823
44444488453245764620466447084752
4400
4796
4312
4356
4840
18
6
OUTPUT LOGIC
MACROCELL
P - 5818 R - 5819
31243168321232563300334433883432
3080
3476
2904
3036
3520
29922948
35643608
14
10
OUTPUT LOGIC
MACROCELL
P - 5826 R - 5827
54125456550055445588563256765720
5368
11
5764
13
SYNCHRONOUS PRESET (TO ALL REGISTERS)
0 4 8 12 16 20 24 28 32 36 40INCREMENT
Supports large number of product terms per outputLatches and muxes associated with output pins
P22V10 PAL
BR Fall 99 34
Rows of programmablelogic building blocks
+
rows of interconnect
Anti-fuse Technology:Program Once
8 input, single output combinational logic blocks
FFs constructed from discrete cross coupled gates
Use Anti-fuses to buildup long wiring runs from
short segments
I/O Buffers, Programming and Test Logic
Logic Module Wiring Tracks
I/O Buffers, Programming and Test Logic
I/O
Buf
fers
, P
rogr
amm
ing
and
Test
Log
ic
I/O B
uffers, Program
ming and Test Logic
Actel Programmable Gate Arrays
BR Fall 99 35
Basic Module is aModified 4:1 Multiplexer
Example: Implementation of S-R Latch
2:1 MUXD0
D1
SOA
2:1 MUXD2
D3
SOB
2:1 MUX
S0
Y
S1
2:1 MUX"0"
R
2:1 MUX"1"
S
2:1 MUX Q
"0"
Actel Logic Module
BR Fall 99 36Interconnection Fabric
Logic Module
Horizontal Track
Vertical Track
Anti-fuse
Actel Interconnect
BR Fall 99 37
Jogs cross an anti-fuse
minimize the # of jobs for speed critical circuits
2 - 3 hops for most interconnections
Logic Module
Logic ModuleLogic Module Output
Input
Input
Actel Routing Example
BR Fall 99 38
IOB IOB IOB IOB
CLB CLB
CLB CLB
IOB
IOB
IOB
IOB
Wiring Channels
Xilinx Programmable Gate Arrays
CLB - Configurable Logic Block 5-input, 1 output function or 2 4-input, 1 output functions optional register on outputs
Built-in fast carry logic
Can be used as memory
Three types of routing direct general-purpose long lines of various lengths
RAM-programmable can be reconfigured
BR Fall 99 39
CLB
CLB
CLB
CLB
SwitchMatrix
ProgrammableInterconnect
I/O Blocks (IOBs)
ConfigurableLogic Blocks (CLBs)
D Q
SlewRate
Control
PassivePull-Up,
Pull-Down
Delay
Vcc
OutputBuffer
InputBuffer
Q D
Pad
D QSD
RDEC
S/RControl
D QSD
RDEC
S/RControl
1
1
F'
G'
H'
DIN
F'
G'
H'
DIN
F'
G'
H'
H'
HFunc.Gen.
GFunc.Gen.
FFunc.Gen.
G4G3G2G1
F4F3F2F1
C4C1 C2 C3
K
Y
X
H1 DIN S/R EC
BR Fall 99 40
The Xilinx 4000 CLB
BR Fall 99 41
Two 4-input functions, registered output
BR Fall 99 42
5-input function, combinational output
BR Fall 99 43
CLB Used as RAM
BR Fall 99 44
Fast Carry Logic
BR Fall 99 45
Xilinx 4000 Interconnect
BR Fall 99 46
Switch Matrix
BR Fall 99 47
Xilinx 4000 Interconnect Details
BR Fall 99 48
Global Signals - Clock, Reset, Control
BR Fall 99 49
Xilinx 4000 IOB
BR Fall 99 50
Xilinx FPGA Combinational Logic Examples
Key: General functions are limited to 5 inputs (4 even better - 1/2 CLB) No limitation on function complexity
Example 2-bit comparator:
A B = C D and A B > C D implemented with 1 CLB(GT) F = A C' + A B D' + B C' D'(EQ) G = A'B'C'D'+ A'B C'D + A B'C
D'+ A B C D
Can implement some functions of > 5 input
BR Fall 99 51
CLB
5-input Majority Circuit
CLB
CLB
CLB
7-input Majority Circuit
Xilinx FPGA Combinational Logic
Examples N-input majority function: 1 whenever n/2 or more inputs
are 1 N-input parity functions: 5 input/1 CLB; 2 levels yield 25
inputs!
CLB
CLB
9 Input Parity Logic
BR Fall 99 52
Xilinx FPGA Adder Example Example
2-bit binary adder - inputs: A1, A0, B1, B0, CIN outputs: S0, S1, Cout
CLB
A0 B0 Cin
S0
CLB
A1 B1
S1
CLB
A2 B2
C1S2
CLB
A3 B3
C2S3 C0Cout
S0
S1
C2
A1 B1 CinA0 B0
CLBS2
S3
Cout
A3 B3 A2 B2
CLB
Full Adder, 4 CLB delays tofinal carry out
2 x Two-bit Adders (3 CLBseach) yields 2 CLBs to finalcarry out
BR Fall 99 53
Computer-Aided Design Can't design FPGAs by hand
Way too much logic to manage, hard to make changes
Hardware description languages Specify functionality of logic at a high level
Validation: high-level simulation to catch specification errors Verify pin-outs and connections to other system components Low-level to verify mapping and check performance
Logic synthesis Process of compiling HDL program into logic gates and flip-
flops
Technology mapping Map the logic onto elements available in the implementation
technology (LUTs for Xilinx FPGAs)
BR Fall 99 54
CAD Tool Path (cont’d) Placement and routing
Assign logic blocks to functions Make wiring connections
Timing analysis - verify paths Determine delays as routed Look at critical paths and ways to improve
Partitioning and constraining If design does not fit or is unroutable as placed split into
multiple chips If design it too slow prioritize critical paths, fix placement
of cells, etc. Few tools to help with these tasks exist today
Generate programming files - bits to be loaded into chip for configuration
BR Fall 99 55
Xilinx CAD Tools Verilog (or VHDL) use to specify logic at a high-level
Combine with schematics, library components
Synopsys Compiles Verilog to logic Maps logic to the FPGA cells Optimizes logic
Xilinx APR - automatic place and route (simulated annealing) Provides controllability through constraints Handles global signals
Xilinx Xdelay - measure delay properties of mapping and aid in iteration
Xilinx XACT - design editor to view final mapping results
BR Fall 99 56
Applications of FPGAs Implementation of random logic
Easier changes at system-level (one device is modified) Can eliminate need for full-custom chips
Prototyping Ensemble of gate arrays used to emulate a circuit to be manufactured Get more/better/faster debugging done than with simulation
Reconfigurable hardware One hardware block used to implement more than one function Functions must be mutually-exclusive in time Can greatly reduce cost while enhancing flexibility RAM-based only option
Special-purpose computation engines Hardware dedicated to solving one problem (or class of problems) Accelerators attached to general-purpose computers