EET 3143 Programmable Logic Devices
Michigan Technological UniversityElectrical Engineering Technology
Instructor: Dr. Nasser Alaraje
Slide - 2
Contact Information
• Name: Abdulnasser (Nasser) Alaraje• Office: 417 EERC Building• Phone (O): 487-1661• Email: [email protected]• Office Hours: MWF 10:00 am – 12:00 pm
(or by appointment)
Slide - 3
Practical Course
Course Objectives:Upon Successful completion of this course, students should:
Learn how to use HDL for modeling basic building blocks of digital systemLearn FPGA technology and the impact of using FPGA in logic designLearn FPGA design flow using Altera’s Quartus® II development softwareGain FPGA design experience by synthesizing, mapping, and placing and routing
a given design on Altera’s DE2 FPGA evaluation boardWork in groups of two or three and thereby learn how to cooperate in teamsGain a basic understanding of timing analysisLearn how to build SDC files for constraining FPGA designsLearn how to verify timing on simple design using the TimeQuest analyzer
Slide - 4
Why FPGA?• Respond to the Market needs of Skilled FPGA
Engineers• FPGA-based re-programmable logic design became
more attractive as a design medium during the last decade
• only 19.5 % of 4-year and 16.5 % of 2-year electrical and computer engineering technology programs at US academic institutions currently have a curriculum component in hardware description language and programmable logic design
• Curriculum has not yet “caught up” to industry needs. industry must be driving the curriculum development.
Slide -
What projects are FPGAs good for
Aerospace & DefenseRadiation-tolerant FPGAs along with intellectual property for image processing, waveform generation, and partial reconfiguration for SDRs.
AutomotiveAutomotive silicon and IP solutions for gateway and driver assistance systems, comfort, convenience, and in-vehicle infotainment.
BroadcastSolutions enabling a vast array of broadcast chain tasks as video and audio finds its way from the studio to production and transmission and then to the consumer.
ConsumerCost-effective solutions enabling next generation, full-featured consumer applications, such as converged handsets, digital flat panel displays, information appliances, home networking, and residential set top boxes.
Industrial/Scientific/MedicalIndustry-compliant solutions addressing market-specific needs and challenges in industrial automation, motor control, and high-end medical imaging.
Storage & ServerData processing solutions for Network Attached Storage (NAS), Storage Area Network (SAN), servers, storage appliances, and more.
Wireless CommunicationsRF, base band, connectivity, transport and networking solutions for wireless equipment, addressing standards such as WCDMA, HSDPA, WiMAX and others.
Wired CommunicationsEnd-to-end solutions for the Reprogrammable Networking Linecard Packet Processing, Framer/MAC, serial backplanes, and more
5
Slide -
Who uses them
www.fpgajobs.com
6
Slide -
Why are they important
• They have the ability to revolutionize the way that prototyping is done.
• Allows companies to get to market quicker and stay in market longer.
7
Slide -
Xilinx
• Largest manufacturer of HW • Develop hardware and software• Embedded PowerPC• University Program
8
Slide -
Altera
• Second largest manufacturer• Develop HW and SW• University Program
9
Slide -
• It depends– Time– Existing resources– Money – Level of effort– Preference
Which is best?
10
Slide -
Hardware/Software?
• Software: Quartus Software
• Hardware: DE2 FPGA board
11
Slide -
Welcome to the Quartus II Software!
Turn on or off inTools Options
12
Slide -
Altera DE2 Development Board
13
Slide - 14
Entity• Describes all inputs and outputs• Every VHDL design must has at least one entity• Requires the use of Identifiers for naming the
entity itself as well as the inputs and outputs• Entity is a keyword and is reserved in VHDL for
this purpose
entity <entity identifier> is
port (signal identifier);
end entity <entity identifier>
ENTITY Or2 IS PORT (x: IN std_logic; y: IN std_logic; F: OUT std_logic);END Or2;
Slide - 15
Architecture• Architecture declaration is where the operation of the logic
function is specified• For each entity there must be a corresponding architecture• Each architecture must be associated by name with an
entity
architecture < architecture name> of <entity name> is
begin
The description of the logic function goes here
end architecture <architecture name >
ARCHITECTURE Or2_beh OF Or2 ISBEGIN
PROCESS(x, y) BEGIN F <= x OR y; END PROCESS;
END Or2_beh;
Slide - 16
VHDL Processes
• A process is executed in sequence• Sensitivity list is a list of signals to which
the process is sensitive and is optional
Name: process (sensitivity list)
Declarations
Begin
Sequential statements
End process;
PROCESS(x, y) BEGIN F <= x OR y; END PROCESS;
Slide - 17
VHDL Components• Predefined logic• Place in a VHDL library and use
repeatedly• Any logic function can become a
component and used in large programscomponent name_of_component is
port (port definition);
end component name_of_component;
COMPONENT And2 IS PORT (x: IN std_logic; y: IN std_logic; F: OUT std_logic); END COMPONENT;
Slide - 18
Conditional Statements• if-then• if-then-else• elsif• case
Slide - 19
If statement
• Causes a decision to be made• When the if statement is true, the code
following the if statement is executed• When the if statement is false, the code
following the if statement until the end if is skipped
if conditional statement then
VHDL statements
end if
Slide - 20
If-Then-Else statement• else is an alternative path for the if
statement
if conditional statement then
VHDL statements
else
VHDL statements
end if
Slide - 21
Elsif statement• Use to allow multiple alternative paths
if conditional statement then
VHDL statements
elsif conditional statement then
VHDL statements
elsif conditional statement then
VHDL statements
end if
Slide - 22
Case statement example
case expression is
when choice =>
VHDL statement;
when choice =>
VHDL statement;
when others =>
VHDL statements;
end case;
Slide - 23
Processes in VHDL• Processes Describe Sequential Behavior• Processes in VHDL Are Very Powerful
Statements• Allow to define an arbitrary behavior that may
be difficult to represent by a real circuit• Not every process can be synthesized
• Use Processes with Caution in the Code to Be Synthesized
• Use Processes Freely in Testbenches
Slide - 24
Logic Operators
• Logic operators
• Logic operators precedence
and or nand nor xor not xnor
notand or nand nor xor xnor
Highest
Lowest
only in VHDL-93
Slide -
Logic Operators - example
Order of evaluation• Need to describe XOR using and,
or, not• C = a and not b or not a and b• Will be interpreted as:• C = ((a and (not b)) or (not a) and
b• C = (ab’+a’)b not correct• Need to use parentheses as follows• C = (a and not b) or (not a and b)
Associative logical operator• and, or, xor, xnor are
associative.• f <= a and b and c; allowed• nand or nor is not
associative.• g <= a nand b nand c;
invalid• G <= not (a and b and c) ;
valid
25
Slide - 26
Loops• A loop repeatedly executes the sequential
statements contained within the loop structure• for loop
– Entry point– Iteration – terminal test
for identifier in starting value to stopping value loop
VHDL statements
end loop
Slide - 27
While loop• A for loop stops after a fix number of
iterations• A while loop continues to loop until a
condition is met• Structure
– Entry point– Terminal test– Exit point
while Boolean expression loop
VHDL statements
end loop
Slide - 28
Data Types• bit• bit_vector• integer
– natural– positive
• Boolean• All are keywords• Data types define the type of data and the
set of values that can be assigned to.
Slide - 29
Integer Data Type
• Can contain positive and negative whole numbers
• entity declaration sets a range • In the example the output will require 4
pins for the integer entity integer_1 is
port( A, B: in bit; Z:out integer range 0 to 15);
end entity integer_1
Slide - 30
Natural data sub type• A subtype of integer data • Holds whole numbers greater than or
equal to zero• In an application limit the range so you
limit the number of pins assigned
entity natural_1 is
port( A: in natural range 0 to 16; X: out natural range 0 to 31);
end entity natural_1;
Slide - 31
Positive data sub type• A subtype of integer data • Restricts integers to the range from 1 to
the specified range limit.
entity positive_1 is
port( A, B: in bit; Z: out positive range 1 to 31);
end entity positive_1;
Slide - 32
Boolean Data Type• Has two possible values true and false• In the example below two variables are
declared on as true and the other is false
variable v1: boolean := false;
variable v2: boolean := true:
Slide - 33
User-defined enumeration types - Examplestype state is (S0, S1);
type alu_function is (disable, pass, add, subtract,multiply, divide);
type octal_digit is (‘0’, ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’);
Slide - 34
Functions and Procedures• Types of subprograms in VHDL• Allow for modularization and code reuse• process can also be used as a subprogram,
think of a subprogram as a process that is located outside of the architecture of a program.
• A function is a subprogram that operates on a set of inputs and returns an output
• A procedure is a subroutine that operates on an argument list and passes values back through the argument list
• function and procedure will require a call
Slide - 35
Function syntaxFUNCTION function_name
(<parameter_list>)RETURN data_type IS [declarations]BEGIN (function statements)Return value;END function_name;
Slide - 36
Function example
function and_gate (X, Y: in std_logic) return std_logic is
begin
return X and Y;
end and_gate;
To call a function : The output of a function can be assigned to an output port (same data type). Information can also be passed into the function by value.
AND1: x<=and_gate (A,B);
AND2: x<=and_gate(‘1’, B);
Slide - 37
Procedure syntaxPROCEDURE procedure_name
(<parameter_list>) IS [declarations]BEGIN (procedure statements)END procedure_name;
Procedure: similar to a function; however, the arguments in a procedure can include both inputs and outputs (function has inputs only).
Slide - 38
Procedure example
procedure or_gate(X, Y : in std_logic; Z: out std_logic) is
begin
Z <= X or Y;
end or_gate;
To call a procedure: Inputs and outputs are used to pass data in and out a VHDL procedure (same data type).
B1: or_gate (A=>X, B=>Y, Z =>V1);
Slide - 39
Libraries, Packages and Package Bodies• They hold commonly-used elements and
allows them to be stored and used over and over again without having to re-write them.
• Components, Procedures and functions are in packages
• Packages can be user defined or vendor supplied
• Libraries are used to hold packages
Slide - 40
Libraries• Two types
– Standard libraries (like IEEE standard library)– User defined (holds user-defined packages)
• IEEE Standard LibraryVHDL library coding
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_1164.std_logic;
Keyword library: make the packages in the IEEE library visible to the VHDL code.
Keyword use: tells the VHDL code what is to be used from the IEEE library.
You can specify a specific feature(s) from the package or you can use the keyword all to make them all available.
Slide - 41
Packages• Used to hold reusable code
– Components– Functions– procedures
Package declaration
package user_defined_name is
package declarations
end package user_defined_name;
Slide - 42
Package Body
• Package body is where items listed in the declaration are defined.
Package body syntax
package body user_define_name is
package body definitions
End package body user_defined_name;
Slide - 43
Package containing a function (1)LIBRARY IEEE;USE IEEE.std_logic_1164.all;PACKAGE specialFunctions ISFUNCTION AndGate( A,B: in std_logic) RETURN std_logic;END specialFunctions;
PACKAGE BODY specialFunctions ISFUNCTION AndGate( A,B: in std_logic) RETURN std_logic is
BEGIN return A AND B; END AndGate;
END specialFunctions;
Slide - 44
Package containing a function (2)• The package is saved as specialFunctions in the library name work, which
is the default library. • Once the package is compiled. It can be used by other VHDL programs.• Example:
LIBRARY IEEE;USE IEEE.std_logic_1164.all;USE work.specialFunctions.all;
Entity ExamplePackage isPort (A, B: in std_logic; X: out std_logic);End Entity ExamplePackage;
Architecture MyGate of ExamplePackage isBeginProcess (A,B)BeginA1: X<= AndGate(A,B);End process;End architecture MyGate;
Slide - 45
FPGA
• Introduced by Xilinx in mid 1980 for implementing digital logicF ieldP rogrammableG ateA rray
• FPGA Can be visualized as a set of programmable logic blocks embedded in programmable interconnect
• Interconnect architecture provides the connectivity between logic blocks• Programming Technology determines the method of storing configuration
Slide - 46
FPGA Re-programmable Logic Applications• When FPGA first introduced, it was considered as another form of
gate array
• SRAM-FPGA in-circuit reprogrammability feature provides a more than just a standard gate array
• FPGAs have gained rapid acceptance and growth over the past decade because they can be applied to a very wide range of applications– random logic– Custom computing machine– device controllers– communication encoding and filtering
• programmable logic becomes the dominant form of digital logic design and implementation
Slide - 47
FPGA design flow• Design Flow is the step-by-step methodology to
go through the process of FPGA design • The design flow can be divided into 6 basic
stepsDesign EntryFunctional Verification and Simulation FPGA SynthesisFPGA Place & RouteCircuit Analysis (Timing, Power …)Programming FPGA devices
Slide - 48
Description of Design steps• Design Entry – describes the design that has to be
implemented onto FPGA• Functional Verification and Simulation – checks logical
correctness of design• FPGA synthesis – converts design entry into actual
gates/blocks needed• FPGA Place & Route – selects the optimal position and
minimizes length of interconnections on device• Time Analysis – determines the speed of the circuit
which has been completely placed and routed• Programming to FPGA – downloads bitstream codes
onto FPGA devices
Slide - 49
FPGA Design Flow
design entry (VHDL)
FPGA Place and Route
FUNCTIONAL VERIFICATION& SIMULATION
FPGA Synthesis
Download to FPGA
CIRCUIT ANALYSIS
(Timing)
Lets put these design steps in order
Slide - 50
FPGA Design Flow
FUNCTIONAL VERIFICATION& SIMULATION
CIRCUIT ANALYSIS
(Timing)
Implementation PathAnalysis Path
design entry (VHDL)
FPGA Synthesis
FPGA Place and Route
Download to FPGA
Slide -
The origin of FPGA• First transistor created at Bell Lab
in 1947.• First Phase Shift oscillator
fabricated on a single chip by TI in 1958. Around mid-1960, TI introduced 54xx and 74xx series.
• In 1971, Intel announced the world’s first uP (4004), contains 2300 transistors and could execute 60,000 operations per second.
• The first programmable IC were referred to as Programmable Logic Devices (PLDs : PROM) arrived in 1970 (simple as compared to new device called Complex PLDs)
51
1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000
FPGAs
ASICs
CPLDs
SPLDs
Microprocessors
SRAMs & DRAMs
ICs (General)
Transistors
Slide -
PLDs
SPLDs CPLDs
PLAsPROMs PALs GALs etc.
PLDPLD (Programmable Logic
Device)– Contains thousands of
basic logic gates in a single package
– Capable of performing advanced sequential functions
– Must be configured to perform a specific function
52
Slide -
a b cl l l
Address 0 &
Address 1 &
Address 2 &
Address 3 &
Address 4 &
Address 5 &
Address 6 &
Address 7 &a !a b !b c !c
!a !c!b& &
!a c!b& &
!a !cb& &
!a cb& &
a !c!b& &
a c!b& &
a !cb& &
a cb& &
Predefined AND array
Pro
gra
mm
ab
le O
R a
rra
y
w x y
Predefined linkProgrammable link
PROMsThe first PLD
– Consists of a fixed array of AND functions driving a programmable array of OR
– 3-input, 3-output PROM, programmable OR link, each OR has 8 inputs, used to implement simple logic functions.
– Must be configured to perform a specific function
53
Slide -
a b c w x y0 0 0 0 1 00 0 1 0 1 10 1 0 0 1 00 1 1 0 1 11 0 0 0 1 01 0 1 0 1 11 1 0 1 0 11 1 1 1 0 0
l
&a
b
c
w
x
y
PROMs – Example, 3-input, 3-output function
a b c
l l l
Address 0 &
Address 1 &
Address 2 &
Address 3 &
Address 4 &
Address 5 &
Address 6 &
Address 7 &a !a b !b c !c
!a !c!b& &
!a c!b& &
!a !cb& &
!a cb& &
a !c!b& &
a c!b& &
a !cb& &
a cb& &
Predefined AND array
Pro
gram
mab
le O
R a
rray
w x y
Predefined linkProgrammable link
w = (a & b)x = !(a & b)y = (a & b) ^ c
PROM Programmed to implement the 3 functions, W, X, and Y
54
Slide -
a b c
&
&
&
a !a b !b c !c
N/A
Predefined AND array
Pro
gra
mm
ab
leO
R a
rra
y
Predefined linkProgrammable link
l l l
w x y
N/A
N/A
PLAsThe first became available in
1975– Both AND and OR arrays
were programmable.– 3-input, 3-output PLA,
number of AND is independent of the number of inputs (PROM)
– OR array is independent of number of AND functions or number of inputs.
55
Slide -
a b c
&
&
&
a !a b !b c !c
a cb& &
a c&
!b !c&
Predefined AND array
Prog
ram
mab
leO
R a
rray
Predefined linkProgrammable link
l l l
w x y
w = (a & c) | (!b & !c)x = (a & b & c) | (!b & !c)y = (a & b & c)
PLAs - Example
PLA Programmed to implement the 3 functions, W, X, and Y
56
Slide -
a b cl l l
&
&
&
&
&
&
a !a b !b c !c
ProgrammableAND array
Pre
de
fin
ed
OR
arr
ay
w x y
Predefined linkProgrammable link
PALs The first became available in
late 1970– The exact opposite of
PROM, Programmable AND, fixed OR.
– 3-input, 3-output PAL, faster because only one array is programmable.
– Allow a restricted number of products to be Ored.
57
Slide -
ProgrammableInterconnect
matrix
Input/output pinsSPLD-like
blocks
CPLDs The first became available in
early 1980, Complex PLD– Mega-PAL, compromised of
four standard PALs with some interconnect linking them together.
– Altera introduced CPLD based on a combination of EPROM.
– A generic device consists of a number of SPLD blocks sharing a common programmable interconnection matrix.
100 wires
30 wires
Programmablemultiplexer
58
Slide -
(a) Host computer (b) Device programmer
Unprogrammeddevice
Programmeddevice
Programming PLDs
o USE device programmer, each vendor has file format, very time consuming design flow.
o In 1980, a committee of the (Joint Electron Device Engineering Council – JEDEC) proposed a standard format for PLD programming text files.
59
Slide -
ASICs
StructuredASICs
GateArrays
StandardCell
FullCustom
Increasing complexity
ASICs
o Four main classes of ASIC (Application Specific Integrated Circuit).
o Full Custom: Engineer have complete control over every mask layer used to fabricate the silicon chip. ASIC vendor does not prefabricate any component on the silicon or does not provide ant libraries of predefined logic gates and functions.
o Highly complex and time consuming design process
60
Slide -
(a) Single-column arrays (b) Dual-column arrays
I/O cells/pads
Channels
Basic cells
o Gate Arrays: based on the idea of a basic cell consisting of a collection of unconnected transistors and resistors.
o ASIC vendor prefab silicon chip containing array of the basic cells.
o Channeled gate array are presented either single-column or dual-column arrays.
o Vendor defines a set of logic function to be used by design engineer (MUX for example) referred as cell.
o ASIC Design flow is beyond the scoop of this course.
(a) Pure CMOS basic cell (b) BiCMOS basic cell
ASICs
61
Slide -
PLDs ASICs
Standard Cell
Full Custom
Gate Arrays
Structured ASICs*
SPLDs
CPLDs
*Not available circa early 1980s
TheGAP
FPGAs o Around 1980s, a gap in the digital IC.o SPLD and CPLD, programmable and had
fast design and modification time, but could not support large or complex functions.
o ASIC, support extremely large and complex function, but painfully expensive and time-consuming to design, once the design had been implemented, it is frozen in the silicon.
o To address this Gap, Xilinx developed a new class of IC called Field-Programmable Gate Array (FPGA).
62
Slide -
3-inputLUT
abc
flip-flop
clock
muxy
qd
FPGAs o FPGA are based on the concept of
programmable logic block, simple, 3-input lookup table (LUT), a register and a MUX.
o Each FPGA contained a large number of these programmable logic blocks embedded in configurable routing architecture.
o Every block could be configured to perform different function, register can be programmed on positive or negative clock.
o The MUX feeding the FFs could be configured to accept output from the LUT or a separate input to the logic block, the LUT could be configured 3-input logic function.
63
Slide -
|
&ab
cy
y = (a & b) | !c
Required function Truth table
1011101
0000010100111001011101111
y
a b c y00001111
00110011
01010101
10111011
SRAM cells
Programmed LUT
8:1
Mul
tiple
xer
a b c
FPGAs
o Example: configure the LUT to perform o Y = (A and B) OR (NOT C)
64
Slide -
Programmableinterconnect
Programmablelogic blocks
FPGAs
Large number of programmable blocks (islands) surrounded by a (sea) of programmable interconnects
65
Slide -
Altera’s Quartus II Tutorial
• Start the Quartus II software and prepare to implement the Boolean equation X = AB +CD.
66
Slide -
Altera’s Quartus II Tutorial
• Create a new project• Create a block design file (bdf)• Draw the digital logic for the Boolean
equation• Make the circuit connections• Compile the project
67
Slide -
Altera’s Quartus II Tutorial
• Create a vector waveform file (vwf)• Add inputs and outputs to the
waveform display• Create timing waveforms for the
inputs• Perform a functional simulation of the
x-output68
Slide -
Altera’s Quartus II Tutorial
• Use the Altera development and education board to program an FPGA.– Assign pins– Recompile the project– Program the FPGA– Test the logic
• Use the VHDL text editor to recreate the design used in the block design.
• http://www.youtube.com/user/billkleitz#p/c/57F1D26AD6D50FA7/0/oVvmeyVMtEI
69
Slide -
FPGA Programming TechnologySRAM-based FPGA
• Fabric: means the underlying structure of the device.• Majority of FPGA are SRAM based. They can be
configured over and over again.• Impact the memory R&D. SRAM cells are created
exactly the same as the rest of the device.• Downside: Have to be reconfigured every time the
system is powered up. Configuration file is stored in external memory.
• Security issues with protecting your IP.• Some SRAM-based FPGA supports encryption.
70
Slide -
Antifuse-based FPGA• Programmed offline using a special programmer.• Nonvolatile, configurations remains when power is
off.• No external memory device to store configuration
data.• Application: military and Aerospace.• Once programmed, it can not be altered.• NO Security issues with protecting your IP.• Downside: They are OTP, once programmed,
function is set stone.71
Slide -
EPROM/Flash-based FPGA• Can be configured offline or using in-system
programming.• Nonvolatile, once programmed, the data is
nonvolatile.• Support protection mechanism.• Application: military and Aerospace.
72
Slide -
State-of-the-art
Feature
Technology node
SRAM Antifuse E2PROM /FLASH
One or moregenerations behind
One or moregenerations behind
FastReprogramming
speed (inc.erasing)
---- 3x slowerthan SRAM
YesVolatile (must
be programmedon power-up)
No No(but can be if required)
MediumPowerconsumption Low Medium
Acceptable(especially when usingbitstream encryption)
IP Security Very Good Very Good
Large(six transistors)
Size ofconfiguration cell Very small Medium-small
(two transistors)
NoRad Hard Yes Not really
NoInstant-on Yes Yes
YesRequires externalconfiguration file No No
Yes(very good)
Good forprototyping No Yes
(reasonable)
Yes(in system)Reprogrammable No Yes (in-system
or offline)
Summary
73
Slide -
Programmableinterconnect
Programmablelogic blocks
FPGA architectures (Fine, medium, and coarse-grained)
• Reminder: large number of programmable logic blocks (islands) embedded in a (sea) of programmable interconnect.
• Fine-grain: each logic block can be used to implement only a very simple function such as any 3-input function.
• Coarse-grain: relatively larger logic block.• As the granularity of the blocks increases
to medium or high, the amount of connections into the blocks decreases compared to functionality they can support.
74
Slide -
&|
a
b
cy
AND
OR
y = (a & b) | c
0
1
0
1
0
1
MUX
MUX
MUX
0
b
a
1
x
0
y
0
1
MUX0
1
c
MUX based logic block• Consider example
y = (A AND B) OR C;• Each input to the block is
presented with a logic 0, a logic 1, or the true or the inverse of a signal
• Implemented using MUX
75
Slide -
Required function Truth table
a b c y00001111
00110011
01010101
01010111
y = (a & b) | c
&|
abc
y
ANDOR
LUT based logic block• Consider example
y = (A AND B) OR C;• A group of input
signals is used as an index (address) to the lookup table.
• Load the 3-input LUT with the appropriate values.
• LUT is SRAM based.
76
Slide -
0
1
1
1
0
1
1
1
0
1
1
1
0
1
1
1
abc
ySRAMcells
Transmission gate(active low)
Transmission gate(active high)
LUT based logic block
77
Slide -
MUX versus LUT logic block?
• Majority of today’s FPGA architectures are LUT based.
• MUX based does not provide high-speed carry logic chains, in which LUT are leader in anything to do with arithmetic processing.
• First FPGAs were based on 3-input LUTs.• Mainly 4-input LUTs architecture.
78
Slide -
16-bit SR
16 x 1 RAM
4-input LUT
CLBs versus LABs?
• Can not LIVE by LUTs alone.
• Will contain other elements such as MUX and registers.
79
Slide -
16-bit SR
flip-flop
clock
muxy
qe
abcd
16x1 RAM
4-inputLUT
clock enable
set/reset
Xilinx logic cell• Each vendor has its own names for
things.• Xilinx call it logic cell (LC),
comprises:• 4-input LUT• MUX• Register
• Clock can be configured rising versus falling
• Register can be configured as FFs or as a latch.
• Altera call it logic element (LE)80
Slide -
16-bit SR16x1 RAM
4-inputLUT
LUT MUX REG
Logic Cell (LC)
16-bit SR16x1 RAM
4-inputLUT
LUT MUX REG
Logic Cell (LC)
Slice
Slicing
• Next step up of the hierarchy is a slice.
• Slice has one set of clock, clock enable, and set/reset signals common to both logic cells.
81
Slide -
CLB CLB
CLB CLB
Logic cell
Slice
Logic cell
Logic cell
Slice
Logic cell
Logic cell
Slice
Logic cell
Logic cell
Slice
Logic cell
Configurable logic block (CLB)
CLBs versus LABs?• Next step up is CLB for
Xilinx and LAB for Altera.• Xilinx have two or more
slices in each CLB, example shows fours slices per CLB, additionally, fast programmable interconnect with the CLB to connect neighboring slices.
82
Slide -
CLB CLB
CLB CLB
Logic cell
Slice
Logic cell
Logic cell
Slice
Logic cell
Logic cell
Slice
Logic cell
Logic cell
Slice
Logic cell
Configurable logic block (CLB)
CLBs versus LABs?• Each 4-bit LUT can be used as
16x1 RAM. Also, the four slices per CLB, all LUTs can be configured to implement the following:• Single port 16X8 bit RAM• Single port 32X4 bit RAM• Single port 64X2 bit RAM• Single port 128X1 bit RAM• Dual port 16X8 bit RAM• Dual port 32X4 bit RAM• Dual port 64X2 bit RAM
• Each 4-bit LUT can be used as 16-bit shift register
83
Slide -
Columns of embeddedRAM blocks
Arrays ofprogrammable
logic blocks
84
Embedded RAMs
• Every applications needs memory.• FPGA now include large chunks of
embedded RAM called e-RAM or block RAM.
• Usually organized in columns.• Each block can be used
independently or multiple blocks can be combined together to implement large blocks.
• Useful to implement single-, dual-, FIFO, state machines …
Slide - 85
Embedded multiplier, adders, … • Some functions are inherently slow
if they are implemented by connecting a large number programmable logic blocks.
• Many FPGA incorporate special hard-wired multiplier blocks
• Located in close proximity to the embedded RAM blocks.
RAM blocks
Multipliers
Logic blocks
Slide - 86
x
+
x
+
A[n:0]
B[n:0] Y[(2n - 1):0]
Multiplier
Adder
Accumulator
MAC
Embedded multiplier, adders, … • Some FPGA offers dedicated adder
blocks ( very useful in DSP applications)
• Multiply-and-Accumulate (MAC).• If FPGA only provides multiplier
blocks, you can combine multiplier with adder and store results in registers.
Slide -
uP
RAM
I/O
etc.
Main FPGA fabric
Microprocessorcore, special RAM,
peripherals andI/O, etc.
The “Stripe”
87
Embedded processor cores• Many application make use of
microprocessors in one form or another.
• High-end FPGA contain one or more embedded microprocessor, referred to as microprocessor cores.
• Hard processor cores: dedicated predefined block. Either locate it in the strip, advantages: main FPGA fabric is identical, easier for design tools
Slide -
uP
(a) One embedded core (b) Four embedded cores
uP uP
uP uP
88
Embedded processor cores
Embed within the main fabric, design tools needs to account for the presence of these blocks in the fabric.
Slide - 89
Embedded processor cores - soft• Configure a group of programmable logic blocks to
act as a microprocessor, soft cores.• Are simpler and slower than hard-cores.• Advantages:
• you implement it if you need it• Instantiate as many as you need.
Slide -
Clock signal fromoutside world
Clocktree Flip-flops
Special clockpin and pad
Clock trees and clock managers• All of the synchronous
elements need to be driven by clock signals.
• Clock signal originates outside the FPGA, comes to FPGA via a special clock input pin and then routed through the device.
• Clock Tree: the main clock signal branches. This structure ensures that all of the flip-flops see their version as close together as possible.
90
Slide -
Clock signal fromoutside world
Clocktree Flip-flops
Special clockpin and pad
Clock trees and clock managers – cnt’d
• If the clock were distributed as a single long track driving all registers, one after another, then registers closer to clock pin will see the clock signal sooner, this is referred as skew (avoid!).
• The clock tree is implement using special track and separate from the general-purpose programmable interconnect.
• Usually, you will have multiple clock domain and multiple clock pins.
91
Slide -
Clock signal fromoutside world
Special clockpin and pad
Daughter clocksused to drive
internal clock treesor output pins
ClockManager
etc.
92
Clock trees and clock managers – cnt’d
• Instead of connecting clock pin into an internal clock tree, it can drive special hard-wired function (block) called clock manager.
• Clock manager generates a number of daughter clocks.
• Daughter clocks can drive internal clock trees or external output pins to provide external clock.
Slide -
Ideal clock signal
1 2 3 4
Real clock signal with jitter
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Superimposed cycles
93
Clock trees and clock managers – cnt’d
• Each FPGA family has its own type of clock manager.
• Clock manager supports jitter removal, clock edges may arrive a little early or a little late (Jitter).
• The FPGA clock manager can be used to detect and correct this jitter and to provide a clean daughter clock signals for use inside the device.
Clock signal fromoutside world
with jitter
Special clockpin and pad
“Clean” daughterclocks used to driveinternal clock trees
or output pins
ClockManager
etc.
Slide - 94
Clock trees and clock managers – cnt’d
• Frequency Synthesis: outside clock is not what the engineers wish for.
• Clock manager can be used to generate daughter clocks with frequencies derived from original clock.
• Example: 3 daughter clocks, 1.0, 2.0, 0.5 x original clock frequency.
1.0 x original clock frequency
2.0 x original clock frequency
.5 x original clock frequency
Slide -
0o Phase shifted
90o Phase shifted
180o Phase shifted
270o Phase shifted
95
Clock trees and clock managers – cnt’d
• Phase shifting: some designs require the use of clocks that are phase shifted (delayed) with respect to each other.
• Clock managers allow to select from a fixed phase shifts, 90, 180, and 270 or configure the exact amount of phase shift.
• Example: 1st is in phase, 2nd is shifted by 90, and so forth.
Slide -
01
54
6
7
3
2
General-purpose I/Obanks 0 through 7
96
General Purpose I/O
• Today’s FPGA package can have 1,000 or more pins, arranged as array across the base of the package.
• Each FPGA general purpose I/O can be configured to accept and generate signals conforming to whichever standard is required.
• General purpose I/O splits in a number of banks, starting from 0 to 7
Slide -
01
54
6
7
3
2
General-purpose I/Obanks 0 through 7
97
General Purpose I/O – cnt’d
• Each bank can be configured to support a particular I/O standard:• LVTTL• LVCMOS• PCI• LVDS
• This allows FPGA to work with multiple I/O standard, or to translate between different protocols that based on particular electrical standards.
Slide - 98
FPGA Families
• Many different types suited for almost every kind of application.
• FPGAs are grouped into categories, often referred as families or series, each with common characteristics.
• Some FPGAs are characterized as having high volume, low cost, high temperature, available in various sizes, packages, and speed.
• Manufacturers group FPGAs according to their application. (automotive, space, medical,.. Etc)
Slide - 99
Altera Families
• Refers to FPGA in series• Stratix:
• High end and High Density• On-Chip Transceivers
• Arria:• Midrange• Transceiver based.
• Cyclone:• Low cost• Low power consumption
Slide -
A Complete Solutions Portfolio
100
High-density,high-performance FPGAs
CPLDs ASICsLow-cost FPGAs
Designsoftware
Developmentkits
Embeddedsoft processors
Mid-range Transceiver FPGAs
Intellectual Property (IP)
Slide - 101
Altera – IP
• Many FPGA manufacturers offer a variety of what are called Intellectual property (IP) cores or functions.
• Allow the designer to select and customize specific desired function.
• Advantages:• Faster code development time• Reduced design risk less likelihood of errors.• Better and faster compiling
• Some IP cores or functions are free, others are fee based, The IP cores or functions are manufacturer dependent.
• Altera’s IP called Megafunctions, designed for only their FPGAs.
Slide -
Altera Megafunctions• Pre-made design blocks• Benefits
– Configurable, parameterized settings add flexibility & portability
– “Drop-in” support to accelerate design entry– Pre-optimized for Altera architecture
• Two versions– Quartus II megafunctions– Intellectual Property (IP) megafunctions
102
Slide -
Quartus II Megafunctions• Free & installed with Quartus II software
– Non-encrypted functions written in AHDL (Altera HDL)– HDL simulation models installed in Quartus II libraries
• Two types– Altera-specific megafunctions (begin with “ALT”)– Library of parameterized modules (LPMs)
• Examples– Arithmetic – On-chip RAM/ROM – PLLs– DDR/QDR/RLDRAM memory controllers
103
Slide -
IP Megafunctions• Must purchase license (except IP base suite)
– Logic for IP function is encrypted
• Two types– MegaCore® IP – Developed by Altera– Altera Megafunctions Partner Program (AMPP ) IP℠
• All MegaCore functions & some AMPP functions support OpenCore® Plus feature– Develop design using free version of core– HDL simulation models provided with IP– Generate time-limited configuration/programming files– See AN320: OpenCore Plus Evaluation of Megafunctions
104
Slide -
MegaCore IP Examples• Included in IP base suite
– FIR Compiler – Fast Fourier Transform – DDR/DDR2 High Performance Memory
Controlle• License required
– Triple-Speed Ethernet MAC– CRC Compiler– PCI Compiler
105
See http://www.altera.com/products/ip/ipm-index.html for a complete list of Altera IP solutions
Slide -
MegaWizard Plug-in Manager• Eases implementation and configuration of megafunctions & IP• GUI, command line, or both
106
Command line: qmegawiz <-silent> <module | wizard>=<mf_name> <ports & parameters options> file_name
Tools MegaWizard Plug-In Manager or Tasks window
File NameSelect
Megafunction or IP
Language
Slide -
MegaWizard Example
107
Multiply-Add megafunction
Updating graphical representation
Customization options
Locate documentation in Quartus II Help or the webThree step process to
configure megafunction
Slide -
MegaWizard Output File Selection
108
Slide -
Programmableinterconnect
Programmablelogic blocks
Programming an FPGA – configuration cells• Configuration file: contains the
information that will be uploaded into the FPGA in order to program it (bit file).
• Simple: load the configuration file into the device.
• Programmable interconnect: connects the device’s primary inputs and outputs to the programmable logic blocks and blocks to each others.
109
Slide -
Programming an FPGA – configuration cells
110
• An example of usage of SRAM-controlled switches is illustrated showing two applications of SRAM cells:
• for controlling the gate nodes of pass-transistor switches and to control the select lines of multiplexers that drive logic block inputs. The figures gives an example of the connection of one logic block (represented by the AND-gate in the upper left corner) to another through two pass-transistor switches, and then a multiplexer, all controlled by SRAM cells.
Slide -
4-inputLUT
flip-flop
clock
muxy
qe
abcd
Programming an FPGA – configuration cells• A simple programmable logic
block: 4-input LUT, MUX, and a register.
• Configuration cell: • MUX: which input is to be
selected.• Register: Edge-trigger FF or
latch, positive or negative clock edge, active low or high enable, whether to be initialized to zero or 1.
• LUT: 16-configuration cells111
Slide -
Configuration data in
Configuration data out
= I/O pin/pad
= SRAM cell
Programming an FPGA – SRAM based
• Volatile: have to be programmed in-system, always need to be reprogrammed when power is first applied to the system.
• All SRAM configuration cells as a long shift register. Beginning and end of the register are accessible from outside world.
• Data out is only used if multiple FPGAs are configured by cascading (daisy-chaining) together.
• FPGA can contain 25 mil cells, clocking 25 mil bits of configuration data into the device.
112
Slide -
Serial load with FPGA as master
Mode Pins Mode
Serial load with FPGA as slave
Parallel load with FPGA as master
Parallel load with FPGA as slave
0 0
0 1
1 0
1 1
Programming an FPGA – SRAM based
• LUT: can be configured to act as LUT, 16x1 chunk of distributed RAM, or as 16-bit shift register.
• Configuration port: small dedicated group of pins used to inform the device which configuration mode is going to be used, two pins are used to provide four modes.
• Mode pins are hardwired to desired logic (0 or 1)
113
Slide -
Configuration data in
Mem
ory
Dev
ice
Control
Configurationdata out
FPGA
Cdata In
Cdata Out
• Serial load with FPGA as a master: simplest mode, use external PROM (now flash), has a single data output pin connected to configuration data in pin.
• FPGA uses several bit to control the external memory device, reset, clock.
• FPGA clocks the configuration data out of the memory device.
• Configuration data out is used to read the configuration data from the device for any reason. OR FPGA can be daisy-chained sharing a single memory device.
Programming an FPGA – SRAM basedSerial load with FPGA as a master
Me
mo
ry
De
vic
e
ControlFPGA
Cdata In
Cdata Out
FPGA
Cdata In
Cdata Out
etc.
114
Slide -
Configuration data [7:0]Mem
ory
Dev
ice
Control FPGA
Cdata In[7:0]
Address
Programming an FPGA – SRAM basedParallel load with FPGA as a master• Very similar to serial mode, except that data
is read in 8-bit chunk from memory device.• FPGA also supplies the external memory
with an address bus. FPGA has internal counter used to generate the address to the external memory and keeps incrementing.
• Offers speed: not really, data read still needs to be clocked in serially in early device, now yes!
• Issues with signal integrity, 8-bit data bus and 24-bit address bus. Newer version of external memory does not require external address, FPGA no longer requires counter.
Configuration data [7:0]
Mem
ory
Dev
ice
Control FPGA
Cdata In[7:0]
115
Slide -
Memo
ryDe
vice
Control
Microp
rocess
or
Address
Data
Perip
heral
,Po
rt, etc
.
FPGA
Cdata In[7:0]
Programming an FPGA – SRAM basedParallel load with FPGA as a slave • FPGA as a master: attractive, only FPGA and external memory
involved.• Microprocessor can be used to load the FPGA, it informs the
FPGA to start the configuration process, it reads a byte of data fro memory device and writes into the FPGA.
116
Slide -
JTAG data in
Input pin fromoutside world
Output pin tooutside world
To internallogic
From internallogic
From previousJTAG filp-flop
To nextJTAG filp-flop
Input pad
Output pad
JTAG flip-flops
JTAG data out
Using the JTAG port
• Today’s FPGA are equipped with JTAG port (Joint Test Action Group, IEEE 1149.1 standards, originally used for testing the circuit boards.
• JTAG port: input data, output data, JTAG registers are daisy-chained.
• Serially clock the data in the JTAG register, FPGA operates on data and ultimately clock the result back out of the JTAG port.
117
Slide -
Using the JTAG port
• JTAG can be used for more than Boundary Scan, FPGA connect SRAM shift register to JTAG scan chain, In this case, JTAG can be used to program the FPGA. Today’s FPGA can support five different programming modes, thus require three mode pins.
Serial load with FPGA as master
Mode Pins Mode
Serial load with FPGA as slave
Parallel load with FPGA as master
Parallel load with FPGA as slave
Use only the JTAG port
0 0
0 1
1 0
1 1
0
0
0
0
x x1
118
Slide -
JTAG data inJTAG data out
FPGA
Core
Primary scan chain
Internal (core) scan chain
Using an embedded processor
• When FPGA contains embedded processor, may have its own dedicated JTAG port.
• JTAG can be used to initialize the internal microprocessor core, configuration then can be handled by the processor.
119
Slide -
FPGA Design Flow – Design Phase
• The first development phase is Design
• FPGA design can be:• Converting schematic to HDL• Modify existing design• Totally new design
• Very critical phase?• Goals:
• Learn how to evaluate design package• Decisions to make prior to creating the
design• How to create the design
120
FUNCTIONAL VERIFICATION& SIMULATION
CIRCUIT ANALYSIS
(Timing)
Implementation PathAnalysis Path
design entry (VHDL)
FPGA Synthesis
FPGA Place and Route
Download to FPGA
Slide -
Design Phase• More than just create the design• Design materials must be understood “ the design
package”: contains the requirements that define the FPGA features and functions, what the design must do and how.
• Success or failure of the design largely depends on:– The quality of the design inputs– Making Key decision– Development Tools
121
Slide -
Design Package• Usually written by system engineer, or
architect.• Includes:
– Creating of design architecture– Partitioning the design into sections– Creation of design requirements– Creation of Timing and other diagrams (supporting documents).– Do not create your own requirements? Always ask
• You should always evaluate its content prior to starting the design.
122
Slide -
Design Package example• Timing Diagram, Requirement
Documents, State machine, Schematics … etc
• Evaluate: Package Analysis: Be sure to have a clear understanding of what your are to design. (questions: always ask?
• Getting Clarification: not all design packages are crystal clear, go directly to the source.
• Organize: make sure you work from the latest and most accurate information.
123
Slide -
Pre-design Decisions• Design format, FPGA vendor? Tools used?• Design requirements may define one or some pre-design decisions.• Making one decision can automatically determine the other option,
selecting Altera for FPGA vendor determine Quartus Tool. Manufacturer must be known in the design phase for manufacturer dependent designs, synthesis phase when manufacturer and part number are needed for independent designs.
124
Slide -
Design Format• Prior to create a design, You must select the design’s format: Schematic
capture, HDL, or a combination.• Sometimes, the decision has been made by your design package. You select manufacturer and development tools.• If you are starting a new design, you may have the option to select the
design format.• Schematic Capture:
– Pros: Design is drawn as a schematic, easier to create, read, and understand.– Cons:
• Logic symbols are proprietary, design is manufacturer dependent, less flexible.• Option on development tools are limited
• HDL: – Pros: more design and manufacturer flexibility, manufacturer independent– Cons: May b difficult to read and understand
125
Slide -
FPGA Manufacturer• How to select a device: Need to know how much resources your design
require, can be difficult at first.• A good way: randomly select a device, synthesize the design, and review
the resources required in the output report.• With this information, use a datasheet to select a more appropriately
sized device.• Factors to consider when selecting the device:
– Design Application: Avionics, Military, Automotive, Medical, and so forth– Environment: Military, Industrial, commercial– Temperature range: Commercial, 0 to 85 C, Industrial -40 to 100 C, Military -55 to 125 C– Design Size: Board allocated space, Package.
126
Slide -
Development Tools• Each development phase utilizes specific tools. Design phase development tool depends mainly on the
output format, if your design is a schematic capture, then the design entry must support schematic capture.• Cost: Fees can be very expensive, (license fees, yearly maintenance, know your needs!)• Design sharing: Have a set of tools to manage and control the design and its revisions.• Complete or Standalone: Manufacturers offer a complete development tools (Altera’s Quartus, Xilinx ISE.
Standalone tools performs single function, such as synthesis or simulation: example: Mentor Graphics' ModelSim and Synopsys’s Synplify for design synthesis
127
Slide -128
Advanced VHDL – Design Phase• Writing synthesizable VHDL• Inferring common logic functions• Coding state machines• Improving logic utilization & performance• Writing parameterized code
Slide -129
Simulation vs. Synthesis• Simulation
– Code executed in the exact way it is written– User has flexibility in writing– Initialization of logic supported
• Synthesis– Code is interpreted & hardware created
• Knowledge of PLD architecture is important– Synthesis tools require certain coding to generate correct logic
• Subset of VHDL language supported• Coding style is important for fast & efficient logic
– Initialization controlled by device– Logic implementation can be adjusted to support initialization
• Pre- & post-synthesis logic should operate the same
Slide -130
Writing Synthesizable VHDL• Synthesizable VHDL Constructs• Sensitivity lists• Latches vs. registers• IF-THEN-ELSE structures• CASE statements• Variables• Synthesizable subprograms• Combinatorial loops• Gated clocks
Slide -131
• ENTITY• ARCHITECTURE• CONFIGURATION• PACKAGE• Concurrent signal assignments• PROCESS• SIGNAL• VARIABLE (non-shared)• CONSTANT• IF-ELSE• CASE• Loops (fixed iteration)• Multi-dimensional arrays• PORT• GENERIC (constant)• COMPONENT
– Synthesis tools may place certain restrictions on supported constructs
– See the online help in Quartus II (or your target synthesis tool) for a complete list
• Component & direct instantiation• GENERATE• FUNCTION• PROCEDURE• ASSERT (constant false)• WAIT (one per process)• TYPE• SUBTYPE
Some Synthesizable VHDL Constructs
Slide -132
• ACCESS• ASSERT• DISCONNECT• FILE• GROUP• NEW• Physical delay types• PROTECTED• SHARED VARIABLE• Signal assignment delays
– These are some of the constructs not supported by Quartus II synthesis
– See the online help in Quartus II (or your target synthesis tool) for a complete list
Some Non-Synthesizable VHDL Constructs
Slide -133
a
b
sel
c
CLRNENA
D Qd
clk
clr
q
Sensitivity list includes all inputs used In the combinatorial logic
Sensitivity list does not include the d input, only the clock or/and control signals
• Sequential PROCESS– Sensitive to a clock and
control signals• Example PROCESS (clr, clk)
• Combinatorial PROCESS– Sensitive to all signals used on right-
hand side of assignment statements• Example
PROCESS (a, b, sel)
Two Types of RTL PROCESS Statements
Slide -134
Sensitivity Lists
• Incomplete sensitivity list in combinatorial PROCESS blocks may result in differences between RTL & gate-level simulations– Synthesis tool synthesizes as if sensitivity list complete
PROCESS (a, b)y <= a AND b AND c;
Incorrect Way – the simulated behavior is not that of the synthesized 3-input AND gate
Correct way for the intended AND logic !PROCESS (a, b, c)y <= a AND b AND c;
Slide - 135
Common Pitfall – Missing Inputs from Sensitivity List
• Pitfall – Missing inputs from sensitivity list when describing combinational behavior– Results in sequential behavior– Wrong 4x1 mux example
• Has memory• No compiler error
– Just not a mux
LIBRARY ieee;USE ieee.std_logic_1164.ALL;
ENTITY Mux4 IS PORT (i3, i2, i1, i0: IN std_logic; s1, s0: IN std_logic; d: OUT std_logic);END Mux4;
ARCHITECTURE Beh OF Mux4 ISBEGIN -- Note: missing i3, i2, i1, i0 PROCESS(s1, s0) BEGIN IF (s1='0' AND s0='0') THEN d <= i0; ELSIF (s1='0' AND s0='1') THEN d <= i1; ELSIF (s1='1' AND s0='0') THEN d <= i2; ELSE d <= i3; END IF; END PROCESS;END Beh;
d
s1
s0
i3i1
Missing i3-i0 from sensitivity list
Recomputes d if s1 or s0 changes
Fails to recompute d if i3 (or i2-i0) changesReminder
• Combinational behavior: Output value is purely a function of the present input values
• Sequential behavior: Output value is a function of present and past input values, i.e., the system has memory
Slide -136
Latches vs. Registers• Altera devices have registers in logic elements, not
latches• Latches are implemented using combinatorial logic &
can make timing analysis more complicated– Look-up table (LUT) devices use LUTs in combinatorial loops – Product-term devices use more product-terms
• Recommendations – Design with registers (RTL)– Watch out for inferred latches
• Latches inferred on combinatorial outputs when results not specified for set of input conditions
• Lead to simulation/synthesis mismatches
Slide -137
IF-ELSE Structure• IF-ELSE (like WHEN-ELSE concurrent assignment) structure
implies prioritization & dependency– Nth clause implies all N-1 previous clauses not true
• Beware of needlessly “ballooning” logic
– Consider restructuring IF statements• May flatten the multiplexer and reduce logic
• If sequential statements are mutually exclusive, individual IF structures may be more efficient
IF <cond1> THENIF <cond2> THEN IF <cond1> AND <cond2>
THEN
(<cond1> • A) + (<cond1>’ • <cond2> • B) + (<cond1>’ • <cond2>’ • cond3 • C) + …Logical Equation
Slide -138
• Cover all cases– Uncovered cases in combinatorial processes result in
latches• For efficiency, consider
– Using don’t cares (‘-’ or ‘X’) for final ELSE clause (avoiding unnecessary default conditions)• Synthesis tool has freedom to encode don’t cares
for maximum optimization– Assigning initial values and explicitly covering
only those results different from initial values
When Writing IF-ELSE Structures…
Slide -139
Unwanted Latches• Combinatorial processes that do not cover all possible input
conditions generate latches
PROCESS (sel, a, b, c)BEGIN
IF sel = “001” THENoutput <= a;
ELSIF sel = “010” THENoutput <= b;
ELSIF sel = “100” THENoutput <= c;
END IF;END PROCESS;
sel(2)
LOGICLATCH
outputsel(1)
A
sel(0)
BC
Slide -140
Unwanted Latches Removed• Close all IF-ELSE structures
– If possible, assign “don’t care’s” to else clause for improved logic optimizationPROCESS (sel, a, b, c)BEGIN
IF sel = “001” THENoutput <= a;
ELSIF sel = “010” THENoutput <= b;
ELSIF sel = “100” THENoutput <= c;
ELSEoutput <=
(OTHERS => ‘X’);END IF;
END PROCESS;
sel(2)
LOGIC
outputsel(1)
A
sel(0)
BC
Slide - 141
Common Pitfall – Output not Assigned on Every Pass
• Pitfall – Failing to assign every output on every pass through the process for combinational behavior– Results in sequential behavior
• Referred to as inferred latch– Wrong 2x4 decoder example
• Has memory• No compiler error
– Just not a decoder
LIBRARY ieee;USE ieee.std_logic_1164.ALL;
ENTITY Dcd2x4 IS PORT (i1, i0: IN std_logic; d3, d2, d1, d0: OUT std_logic);END Dcd2x4;
ARCHITECTURE Beh OF Dcd2x4 ISBEGIN PROCESS(i1, i0) BEGIN IF (i1='0' AND i0='0') THEN d3 <= '0'; d2 <= '0'; d1 <= '0'; d0 <= '1'; ELSIF (i1='0' AND i0='1') THEN d3 <= '0'; d2 <= '0'; d1 <= '1'; d0 <= '0'; ELSIF (i1='1' AND i0='0') THEN d3 <= '0'; d2 <= '1'; d1 <= '0'; d0 <= '0'; ELSIF (i1='1' AND i0='1') THEN d3 <= '1'; END IF; -- Note: missing assignments -- to all outputs in last ELSIF END PROCESS;END Beh;
d3
d2
i0i1
i1i0=10 d2=1, others=0
i1i0=11 d3=1,but d2 stays same
Missing assignments to outputs d2, d1, d0
Slide - 142
Common Pitfall – Output not Assigned on Every Pass• Same pitfall often occurs due to not
considering all possible input combinations
PROCESS(i1, i0)BEGIN IF (i1='0' AND i0='0') THEN d3 <= '0'; d2 <= '0'; d1 <= '0'; d0 <= '1'; ELSIF (i1='0' AND i0='1') THEN d3 <= '0'; d2 <= '0'; d1 <= '1'; d0 <= '0'; ELSIF (i1='1' AND i0='0') THEN d3 <= '0'; d2 <= '1'; d1 <= '0'; d0 <= '0'; END IF;END PROCESS;
Last "ELSE" missing, so not all input combinations are covered (i.e., i1i0=11 not covered) – no
update to the outputs
Slide -143
sel(2)
sel(2)
LOGIC LATCHXsel(1)
A
sel(0)
• Beware of building unnecessary dependencies– e.g. Outputs x, y, z are mutually exclusive, IF-ELSIF causes all outputs to be
dependant on all tests & creates latches
PROCESS (sel,a,b,c)BEGIN
IF sel = “010” THENx <= a;
ELSIF sel = “100” THENy <= b;
ELSIF sel = “001” THENz <= c;
ELSEx <= ‘0’;y <= ‘0’;z <= ‘0’;
END IF;END PROCESS;
sel(2)
LOGIC LATCHYsel(1)
B
sel(0)
LOGIC LATCHZsel(1)
C
sel(0)
Mutually Exclusive IF-ELSE Latches
Slide -144
sel(0)
sel(0)
sel(0)
• Separate IF statements and closePROCESS (sel, a, b, c)BEGIN
IF sel = “010” THENx <= a;
ELSE x <= ‘0’;
END IF;IF sel = “100” THEN
y <= b;ELSE
y <= ‘0’;END IF;IF sel = “001” THEN
z <= c;ELSE
z <= ‘0’;END IF;
END PROCESS;
LOGICXsel(1)
sel(2)A
LOGICYsel(1)
sel(2)B
LOGICZsel(1)
sel(2)C
PROCESS (sel, a, b, c)BEGIN
x <= ‘0’;y <= ‘0’;z <= ‘0’;IF sel = “010” THEN
x <= a;END IF;IF sel = “100” THEN
y <= b;END IF;IF sel = “001” THEN
z <= c;END IF;
END PROCESS;
Mutually Exclusive Latches Removed
Slide -145
• Use nested IF statements with care– e.g. These nested IF statements do not cover all possible conditions (open IF
statements) & latch is created
PROCESS (ina, inb)BEGIN
IF ina = '1' THENIF inb = '1' THEN
y <= '1';END IF;
ELSEy <= '0';
END IF;END PROCESS;
ina
inby
ina inb out1 1 1 0 0 00 1 01 0 ?
Uncovered cases infer latches No default value for objects
Nested IF Generating Unwanted Latches
Slide -146
PROCESS (ina, inb)BEGIN
y <= ‘0’;IF ina = '1' THEN
IF inb = '1' THENy <= '1';
END IF;END IF;
END PROCESS;
ina inb out1 1 1 0 0 00 1 01 0 0
inaoutinb
Using initialization to cover all cases; no latch inferred
Nested IF – Unwanted Latches Removed
Slide -147
Case Statements• Case statements usually synthesize more
efficiently when mutual exclusivity exists• Define outputs for all cases
– Undefined outputs for any given case generate latches
• VHDL already requires all case conditions be covered– Use WHEN OTHERS clause to close
undefined cases (if any remain)
Slide -148
Case Statement Recommendations• Initialize all case outputs or ensure outputs
assigned in each case • Assign initialized or default values to don’t
cares (X) for further optimization, if logic allows
Slide -149
• Conditions where output is undeterminedoutput: PROCESS (filter)BEGIN
CASE filter ISWHEN idle =>
nxt <= '0';first <= '0';
WHEN tap1 =>sel <= "00";first <= '1';
WHEN tap2 =>sel <= "01";first <= '0';
WHEN tap3 =>sel <= "10";
WHEN tap4 =>sel <= "11";nxt <= '1';
END CASE; END PROCESS output;
sel missing
nxt missing
nxt missing
nxt & first missing
first missing
– Undetermined output conditions implies memory
– Latch generated for ALL 3 outputs
Unwanted Latches - Case Statements
Slide -150
• Conditions where output is determined
output: PROCESS(filter)BEGIN
first <= ‘0’;nxt <= ‘0’;sel <= “00”;CASE filter IS
WHEN idle => WHEN tap1 =>
first <= '1';WHEN tap2 =>
sel <= "01";WHEN tap3 =>
sel <= "10";WHEN tap4 =>
sel <= "11";nxt <= '1';
END CASE; END PROCESS output;
Signals Initialized
To remove latches & ensure outputs are never undetermined
– Use signal initialization at beginning of case statement (case statement only deals with changes)
– Use don’t cares (‘-’) for WHEN OTHERS clause, if design allows (for better logic optimization)
– Manually set output in each case
Latches Removed - Case Statements
Slide -151
Variable Declarations• Variables are declared inside a process• Variables are represented by: :=• Variable declaration
VARIABLE <name> : <DATA_TYPE> := <value>;Variable temp : STD_LOGIC_VECTOR (7 DOWNTO 0);
• Variable assignments are updated immediately– Do not incur a delay
No Delay
Temporary storage
Slide -152
Assigning Values to Variables• Variable assignments are represented by :=• Examples
– All bitstemp := “10101010”; temp := x”aa” ; (1076-1993)
– VHDL also supports ‘o’ for octal and ‘b’ for binary– Bit-slicing
temp (7 DOWNTO 4) := “1010”; – Single bit
temp(7) := ‘1’;
• Use double-quotes (“ “) to assign multi-bit values and single-quotes (‘ ‘) to assign single-bit values
VARIABLE temp : STD_LOGIC_VECTOR (7 DOWNTO 0);
Slide -153
LIBRARY IEEE;USE IEEE.STD_LOGIC_1164.ALL;
ENTITY var ISPORT (
a, b : IN STD_LOGIC;y : OUT STD_LOGIC
);END ENTITY var;
ARCHITECTURE logic OF var ISBEGIN
PROCESS (a, b)VARIABLE c : STD_LOGIC;
BEGINc := a AND b;
y <= c;END PROCESS;
END ARCHITECTURE logic;
Variable declaration
Variable assignment
Variable is assigned to a signal to synthesize to a piece of hardware
Variable AssignmentVariable c updated immediately and new value is available for assigning to y
Slide -154
ARCHITECTURE
label1: PROCESS {VARIABLE Declarations}
label2: PROCESS {VARIABLE Declarations}
{SIGNAL declarations}Declared outside of the process statements (Visible to all process statements)
Declared inside the PROCESS statements (locally visible to the process statements)
Signal and Variable Scope
Slide -155
Signals vs. Variables
Signals (<=) Variables (:=)
Assign assignee <= assignment assignee := assignment
Utility Represent circuit interconnect Represent local storage
ScopeArchitecture scope
(communicate between processes within architecture)
Local Scope(inside processes)
BehaviorUpdated at end of current delta cycle
(new value not immediately available)
Updated immediately
Slide -156
Variables• May synthesize to hardware depending on use• Advantages vs. signals
– Variables are a more behavioral construct as they don’t have a direct correlation to hardware (like signals) and may lead to more efficient logic
– Simulate more efficiently as they require less memory• Signals not updated immediately, so simulator must store two values
(current and next value) for every changing signal• Variables updated immediately, so simulator stores single value
• Disadvantages vs. signals– Must be assigned to signal before process ends
• Do not represent physical hardware unless equated with signal– Must be handled with care
• Requires fully understand assigning values to variables and signals in same process and how dataflow is effected
Slide -157
Variables & Latches (Recommendations)• Assign an initial value or signal to a
variable unless feedback is desired• If a variable is not assigned an initial value
or signal in a combinatorial process, a latch will be generated– This could cause your design to not function as
intended
Slide -158
ARCHITECTURE logic OF cmb_vari ISBEGIN
PROCESS(i0, i1, a)VARIABLE val :
INTEGER RANGE 0 TO 1;BEGIN
IF (a = '0') THENval := val;
ELSE val := val + 1;
END IF;
CASE val IS WHEN 0 =>
q <= i0;WHEN OTHERS =>
q <= i1;END CASE;
END PROCESS;END ARCHITECTURE logic;
Variable used without initialization
Variable Uninitialized
a
case (val)…;
0
1
Slide -159
ARCHITECTURE logic OF cmb_vari ISBEGIN
PROCESS(i0, i1, a)VARIABLE val :
INTEGER RANGE 0 TO 1;BEGIN
val := 0;IF (a = '0') THEN
val := val;ELSE
val := val + 1;END IF;
CASE val IS
WHEN 0 =>q <= i0;
WHEN OTHERS =>q <= i1;
END CASE;END PROCESS;
END ARCHITECTURE logic;
Assign initial value or signal to variable
Assign Initial Value to Variable
a
case (val)…;
Slide -160
Subprograms• VHDL has 2 subprograms
– FUNCTION• Performs calculation and returns value
– PROCEDURE• Performs sequence of defined sequential statements
• Uses– Replacing repetitive code– Enhancing readability – Break processes into executable sections
• Defined by means of subprogram declaration (optional) and subprogram body– Subprogram declarations required if subprogram is called before subprogram body is read
• Consist of sequential statements (like a process)• May be declared in process, architecture or package
– Determines visibility– When placed in package, subprogram declaration goes in package declaration and
subprogram body goes in package body (see earlier package example)• Synthesis places restrictions on use of subprograms
Slide -
PROCEDURE
FUNCTION
ARCHITECTURE
PARAMETERS
IN PARAMETERS
RETURN VALUE
OUT PARAMETERS
INOUT PARAMETERS
Subprogram Diagram
161
Slide -162
FUNCTION ones_count(SIGNAL a : STD_LOGIC_VECTOR)
ISVARIABLE r : INTEGER;
BEGIN r := 0;
FOR i IN a’RANGE LOOPIF a(i) /= ’0’ THEN
r := r + 1 ;END IF;
END LOOP;RETURN r; -- Required
END FUNCTION ones_count;
Function Definition & Call
• Must return a single value based on zero or more inputs
• Must be called in an expression
• Can be passed classes CONSTANT (default), SIGNAL or FILE
• Class for internal objects must be VARIABLE
total_ones <= ones_count (input) WHEN test_ones = ‘1’;
Function Body
Invoking a Function
FUNCTION ones_count (SIGNAL a : STD_LOGIC_VECTOR) RETURN VARIABLE;
Function Declaration
Note: ‘RANGE is a VHDL attribute which returns the range of the object it is applied to (e.g. 7 DOWNTO 0)
Slide -163
Procedure Definition & Call
• May have inputs, inouts and outputs
• May return zero or multiple outputs• Must be called as a separate
sequential statement• Parameters may be any class
– Inputs are CONSTANT by default– Outputs/inouts are VARIABLE by
default
PROCEDURE incr_comp (SIGNAL cnt_sig : INOUT
STD_LOGIC_VECTOR;CONSTANT max : IN INTEGER;SIGNAL maxed_out : OUT BOOLEAN) IS-- declare any local objects (i.e. constants,-- variables,…)
BEGINIF cnt_sig >= max THEN
maxed_out <= TRUE;ELSE
maxed_out <= FALSE;cnt_sig <= cnt_sig + 1;
END IF;END PROCEDURE incr_comp;
incr_comp (err_cnt, 12, err_cnt_maxed);incr_comp (code_cnt, 144, code_cnt_maxed);
Invoking a Procedure
Procedure Declaration
PROCEDURE incr_comp (SIGNAL cnt_sig : INOUT
STD_LOGIC_VECTOR;CONSTANT max : IN INTEGER;SIGNAL maxed_out : OUT BOOLEAN
);
Procedure Declaration
Slide -164
Functions vs. Procedures
• Always execute in zero time– Cannot pause their execution– Can not contain any delay, event,
or timing control statements• Must have at least one input
argument– Inputs may not be affected by
function• Arguments may not be outputs
and inouts• Always return a single value
• May execute in non-zero simulation time– May contain delay, event, or timing
control statements• May have zero or more input,
output, or inout arguments• Modify zero or more values• Return values by means of
parameter arguments
Functions Procedures
Slide -165
Synthesizable Subprograms• Make code more readable/reusable• Two types
– Functions• Synthesize to combinatorial logic
– Procedures• Can synthesize to combinatorial or sequential logic
– Signal assignments in procedures called from clocked processes generate registers
– May test for clock edges» May not be supported by all synthesis tools
• Must not contain WAIT statements
• Each call generates a separate block of logic– No logic sharing– Implement manual resource sharing, if possible (discussed later)
Slide -166
Combinational Loops• Common cause of instability• Behavior of loop depends on the
relative propagation delays through logic– Propagation delays can change
• Simulation tools may not match hardware behavior
CLRNENA
D Qd
clkq
Logic
PROCESS (clk, clrn)BEGIN
IF clrn = ‘0’ THENq <= 0;
ELSIF rising_edge (clk) THENq <= d;
END IF;END PROCESS;
clrn <= (ctrl1 XOR ctrl2) AND q;
Slide -167
Combinational Loops• All feedback loops should
include registers
CLRNENA
D Qd
clkq
Logic
clrn
CLRNENA
D Q
PROCESS (clk, clrn)BEGIN
IF clrn = ‘0’ THENq <= 0;
ELSIF rising_edge (clk)q <= d;
END IF;END PROCESS;
PROCESS (clk)BEGIN
IF rising_edge (clk) THENclrn <= (ctrl1 XOR ctrl2) AND q;
END IF;END PROCESS;
Slide -168
Gated Clocks• Can lead to both functional and timing problems
– Clock behavior subject to both synthesis and placement & routing– Can be a source of additional clock skew – Glitches on clock path possible
• Recommendations: – Use clock enables for clock gating functionality– Use dedicated device resources (e.g. clock control blocks) to gate
clocks synchronously and reduce power– If you must build your own gating logic
• Use a synchronous gating structure• Ensure global clock routing is used for clock signal• Gate the clock at the source
Slide -169
Gated Clock Examplesg_clk <= gate AND clk;
PROCESS (g_clk, clrn)BEGIN
IF clrn = ‘0’ THENq <= ‘0’;
ELSIF rising_edge(g_clk) THENq <= d;
END IF;END PROCESS;
PROCESS (clk)BEGIN
IF falling_edge (clk) THENsgate <= gate;
END IF;END PROCESS;
g_clk <= sgate AND clk;
PROCESS (g_clk, clrn)BEGIN
IF clrn = ‘0’ THENq <= ‘0’;
ELSIF rising_edge (g_clk) THEN q <= d;
END IF;END PROCESS;
Poor clock gating – Active clock edges occurring near gate signal changes may result in glitches
Better clock gating – Gate signal clocked by falling edge clk, so gate may only change on inactive clock edge (Use OR gate when falling edge is the active clock edge)
Slide -170
How Many Registers?LIBRARY IEEE;USE IEEE.STD_LOGIC_1164.ALL;
ENTITY reg1 ISPORT (
d : IN STD_LOGIC;clk : IN STD_LOGIC;q : OUT STD_LOGIC
);END ENTITY reg1;
ARCHITECTURE logic OF reg1 ISSIGNAL a, b : STD_LOGIC;
BEGINPROCESS (clk)BEGIN
IF rising_edge (clk) THENa <= d;b <= a;q <= b;
END IF;END PROCESS;
END ARCHITECTURE reg1;
Slide -171
CLRNENA
D Q
clk
qb
CLRNENA
D Q
clkCLRN
ENA
D Qd
clk
a
How Many Registers?• Signal assignments inside the IF-THEN
statement that checks the clock condition infer registers
Slide -172
Signal Assignment Moved
How Many Registers?LIBRARY IEEE;USE IEEE.STD_LOGIC_1164.ALL;
ENTITY reg2 ISPORT (
d : IN STD_LOGIC;clk : IN STD_LOGIC;q : OUT STD_LOGIC
);END ENTITY reg2;
ARCHITECTURE logic OF reg2 ISSIGNAL a, b : STD_LOGIC;
BEGINPROCESS (clk)BEGIN
IF rising_edge (clk) THENa <= d;b <= a;
END IF;END PROCESS;q <= b;
END ARCHITECTURE reg1;
Slide -173
• Signal b to signal q assignment is no longer edge-sensitive because it is not inside the if-then statement that checks the clock condition
q
CLRNENA
D Q
clk
CLRNENA
D Qd
clk
a
How Many Registers?
Slide -174
Signals changed to variables
How Many Registers?LIBRARY IEEE;USE IEEE.STD_LOGIC_1164.ALL;
ENTITY reg3 ISPORT (
d : IN STD_LOGIC;clk : IN STD_LOGIC;q : OUT STD_LOGIC
);END ENTITY reg3;
ARCHITECTURE logic OF reg3 ISBEGIN
PROCESS (clk)VARIABLE a, b :
STD_LOGIC;BEGIN
IF rising_edge (clk) THENa := d;b := a;q <= b;
END IF;END PROCESS;
END ARCHITECTURE reg1;
Slide -175
• Variable assignments are updated immediately
• Signal assignments are updated on clock edge
CLRNENA
D Qd
clk
q
How Many Registers?
Slide -176
Inferring Logic Functions• Using behavioral modeling to describe
logic blocks • Synthesis tools recognize description &
insert equivalent logic functions (e.g. megafunctions)– Functions typically pre-optimized for utilization or
performance over general purpose functionally equivalent logic
– Use synthesis tool’s templates (if available) as starting point
– Use synthesis tool’s graphic display to verify logic recognition
• Makes code vendor-independent
Slide -177
Logic Inference ExamplePROCESS (clock)BEGIN
IF rising_edge (clock) THENIF wren = ‘1’ THEN
mem(conv_integer(address) <= data;END IF;q <= mem(conv_integer(address);
END IF;END PROCESS;
Altera megafunction and/or library cells
Synthesis tool sees
Replaces with
Slide -178
Quartus II VHDL Templates
178
Insert Template (Edit menu)
Preview window: edit before inserting & save
as user template
Slide -179
Quartus II Software RTL Viewer• Graphically represents results of synthesis
Schematic View
Hierarchy ListHierarchy List
Toolbar
Starting RTL Viewer1. Run Analysis & Elaboration (Processing
menu or Task window) • Any processing that performs elaboration
2. Open RTL Viewer (Tools menu or Tasks window)
• Displays last successful analysis
Slide -180
Inferring Common Functions• Latches• Registers• Counters• Tri-states• Memory
Slide -181
Latch Inference – “Wanted” Latch
sensitivity list includes both inputs
LIBRARY IEEE;USE IEEE.std_logic_1164.ALL;
ENTITY latch ISPORT (
data : IN std_logic;gate : IN std_logic;q : OUT std_logic);
END ENTITY latch;
ARCHITECTURE behavior OF latch ISBEGIN
label_1: PROCESS (data, gate)BEGINIF gate = '1' THEN
q <= data;END IF;
END PROCESS label_1;
END ARCHITECTURE behavior;What happens if gate = ‘0’?
Implicit memory & feedback
level sensitive…not edge
Latch in RTL Viewer
Latch in Technology Viewer
Slide -182
DFF Using rising_edge Function
rising_edge – IEEE function that is defined in
the std_logic_1164 package
– specifies that the signal value must be 0 to 1
– X, Z to 1 transition is not allowed
CLRNENA
D Qd
clk
qLIBRARY IEEE;USE IEEE.std_logic_1164.ALL;
ENTITY dff_b ISPORT (
d : IN std_logic;clk : IN std_logic;q : OUT std_logic);
END ENTITY dff_b;
ARCHITECTURE behavior OF dff_b ISBEGINPROCESS(clk)
BEGINIF rising_edge(clk) THEN
q <= d;END IF;
END PROCESS;END ARCHITECTURE behavior;
Slide -183
DFF Using clk’event and clk=‘1’
clk’event and clk=‘1’– clk is the signal name (any name)– ‘event is a VHDL attribute,
specifying that there needs to be a change in signal
value– clk=‘1’ means positive-edge
triggered
CLRNENA
D Qd
clk
qLIBRARY IEEE;USE IEEE.STD_LOGIC_1164.ALL;
ENTITY dff_a ISPORT (
d : in std_logic;clk : in std_logic;q : out std_logic);
END ENTITY dff_a;
ARCHITECTURE behavior OF dff_a ISBEGINPROCESS (clk)
BEGINIF (clk'event and clk = '1’)
THENq <= d;
END IF;END PROCESS;END ARCHITECTURE behavior;
Slide -184
Recommended DFF Inference• Use the rising_edge function for
consistent simulation– ‘X’ to ‘1’ transitions trigger the DFF when clk’event
and clk=‘1’ is used, but not when rising_edge is used• Both clk’event and clk=‘1’ & rising_edge
produce the same synthesis• Must use std_logic_1164 package for
rising_edge or falling_edge functions
Slide -185
Secondary Control Signals• Register control signals vary between
FPGA & CPLD families– Clear, preset, load, clock enable, etc.
• Avoid using signals not available in architecture– Functionality of design supported by creating
extra logic cells– Less efficient, possibly slower results
Slide -186
ARCHITECTURE behavior OF dff ISBEGIN
PROCESS(clk, aclr, apre, aload, adata)BEGIN
IF aclr = ‘1' THENq <= '0';
ELSIF apre = ‘1’ THENq <= ‘1’;
ELSIF aload = ‘1’ THENq <= adata;
ELSIF rising_edge(clk) THEN
IF ena = ‘1’ THENIF sclr
= ‘1’ THEN
q <= ‘0’;ELSIF
sload = ‘1’ THEN
q <= sdata;ELSE
q <= d;END IF;
END IF;END IF;
END PROCESS;END ARCHITECTURE behavior;
– This is how to implement all asynchronous and synchronous control signals for the Altera PLD registers– Conditions outside of the rising_edge
statement are asynchronous– Conditions inside of the rising_edge
statement are synchronous– Remove signals not required by your
logic– Synchronous controls are not included
in sensitivity list
DFF with Secondary Control Signals
Slide -187
Incorrect Control Signal PriorityARCHITECTURE behavior OF dff_clr ISBEGIN
PROCESS(clk)BEGIN
IF rising_edge(clk) THEN
IF sclr = ‘1’ THEN
q <= ‘0’;
ELSIF ena = ‘1’ THEN
q <= d;
END IF;END IF;
END PROCESS;END ARCHITECTURE behavior;
– 2 control signals– Considerations
– Do the registers in the hardware have both ports available?
– How does hardware behave? Does clear or enable have priority?
– Sync clear has priority enable over in code– Enable has priority over sync clear in silicon– Additional logic needed to force code priority
Slide -188
Control Signals Priority1. Asynchronous clear (aclr)2. Asynchronous preset (pre)3. Asynchronous load (aload)4. Enable (ena)5. Synchronous clear (sclr)6. Synchronous load (sload)
• Same for all Altera FPGA families– All signals not supported by all families
• Re-ordering generates extra logic
Slide -189
Incorrect Control LogicPROCESS (clk, clr_n)BEGIN
IF clr_n = '0' THENx <= '0';
ELSIF rising_edge(clk) THEN
x <= a;y <= b;
END IF;END PROCESS;– y is not included in clr_n
condition– What is the behaviour
specified for y when clr_n is asserted?– While clr_n clears x, it acts
like an enable for y
CLRNENA
D Qa
clk
clr_n
x
CLRNENA
D Qb
clk
y
clr_n
Slide -190
DFF with Clock EnableARCHITECTURE behavior OF dff_all IS
SIGNAL ena : std_logic;BEGIN
PROCESS (clk, clr_n)BEGIN
IF clr_n = '0' THENq <= '0';
ELSIF rising_edge(clk) THEN
IF ena = '1' THEN
q <= d;
END IF;END IF;
END PROCESS;
ena <= (ena_a OR ena_b) XOR ena_c;
END ARCHITECTURE behavior;
CLRNENA
D Qd
enaclr_n
q
clk
– To ensure that this is synthesised using DFFE primitives (DFF with enable)– Place the enable statement directly
after the rising edge statement– Place enable expressions in separate
process or assignment– If the synthesis tool does not recognize
this as an enable it will be implemented using extra LUTs
Slide -191
Shift RegistersARCHITECTURE behavior OF shift IS
SIGNAL qi : STD_LOGIC_VECTOR (7
DOWNTO 0);BEGIN
PROCESS (clock, aclr)BEGIN
IF aclr = ‘1' THENqi <= (OTHERS => '0‘);
ELSIF rising_edge(clock) THENIF enable = '1' THEN
qi <= qi (6 DOWNTO 0) & shiftin;
END IF;END IF;
END PROCESS;q <= qi;END ARCHITECTURE behavior;
– Shift register with parallel output, serial input, asynchronous clear and enable which shifts left
– Add or remove secondary controls similar to DFF
Shift function (& = Concatenation)
Slide -192
Basic CounterPROCESS (clock, aclr)
VARIABLE cnt : std_logic_vector (7
DOWNTO 0);BEGIN
IF aclr = ‘1' THENcnt := (OTHERS =>
'0‘);ELSIF rising_edge(clock) THEN
cnt := cnt + 1;END IF;
q <= cnt;END PROCESS;
– Binary up counter with asynchronous clear
– Add or remove secondary controls similar to DFF
Count function
Note: These examples use the VARIABLE class as the count variable but a SIGNAL could have been used just as easily
Slide -193
Counter Using IntegersPROCESS (clock, aclr)
VARIABLE cnt : INTEGER RANGE 0 TO
255;BEGIN
IF aclr = ‘1’ THENcnt := 0;
ELSIF rising_edge(clock) THENIF cnt = 255 THEN
cnt := 0;ELSE
cnt := cnt + 1;END IF;
END IF; q <= conv_std_logic_vector(cnt,8);END PROCESS;
– Range determines bit width for counter– If range is left out, counter will
default to at least 32 bits– Must manually account for rollover
– No automatic rollover for integers (unlike std_logic)
– If missing, code causes end of range errors in simulation (synthesizes correctly)
• conv_std_logic_vector(<integer_name_or_value>, <bus_width>) converts integer to std_logic
• Found in std_logic_arith package
Slide -194
Up / Down CounterPROCESS (clock,aclr)
VARIABLE cnt : std_logic_vector(7 DOWNTO 0);VARIABLE direction : integer RANGE -1 TO 1;BEGIN
IF aclr = ‘1’ THENcnt := (OTHERS => '0‘);
ELSIF rising_edge(clock) THEN IF updown = ‘1’ THEN
direction := 1;ELSE
direction := -1;END IF;cnt := cnt + direction;
END IF; q <= cnt;
END PROCESS;
Slide -195
PROCESS (clock,aclr) VARIABLE cnt : std_logic_vector(7 DOWNTO 0);CONSTANT modulus : INTEGER := 200; BEGIN IF aclr = ‘1’ THEN
cnt := (OTHERS => '0‘); ELSIF rising_edge(clock) THEN
IF cnt = modulus-1 THEN cnt := (OTHERS => '0‘);
ELSE cnt := cnt + 1;
END IF; END IF; q <= cnt; END PROCESS;
Modulus 200 Counter
Slide -196
PROCESS (clock,aclr) VARIABLE cnt : INTEGER RANGE 0 TO 199;CONSTANT modulus : INTEGER := 200; BEGIN
IF aclr = ‘1’ THENcnt := 0;
ELSIF rising_edge(clock) THEN IF cnt = modulus-1 THEN
cnt := 0;ELSE
cnt := cnt + 1; END IF;
END IF; q <= conv_std_logic_vector(cnt,8);
END PROCESS;
– Cannot simply change range– Same logic if range was 0 to
255– Range used by synthesis tool to
define bit width; Does not build decode logic for synchronous reset
– Logic must be defined explicitly
Modulus 200 Counter Using Integers
Slide -197
Integers vs. Standard Logic Arrays
• Represent numbers only – Are more behavioral than standard
logic– Synthesis tools more free to
generate resulting logic– May generate less logic
• Integers use less storage space during processing– Simulate faster
• Always use RANGE to constrain integers for synthesis– Defaults to 32 bits
• Use for internal calculations and describing internal logic
• Represent an array of 9 signal values– Can be “sliced”– Are more structural than integers– Structure must be optimized down
into efficient logic• Can be set to bus widths wider
than 32 bits• Automatically roll over during
calculations• Use for I/O ports & data path
Integers Standard Logic Arrays
Slide -198
Tri-states• IEEE defines ‘Z’ value in STD_LOGIC package
– Simulation: Behaves like high-impedance state– Synthesis: Converted to tri-state buffers
• Altera devices have tri-state buffers only in I/O cells– Benefits:
• Eliminates possible bus contention• Location of internal logic is a non-issue• Cost savings
– Don’t pay for unused tri-state buffers– Less testing required of devices
– Internal tri-states must be converted to combinatorial logic– Complex output enable may cause errors or inefficient logic
Slide -199
Inferring Tri-states Correctly
ARCHITECTURE behavior OF tri2 ISBEGIN
driver1 : PROCESS (ena, in_sig) BEGIN
IF (ena=‘1’) THEN out_sig <= in_sig;
ELSE out_sig <= ‘Z’;
END IF;END PROCESS;END ARCHITECTURE behavior;
ARCHITECTURE behavior OF tri1 ISBEGIN
out_sig <= in_sig WHEN ena = ‘1’ ELSE ‘Z’;END ARCHITECTURE behavior;
Conditional Signal Assignment
Process Statement
– Only 1 Assignment to Output Variable
– Uses Tri-State Buffer in I/O Cell
Device
I/O Cells
ena
in_sig out_sig
Slide -200
Inferring Tri-states IncorrectlyARCHITECTURE behavior OF tri3 ISBEGIN
out_sig <= in_sig1 WHEN ena1 = ‘1’ ELSE ‘Z’;
out_sig <= in_sig2 WHEN ena2 = ‘1’ ELSE ‘Z’;END ARCHITECTURE behavior;
– 2 Assignments to Same Signal Not Allowed in Synthesis Unless ‘Z” Is Used
– Output Enable Logic Emulated in LEs– Simulation & Synthesis Do Not Match
I/O Cells
APEX II DeviceLogic
ena1ena2
in_sig1in_sig2
out_sig
Slide -201
Bidirectional PinsENTITY bidir_pin IS (
bidir : INOUT std_logic;oe, clk, from_core : IN std_logic;to_core : OUT std_logic;●●●
END ENTITY bidir_pin;
ARCHITECTURE behavior OF bidir_pin ISBEGIN
bidir <= from_core WHEN oe=‘1’ ELSE “Z”;
to_core <= bidir;●●●
END ARCHITECTURE behavior;bidir as an tri-stated output
bidir as an input
– Declare pin as direction INOUT– Use INOUT as both input & tri-
stated output– Input side always “on”– For registered bidirectional I/O,
use separate process to infer registers
Slide -202
Memory• Synthesis tools have different capabilities for recognizing memories• Synthesis tools are sensitive to certain coding styles in order to
recognize memories– Usually described in the tool documentation
• Tools and target devices may have limitations in architecture implementation– Synchronous inputs only– Limitations in clocking schemes– Memory size limitations– Read-during-write support
• Must declare an array data type to hold memory values
• Recommendation: Read Quartus II Handbook, Volume 1, Chapter 6 for more information on inferring memories and read during write behavior
Slide -203
ARCHITECTURE logic OF sp_ram IS
TYPE mem_type IS ARRAY (0 TO 63) OFstd_logic_vector
(7 DOWNTO 0);
SIGNAL mem: mem_type;
BEGIN
PROCESS (clock) BEGINIF rising_edge(clock) THEN
IF (wren = '1') THEN
mem(conv_integer(address)) <= data;END IF;
END IF;END PROCESS;
q <= mem(conv_integer(address));
END ARCHITECTURE logic;
Inferred Single-Port Memory (1)
– Code describes a 64 x 8 RAM with synchronous write & asynchronous read
– Cannot be implemented in Altera embedded RAM due to asynchronous read– Uses general logic and registers
– conv_integer is a function found in the std_logic_unsigned (or signed) package– Use TO_INTEGER if using
numeric_std package
Slide -204
ARCHITECTURE logic OF sp_ram IS
TYPE mem_type IS ARRAY (0 TO 63) OFstd_logic_vector (7
DOWNTO 0);
SIGNAL mem: mem_type;
BEGIN
PROCESS (clock) BEGINIF rising_edge(clock) THEN
IF (wren = '1') THEN
mem(conv_integer(address)) <= data;END IF;q <=
mem(conv_integer(address));END IF;
END PROCESS;
END ARCHITECTURE logic;
Inferred Single-Port Memory (2)
– Code describes a 64 x 8 RAM with synchronous write & synchronous read
– Old data read-during-write behaviour– Memory read in same
process/cycle as memory write– Check target architecture for
support as unsupported features built using LUTs/registers
Slide -205
ARCHITECTURE logic OF sp_ram IS
SUBTYPE byte IS std_logic_vector (7 DOWNTO 0);
TYPE mem_type IS ARRAY (0 TO 63) OF byte;SIGNAL mem: mem_type;
SIGNAL rdaddr_reg : byte;
BEGIN
PROCESS (clock) BEGINIF rising_edge(clock) THEN
IF (wren = '1') THEN
mem(conv_integer(address)) <= data;END IF;rdaddr_reg <= address;
END IF;END PROCESS;
q <= mem(conv_integer(rdaddr_reg));
END ARCHITECTURE logic;
Inferred Single-Port Memory (3)
– Same memory with new data read-during-write behaviour– Read performed by separate
concurrent statement/process– Check target architecture for
support– Use ramstyle attribute set to
“no_rw_check” to disable checking and prevent extra logic generation
Using subtype for vector width
Slide -206
ARCHITECTURE logic OF sdp_ram IS
TYPE mem_type IS ARRAY (63 DOWNTO 0) OFstd_logic_vector (7
DOWNTO 0);
SIGNAL mem: mem_type;
BEGIN
PROCESS (clock) BEGINIF rising_edge(clock) THEN
IF (wren = '1') THENmem(conv_integer(wraddress)) <=
data;END IF;q <= mem(conv_integer(rdaddress));
END IF;END PROCESS;
END ARCHITECTURE logic;
– Code describes a simple dual-port (separate read & write addresses) 64 x 8 RAM with single clock
– Code implies old data read-during-write behaviour– New data support in simple
dual-port requires additional RAM bypass logic
Simple Dual-Port, Single-Clock Memory
Slide -207
ARCHITECTURE logic OF dp_dc_ram ISTYPE mem_type IS ARRAY (63 DOWNTO 0) OF
std_logic_vector (7 DOWNTO 0);
SIGNAL mem: mem_type;SIGNAL addr_reg_a, addr_reg_b :
std_logic_vector (7 DOWNTO 0);BEGIN
PROCESS (clock_a) BEGINIF rising_edge(clock_a) THEN
IF (wren_a = '1') THENmem(conv_integer(address_a)) <=
data_a;END IF;addr_reg_a <= address_a;
END IF;q_a <= mem(conv_integer(addr_reg_a));
END PROCESS;
PROCESS (clock_b) BEGINIF rising_edge(clock_b) THEN
IF (wren_b = '1') THENmem(conv_integer(address_b)) <=
data_b;END IF;addr_reg_b <= address_b;
END IF;q_b <= mem(conv_integer(addr_reg_b));
END PROCESS;END ARCHITECTURE logic;
– Code describes a true dual-port (two individual addresses) 64 x 8 RAM
– May not be supported in all synthesis tools– New data same-port read-during-write
behaviour shown– Mixed port behaviour undefined with
multiple clocks
True Dual-Port, Dual-Clock Memory
Slide -208
Initializing Memory Contents Using Files
ARCHITECTURE logic OF sp_ram IS
TYPE mem_type IS ARRAY (0 TO 63) OFstd_logic_vector (7 DOWNTO 0);
SIGNAL mem: mem_type;ATTRIBUTE ram_init_file : STRING;ATTRIBUTE ram_init_file OF mem : SIGNAL IS
“init_file_name.hex”;
BEGIN
PROCESS (clock) BEGINIF rising_edge(clock) THEN
IF (we = '1') THEN
mem(conv_integer(address)) <= data;END IF;q <=
mem(conv_integer(address));END IF;
END PROCESS;
END ARCHITECTURE logic;
– Use VHDL attribute to assign initial contents to inferred memory
– Store initialization data as .HEX or .MIF
– Contents of initialization file downloaded into FPGA during configuration
Slide -209
Initializing Memory Using Default
ARCHITECTURE logic OF sp_ram IS
TYPE mem_type IS ARRAY (0 TO 63) OFstd_logic_vector
(7 DOWNTO 0);
FUNCTION init_ramRETURN mem_type IsVARIABLE mem_out : mem_type;
BEGINFOR I IN 0 TO 63 LOOP
mem_out(i) := conv_std_logic_vector(i, 8);
END FOR;RETURN mem_out;
END FUNCTION init_ram;
SIGNAL mem: mem_type := init_ram;
BEGIN
– Assign default value when declaring memory
– This example uses a function to establish memory values– Recommendation: Use when initializing
memory with patterned data– Can also use a constant (see ROM
example)– Recommendation: Use when initializing
memory with non-patterned data or single value (e.g. OTHERS => “11111111”;)
– MIF file automatically generated during synthesis due to initialization
Default initial value for memory
Loop used to assign each memory address
Slide -210
Unsupported Control Signals• e.g. Clearing RAM contents with reset
BEGIN
PROCESS (clock, reset) BEGIN
IF reset = ‘1’ THEN
mem(conv_integer(address)) <=
(OTHERS => ‘0’);
ELSIF rising_edge(clock) THENIF (we = '1') THEN
mem(conv_integer(address)) <= data;END IF;
END IF;END PROCESS;
q <= mem(conv_integer(address));
END ARCHITECTURE logic;
– Memory content cannot be cleared with reset
– Synthesizes to general logic resources
– Recommendations1. Avoid reset checking in
RAM read or write processes
2. Be wary of other control signals (i.e. clock enable) until validated with target architecture
Slide -211
SIGNAL q : std_logic_vector (6 DOWNTO 0);
BEGIN
PROCESS(clock)BEGIN
IF rising_edge(clock) THENCASE address IS
WHEN "0000" => q <= "0111111";
WHEN "0001" => q <= "0011000";
WHEN "0010" => q <= "1101101";
WHEN "0011" => q <= "1111100";
WHEN "0100" => q <= "1011010";
…WHEN "1101" => q
<= "1111001";WHEN "1110" => q
<= "1100111";WHEN "1111" => q
<= "1000111";WHEN OTHERS => q
<= "XXXXXXX";END CASE;
END IF;END process;
Inferred ROM (Case Statement)
– Automatically converted to ROM– Tools generate ROM
using embedded RAM & initialization file
– Requires constant explicitly defined for each choice in CASE statement
– May use romstyle synthesis attribute to control implementation
– Like RAMs, address or output must be registered to implement in Altera embedded RAM
Slide -212
ARCHITECTURE logic OF rom16x7 ISTYPE rom_type IS ARRAY (0 TO 15) OF
STD_LOGIC_VECTOR (6 DOWNTO 0);CONSTANT rom : rom_type :=
“0111111”,“0011000”,“1101101”,“1111100”,“1011010”,“1110110’,“1110111”,“0011100”,“1111111”,“1111110”,“1011111”,“1110011”,OTHERS => “0000000”
);
Inferred ROM (Constant)
– Needs 1 constant value for each ROM address
– Example shows dual-port access– May place type & constant
declaration in package for re-use– Alternate: Create and use
initialization function routine (see RAM example)
BEGINPROCESS (clock)BEGIN
IF rising_edge (clock) THENqa <=
rom(CONV_INTEGER(addr_a));qb <=
rom(CONV_INTEGER(addr_b));END IF;
END PROCESS;END ARCHITECTURE logic;
Slide -213
State Machine Coding• Enumerated data type is used to define the different states in the
state machine– Using constants for states may not be recognized as state machine
• One or two signals assigned to the name of the state-variable :
• Use CASE statement to do the next-state logic, instead of IF-THEN statement– Synthesis tools recognize CASE statements for implementing state
machines• Use CASE or IF-THEN-ELSE for output logic
TYPE state_type IS (idle, fill, heat_w, wash, drain);
SIGNAL current_state, next_state : state_type;
Slide -214
• Use to verify correct coding of state machine
Highlighting State in State Transition Table Highlights Corresponding State in State Flow Diagram
State Flow Diagram
State Transition/Encoding Table
Tools Menu State Machine Viewer Use Drop-Down to
Select State Machine
Quartus II Software State Machine Viewer
Slide -215
ENTITY wm ISPORT (
clk, reset, door_closed, full : in std_logic;
heat_demand, done, empty : in std_logic;
water, spin, heat, pump : out std_logic);END ENTITY wm;
ARCHITECTURE behave OF wm ISTYPE state_type IS
(idle, fill, heat_w, wash, drain);SIGNAL current_state, next_state :
state_type;BEGIN
State DeclarationIDLE
Water = 0 Spin = 0 Heat = 0 Pump = 0
FILL
Water = 1 Spin = 0 Heat = 0 Pump = 0
HEAT_W
Water = 0 Spin = 1 Heat = 1 Pump = 0
WASH
Water = 0 Spin = 1 Heat = 0 Pump = 0
DRAIN
Water = 0 Spin = 1 Heat = 0 Pump = 1
Door_closed = 1
Full = 1
Heat_demand = 0
Heat_demand = 1
Done = 1
Empty = 1
Slide -216
PROCESS (clk, reset)BEGINIF reset = ‘1’ THEN
current_state <= idle;ELSIF risting_edge(clk) THEN
current_state <= next_state;END IF;END PROCESS;
PROCESS (current_state, door_closed, full, heat_demand,
done, empty)BEGIN
next_state <= current_state;CASE current_state IS
WHEN idle =>IF door_closed
= ‘1’ THEN
next_state <= fill;WHEN fill =>
IF full = ‘1’ THEN
next_state <= heat_w;
Next State LogicIDLE
Water = 0 Spin = 0 Heat = 0 Pump = 0
FILL
Water = 1 Spin = 0 Heat = 0 Pump = 0
HEAT_W
Water = 0 Spin = 1 Heat = 1 Pump = 0
WASH
Water = 0 Spin = 1 Heat = 0 Pump = 0
DRAIN
Water = 0 Spin = 1 Heat = 0 Pump = 1
Door_closed = 1
Full = 1
Heat_demand = 0
Heat_demand = 1
Done = 1
Empty = 1
Sequential state transitions
Combinatorial next state logic
Default next state is current state
Slide -217
PROCESS (current_state)BEGIN
water <= ‘0’;spin <= ‘0’;heat <= ‘0’;pump <= ‘0’;CASE current_state IS
WHEN idle =>WHEN fill =>
water <= ‘1’;
WHEN heat_w =>spin <= ‘1’;heat <= ‘1’;
WHEN wash =>spin <= ‘1’;
WHEN drain =>spin <= ‘1’;pump <=
‘1’;END CASE;
END PROCESS;
Combinatorial Outputs
IDLE
Water = 0 Spin = 0 Heat = 0 Pump = 0
FILL
Water = 1 Spin = 0 Heat = 0 Pump = 0
HEAT_W
Water = 0 Spin = 1 Heat = 1 Pump = 0
WASH
Water = 0 Spin = 1 Heat = 0 Pump = 0
DRAIN
Water = 0 Spin = 1 Heat = 0 Pump = 1
Door_closed = 1
Full = 1
Heat_demand = 0
Heat_demand = 1
Done = 1
Empty = 1
Default output conditions
– Output logic function of current state only
Slide -218
State Machine Encoding StylesState Binary
EncodingGrey-Code Encoding
One-Hot Encoding
Custom Encoding
Idle 000 000 00001 ?
Fill 001 001 00010 ?
Heat_w 010 011 00100 ?
Wash 011 010 01000 ?
Drain 100 110 10000 ?
Quartus II default encoding styles for Altera devices- One-hot encoding for look-up table (LUT) devices
Architecture features lesser fan-in per cell and an abundance of registers - Binary (minimal bit) or grey-code encoding for product-term devices
Architecture features fewer registers and greater fan-in
Slide -219
Quartus II Encoding Style
Apply Assignment to State Variable
Options:• One-Hot• Gray• Minimal Bits• Sequential• User-Encoded• Johnson
Slide -220
Undefined States• Noise and spurious events in hardware can cause state
machines to enter undefined states• If state machines do not consider undefined states, it can
cause mysterious “lock-ups” in hardware • Good engineering practice is to consider these states• To account for undefined states
– Explicitly code for them (manual)– Use “safe” synthesis constraint (automatic)
Slide -221
TYPE state_type IS(idle, fill, heat_w, wash, drain);
SIGNAL current_state, next_state : state_type;
PROCESS (current_state, door_closed, full, heat_demand, done, empty)BEGIN
next_state <= current_state;CASE current_state is
WHEN idle =>IF door_closed = ‘1’ THEN
next_state <= fill;END IF;
WHEN fill =>IF full = ‘1’ THEN next_state <=
heat_w; END IF;
WHEN heat_w =>IF heat_demand = ‘0’ THEN
next_state <= wash;END IF;
WHEN wash =>IF heat_demand = ‘1’ THEN
next_state <= heat_w;ELSIF done = ‘1’ THEN next_state
<= drain;END IF;
WHEN drain =>IF empty = ‘1’ THEN next_state
<= idle;END IF;
WHEN others =>next_state <= idle;
END CASE;END PROCESS;
‘Safe’ Binary State Machine?IDLE
Water = 0 Spin = 0 Heat = 0
Pump = 0 FILL
Water = 1 Spin = 0 Heat = 0
Pump = 0
HEAT_W
Water = 0 Spin = 1 Heat = 1
Pump = 0
WASH
Water = 0 Spin = 1 Heat = 0
Pump = 0
DRAIN
Water = 0 Spin = 1 Heat = 0
Pump = 1
Door_closed = 1
Full = 1
Heat_demand = 0
Heat_demand = 1
Done = 1
Empty = 1
– This code does not consider undefined states
– The “when others” statement only considers other enumerated states
– The states “101”, “110” & “111” are not considered
Slide -222
Creating “Safe” State Machines• WHEN OTHERS clause does not make state machines
“safe”– Once state machine is recognized, synthesis tool only accounts for explicitly
defined states– Exception: Number of states equals power of 2 AND binary/grey encoding
enabled
• Safe state machines created using synthesis constraints– Quartus II software uses
• SAFE STATE MACHINE assignment applied project-wide and to individual FSMs• VHDL synthesis attribute
– May increase logic usage
Slide -223
Using Custom Encoding Styles• Remove glitches
without output registers• Eliminate combinatorial
output logic• Outputs mimic state
bits– Use additional state bits for
states that do have exclusive outputs
State Outputs Custom Encoding
Idle 0 0 0 0 0000
Fill 1 0 0 0 1000
Heat_w 0 1 1 0 0110
Wash 0 1 0 0 0100
Drain 0 1 0 1 0101
Wat
er
Spi
n H
eat
Pum
p
Slide -224
ENTITY wm ISPORT (
clk, reset, door_closed, full : in std_logic;heat_demand, done, empty : in std_logic;water, spin, heat, pump : out std_logic);
END wm;
ARCHITECTURE behave OF wm ISTYPE state_type IS (idle, fill, heat_w, wash, drain);ATTRIBUTE syn_encoding : STRING;ATTRIBUTE syn_encoding OF state_type : TYPE IS
"0000 1000 0110 0100 0101”;SIGNAL current_state, next_state : state_type;BEGIN
Quartus II Custom State Encoding
Full = 1
Done = 1
IDLE
Water = 0 Spin = 0 Heat = 0
Pump = 0 FILL
Water = 1 Spin = 0 Heat = 0
Pump = 0
HEAT_W
Water = 0 Spin = 1 Heat = 1
Pump = 0
WASH
Water = 0 Spin = 1 Heat = 0
Pump = 0
DRAIN
Water = 0 Spin = 1 Heat = 0
Pump = 1
Door_closed = 1
Heat_demand = 0
Heat_demand = 1
Empty = 1
– Must also set State Machine Processing assignment to “User Encoded”
– Output assignments are coded per previous examples– Synthesis automatically handles reduction of output logic
– Some tools use VHDL attributes like enum_encoding OR syn_enum_encoding to perform custom state encoding
Slide -225
Writing Efficient State Machines• Remove counting, timing, arithmetic
functions from state machine & implement externally
• Reduces overall logic & improves performance
Slide -226
VHDL Logic Optimization & Performance Balancing operators Resource sharing Logic duplication Pipelining
Slide -227
Operators• Synthesis tools replace operators with pre-
defined (pre-optimized) blocks of logic• Designer should control when & how many
operators– Ex. Dividers
• Dividers are large blocks of logic• Every ‘/’, mod and rem inserts a divider block and
leaves it up to synthesis tool to optimize• Better resource optimization usually involves cleverly
using multipliers or shift operations to do divide
Slide -228
Generating Logic from Operators
IF (sel < 10) THENy <= a + b;
ELSEy <= a +
10;END IF;
+ +
<1 Comparator
2 Adders
1 Mulitplexer
– Synthesis tools break down code into logic blocks
– They then assemble, optimize & map to hardware
Slide -229
Balancing Operators• Use parenthesis to define logic groupings
– Increases performance– May increase utilization– Balances delay from all inputs to output– Circuit functionality unchanged
z <= a * b * c * d
Xa
b Xc Xd
z
Xa
b
z <= (a * b) * (c * d)
Xc
d
X z
Unbalanced Balanced
Slide -230
Balancing Operators: Example• a, b, c, d: 4-bit vectors
z <= a * b * c * d
Xa
b Xc Xd
z
Xa
b
z <= (a * b) * (c * d)
Xc
d
X z
Unbalanced Balanced
4 x 4
8 x 4
12 x 4
16-bit
4 x 4
4 x 4
8 x 8
16-bit
Delay through 3 stages of multiply Delay through 2
stages of multiply
Slide -231
Resource Sharing
• Reduces number of operators needed– Reduces area
• Two types– Sharing operators among mutually exclusive
functions– Sharing common subexpressions
• Synthesis tools can perform automatic resource sharing– Feature can be enabled or disabled
Slide -232
Mutually Exclusive Operators
process(rst, clk)variable tmp_q : std_logic_vector(7 DOWNTO 0);begin
if rst = '0' thentmp_q := (OTHERS =>
‘0’);elsif rising_edge(clk) then
if updn = '1' thentmp_q := tmp_q
+ 1; else tmp_q := tmp_q
- 1; end if;
end if;q <= tmp_q;
end process;
– Up/down counter– 2 adders are mutually
exclusive & can be shared (typically IF-THEN-ELSE with same operator in both choices)
+Registers
+1
-1q
rstclk
+
Slide -233
process(rst, clk)variable tmp_q : std_logic_vector(7 DOWNTO 0); variable dir : integer range -1 to 1;begin
if rst = '0' thentmp_q := (OTHERS =>
‘0’);elsif rising_edge(clk) then
if updn = '1' thendir := 1;
else dir := -1;
end if;tmp_q := tmp_q + dir;
end if; q <= tmp_q;
end process;
– Up/down counter– Only one adder required
+ Registers
+1
-1 q
rstclk
Sharing Mutually Exclusive Operators
Slide -234
How Many Multipliers?
y <= a * b * cz <= b * c * d
Slide -235
How Many Multipliers? (Answer)
Xa
b Xc
XX
d
y
z
y <= a * b * cz <= b * c * d
4 Multipliers!
Slide -236
How Many Multipliers Again?
y <= a * (b * c)z <= (b * c) * d
Slide -237
Xb
c Xa
Xd
y
z
y <= a * (b * c)z <= (b * c) * d
3 Multipliers!
– This is called sharing common subexpressions
– Some synthesis tools do this automatically, but some don’t!
– Parentheses guide synthesis tools
– If (b*c) is used repeatedly, assign to temporary signal
How Many Multipliers Again? (Answer)
Slide - 238
Topics• PLD
– PROM– PLA– PAL– CPLD– Programming PLD– ASIC
• FPGA Architecture• Quartus Development software• FPGA Programming Technology• SRAM versus Antifuse FPGA• EEPROM/Flash FPGA• Xilinx FPGA Architecture• FPGA basic building blocks• FPGA Embedded Blocks• FPGA Clocking Mechanism• FPGA Family• Altera Megafunctions• FPGA Design flow• Design phase• Advanced VHDL Topics
• Simulation versus Synthesis• Latches versus registers• Common pitfalls• Unwanted latches• Case statement• Variable versus signals• Synthesizable subprograms• Gated clocks• Inferring Logic Functions.• Control Signal Priority• Tri-state• Memory
Slide - 239
Example - 1• Explain the problem with gated clock? How
can you implement a gated clock in your design?
• Cause of functional and timing problem• source of additional clock skew • To solve:
• Use a synchronous gating structure• Ensure global clock routing is used for clock signal• Gate the clock at the source
Slide - 240
Example - 2• How many registers are? Four registers• Use variable that are updated immediately as shownOne register now!
ARCHITECTURE logic OF reg1 ISSIGNAL a, b, c : STD_LOGIC;
BEGINPROCESS (clk)BEGIN
IF rising_edge (clk) THENa <= d;b <= a;c <= b;q <= c;
END IF;END PROCESS;
END ARCHITECTURE reg1;
ARCHITECTURE logic OF reg1 ISVARIABLE a, b, c : STD_LOGIC;
BEGINPROCESS (clk)BEGIN
IF rising_edge (clk) THENa := d;b := a;c := b;q <= c;
END IF;END PROCESS;
END ARCHITECTURE reg1;
Slide - 241
Example - 3• Explain the problem with the following code?• Two drivers drive the same signal, use tri-state
ARCHITECTURE beh OF example3 ISBEGIN
q <= d;q <= i;
END ARCHITECTURE beh;
Slide - 242
Example - 4• Explain the
problem with the following VHDL model?
• Fix It.
LIBRARY ieee;USE IEEE.std_logic_1164.all;
ENTITY nolatch ISPORT (a,b,c : IN STD_LOGIC;
sel: IN STD_LOGIC_VECTOR (4 DOWNTO 0);oput: OUT STD_LOGIC);
END nolatch;
ARCHITECTURE rtl OF nolatch ISBEGIN
PROCESS (a,b,c,sel) BEGINIF sel = "00000" THEN
oput <= a;ELSIF sel = "00001" THEN
oput <= b;ELSIF sel = "00010" THEN
oput <= c;END IF;
END PROCESS;END rtl;
Slide - 243
Example - 4• Explain the
problem with the following VHDL model?
• Unwanted latch, code updated to remove the unwanted latch.
LIBRARY ieee;USE IEEE.std_logic_1164.all;
ENTITY nolatch ISPORT (a,b,c : IN STD_LOGIC;
sel: IN STD_LOGIC_VECTOR (4 DOWNTO 0);oput: OUT STD_LOGIC);
END nolatch;
ARCHITECTURE rtl OF nolatch ISBEGIN
PROCESS (a,b,c,sel) BEGINIF sel = "00000" THEN
oput <= a;ELSIF sel = "00001" THEN
oput <= b;ELSIF sel = "00010" THEN
oput <= c;ELSE --- Prevents latch inference
oput <= 'X'; --/END IF;
END PROCESS;END rtl;
Slide -244
Pipelining• Purposefully inserting register(s) into
middle of combinatorial data (critical) path• Increases clocking speed• Adds levels of latency
– More clock cycles needed to obtain output• Some tools perform automatic pipelining
– Same advantages/disadvantages as automatic fan-out
Slide -245
Adding Single Pipeline Stage
DecodeValue
x-1Logic
20 ns 20 ns
DecodeValue
x
Counter,State
MachineLogic
40 ns
Counter,State
Machine
25 MHz System
50 MHz System
Slide -246
mult_ : PROCESS (clk, clr) BEGIN
IF (clr = ‘0’) THENatemp <= (OTHERS => ‘0’);btemp <= (OTHERS => ‘0’);ctemp <= (OTHERS => ‘0’);dtemp <= (OTHERS => ‘0’);result <= (OTHERS => ‘0’);
ELSIF rising_edge(clk)atemp <= a;btemp <= b;ctemp <= c;dtemp <= d;result <= (atemp * btemp)
* (ctemp * dtemp);END IF;
END PROCESS;
mult_pipe : PROCESS (clk, clr) BEGIN
IF (clr = ‘0’) THENatemp <= (OTHERS => ‘0’);btemp <= (OTHERS => ‘0’);ctemp <= (OTHERS => ‘0’);dtemp <= (OTHERS => ‘0’);int1 <= (OTHERS => ‘0’);int2 <= (OTHERS => ‘0’);result <= (OTHERS => ‘0’);
ELSIF rising_edge(clk)atemp <= a;btemp <= b;ctemp <= c;dtemp <= d;int1 <= atemp * btemp;int2 <= ctemp * dtemp;result <= int1 * int2;
END IF;END PROCESS;
Non-Pipelined Pipelined
Adding Single Pipeline Stage In VHDL
Slide -247
Pipelined 4-input Multiplier
Xa
b
Xc
d
X z
Slide -248
Parameterized Code• Logic blocks that are made scalable for
reuse• Code is written for flexibility
– Different configurations of same model• 4 constructs
– Pre-defined attributes– Generics– For generate– If generate
Slide -249
Pre-Defined Attributes• Return information regarding associated object• Object changes will automatically be reflected in
returned values• Uses
– Improving readability of code– Creating parameterized models
• Improve flexibility of code, especially using loops• Limit hard-coding logic resources
• Examples– Array attributes – Signal attributes (not discussed)
• e.g. ‘EVENT, ‘STABLE
Slide -250
Pre-Defined Array Attributes
• a‘HIGH = 7 – Upper bound of array index
• a‘LOW = 0– Lower bound of array index
• a‘RIGHT = 0– Right-most bound of array index
• a‘LEFT = 7– Left-most bound of array index
• a‘RANGE = 7 DOWNTO 0– Range declared for object, either TO or DOWNTO
• a‘REVERSE = 0 TO 7– Reverse of the range declared for object
• a‘LENGTH = 8– Number of values in range index
• a’ASCENDING = FALSE– Returns TRUE if array range uses TO and FALSE if array range uses DOWNTO
a : IN STD_LOGIC_VECTOR(7 DOWNTO 0)
- These array attributes are synthesizable
Slide -251
Generics (Review)• Used to pass information to an entity instance
– Timing values (for simulation)– Scalable code
ENTITY reg_bank ISGENERIC (
tplh , tphl : time := 5 ns; tphz, tplz : time := 3 ns;
size : integer := 1;);
PORT (clk : IN std_logic;d : IN std_logic_vector (size - 1
DOWNTO 0);q : OUT std_logic_vector (size - 1
DOWNTO 0));
END ENTITY shift_reg;
Slide -252
Parameterized Counter
ENTITY counter ISGENERIC (width : INTEGER);PORT (
clk, clr, sload, cnt_en : IN std_logic;data : IN std_logic_vector (width - 1
DOWNTO 0);q : OUT std_logic_vector (width - 1
DOWNTO 0));END ENTITY counter;ARCHITECTURE logic OF counter ISBEGIN
PROCESSBEGIN
PROCESS (clk, clr) VARIABLE cnt : std_logic_vector (width -
1 DOWNTO 0; IF clr = ‘1’ THEN
cnt := 0;ELSIF rising_edge(clk)
THEN IF sload =
‘1' THEN
cnt := data; ELSIF
cnt_en = '1' THEN
cnt := cnt + 1; END IF;
END IF; q <= cnt;
END PROCESS; END ARCHITECTURE logic;
Generic width used to scale counter
Slide -253
Using A Parameterized Function• Must map to generics & port• Generic & port resolution done at compile time
u1 : counter GENERIC MAP (width => 16) PORT MAP (clk => tclk, clr => tclr, sload => tsload,
cnt_en => tcnt_en, data => tdata, q => tq);
top_counter
counterclk
clr
cnt_en
sload
data
q
tclk
tclr
cnt_en
tsload
data
tq
16
Slide -254
Complete Code LIBRARY IEEE;USE IEEE.std_logic_1164.all;USE IEEE.std_logic_arith.all;
ENTITY top_counter ISPORT (
tclk, tclr, tsload, tcnt_en : IN std_logic;tdata : IN std_logic_vector (15 DOWNTO 0);tq : OUT std_logic_vector (15 DOWNTO 0));
END ENTITY top_counter;ARCHITECTURE logic OF top_counter IS
COMPONENT pcounterGENERIC (width : INTEGER);PORT (
clk, clr, sload, cnt_en : IN std_logic;data : IN std_logic_vector (width - 1 DOWNTO
0);q : OUT std_logic_vector (width - 1 DOWNTO
0));
END COMPONENT;
BEGIN
u1 : pcounter GENERIC MAP (width => 16) PORT MAP (clk => tclk, clr => tclr, sload => tsload,
cnt_en => tcnt_en, data => tdata, q => tq);
END ARCHITECTURE logic;
Slide -255
Generate Statements• Used to create structural blocks• Resolved at compile time• Reduce amount of code• Can be nested• For-generate
– Creates zero or a set number of duplicates of a structure– No need to individual instantiate each duplicate
• If-generate– Conditionally selects whether zero or one structure is made
Slide -256
For-Generate• Syntax
• Sets the number of structures created• Similar to FOR loop
– Can only use concurrent statements• Label is required
label : FOR <identifier> IN <range> GENERATE--concurrent statements
END GENERATE label;
Slide -
PARITY: Block Diagram
257
Slide -
PARITY: Entity Declaration
LIBRARY ieee;USE ieee.std_logic_1164.all;
ENTITY parity IS PORT(
parity_in : IN STD_LOGIC_VECTOR(7 DOWNTO 0); parity_out : OUT STD_LOGIC
);END parity;
258
Slide -
PARITY: Block Diagram
xor_out(1)xor_out(2)
xor_out(3) xor_out(4)xor_out(5) xor_out(6)
259
Slide -
PARITY: ArchitectureARCHITECTURE parity_dataflow OF parity IS
SIGNAL xor_out: std_logic_vector (6 downto 1);
BEGIN
xor_out(1) <= parity_in(0) XOR parity_in(1);xor_out(2) <= xor_out(1) XOR parity_in(2);xor_out(3) <= xor_out(2) XOR parity_in(3);xor_out(4) <= xor_out(3) XOR parity_in(4);xor_out(5) <= xor_out(4) XOR parity_in(5);xor_out(6) <= xor_out(5) XOR parity_in(6);parity_out <= xor_out(6) XOR parity_in(7);
END parity_dataflow;
260
Slide -
PARITY: Block Diagram (2)
xor_out(1)xor_out(2)
xor_out(3) xor_out(4)xor_out(5) xor_out(6)
xor_out(7)
xor_out(0)
261
Slide -
PARITY: ArchitectureARCHITECTURE parity_dataflow OF parity IS
SIGNAL xor_out: STD_LOGIC_VECTOR (7 downto 0);
BEGIN
xor_out(0) <= parity_in(0);xor_out(1) <= xor_out(0) XOR parity_in(1);xor_out(2) <= xor_out(1) XOR parity_in(2);xor_out(3) <= xor_out(2) XOR parity_in(3);xor_out(4) <= xor_out(3) XOR parity_in(4);xor_out(5) <= xor_out(4) XOR parity_in(5);xor_out(6) <= xor_out(5) XOR parity_in(6);xor_out(7) <= xor_out(6) XOR parity_in(7);parity_out <= xor_out(7);
END parity_dataflow;262
Slide -
PARITY: Architecture (2)ARCHITECTURE parity_dataflow OF parity IS
SIGNAL xor_out: STD_LOGIC_VECTOR (7 DOWNTO 0);
BEGIN
xor_out(0) <= parity_in(0);
G2: FOR i IN 1 TO 7 GENERATExor_out(i) <= xor_out(i-1) XOR parity_in(i);
END GENERATE G2;
parity_out <= xor_out(7);
END parity_dataflow;
263
Slide -
w 8
w 11
s 1
w 0
s 0
w 3
w 4
w 7
w 12
w 15
s 3 s 2
f
Example – 16X1 Mux
264
Slide -
A 4-to-1 MultiplexerLIBRARY ieee ;USE ieee.std_logic_1164.all ;
ENTITY mux4to1 ISPORT ( w0, w1, w2, w3 : IN STD_LOGIC ;
s : IN STD_LOGIC_VECTOR(1 DOWNTO 0) ;
f : OUT STD_LOGIC ) ;END mux4to1 ;
ARCHITECTURE Dataflow OF mux4to1 ISBEGIN
WITH s SELECTf <= w0 WHEN "00",
w1 WHEN "01", w2 WHEN "10", w3 WHEN OTHERS ;
END Dataflow ;265
Slide -
Straightforward code for 16X1 MuxLIBRARY ieee ;USE ieee.std_logic_1164.all ;
ENTITY Example1 ISPORT ( w : IN STD_LOGIC_VECTOR(0 TO 15) ;
s : IN STD_LOGIC_VECTOR(3 DOWNTO 0) ; f : OUT STD_LOGIC ) ;
END Example1 ;
266
Slide -
Straightforward code for 16X1 Mux
ARCHITECTURE Structure OF Example1 IS
COMPONENT mux4to1PORT ( w0, w1, w2, w3 : IN STD_LOGIC ;
s : IN STD_LOGIC_VECTOR(1 DOWNTO 0) ;
f : OUT STD_LOGIC ) ;END COMPONENT ;
SIGNAL m : STD_LOGIC_VECTOR(0 TO 3) ;
BEGINMux1: mux4to1 PORT MAP ( w(0), w(1), w(2), w(3), s(1 DOWNTO 0), m(0) ) ;Mux2: mux4to1 PORT MAP ( w(4), w(5), w(6), w(7), s(1 DOWNTO 0), m(1) ) ;Mux3: mux4to1 PORT MAP ( w(8), w(9), w(10), w(11), s(1 DOWNTO 0), m(2) ) ;Mux4: mux4to1 PORT MAP ( w(12), w(13), w(14), w(15), s(1 DOWNTO 0), m(3) ) ;Mux5: mux4to1 PORT MAP ( m(0), m(1), m(2), m(3), s(3 DOWNTO 2), f ) ;
END Structure ;
267
Slide -
Modified code for 16X1 MuxARCHITECTURE Structure OF Example1 IS
COMPONENT mux4to1PORT ( w0, w1, w2, w3 : IN STD_LOGIC ;
s : IN STD_LOGIC_VECTOR(1 DOWNTO 0) ;
f : OUT STD_LOGIC ) ;END COMPONENT ;
SIGNAL m : STD_LOGIC_VECTOR(0 TO 3) ;
BEGING1: FOR i IN 0 TO 3 GENERATE
Muxes: mux4to1 PORT MAP (w(4*i), w(4*i+1), w(4*i+2), w(4*i+3), s(1 DOWNTO 0), m(i) ) ;
END GENERATE ;Mux5: mux4to1 PORT MAP ( m(0), m(1), m(2), m(3), s(3 DOWNTO 2), f ) ;
END Structure ;268
Slide -
w 0
En
y 0 w 1 y 1
y 2 y 3
y 8 y 9 y 10y 11
w 2
w 0 y 0 y 1 y 2 y 3
w 0
En
y 0 w 1 y 1
y 2 y 3
w 0
En
y 0 w 1 y 1
y 2 y 3
y 4 y 5 y 6 y 7
w 1
w 0
En
y 0 w 1 y 1
y 2 y 3
y 12y 13y 14y 15
w 0
En
y 0 w 1 y 1
y 2 y 3
w 3
En w
Example- 4X16 Decoder
269
Slide -
A 2-to-4 binary decoderLIBRARY ieee ;USE ieee.std_logic_1164.all ;
ENTITY dec2to4 ISPORT ( w : IN STD_LOGIC_VECTOR(1 DOWNTO 0) ;
En : IN STD_LOGIC ; y : OUT STD_LOGIC_VECTOR(0 TO 3) ) ;
END dec2to4 ;
ARCHITECTURE Dataflow OF dec2to4 ISSIGNAL Enw : STD_LOGIC_VECTOR(2 DOWNTO 0) ;
BEGINEnw <= En & w ;WITH Enw SELECT
y <= "1000" WHEN "100", "0100" WHEN "101",
"0010" WHEN "110", "0001" WHEN "111",
"0000" WHEN OTHERS ;END Dataflow ;
270
Slide -
VHDL code for 4X16 decoder
LIBRARY ieee ;USE ieee.std_logic_1164.all ;
ENTITY dec4to16 ISPORT (w : IN STD_LOGIC_VECTOR(3 DOWNTO 0) ;
En : IN STD_LOGIC ; y : OUT STD_LOGIC_VECTOR(0 TO 15) ) ;
END dec4to16 ;
271
Slide -
VHDL code for 4X16 decoder (2)ARCHITECTURE Structure OF dec4to16 IS
COMPONENT dec2to4PORT ( w : IN STD_LOGIC_VECTOR(1 DOWNTO 0) ;
En : IN STD_LOGIC ;y : OUT STD_LOGIC_VECTOR(0 TO 3) ) ;
END COMPONENT ;
SIGNAL m : STD_LOGIC_VECTOR(0 TO 3) ;
BEGING1: FOR i IN 0 TO 3 GENERATE
Dec_ri: dec2to4 PORT MAP ( w(1 DOWNTO 0), m(i), y(4*i TO 4*i+3) );END GENERATE ;Dec_left: dec2to4 PORT MAP ( w(3 DOWNTO 2), En, m ) ;
END Structure ;
272
Slide -273
If Generate• Syntax
• Condition controls whether a structure is created
• Can only use concurrent statements• Label is required
label : IF <expression GENERATE--concurrent statements
END GENERATE label;
Slide -
Example 1• Based on Lshift,
either a Left-Shift register or Right-shift register is generated.
• If Lshift is true – N-bit left-shift register
• If false, Right-shift register.
274
entity shift_reg is generic(N: positive := 4; Lshift: Boolean := true);-- generic parameters used port(D: in bit_vector(N downto 1); -- named association Qout: out bit_vector(N downto 1); CLK, Ld, Sh, Shiftin: in bit);end shift_reg;
architecture SRN of shift_reg issignal Q, shifter: bit_vector(N downto 1);begin Qout <= Q; genLS: if Lshift generate -- conditional generate of left shift register shifter <= Q(N-1 downto 1) & Shiftin; end generate; genRS: if not Lshift generate -- conditional generate of right shift register shifter <= Shiftin & Q(N downto 2); end generate; process(CLK) begin if CLK'event and CLK = '1' then if LD = '1' then Q <= D; elsif Sh = '1' then Q <= shifter; end if; end if; end process;end SRN;
Slide -275
If Generate Example 2ENTITY counter IS
GENERIC (width : INTEGER; rise_or_fall : INTEGER);
PORT (clk, clr, sload, cnt_en : IN std_logic;data : IN std_logic_vector (width - 1 DOWNTO 0);q : OUT std_logic_vector (width - 1 DOWNTO 0));
END ENTITY counter;ARCHITECTURE logic OF counter IS
SIGNAL clk_buf : std_logic;BEGIN
clock : IF (rise_or_fall > 0) GENERATEclk_buf <= clk;END GENERATE;
not_clock : IF (rise_or_fall <= 0) GENERATEclk_buf <= NOT clk;END GENERATE;PROCESS (clk_buf, clr) VARIABLE cnt : INTEGER RANGE 0 TO (2**width)-1; BEGIN
IF clr = ‘1’ THENcnt := 0;
ELSIF rising_edge(clk_buf) THEN IF sload = ‘1' THEN
cnt := conv_integer(data); ELSIF cnt_en = '1' THEN
cnt := cnt + 1; END IF;
END IF; q <= conv_std_logic_vector(cnt,width);
END PROCESS; END ARCHITECTURE logic;
u1 : pcounter3 GENERIC MAP (width => 16, rise_or_fall => 0) PORT MAP (clk => tclk, clr => tclr,
sload => tsload, data => tdata,
cnt_en => tcnt_en, q => tq);
END ARCHITECTURE logic;
selects rising or falling edge clock behavior
– One code slice can implement both a rising & falling edge counter
– Different (& better) than using IF-THEN-ELSE– No clock mux is created;
either clock inversion is implemented or it is not
FPGA Design Validation: Simulation & Design
Verification
Slide -
FPGA Design Flow
• Requirements:– Provided by customer or generated internally– May be ambiguous– Little or no implementation details– The customer (internal or external) may not
know exactly what they want or what is possible
Product Delivery
Design Verification
Design ImplementationSpecificationsRequirements
277
Slide -
FPGA Design Flow
• Specification:– Identify what the requirements mean– Narrow the requirements to specifics
• Design blocks• Components• Input/Output• What the design should and shouldn’t do
Product Delivery
Design Verification
Design ImplementationSpecificationsRequirements
278
Slide -
Design Implementation• Synthesis of specification into a workable
design• Other names: Design Engineering Cycle• Initially iterative
Design
Test
Examine Results
Modify Specification
279
Slide -
Design Verification• Verify that your design functions according
to the specification• A complete specification will cover all
cases• A poor specification is not an excuse for a
sloppy design
280
Slide -
Product Delivery• Output of the design process:
– A product that performs according to the provided requirements
– Internally/Mutually developed specification – Verification of the performance to the specification
• Documentation of due diligence– Documentation and customer acknowledgement
of all know design faults• Assessment of risk severity• DFMEA
281
Slide -
DFMEA• Design Failure Mode Engineering Analysis• Basics:
– Identify all possible design failure modes– Assign a severity to the failure mode– Assess the risk (probability) of this type of
failure– For all failure modes above a certain
severity/probability develop mitigation plan– Assign test criteria based on failure mode
282
Slide -
Product Development for FPGAs• Simulation and Design Verification• Crucial Part of the design process• FPGA are not hardware and not software• Hardware:
– Deterministic– My schematic is my schematic
• Software:– Non-deterministic– Different compilers may produce operations for the
same high-level program 283
Slide -
Product Development for FPGAs – What is Simulation• Simulation is the
process of applying stimulus or inputs that mimic actual data to the design and observing the output.
• Input to simulation phase:– Design– Synthesis netlist– Implementation netlist
284
Simulation Phase Inputs and Outputs
Simulation
Clk Reset Input10 1 01 1 00 0 11 0 10 0 01 0 0
Listing
Graphical
Pass/Fail
Resultant File
Design
Synthesis
Implementation
Development Phases
Placed & Routed Netlist
Original Design
Synthesized Netlist
Outputs
Slide -
Product Development for FPGAs –Simulation Tools• Editor to create the inputs
– Text editor– Graphical editor
• Simulator: compiles or connect the test inputs to the design, causing outputs to change based on input data.
• Input to simulation phase:– Design– Synthesis netlist– Implementation netlist
• Example: Mentor Graphics
285
Design
Test Inputs
Compile
Run
Design and Test Input Compile Flow
Slide -
VHDL Design Validation• Levels of
Simulation– Register
Transfer Level (RTL)
– Functional– Gate Level
286
Output
Design Recompile after design edit
RTL
SynthesisFunctional
Implementation Recompile after design edit
Gate-Level
Recompile after design edit
Simulation
Edit design to correct logic errors, change design &...
RTLFunctional
Edit design to correct synthesis & other errors, change logic & ...
Gate-Level
Edit design to correct timing & other errors, change logic & ...
Re- synthesize
Re- Implement
Simulation Levels
Slide -
RTL Simulation• Check for logic and syntax error• Does the design work on the target
hardware• Will it compile?• Contains no timing evaluation
287
Slide -
Functional Simulation• Performed on netlist or code generated by
synthesis tool• Sometimes necessary to direct synthesis
tool to provide netlist• Initial Timing Analysis• Will the synthesized design fit or work on
the target hardware
288
Slide -
Gate Level Simulation• Performed on the netlist generated by the
implementation tool.• Contains actually timing information
– Representative of hardware– Most realistic– Detects design timing problems
289
Slide -
Simulation in the Design Process• Complete RTL
– Does the design function/compile?• Complete Functional Simulation
– Will it function on the target hardware• Gate level simulation
– Will it work as expected over all operational conditions
• A failure at any of these level require the other steps to be revisited
290
Slide -
Developing a RTL Simulation• Identify Inputs/Outputs• Identify Test Cases• For each test case develop a vector
waveform• Run each test case and verify output • Should hit every area of your design• Test cases are referred to a stimulus
291
Slide -
Vector Waveform Files (VWF)
292
Slide -
Functional Simulation• Verify the functional operation• Expand on RTL simulation• Include some timing variation
– Looking for timing hazards• VWF may include timing variations
– Pulse width– Pulse spacing
293
Slide -
Gate Level Simulation• A full timing analysis including hardware
effects• Repeat of Functional Simulation
294
Slide -
Hardware Verification• Stimuli developed in simulation can be
supplied to a hardware test cases generator
• Build and program target hardware– FPGA level– Board Level– System Level
295
FPGA Design Validation: Simulation & Design
Verification
Slide -
FPGA Design Flow
• Simulation:– RTL– Functional– Gate Level
Product Delivery
Design Verificat
ion
Design Implementation
Specifications
Requirements
297
Slide -
Simulation in the design process• Good practice to return at least to
functional simulation before approving design changes
• Gate level simulation involving multiple timing cases can be time consuming
298
Slide -
Stimulus• Test cases/Stimulus:
– One test case for each condition
…and so on
Test Case
Input 1 Input 2 Q
1 Wide N/A Low
2 Default Short Low
3 Wide Wide High
299
Slide -
Choosing a simulation tool• Hardcore:
– Develop HDL – Company specific automated script
generation tools • IDE: Development Toolchain
– ModelSim• Mentor Graphics (Also owns Cadence)
300
Slide -301
Introduction to Testbenches• Purpose of testbench• Three classes of traditional testbenches• General testbench methods• Self verification methods• Arrays for stimulus & results• TEXTIO for stimulus & results
Slide -302
Purpose of Testbench• Generate stimulus to test design for
normal transactions, corner cases and error conditions– Direct tests– Random tests
• Automatically verify design to spec and log all errors– Regression tests
• Log transactions in a readable format for easy debugging
Slide -303
Three Classes of Traditional TestbenchesI. Test bench applies stimulus to target code and
outputs are manually reviewedII. Test bench applies stimulus to target code and
verifies outputs functionally• Requires static timing analysis
III. Test bench applies stimulus to target code and verifies outputs with timing• Does not require full static timing analysis• Code and test bench data more complex• Not covered
Slide -304
Advantages/Disadvantages
Testbench Type Advantages Disadvantages Recommendation
Class I
• Simple to write • Requires manual verification• Takes longer for others (not
original designer) to verify• Easy for others to miss errors
• Great for verifying simple code
• Not intended for re-use
Class II
• Easy to perform verification once complete
• “Set and forget it”
• Takes longer to write• More difficult to debug initially
• Better for more complicated designs, designs with complicated stimulus/outputs and higher-level designs
• Promotes re-usability
Class III
• Most in-depth• “Guarantees” design
operation, if successful (subject to model accuracy)
• Takes longest to write• Most difficult to debug• Physical changes (i.e. target
device, process) requires changing testbench
• Might be overkill for many FPGA designs
• Required for non-Altera ASIC designs
Slide -305
General Testbench Methods• Create “test harness” code to instantiate the device under test
(DUT) or target code• Create stimulus signals to connect to DUT
mycode_tb.vhd
mycode.vhdclk
in1in2in3
out1
clk_assignment
datagen_process
rstreset_assignmentout2
Single Process to Control each Signal
Slide -306
Test Vector Generation• Develop sequence of fixed input values• Test vector development from bottom up
– Write basic tasks– Write more complex tasks based on basic tasks– Perform tests
• Example – memory testing– Basic tasks: readmem, writemem– 2nd level tasks: initmem, copymem, comparemem– Generation of tests based on tasks
Slide -
Testbench AnatomyENTITY my_entity_tb IS
--TB entity has no ports END my_entity_tb;
ARCHITECTURE behavioral OF tb IS
--Local signals and constants
COMPONENT TestComp --All Design Under Test component declarations PORT ( ); END COMPONENT;-----------------------------------------------------BEGIN DUT:TestComp PORT MAP( -- Instantiations of DUTs ); testSequence: PROCESS -- Input stimuli END PROCESS;
END behavioral; 307
Slide -
Testbench for XOR3 (1)
LIBRARY ieee;USE ieee.std_logic_1164.all;
ENTITY xor3_tb ISEND xor3_tb;
ARCHITECTURE behavioral OF xor3_tb IS-- Component declaration of the tested unitCOMPONENT xor3PORT(
A : IN STD_LOGIC;B : IN STD_LOGIC;C : IN STD_LOGIC;Result : OUT STD_LOGIC );
END COMPONENT;
-- Stimulus signals - signals mapped to the input and inout ports of tested entitySIGNAL test_vector: STD_LOGIC_VECTOR(2 DOWNTO 0);SIGNAL test_result : STD_LOGIC;
308
Slide -
Testbench for XOR3 (2)BEGIN
UUT : xor3PORT MAP (
A => test_vector(2),B => test_vector(1),C => test_vector(0),Result => test_result);
); Testing: PROCESS BEGIN
test_vector <= "000"; WAIT FOR 10 ns;
test_vector <= "001"; WAIT FOR 10 ns; test_vector <= "010"; WAIT FOR 10 ns;
test_vector <= "011"; WAIT FOR 10 ns; test_vector <= "100"; WAIT FOR 10 ns; test_vector <= "101"; WAIT FOR 10 ns; test_vector <= "110"; WAIT FOR 10 ns; test_vector <= "111";
WAIT FOR 10 ns; END PROCESS;END behavioral;
309
Slide -
Generating selected values of one input
SIGNAL test_vector : STD_LOGIC_VECTOR(2 downto 0);
BEGIN .......
testing: PROCESS BEGIN
test_vector <= "000";WAIT FOR 10 ns;test_vector <= "001";WAIT FOR 10 ns;test_vector <= "010";WAIT FOR 10 ns;
test_vector <= "011";WAIT FOR 10 ns;test_vector <= "100";WAIT FOR 10 ns;
END PROCESS; ........END behavioral;
310
Slide -
Generating all values of one input
SIGNAL test_vector : STD_LOGIC_VECTOR(3 downto 0):="0000";
BEGIN .......
testing: PROCESSBEGIN
WAIT FOR 10 ns;test_vector <= test_vector + 1;
end process TESTING;
........END behavioral;
311
Slide -
Generating periodical signals, such as clocks
CONSTANT clk1_period : TIME := 20 ns;CONSTANT clk2_period : TIME := 200 ns;
SIGNAL clk1 : STD_LOGIC;SIGNAL clk2 : STD_LOGIC := ‘0’;
BEGIN ....... clk1_generator: PROCESS clk1 <= ‘0’;
WAIT FOR clk1_period/2;clk1 <= ‘1’;
WAIT FOR clk1_period/2;END PROCESS;
clk2 <= not clk2 after clk2_period/2; .......END behavioral;
312
Slide -
Generating one-time signals, such as resets
CONSTANT reset1_width : TIME := 100 ns;CONSTANT reset2_width : TIME := 150 ns;
SIGNAL reset1 : STD_LOGIC;SIGNAL reset2 : STD_LOGIC := ‘1’;
BEGIN ....... reset1_generator: PROCESS reset1 <= ‘1’;
WAIT FOR reset_width;reset1 <= ‘0’;
WAIT;END PROCESS;
reset2_generator: PROCESSWAIT FOR reset_width;reset2 <= ‘0’;
WAIT; END PROCESS;
.......END behavioral;
313
Slide -314
Concurrent Statements• Signals with regular or limited transitions can be created with concurrent
statements• These statements can begin a testbench and reside outside any processes
0 5 10 15 20 25 30 35 40 45 50 55
CLK
RESET
ns
ARCHITECTURE logic OF test_b IS
-- Use clkperiod constant to create 50 MHz clockCONSTANT clkperiod : TIME := 20 ns;
-- clk initialized to ‘0’SIGNAL clk : std_logic := ‘0’;
SIGNAL reset : std_logic;
BEGIN
--clock must be initialized when declared to use-- this notation
clk <= NOT clk AFTER clkperiod/2;
reset <= ‘1’, ‘0’ AFTER 20 ns, ‘1’ AFTER 40 ns;
END ARCHITECTURE logic;
Slide -315
Sequential Statements• More complex
combinations can be created using sequential statements (i.e. LOOP, WAIT, IF-THEN, CASE)– Statements
dependent on clock edges
– Multiple processes & loops executing at once
clkgen: PROCESS -- Another clock generation exampleCONSTANT clkperiod : TIME := 20 ns;
BEGINclk <= ‘0’; -- Initialize clockWAIT FOR 500 ns; -- Delay clock for 500 nsLOOP -- Infinite loop to create free-running clock
clk <= ‘1’;WAIT FOR clkperiod/2;clk <= ‘0’;WAIT FOR clkperiod/2;
END LOOP;END PROCESS clkgen;
buscount: PROCESS (clk) -- Generate counting patternBEGIN
IF rising_edge (clk) THEN inbus <= count;
count <= count + 1;END IF;
END PROCESS buscount;
Slide -316
Sequential Statements (cont.)
• Example shows more complex stimulus generation• Process uses sensitivity list and WAITs (not allowed in synthesis)
(uses IEEE.numeric_std.all)
bus_gray: PROCESS (clk) CONSTANT buswidth: INTEGER := 16;
BEGINinbus <= (OTHERS => ‘0’);FOR n IN 0 TO 131072 LOOP
inbus <= TO_UNSIGNED(n, buswidth) XORshift_right(TO_UNSIGNED(n, buswidth)), 1);
WAIT UNTIL rising_edge(clk);END LOOP;
END PROCESS;
Slide -317
LIBRARY ieee;USE ieee.std_logic_1164.all;USE ieee.std_logic_unsigned.all;
ENTITY addtest IS -- Top-level entity with no portsEND ENTITY addtest;
ARCHITECTURE stimulus OF addtest IS
-- Declare design being testedCOMPONENT adder
PORT (clk : IN std_logic;a, b: IN std_logic_vector(3 DOWNTO 0);sum : OUT std_logic_vector(3 DOWNTO 0));
END COMPONENT;
-- Signals to assign values and observe resultsSIGNAL a, b, sum: std_logic_vector(3 DOWNTO 0);SIGNAL clk : std_logic := ‘0’;
-- Constants for timing valuesCONSTANT clkperiod : TIME := 20 ns;
BEGIN
-- Create clock to synchronize actionsclk <= NOT clk AFTER clkperiod/2;
-- Instantiate design being testedadd1: adder PORT MAP (
clk => clk, a => a, b => b, sum => sum);
Sample VHDL Class I Testbench-- Process to generate stimulus; Note operations-- take place on inactive clock edgePROCESS
CONSTANT period : TIME := 40 ns;VARIABLE ina, inb : std_logic_vector(3 DOWNTO 0);
BEGINWAIT UNTIL falling_edge (clk);ina := (OTHERS => ‘0’);inb := (OTHERS => ‘0’);
stim_loop: LOOP-- Apply generated stimulus to inputsa <= ina; b <= inb;WAIT FOR period;
-- Exit loop once simulation reaches 1 usEXIT stim_loop WHEN NOW > 1 us ;
-- Use equations below to generate new stimulus-- valuesWAIT UNTIL falling_edge (clk);ina := ina + 2;inb := inb + 3;
END LOOP stim_loop;
-- Final wait to keep process from repeatingWAIT;
END PROCESS; END ARCHITECTURE stimulus;
Slide -318
Example Results
Slide - 319
Topics – Exam II• State Machine Coding• VHDL Logic Optimization & Performance
– Balancing operators– Resource Sharing– Pipelining
• Parameterized Code– Constructs
• Pre-Defined Attributes• Generics• For Generate• If generate
• Simulation– RTL Simulation – Functional Simulation– Gate Level simulation
• Testbenchs– Classes of Testbenches Advantages and
Disadvantages– Test Vector Generation
Slide - 320
Example - 1• Explain One-Hot Encoding used by Altera’s
Quartus? Show how you can encode the following 5 states?
• State 0• State 1• State 2• State 3• State 4
Slide - 321
Example - 1• Explain One-Hot Encoding used by Altera’s Quartus? Show how you can
encode the following 5 states?• One-Hot Encoding: The default encoding style requiring N bits, in which N
is the number of enumeration literals in the Enumeration Type.• State 0 0 0 0 0 1• State 1 0 0 0 1 0• State 2 0 0 1 00• State 3 0 1 0 0 0• State 4 1 0 0 0 0
Slide - 322
Example - 2• Generate the possible logic block from the given operators:
IF (A > 20) THENX <= B * C;ELSEX <= C *D;
END IF;
Slide - 323
Example - 2• Generate the possible logic block from the given operators:
IF (A > 20) THENX <= B * C;ELSEX <= C *D;
END IF;
<1 Comparator
2 Multiplier
1 Mulitplexer
X X
Slide - 324
Example - 3• Use parenthesis to balance the following operators
• Z <= a * b * c * d * e * f
Slide - 325
Example - 3• Use parenthesis to balance the following operators• Z <= a * b * c * d * e * f
z <= a * b * c * d * e * f
Xa
b Xc Xd
z
Xa
b
z <= (a * b) * (c * d) * (e * f)
Xc
d
X
z
Unbalanced Balanced
Xe
Xe
f
Xf X
Slide -326
Example - 4• Draw the test waveforms generated by the following testbench?
ARCHITECTURE logic OF test_b IS
-- Use clkperiod constant to create 50 MHz clockCONSTANT clkperiod : TIME := 20 ns;
-- clk initialized to ‘1’SIGNAL clk : std_logic := ‘1’;
SIGNAL reset : std_logic;
BEGIN
--clock must be initialized when declared to use-- this notation
clk <= NOT clk AFTER clkperiod/2;
reset <= ‘1’, ‘0’ AFTER 25 ns, ‘1’ AFTER 40 ns;
END ARCHITECTURE logic;
Slide -327
Example - 4• Draw the test waveforms generated by the following testbench?
0 5 10 15 20 25 30 35 40 45 50 55
CLK
RESET
ns
ARCHITECTURE logic OF test_b IS
-- Use clkperiod constant to create 50 MHz clockCONSTANT clkperiod : TIME := 20 ns;
-- clk initialized to ‘1’SIGNAL clk : std_logic := ‘1’;
SIGNAL reset : std_logic;
BEGIN
--clock must be initialized when declared to use-- this notation
clk <= NOT clk AFTER clkperiod/2;
reset <= ‘1’, ‘0’ AFTER 25 ns, ‘1’ AFTER 40 ns;
END ARCHITECTURE logic;
Slide - 328
Example - 5• Add to the following entity interface a generic clause
defining generic constant Tpw_clk_h and Tpw_clk_l that specify the minimum clock pulse width timing. Both generic constants have a default value of 3 ns.
ENTITY flipflop ISPORT (clk, d : IN STD_LOGIC; q, q_n : OUT STD_LOGIC);END ENTITY flipflop ;
Slide - 329
Example - 5• Add to the following entity interface a generic clause
defining generic constant Tpw_clk_h and Tpw_clk_l that specify the minimum clock pulse width timing. Both generic constants have a default value of 3 ns.
ENTITY flipflop ISPORT (clk, d : IN STD_LOGIC; q, q_n : OUT STD_LOGIC);END ENTITY flipflop ;
ENTITY flipflop ISGENERIC (Tpw_clk_h, Tpw_clk_l : delay_length := 3 ns);PORT (clk, d : IN STD_LOGIC; q, q_n : OUT STD_LOGIC);END ENTITY flipflop ;
Slide -330
Class II (& III) Methods• Add a compare process so that DUT outputs can be
monitored– Allows testbench to do “self-verification”
mycode_tb.vhd
mycode.vhdclk
in1in2in3
out1
clk_assignment
wavegen_process
clkreset_assignmentout2
compare_process
Slide -331
Self Verification Methods• Use “compare_process” or equivalent to check results
generated by design against expected results• Single simulation can use one or multiple testbench files
– Single testbench file containing all stimulus and all expected results
– Multiple testbench files based on stimulus, expected results or functionality (e.g. data generator, control stimulus)
• Many times signaling is too complicated to model without using vectors saved in “time-slices”
Slide -332
Simple Self Verifying Test Benches
clk <= NOT clk AFTER clkperiod/2;
add1 : adder PORT MAP (clk => clk, a => a, a => b, sum => sum);
stim: PROCESSVARIABLE error : BOOLEAN;
BEGINWAIT UNTIL falling_edge(clk);a <= (OTHERS => ‘0’);b <= (OTHERS => ‘0’);WAIT FOR 40 ns;IF (sum /= 0) THEN
error := TRUE;END IF;
WAIT UNTIL falling_edge(clk);a <= “0010”;b <= “0011”;WAIT FOR 40 ns;IF (sum /= 5) THEN
error := TRUE;END IF;
-- repeat above varying values of a and b
WAIT;END PROCESS stim;
• Code repeated for each test case• Result checked
– Simple self verifying test bench– Each sub-block within process
assigns values to a,b and waits to compare sum to its predetermined result
– Code not very efficient– Each test case may require a
lot of repeated code– Improve this code by introducing
a procedure
Slide -333
PROCEDURE test (SIGNAL clk : IN std_logic;inval_a, inval_b, result : IN INTEGER RANGE 0 TO 15;SIGNAL in_a, in_b : OUT std_logic_vector(3 DOWNTO 0);SIGNAL sum_out : IN std_logic_vector(3 DOWNTO 0);SIGNAL error : INOUT BOOLEAN) IS
BEGINWAIT UNTIL falling_edge(clk);in_a <= conv_std_logic_vector(inval_a,4);in_b <= conv_std_logic_vector(inval_b,4);WAIT FOR 40 ns;IF sum_out /= result THEN
error <= TRUE;ELSE
error <= FALSE;END IF;
END PROCEDURE;
BEGIN – architecture begin
clk <= NOT clk AFTER clkperiod/2;
add1 : adder PORT MAP (clk => clk, a => a, a => b, sum => sum);
PROCESSBEGIN
test(clk, 0, 0, 0, a, b, sum, error);test(clk, 2, 3, 5, a, b, sum, error);test(clk, 4, 6, 10, a, b, sum, error);test(clk, 6, 9, 15, a, b, sum, error);test(clk, 8, 12, 4, a, b, sum, error);WAIT ;
END PROCESS;END ARCHITECTURE;
• Procedure used to simplify test bench
• Each procedure call passes in • clock• 3 integers representing input
stimulus and expected result• ports connecting to adder• error flag
– Procedure improves efficiency and readability of testbench
– Advantage: Easier to write– Disadvantages
–Each procedure call (like last example) assigns values to a, b then waits to compare sum to its predetermined result
–Very difficult to do for complicated signaling
Simplifying Test Bench with Procedure
Slide -334
“Time-Slice” Vectors• Allows you to apply input stimulus and check results at specific
simulation times• Two methods for storage
– Internal arrays• Faster simulation times• Harder to write, creates very large VHDL file
– External files• Slower simulation times• Easier to write• Use TEXTIO or STD_LOGIC_TEXTIO package
– TEXTIO for reading/writing built-in data types– STD_LOGIC_TEXTIO for reading/writing standard logic
Slide -335
Add’l Useful VHDL Constructs for Testbenches
• Record data types• Assert & report statements• Type conversion to STRING• TEXTIO/File operations
Slide -336
Record Data Types• Declares a new data type with multiple elements
– Allows grouping of related data types/objects • Each element may be of any previously defined data type, including
arrays, enumerated types and even other records• Similar to a struct in C• Using in a testbench
– Set each record to the values for one time slice• Cycle through records to apply stimulus and check results
– Examples• Store input and output values in different elements• Store different inputs in different elements
TYPE test_record_type IS RECORDa, b : std_logic_vector(3 DOWNTO
0); sum : std_logic_vector(3 DOWNTO
0); END RECORD;
element names element data types
Slide -337
Accessing Values in a Record
• Use selected name to access single record element
• Use aggregate to access entire record
VARIABLE vector : test_record_type;
vector.a := “0010”;vector.b := “0011”;vector.sum := “0101”;
vector := (a => “0010”, b => “0011”, sum => “0101”);
Slide -338
Using Internal Arrays for Stimulus & Results
• Create array to store values (e.g. array of records)
• Assign values to array
-- Create unconstrained array so the array depth can be set when object is-- declared of the array typeTYPE test_array_type IS ARRAY (POSITIVE RANGE <>) OF test_record_type;
-- Constant array with 6 recordsCONSTANT test_patterns : test_array_type := (
(a => “0000", b => “0000“, sum => “0000”),
(a => “0010", b => “0011“, sum => “0101”),
(a => “0100", b => “0110“, sum => “1010”),
(a => “0110", b => “1001“, sum => “1111”),
(a => “1000", b => “1100“, sum => “0100”),
(a => “1010", b => “1111“, sum => “1001”)
);
* POSITIVE is INTEGER data type with range of 1 to highest integer value
Slide -339
• Checks condition expression and executes assertion if condition evaluates to false– Use as concurrent or sequential statement
• Syntax
• Report (optional)– Displays text in simulator window– Must be type string
• Enclose character strings in “ “• Other data types must be converted (discussed later)
• Severity (optional)– Expression choices: NOTE, WARNING, ERROR, FAILURE
• ERROR is the default– Results of severity depend on simulator
• e.g. By default, ModelSim tool ends simulation on failure only
Assert Statements
ASSERT <condition_expression> REPORT <text_string> SEVERITY <expression>;
Slide -340
Report Statements• Displays message without ASSERT statement
– No expression to check– Sequential statement only
• Test must be type string – Enclose character strings in “ “– Other data types must be converted (next slide)
• Syntax
• Severity (optional)– Same options as ASSERT except NOTE is the default
REPORT <text_string> SEVERITY <expression>;
Slide -341
Type Conversions to STRING• Use to display formatted messages
• <data_type>’IMAGE(obj)– Type attribute that converts obj of type <data_type> to its string
equivalent with no leading or trailing whitespace– Examples
• INTEGER’IMAGE(integer_variable)• TIME’IMAGE(time_variable)• std_logic’IMAGE(1_bit_std_logic_variable)
• Conversion utilities– Cannot use ‘IMAGE for vectors
• <data_type> must be a scalar type or subtype– Simple web search can provide most (if not all) required conversion
utilities
Slide -342
Sample Testbench Using Internal Array
test: PROCESSVARIABLE vector : test_record_type;VARIABLE found_error : BOOLEAN := FALSE;
BEGIN-- Loop through all the values in test_patternsFOR i IN test_patterns‘RANGE LOOP
vector := test_patterns(i);
-- apply the stimulus on a falling edge clockWAIT UNTIL falling_edge(testclk);a <= vector.a;b <= vector.b;
-- check result on next falling edge of clockWAIT UNTIL falling_edge(testclk);IF (sum /= vector.sum) THEN
REPORT TIME’IMAGE(NOW) & “ : Calc= " & slv_to_string(sum) & ", Exp= " & slv_to_string(vector.sum);found_error := TRUE;
END IF;
END LOOP;
ASSERT NOT found_error REPORT "---VECTORS FAILED---"SEVERITY FAILURE;
ASSERT found_error REPORT "---VECTORS PASSED---"
END PROCESS;END ARCHITECTURE;
-- entity and some of architecture declaration not shown
SIGNAL testclk : std_logic := '0';SIGNAL a, b : std_logic_vector (3 DOWNTO 0);SIGNAL sum : std_logic_vector (3 DOWNTO 0);CONSTANT clk_period : time := 20 ns;
TYPE test_record_type IS RECORDa, b : std_logic_vector(3 DOWNTO 0); sum : std_logic_vector(3 DOWNTO 0);
END RECORD;
TYPE test_array_type IS ARRAY(POSITIVE RANGE <>) OF test_record_type;
CONSTANT test_patterns : test_array_type := ((a => “0000", b => “0000“, sum => “XXXX”),(a => “0010", b => “0011“, sum => “0000”),(a => “0100", b => “0110“, sum => “0101”),(a => “0110", b => “1001“, sum => “1010”),(a => “1000", b => “1100“, sum => “1111”),(a => “1000", b => “1100“, sum => “0100”));
BEGIN -- beginning of architecture body
-- instantiate unit under test (adder) add1 : adder PORT MAP
( clk => testclk, a => a, b => b, sum => sum);
-- free-running clock process --testclk <= NOT testclk AFTER clk_period/2;
Slide -343
Example Results• Testbench fails (expected results ≠ actual results)
• Testbench passes
** Note: 72 ns : Calc = 0100, Exp= 1001 Time: 72 ns Iteration: 0 Instance: /record_add_tb** Failure: ---VECTORS FAILED--- Time: 288 ns Iteration: 0 Process: /record_add_tb/test File: … Break in Process test at record_tb.vhd line 56
ModelSim Transcript Window
** Failure: ---VECTORS PASSED--- Time: 288 ns Iteration: 0 Process: /record_add_tb/test File: …Break in Process test at record_tb.vhd line 59
ModelSim Transcript Window
Slide -344
TEXTIO/FILE Operations• FILE declaration
– Creates file handle to represent file– Opens file in READ_MODE, WRITE_MODE or APPEND_MODE
• LINE declaration– Creates line variable for reading and writing to files
• READLINE(<file_handle>,<line_variable>)– Reads a line from a file and stores information in a variable of type LINE
• READ(<line_variable>,<data_object>) – Reads text from line variable and writes to data object depending on size/type of data objec – Use STD_LOGIC_TEXTIO package to read directly into std_logic data objects
• Only built-in data types supported by TEXTIO package READ (BIT, BOOLEAN, STRING, TIME)• WRITE(<line_variable>,<data_object>)
– Writes data object to a variable of type LINE as text– Use STD_LOGIC_TEXTIO package to write directly from std_logic data objects
• Only built-in data types supported by TEXTIO package WRITE (BIT, BOOLEAN, STRING, TIME) • WRITELINE(<file_handle>,<line_variable>)
– Writes information from variable of type LINE to file
Slide -345
Sample Testbench Using External File
-- Declare packages to enable file operationsLIBRARY ieee;USE STD.TEXTIO.ALL;USE ieee.std_logic_1164.ALL;USE ieee.std_logic_textio.ALL;
ENTITY file_tb ISEND ENTITY file_tb;
ARCHITECTURE stimulus OF file_tb IS
COMPONENT adderPORT (clk : IN std_logic; a, b: IN std_logic_vector(3 DOWNTO 0);
sum: OUT std_logic_vector(3 DOWNTO 0));END COMPONENT;
-- create file handles to access text files, one for reading vectors and-- another to write output messages
FILE vectorfile: TEXT OPEN READ_MODE IS “vectors.txt”;FILE results: TEXT OPEN WRITE_MODE IS “results.txt”;
SIGNAL a, b, sum : std_logic_vector (3 DOWNTO 0);SIGNAL testclk : std_logic := ‘0’;CONSTANT clk_period : TIME := 20 ns;
BEGIN -- beginning of architecture body
-- instantiate unit under test (adder)add1 : adder PORT MAP
( clk => testclk, a => a, b => b, sum => sum);
-- free-running clock process --testclk <= NOT testclk AFTER clk_period/2;
Slide -346
Sample Testbench Using External File (cont.)
END LOOP;
ASSERT NOT found_errorREPORT "---VECTORS FAILED---"SEVERITY FAILURE;
ASSERT found_errorREPORT "---VECTORS PASSED---"SEVERITY FAILURE;
END PROCESS test;END ARCHITECTURE stimulus;
test: PROCESSVARIABLE found_error : BOOLEAN := FALSE;VARIABLE a_var, b_var, sum_var : std_logic_vector (3 DOWNTO 0);VARIABLE vectorline, resultsline : LINE;
BEGINWHILE NOT ENDFILE (vectorfile) LOOP
-- read file into line and line into variablesREADLINE (vectorfile, vectorline);READ (vectorline, a_var);READ (vectorline, b_var);READ (vectorline, sum_var);
-- apply the stimulus on a falling edge clockWAIT UNTIL falling_edge(testclk);a <= a_var;b <= b_var;
-- check result on next falling clock edgeWAIT UNTIL falling_edge(testclk);IF (sum /= sum_var) THEN -- write current simulation time to line variable
WRITE (resultsline, NOW); -- write string WRITE (resultsline, string'(" : Calc= ")); -- write result valueWRITE (resultsline, sum); -- write string WRITE (resultsline, string'(", Exp= ")); -- write expected valueWRITE (resultsline, sum_var); -- write entire line to text file WRITELINE (results, resultsline); found_error := TRUE;
END IF;
Slide -347
Example Files• vectors.txt
– No inherent formatting excepting white-space skipping
– Options• Use separate files for stimulus and expected results• Design custom tasks to extend capabilities (e.g. support comments)
• results.txt (failure example)
0000 0000 00000010 0011 01010100 0110 10100110 1001 11111000 1100 01001010 1111 1001
240 ns Calc= 0100, Exp= 1001
Slide -348
Example Test Plans• Develop high-level behavioral (i.e. non-
synthesizable) model of design• Create stimulus/test vectors to simulate model• Generate expected results from behavioral
model simulation• Replace behavioral blocks with RTL model
blocks– Simulate each RTL block with other
behavioral blocks to ensure functionality is the same
Slide -
Synthesis
349
• The first step in which HDL (or other design format) is associated with internal logic.
• Input: Design. Output: design netlist that feeds into the implementation tools. Other outputs: functional simulation netlist, and reports: provides pertinent information about synthesized design.
• Could be performed immediately following the design phase, it is mandatory (simulation is optional)
• Netlists: connects FPGA resources to perform the same function defined by the high level design.
Slide -
What is Design Synthesis?
350
• FPGA consists of logic blocks that can be configured to perform functions.
• Synthesis takes the high-level design and associates it with FPGA resources and reduce logic to make design more efficient.
• Synthesis process needs information about the FPGA device, such as speed, and internal resources.
• The FPGA is identified by selecting the family, device number, package, and speed.
Slide -
What is Design Synthesis?
351
• Three basic synthesis operations:
• Syntax check and Resource Association: design is checked for syntax and synthesis errors, once the design is error free, it is converted into structural elements, logic elements are inserted as replacement for arithmetic operators (X, -, ..)
• Optimization: Design is put together without concern for redundant logic, timing constraints (if provided), clock speed, or other design consideration. Next. Algorithms are used to optimize the design:• Check for redundant logic, clock
speed, evaluate multiple paths to ensure fastest timing is achieved.
Slide -
Optimization
352
• Shortest path does not mean fastest time because of resources layout and how those resources are used.
• Example: option 2 is longer, however, option 1 has more resource delays and therefore option 2 is faster
Slide -
What is Design Synthesis? Cnt’d
353
• Technology Mapping: Map optimized design to technology associated with the targeted FPGA
• Synthesis tools use advanced techniques to make predictions about how the design will place and routed in the target device.
• Synthesis Tools produce synthesis timing estimates that are near the actual post-implementation timing, real time is unknown until after the design has been placed and routed.
• Example of some technology view symbols
Slide -
Synthesis Phase Tools
354
• Synthesis tools are available as standalone or part of a complete package. • Complete Package Synthesis: Examples: Xilinx ISE, Altera Quartus
• Advantages:• Single tool: need to know only one tool• Faster: eliminate time to switch between tools.• Cheaper• Manufacturer understands device better than a third party, data are more accurate.
• Disadvantages:• Manufacturer dependent
• Standalone Package Synthesis: Examples: LeonardoSpectrum by Mentor Graphics, Synplify Pro by Synopsys• Advantages:
• Manufacturer independent• Disadvantages:
• Separate tools for synthesis and implementation• More expensive than the complete package• Not expert on device
Slide -
Synthesis Setup
355
• Synthesis setup consists:• Device information ( family, device
number, package, and speed)• Input design• User-defined constraint file(s).
• Input Design: Altera’s Quartus accepts:• AHDL (Altera Hardware Description
Language)• VHDL• Verilog• Schematic Capture• EDIF: vendor independent netlist file
• Outputs: • Netlist: the synthesized design• Status reports: utilization, timing, ..• schematic view: RTL
Slide -
Netlists
356
• The design netlist is what your design looks like after it has been sythesized (optimized, connected using internal FPGA logic)
• Functional Simulation netlist: allows to verify the synthesis process did not alter the design, you should expect same results using testbenchs.
• Functional Simulation is done using simulator. (ModelSim)
Slide -
Status Reports
357
• Optional: reports on resource utilizations, timing information, critical paths, warnings and errors.
• Not used as input to other development phases.
• Very helpful information and allow you to identify real or potential problems, such as design is not meeting timing and other constraints.
Slide -
Schematic View
358
• Synthesis tools generates two: RTL and technology
• RTL: shows the pre-optimized design in terms of generic symbols, such as adder, multiplier, counters, AND gates, … etc.
• RTL is manufacturer independent, not associated yet with manufacturer,
• Technology: shows gates and elements as they will look in the device.
Slide -
RTL Schematic View
359
• RTL: how the design looks as it is converted to logic elements
Slide -
Technology Schematic View
360
• Technology: shows the internal technology, such as lookup table connected to create the design.
Slide -
Key points to remember
361
• Synthesis is required and must be performed prior to implementation
• Tools include complete package versus standalone.
• Functional simulation should be performed, time permitting
• RTL and technology views show what logic makes up the design.
Slide -
Quartus II Full Compilation Flow
362
Design Files
Analysis & Elaboration
Synthesis
Fitter
Constraints & settings
Constraints & settings
Functional Simulation
Gate-Level Simulation
EDA Netlist Writer
Functional Netlist
Post-Fit Simulation Files
Programming & Configuration filesTimeQuest
Timing Analysis
AssemblerExecuted in parallel
(multi-processor or multi-core systems only)
Slide -
Netlist Viewers• RTL Viewer
– Schematic of design after Analysis and Elaboration – Visually check initial HDL before synthesis
optimizations– Locate synthesized nodes for assigning constraints– Debug verification issues
• Technology Map Viewers (Post-Mapping or Post-Fitting)– Graphically represents results of mapping (post-
synthesis) & fitting– Analyze critical timing paths graphically– Locate nodes & node names after optimizations
363
Slide -
RTL Viewer
364
Schematic view
Hierarchy list
Note: Must perform elaboration first (e.g. Analysis & Elaboration OR Analysis & Synthesis)
Tools menu Netlist Viewers or Tasks window “Compile Design” tasks
Find in hierarchy
Slide -
Schematic View (RTL Viewer)
• Represents design using logic blocks & nets– I/O pins– Registers– Muxes– Gates (AND, OR, etc.)– Operators (adders, multipliers, etc.)
365
Place pointer over any element in schematic to see details• Name• Internal resource count
Slide -
Schematic Hierarchy Navigation
366
• Descend hierarchy– Double-click on instance– Right-click & select Hierarchy Down
• Ascend hierarchy– Double-click in white space– Right-click & select Hierarchy Up
• Middle hierarchy– Double-click on instance descends– Double-click in white space ascends
Slide -
Technology Map Viewers
367
Tools Menu Netlist Viewers or Tasks window “Compile Design” tasks
Note: Must run synthesis and/or fitting first
Schematic viewHierarchy
list
Slide -
Schematic View (Technology Viewer)• Represents design using atoms
– I/O pins & cells– Lcells– Memory blocks– MAC (DSP blocks)
368
Place pointer over any element in schematic to see details• Name• Internal resource count• Logic equation
Slide -
Hierarchy List• Traverse between levels of design hierarchy • View logic schematic for each hierarchical level • Break down each hierarchical level into netlist elements
or atoms – Instances– Primitives– Pins– Nets– State machines– Logic clouds (if enabled)
369
Slide -
Using Hierarchy List
370
Expanding instances shows• Instances • Pins,• Nets
Highlighting netlist element in hierarchy list highlights/views that element in schematic view
Highlighting netlist element in hierarchy list highlights/views that element in schematic view
Slide -
Timing Analysis - Quartus• TimeQuest GUI
• Using the TimeQuest Timing Analyzer in the Quartus II flow
• Online training course by Altera:• http://www.altera.com/customertraining/we
bex/TimeQuest/player.html
371
Note: For more details on verifying designs for timing, please attend the course “Quartus II Software Design Series: Timing Analysis”Online training also available: TimeQuest Timing Analyzer
Slide -
TimeQuest Timing Analyzer• Timing engine in Quartus II software• Provides timing analysis solution for all levels of
experience and design complexity
372
Features- Synopsys Design
Constraints (SDC) support
- Easy-to-use interface- Scripting emphasis
Slide -
Opening the TimeQuest Interface
• Toolbar button• Tools menu• Tasks window• Stand-alone mode (run w/o opening the Quartus II
software)– quartus_staw
373
Slide -
Quartus Settings File (QSF)• SDC constraints are not stored in QSF
• For 90 nm and older devices, TimeQuest TA provides a script to convert QSF timing assignments to SDC
374
Slide -
TimeQuest GUI
375
Report pane
Tasks pane
Console pane
View pane
Menu access to all TimeQuest features
Slide -
SDC File Editor (1)• Use Quartus II editor to create and/or edit SDC
376
TimeQuest File menu New/Open SDC FileQuartus II File menu New Other Files
Command tooltip
Features- Access to GUI
dialog boxes for constraint entry
- Syntax coloring
- Tooltip syntax help
Slide -
SDC File Editor (2)
377
Construct an SDC file using the TimeQuest graphical constraint creation tools
Constraints inserted at cursor location
Slide -
Using TimeQuest TA in Quartus II Flow
378
Slide -
Steps to Using TimeQuest Tool
379
1. Generate timing netlist
2. Enter SDC constraints by creating or reading in an SDC file
3. Update timing netlist
4. Generate timing reports
Slide -
• Create a timing netlist based on compilation results – Post-synthesis (mapping) or post-fit (fully compiled)– Delay model (slow or fast)
• Netlist menu gives complete control• Tasks pane uses default (post-fit, slow)
1. Generate Timing Netlist
380
Netlist menu Tasks pane
Tcl equivalent of command
Slide -
2. Create or Read in SDC File• Create SDC file using SDC file
editor - Don’t enter constraints using Constraints menu
• Read in constraints & exceptions from existing SDC file
• Execution - Read SDC File (Tasks pane or Constraints menu)
• File precedence (if no filename specified)– Files specifically added to
Quartus II project– <current_revision>.sdc (if it
exists in project directory)381
Tcl: read_sdc [<filename>]
Slide -
Constraining• User MUST enter constraints for all paths to fully analyze
design– Timing analyzer only performs slack analysis on
constrained design paths– Constraints guide the fitter to place & route design in
order to meet timing requirements• Not as difficult a task as it may sound
– Wildcards– Single, generalized constraints cover many paths,
even all paths in an entire clock domain• See Altera TimeQuest Timing Analyzer online training for
information about basic SDC constraints 382
Slide -
3. Update Timing Netlist• Apply SDC constraints/exceptions to current timing
netlist• Generates warnings
– Undefined clocks– Partially defined I/O delays– Combinational loops
• Update timing netlist after adding any new constraint• Execution
– Update Timing Netlist (Tasks pane or Netlist menu)
383
Tcl: update_timing_netlist
Slide -
4. Generate Timing Reports• Verify timing requirements and
locate violations• Check for fully constrained
design or ignored timing constraints
• Two methods– Tasks pane - Shortcut: Automatically
creates/updates netlist & reads default SDC file if needed
– Reports menu - Must have valid netlist to access
384
Double-click individual report(shortcut to skip steps 1-3)
Slide -
Reset Design Command• Located in Tasks pane or Constraints menu• Flushes all timing constraints from current timing netlist
– Functional Tcl equivalent: delete_timing_netlist command followed by create_timing_netlist
• Uses– “Re-starting” timing analysis on same timing netlist
applying different constraints or SDC file– Starting analysis over if results seem to be
unexpected
385
Slide -
Using TimeQuest TA in Quartus II Flow
386
Enable TimeQuest TA in Quartus II project
SynthesizeQuartus II project
Use TimeQuest TA to specify timing requirements
Verify timing inTimeQuest TA
Perform full compilation(run Fitter)
Slide -
Enable TimeQuest TA in Quartus II Software
• Tells the Quartus II software to use SDC constraints during fitting
• File order precedence1. Any SDC files manually added to Quartus II
project (in order)2. <current_revision>.SDC located in project
directory
387
Slide -
Enabling in the Quartus II Software
388
Notes:• Arria GX and newer devices only support
Timequest TA.• TimeQuest TA is enabled by default for new
Stratix III and Cyclone III designs.
Slide -
Adding SDC File to Quartus II Project
• Add SDC files to TimeQuest Timing Analyzer
• Multicorner timing analysis checks all process corners (On by default for Cyclone II, Stratix II, & newer devices)
389
Analyze fast and slow corners during compile
Click Add to add SDC to list
Slide -
Using TimeQuest TA in Quartus II Flow
390
Enable TimeQuest TA in Quartus II project
SynthesizeQuartus II project
Use TimeQuest TA to specify timing requirements
Verify timing inTimeQuest TA
Perform full compilation(run Fitter)
Slide -
Verifying Timing Requirements• View TimeQuest summary information directly in Quartus
II Compilation Report• Open TimeQuest TA for more thorough analysis
– Follow TimeQuest flow using Post-fit netlist– Run TimeQuest easy-to-use reporting capabilities
(Tasks pane) – Place Tcl reporting commands into script file - Easy
repetition• Verify whether Fitter was able to meet timing
requirements
391
Slide -
3rd-Party Timing Analysis Tool Support• Synopsys
– PrimeTime
• Mentor Graphics– TAU
392
Slide -
Design Constraints: An Example
• shows an example circuit including two clocks, a PLL, and other common synchronous design elements
393
Slide -
SDC - Example
394
# Create clock constraintscreate_clock -name clockone -period 10.000 [get_ports {clk1}]create_clock -name clocktwo -period 10.000 [get_ports {clk2}]# Specify that clockone and clocktwo are unrelated by assigning# them to separate exclusive groupsset_clock_groups -exclusive -group [get_clocks {clockone}] -group [get_clocks {clocktwo}] # set input and output delaysset_input_delay -clock { clockone } -max 4 [get_ports {data1}]set_input_delay -clock { clockone } -min -1 [get_ports {data1}]set_input_delay -clock { clockone } -max 4 [get_ports {data2}]set_input_delay -clock { clockone } -min -1 [get_ports {data2}]
Slide -
SDC Example• The SDC file shown contains the following basic constraints you
should include for most designs:– Definitions of clockone and clocktwo as base clocks, and assignment of
those settings to nodes in the design.• create_clock Command• create_clock -period 10 -name clk_sys [get_ports clk_sys]
– Specification of two mutually exclusive clock groups, one containing clockone and the other containing clocktwo. This overrides the default analysis of all clocks in the design as related to each other.
• set_clock_groups -exclusive -group [get_clocks {clockone}] -group [get_clocks {clocktwo}]
– Specification of input delays for the design to specify the external input delay requirement with reference to clock.
• set_input_delay -clock { clockone } -max 4 [get_ports {data1}]
395
Slide -
Summary• TimeQuest timing analyzer provides an
easy-to-use tool to verify timing– Entering timing constraints– Run various timing reports
396
Slide -
Implementation
397
• Also refers as Place and Route (PAR), the hardest job.• Input: Synthesized netlist Output: bit stream or programming file with an optional gate-
level simulation netlist • Maps the synthesized netlist to the specific or target FPGA’s resources and
interconnects them to the FPGA’s internal logi and I/O resources. Physical layout is determined.
• Takes four steps to convert the mid-level netlist to a final programming file – translate, map, place and route, and generate programming file.
Slide -
Translate
398
• Translation process takes the input netlist and merges it with the design constraints (if provided) to create a native generic database (NGD) output file.
• The synthesized netlist is automatically fed into the translation process.
• If error detected, the tool stops.• Once completed, NGD output netlist is automatically fed into the
mapping process.
Slide -
Map
399
• Mapping takes the NGD netlist, the logical design, and maps it to the target FPGA.
• First, a logical DRC (design rule check) is performed on the NGD list.
• The logic is mapped to the target FPGA’s logic cells, I/O cells, and other internal resources.
• The output is a native circuit description (NCD) file.• NCD: the physical representation of the design and mapped to the
target FPGA’s internal resources and components.• NCD feeds into place-and-route stage.
Slide -
Place and Route
400
• Takes the NCD file and interconnects the design (places and routes it).
• The output is NCD which is used to create the programming bit stream.
• Optional gate level simulation, provides actual gate delay based on routing and placement.
• If a functional simulation was successful but not the gate-level simulation, need to narrow down where the problem first occurred.
Slide -
Generate Program File
401
• The final Step: is to generate the programming file with NCD output file from the place-and-route step as input, output is the FPGA’s programming file.
• This programming file resides on a nonvolatile device like PROM or within the FPGA device.
• This bit stream is automatically downloaded to the FPGA at power-up, this process is called configuration.
• Implementation tool provides various option, the bit stream can be compressed or uncompressed, Security options are available to prevent unauthorized downloading of the bit stream.
• Once bit stream is ready, the next step is to program the FPGA.
Slide -
Implementation Tools
402
• Implementation tool is offered by the FPGA’s manufacturer and generally not a third-party company.
• The tools use proprietary algorithms to process the synthesized netlist and produce the final programming file.
• Step up is easy, the synthesized netlist is automatically fed into the implementation process for a complete package development tools.
• Tools must be directed to the synthesized netlist for a third party’s netlist.
• Putting the design into the FPGA and interconnecting can be the most challenging and time-consuming part of the development process.
• Minimum Input: synthesized netlist with an optional user-defined constraints file.
Slide -
Implementation Tools – cnt’dUser Constraints
403
• User-defined constraint files contain such information as timing, pin assignments, and internal placement for logic.
• Constraints make the tool work harder. Make sure to consider all the factors when determining when and what should be constrained.
• Try to keep the device utilization below a reasonable percent. Consider the room needed for potential growths and spare pins.
• Pin assignment is most used constraint since it impacts the board routing.
• Either the tool or you should assign pins.• Possibly, let the tools make the initial pin assignment, review the list,
and make changes as necessary.
Slide -
Implementation Phase Tips
404
• Remember to lock pin assignments, otherwise they are subject to change.
• Create constraints only when necessary.
• Implementation Processes can be performed continuously, if no errors are encountered.
• Consult the data sheet, user’s guide, or other manufacturer’s materials to find acceptable configuration options for your FPGA.
Slide -
Programming
405
• Programming is the final development phase and the introduction of hardware.
• Programming involves transferring the bit stream into a nonvolatile or volatile memory device and configuring or programming the FPGA. Serially or Parallel data transfer.
• Configuration can involve one or a series of daisy chained or connected FPGAs.
• Nonvolatile device are located on the same board as the targeted FPGA or even on another board.
• The FPGA may be operating in wither master (controlling configuration) or slave (not controlling configuration) mode.
Slide -
Tools and Hardware
406
• If the microprocessor holds the bit-stream, then it is merged with the software build. The processor configures the FPGA on power-up.
• For nonvolatile memory, programming options include:• JTAG (Joint Test Advisory Group)• in-system programming (ISP)• Third-party programmer
Slide -
JTAG - Joint Test Advisory Group
407
• IEEE 1149.1, Standard Test Access Port and Boundary Scan Architecture.• Access pins on a JTAG –compatible device that provides visibility inside the
device. Testing and debugging mechanism used to detect manufacturing faults on populated boards.
• Tools include JTAG software and a software host, and the hardware is JTAG cable.
• JTAG software is the interface used to transfer the bit stream from the host to the programmable device.
Slide -
JTAG - Joint Test Advisory Group – cnt’d
408
• Over time, it was realized that JTAG ports could be used for programming.• The pins include:
• TDI (Test Data In)• TDO (Test Data Out)• TCK (Test Clock)• TMS (Test Mode Select)• Optional TRST (Test Reset)
• A JTAG programming involves transferring the bit stream from the host through the JTAG cable to a header, test pins ,or a connector on a board that connects to the JTAG-compatible nonvolatile memory devices.
• FPGA Manufacturers generally offer JTAG programming tools, cable, and any necessary supplies.
Slide -
In-System Programming
409
• Device can be programmed while the system is still operating.• Datasheet specifies whether the device supports ISP.• Tools needed: ISP software on Host, downloadable cable.• Programming can be done by connecting Test Pins to ATE or a board
connector. • Supported protocols are the IEEE for Boundary-Scan-Based In-System
(IEEE1532), JTAG, and serial peripheral interface (SPI).• ISP is a better option.
Slide -
Third Party Programming
410
• Available from third-party manufacturers.• Include GUI, programming base that connects to a computer, and some
socket adaptors or all-in-one programmer.• A socket adaptor is where the programmable device is placed to get
programmed. • Each is designed to hold specific package type.• Example: Data I/O• Manual programming
Slide -
Hardware Configuration
411
• FPGA can be master or slave• Configuration pins are set to specific values to indicate whether it is a
master or a slave.• Always make the programming pins accessible via test points, or a
connector
Mode M2 M1 M0Master serial 0 0 0
Slave serial 1 1 1
Master Parallel 0 1 1
Slave Parallel 1 1 0
JTAG 1 0 1
Slide -
Board Design Tips
412
• Tip 1: When daisy-chain device, make sure to add the ability to jump out or remove any of the device if necessary.
• Tip 2: Design with troubleshooting mindset, test points, pads, or connectors are valuable. Consider using test connectors that mate the lab equipment hardware.
• Tip 3: Select the FPGA package based on the ability to upgrade to a larger size in the same package without re-spinning the board. Make sure the two devices are pin-pin compatible. Goal: upgrade to a larger size without having to redo the board.
• Tip 4: Unused pins, make sure to consult with datasheet for appropriate level, (terminate unused pins).
Slide -
DE2 board
413
• The DE2 board contains a serial EEPROM chip that stores configuration data for the Cyclone II FPGA.
• This configuration data is automatically loaded from the EEPROM chip into the FPGA each time power is applied to the board.
• Using the Quartus II software, it is possible to reprogram the FPGA at any time, and it is also possible to change the non-volatile data that is stored in the serial EEPROM chip.
• JTAG programming: In this method of programming, named after the IEEE standards Joint Test Action Group, the configuration bit stream is downloaded directly into the Cyclone II FPGA. The FPGA will retain this configuration as long as power is applied to the board the configuration is lost when the power is turned off.
• AS programming: In this method, called Active Serial programming, the configuration bit stream is downloaded into the Altera EPCS16 serial EEPROM chip. It provides non-volatile storage of the bit stream, so that the information is retained even when the power supply to the DE2 board is turned off. When the board's power is turned on, the configuration data in the EPCS16 device is automatically loaded into the Cyclone II FPGA.
Slide - 414
Final Exam Scope – Wednesday Dec 19 @ 12:45 pm
• PLD– PROM– PLA– PAL– CPLD– Programming PLD– ASIC
• FPGA Architecture• Quartus Development software• FPGA Programming Technology• SRAM versus Antifuse FPGA• EEPROM/Flash FPGA• Xilinx FPGA Architecture• FPGA basic building blocks• FPGA Embedded Blocks• FPGA Clocking Mechanism• FPGA Family• Altera Megafunctions• FPGA Design flow• Design phase• Advanced VHDL Topics
• Simulation versus Synthesis• Latches versus registers• Common pitfalls• Unwanted latches• Case statement• Variable versus signals• Synthesizable subprograms• Gated clocks• Inferring Logic Functions• Control Signal Priority• Tri-state• Memory• State Machine Coding• VHDL Logic Optimization & Performance
• Balancing operators• Resource Sharing• Logic Duplication• Pipelining
Slide - 415
• Parameterized Code– Constructs
• Pre-Defined Attributes• Generics• For Generate• If generate
• RTL Simulation • Functional Simulation• Gate Level simulation• Testbenchs
– Classes of Testbenches Advantages and Disadvantages
– Test Vector Generation– Self Verifying Testbenches– Useful VHDL constructs for Testbenches
• Synthesis– Synthesis Operation
• Syntax Check and resource association
• Optimization
• Synthesis Operation• Technology Mapping• Synthesis Tools• Netlists• Status Reports• Schematic View (RTL
and Technology View)• Timing Analysis using
TimeQuest• Implementation
• Implementation Processes
• Tools• Programming
• Tools and hardware
Final Exam Scope – Wednesday Dec 19 @ 12:45 pm
Slide - 416
Example - 1• What is DRC and Where it happened in
Implementation phase?
• State the four process of implementation phase?
• Explain the difference between Functional Simulation and Gate level Simulation?
Slide - 417
Example - 1• What is DRC and Where it happened in
Implementation phase?• DRC: Design Rule Check and is performed on the NGD
list in Mapping.• State the four process of implementation
phase?– Translate, Map, Place and Route, and Generate
Program File
Slide -418
Example – 1Functional vs. Gate-Level
• Performed on netlist or code generated by synthesis tool
• Sometimes necessary to direct synthesis tool to provide netlist
• Initial Timing Analysis• Will the synthesized design fit
or work on the target hardware
• Performed on the netlist generated by the implementation tool.
• Contains actually timing information
• Will it work as expected over all operational conditions• Detects design timing problems
• It is– Representative of hardware– Most realistic
Functional Gate-Level
Slide - 419
Example - 2• Given the following entity declaration of a register:
• Write a component instantiation that instantiates the reg entity to implement a 4-bit control register. The register data input connects to the rightmost four bits of data_out, the clk input to io_write, the reset input to io_reset and the data output to control signals io_en, io_int_en, io_dir, and io_mode.
ENTITY reg ISGENERIC (width : positive); PORT ( d : IN STD_LOGIC_VECTOR (0 to width – 1); q: OUT STD_LOGIC_VECTOR (0 to width – 1);Clk, reset : IN STD_LOGIC);END ENTITY reg;
Slide - 420
Example - 2• Write a component instantiation that instantiates the
reg entity to implement a 4-bit control register. The register data input connects to the rightmost four bits of data_out, the clk input to io_write, the reset input to io_reset and the data output to control signals io_en, io_int_en, io_dir, and io_mode.
Io_control_reg : reg GENERIC MAP (width => 4); PORT MAP ( d => data_out (3 downto 0), q(0) => io_en, q(1) => io_int_en, q(2) => io_dir, q(3) => io_mode, clk => io_write, reset => io_reset);END ENTITY reg;
Slide - 421
Example - 3• Draw a diagram illustrating the circuit described by
the following generate statement:
Synch_delay_line : for stage in 1 to 4 generateDelay_ff : component d_ff
port map (clk => sys_clock,d => delayed_data ( stage – 1),q => delayed_data (stage) );
End generate synch_delay_line;
Slide - 422
Example - 3• Draw a diagram illustrating the circuit described by
the following generate statement:
Synch_delay_line : for stage in 1 to 4 generateDelay_ff : component d_ff
port map (clk => sys_clock,d => delayed_data ( stage – 1),q => delayed_data (stage) );
End generate synch_delay_line;
Slide - 423
Example - 4• Write a conditional generate statement that connects a signal
external_clock directly to a signal internal_clock if a Boolean generic constant positive_clock is true. If the generic is false, the statement should connect external_clock to internal_clock via an instance of an inverter component.
Slide - 424
Example - 4• Write a conditional generate statement that connects a signal
external_clock directly to a signal internal_clock if a Boolean generic constant positive_clock is true. If the generic is false, the statement should connect external_clock to internal_clock via an instance of an inverter component.
Slide -425
Logic Duplication• Intentional duplication of logic to improve
performance• Synthesis tools can perform automatically
– User sets maximum fan-out of a node
Slide -426
Fan-out Problems• High fan-out increases placement difficulty
– High fan-out node cannot be placed close to all destinations
– Ex: Fan-out of 1 & 15
Slide -427
Controlling Fan-out• Logic fan-out reduced by replication
– Path now contains fan-out of 3 & 5
Slide -428
Logic Duplication Example• High fan-out node duplicated & placed to
reduce delay
N
Slide -429
• Most synthesis tools feature options which limit fan-out
• Advantage: Easy experimentation
• Disadvantage: Less control over results– Knowing which nodes have high fan-out &
their destination helps floor-planning
Automatic Fan-out Control
Slide -430
Quartus II Software Fan-out Control
Select Signal Details
Slide -431
PROCESS (clk)BEGINIF rising_edge(clk) THEN
IF sclr_cell = '1' THENregc <= (others => '0');
ELSE regc <= regc(62 downto 0)
& regb (63);END IF;IF sclr_cell = '1' THEN
regb <= (others => '0');ELSE
regb <= regb(62 downto 0) & rega (63);
END IF;IF sclr_cell = '1' THEN
rega <= (others => '0');ELSE
rega <= rega(62 downto 0) & d;
END IF;END IF;END PROCESS;
q_out <= regc(63);
Shift Register Example
– sclr_cell fans out to each DFF within 3 64 bit shift registers
– The shift registers are cascaded to produce one 192 bit shift register
– sclr_cell provides a synchronous clear function
Slide -432
Fan-out to 192 Registers
Slide -433
PROCESS (clk)BEGINIF rising_edge(clk) THEN
IF sclr_cell(2) = '1' THENregc <= (others => '0');
ELSE regc <= regc(62 downto
0) & regb (63);END IF;IF sclr_cell(1) = '1' THEN
regb <= (others => '0');ELSE
regb <= regb(62 downto 0) & rega (63);
END IF;IF sclr_cell(0) = '1' THEN
rega <= (others => '0');ELSE
rega <= rega(62 downto 0) & d;
END IF;END IF;END PROCESS;
q_out <= regc(63);
Shift Reg with Reduced Fan-out
– sclr_cell is replicated so that it appears 3 times
– Fan-out from the previous cell has gone from 1 to 3 but this is insignificant
Slide -434
Fan-out to 64 Registers
0
a
b
0
0
1
1
0
c
0
1
0
1
0
1
0
1
X
Slide - 436
Topics – Exam II• State Machine Coding• VHDL Logic Optimization & Performance
– Balancing operators– Resource Sharing– Pipelining
• Parameterized Code– Constructs
• Pre-Defined Attributes• Generics• For Generate• If generate
• Simulation– RTL Simulation – Functional Simulation– Gate Level simulation
• Testbenchs– Classes of Testbenches Advantages and
Disadvantages– Test Vector Generation– Self Verifying Testbenches– Useful VHDL constructs for Testbenches
• Synthesis• Synthesis Operation
• Syntax Check and resource association
• Optimization• Technology Mapping• Synthesis Tools• Netlists• Status Reports• Schematic View (RTL
and Technology View)