© Copyright 2013 Xilinx.
© Copyright 2013 Xilinx.
Vivado Design SuiteUltraFastTM Design Methodology Guidelines For Predictable Success
© Copyright 2013 Xilinx.
UltraFastDesign Methodology
Best practices for PCB planning,
HDL design, closure
Predictable success in weeks,
not months
Xilinx Delivers an ASIC-Class Advantage Through Silicon, Tools, and Methodology
Page 2
UltraScale ASIC-Class Architecture
ASIC-class capabilities
Enables massive data flow
Removes interconnect bottlenecks
Vivado ASIC-Strength Design Suite
Accelerate system integration:
− 15x faster C++ verification
− Interface-level connections
Accelerate implementation:
− 4x faster analytical P&R
ASIC-Class Advantage
© Copyright 2013 Xilinx.
UltraFast Methodology Introduction
Write HDL code that best fit the hardware
Timing constraints creation and validation
Clock planning, Pin planning, Floorplanning
Agenda
Page 3
© Copyright 2013 Xilinx.
Fast Compile Times and Predictable Results – Require good methodology
Project Schedules Drive Time To Market– Manage risk affectively– Minimize Iterations, especially late-stage changes– Explore options early with estimation and progressive analysis
Proven Recommendations from Successful Customers– Best Practices with Checklists and Links to Documentation– Verification Tools and Reports– Linting and DRC
UltraFastTM MethodologyBenefits
Page 4
© Copyright 2013 Xilinx.
PCB planning: Avoid board re-spins– Use XPE to validate power against budget– Use Vivado I/O planning & DRC on a top level including all I/F
Design Creation: Coding style for best QoR– Use HDL language templates in Vivado– New Linting capability: Methodology DRC ruledeck
Implementation: Rapid convergence & signoff timing– Rapid convergence technique: Closure with the simplest constraints– Signoff convergence: Closure with pristine constraints– Use XDC language templates & Timing DRC ruledeck
UltraFast User Guide: UG949
Page 5
© Copyright 2013 Xilinx.
Start closure at the front-end of the design flow– Engage UltraFast early– Faster iterations than in the back-end– Greater impact on Quality of Results (QoR)
Overall Strategy for Accelerated Design CycleEarlier Iterations
Device/IPselection
PCB /Planning
ImplementationClosure
IP Integration, RTL Design, Verification
Config., Bring-up,Debug
100x 10x 1.2x 1.1x
Impact on QoR
Reduce Design Cycle Time & Cost
Page 6
© Copyright 2013 Xilinx.
Project Planning & Kickoff
Board Planning & Schematic Creation
Design Creation & IP Integration
Implementation & Design Closure
Configuration Programming & Hardware Debug
Page 7
UltraFAST Design Methodology Guide – UG949
© Copyright 2013 Xilinx.
Design Methodology Checklist in DocNavSample section
© Copyright 2013 Xilinx.
Checklist
Spreadsheet based checklist to be used by designer and FAE to review key portions of board schematic for FPGA/SOC– Power Distribution System, Configuration, Transceivers, XADC, I/O Interfaces
© Copyright 2013 Xilinx.
© Copyright 2013 Xilinx.
Vivado Design SuiteUltraFastTM Design Methodology Guidelines For Predictable Success
© Copyright 2013 Xilinx.
Shares design information between implementation steps– Ensures fast convergence and timing closure
Enables use of the same commands & reports to analyze design at every step
Enables cross-probing
Highly efficient memory utilization– Scalable for next decade of designs
Vivado Enables Design MethodologyKey Technology: Shared, Scalable Data Model
Estimation
IP Integration
RTL Design Synthesis Place & Route
Progressive estimation accuracy across the entire flow
Shared, Scalable Data Model
Reduce iterations late in the cycle
entity FIR is port (clk : in rst : in din : in
Timing Report
Timing Path #1Timing Path #2Timing Path #3
RTL Schematics Placement
Reports
Code Changes
Tool Settings
Placement Edits
Page 11
© Copyright 2013 Xilinx.
Prioritize and close 1 step at a time– Converge first at Synthesis (faster, higher impact), then in back-end– Start with the simplest (baseline) constraint:
• Internal Fmax (flop-to-flop constraints) which is the problem 9/10 times• Define proper clock dependencies
– Make sure the design & constraints are reasonable– Analyze, get to root cause, then decide how to fix it
• Clock path vs. data path vs. interconnect delay vs. logic delay…
– Add I/O constraints (with Vivado XDC templates) and redo…
Do not confuse with “Signoff” Constraints– You still want complete constraints
View QuickTime Video for UltraFast Design Methodology for Timing Closure
Technique for Rapid Timing Closure Baselining
Page 12
© Copyright 2013 Xilinx.
Progressive Approach to Design Closure
Synthesis•Analysis
Place•Analysis
Route•Analysis
Baseline Constraints
Optimize Internal Paths
Fmax
Baseline XDC
Synthesis•Analysis
Place•Analysis
Route•Analysis
Add I/O Constraints
Optimize Entire Chip
Fmax
Complete XDC
Synthesis•Analysis
Place•Analysis
Route•Analysis
If neededAdd Timing Exceptions
and/or Floorplan
Fine-tune
Fmax
Final XDC
Page 13
© Copyright 2013 Xilinx.
Critical Path could be a Moving TargetExample from a Real Design
Page 14
worst path: 4.1ns
worst path: 4.3ns
worst path: 4.2ns
Analyze & Fix timing issues at early stages forfaster timing convergence
Post-synthesis estimates (the real problem)– Worst path: 13 levels of logic
Post-place– Worst path: 7 levels– Paths with 7-13 levels got placed locally
Post-route (the side-effect of the real problem)– Worst Path: 4 levels of logic– Paths with 5-13 levels got preferred routing
© Copyright 2013 Xilinx.
© Copyright 2013 Xilinx.
Vivado Design SuiteWriting HDL Code that Best Fits the Hardware
© Copyright 2013 Xilinx.
Block inference– Follow recommended templates for RAM, DSP, LUTRAM, SRL inference
Pipeline your design to reduce levels of logic
Think about Reset– Taxes routing not always needed: Xilinx devices boot in a known state– Dedicated shifters (SRLs) and RAM memory arrays don’t use resets
Synchronous resets are preferred– Allows packing of registers into dedicated RAM and DSP blocks– Tools have the option to implement reset in datapath (LUT)
Give more freedom to Synthesis– Revisit attributes needed by other synthesis engines or older releases– Avoid KEEP, dont_touch, syn_preserve, max_fanout attributes…
Review Design Creation Chapter in UG949
Review Design Creation tab in the Design Methodology Checklist
Page 16
Impact of HDL Coding Style
© Copyright 2013 Xilinx.
Accessing templates in IDE– Windows Language Templates
Synthesis Templates– BRAM, LUTRAM, ROM, SRL– Counter, MULT– FSM, Decoder, Encoder– …
Using HDL Language Templates
Page 17
© Copyright 2013 Xilinx.
Coding to Match the HardwareDSP48 Blocks and BRAM Blocks
Leverage DSP block cascading capabilities
Avoid Block RAM collision avoidance logic(*)
Page 18
DSP48
in
out
in
out DSP48 DSP48 DSP48
Adder tree becomes aperformancebottleneck
Pipelined adder chaindelivers optimal performance
rdaddrwraddr
din
=
dout
RAMBrdaddrwraddrdin
dout
RAMBSynthesis assumes collision
Inference withcollision check disabled
(*): logic added by default by Synplify (attribute syn_no_rw_check removes the logic)
© Copyright 2013 Xilinx.
Increase performance with the right reset choice– Think Local, not Global with resets– No reset at all (if possible) is best – Synchronous rather than asynchronous reset– Active HIGH rather than active low reset– Default register value can be controlled via the
INIT property or at signal declaration in RTL
The Impact of Resets
Page 19
From: UG949 Chapter 4 Design Creation – Control Signals and Control Sets
© Copyright 2013 Xilinx.
Resets compete for the same resources as the rest of the active signals of the design – Including the critical datapath paths
Designs that minimize or eliminate resets have– About 18% fewer timing paths on average– About 15% less runtime on average– 10% fewer registers and 7% fewer LUTs– 20% lower timing scores– Use less memory
Be selective with where you code resets
Initialize all registers in the VHDL / Verilog code
Reset Routing
Page 20
© Copyright 2013 Xilinx.
Many designs need some resets– Very few designs require resets on all registers
• Most ASICs require a described reset on every register for testability • But the FPGA has a built-in Global Set/Reset (GSR)
Guideline: Be selective with where you code resets– Only place resets that have impact on functionality
• I/O, State-machines, critical control logic, etc.
– Omit resets that do not
Initialize all registers in the VHDL / Verilog code– This should be done whether using a reset or not
More on Resets
VHDL:
signal my_regsiter : std_logic_vector (7 downto 0) := “01010101”;
Verilog:
reg [7:0] my_register = 8’h55;
© Copyright 2013 Xilinx.
report_high_fanout_nets– To reduce fanout on a net use…
• max_fanout (Vivado synthesis and XST)• syn_maxfan (Synplify)
– Use phys_opt_design for timing driven replication
Page 22
Gauging Other Design Metrics
From: Design Methodology Checklist – Design Creation tab
© Copyright 2013 Xilinx.
report_control_sets– Indicator of possible packing fragmentation and fitting issues– Run the –verbose option to generate a full list– Use Synplify’s syn_reduce_controlset_size attribute for control
Default is 2, set it to 8 to eliminate most lowest fanout control sets
Page 23
Gauging Other Design Metrics
From: Design Methodology Checklist – Design Creation tab
© Copyright 2013 Xilinx.
Two new rule decks in 2013.3– methodology_checks– timing_checks
Usage:– report_drc –ruledeck methodology_checks– report_drc –ruledeck timing_checks– Specific “methodology_checks” available only for the elaborated
design
Methodology DRCs
Tools →Report → Report DRCsPage 24
© Copyright 2013 Xilinx.
Vivado does not stop for Critical Warnings– Enables fixing many issues at once– Bitstream generation will error with unresolved critical warnings
Page 25
Review and Resolve Critical Warnings
From: UG949 Chapter 5 Implementation – Moving past Synthesis
© Copyright 2013 Xilinx.
Critical warnings are serious design issues– Invalid constraints or XDC syntax errors– Path segmentation– Netlist or target objects not found or invalid
Address these warnings before moving forward– Results of design analysis may be inaccurate – Critical Warnings may prevent design success
Page 26
Review and Resolve Critical Warnings
© Copyright 2013 Xilinx.
© Copyright 2013 Xilinx.
Vivado Design SuiteTiming Constraints Creation and Validation
© Copyright 2013 Xilinx.
When constraints (clock, IO) are missing– The corresponding paths are timed optimistically– No violation will be reported but design may not work on HW
When path are incorrectly constrained – Runtime and optimization efforts will be spent on the wrong paths– Reported timing violations may not result in any issues on HW
When constraints create wrong HOLD violations– May result in long runtime and SETUP violations– P&R fixes HOLD violations as #1 priority, because:
• Designs with HOLD violations won’t work on HW• Designs with SETUP violations will work, but slower
Review the Creating Constraints section of the Design Creation Chapter in UG949 & checklist
Page 28
Timing Constraints Need to Be "Clean"
© Copyright 2013 Xilinx.
Many cores have their own constraints / exceptions– PCIE, MIG, RAM-based asynchronous FIFOs…
Non-native IP: Be careful!– Very easy to drop the IP constraints especially if provided as .ngc files
Native IP: Constraints included– Sources window in IDE: Compile Order Constraints– Use report_compile_order –constraints to identify constraint file sources
Page 29
Include IP Constraints
© Copyright 2013 Xilinx.
Create clocks and define clock interactions
– Four-step guideline
Set input and output delays
– Beware of creating incorrect HOLD violations
Set timing exceptions
– Less is more!
– Beware of creating incorrect HOLD violations
Use report commands to validate each step
Method to Create Good Constraints
Page 30
© Copyright 2013 Xilinx.
For SDC-based timers, clocks only exist if you create them– Use create_clock for primary clocks
Clocks propagate automatically through clocking modules– MMCM and PLL output clocks are automatically generated– Gigabit transceivers are not supported. Create them manually.
Use create_generated_clock for internal clocks (if needed)
All inter-clock paths are evaluated by default
Clock Ground Rules
don’t create_clock here
Page 31
create_clock here
© Copyright 2013 Xilinx.
Run report_timing_summary before starting constraint capture– View report_clocks section to see all signals driving clock pins
Step 1– Use create_clock for all primary clocks on top level ports– Run the design (synthesis) or open netlist design
Step 2– Run report_clocks– Study the report to verify period, phase and propagation– Apply corrections to your constraints (if needed)
Four Steps for Creating Clocks
Attributes P: Propagated G: Generated
Clock Period Waveform Attributes Sourcessys_clk 10.000 {0.000 5.000} P {sys_clk}pll0/clkfbout 10.000 {0.000 5.000} P,G {pll0/plle2_adv_inst/CLKFBOUT}pll0/clkout0 2.500 {0.000 1.250} P,G {pll0/plle2_adv_inst/CLKOUT0}pll0/clkout1 10.000 {0.000 5.000} P,G {pll0/plle2_adv_inst/CLKOUT1}
Output of report_clocks (excerpt)Page 32
© Copyright 2013 Xilinx.
Step 3– Evaluate the clock interaction using report_clock_interaction
BEWARE: All inter-clock paths are constrained by default!
– Mark inter-clock paths (Clock Domain Crossing) as asynchronous• Make sure you designed proper CDC synchronizers
• Use set_clock_groups (preferred method to set_false_path)
BEWARE: This overrides any set_max_delay constraints!
– Do you have unconstrained objects? • Find out with check_timing
Step 4– Run report_clock_networks– You want the design to have clean clock lines without logic
• Tip: Use clock gating option in synthesis to remove LUTs on the clock line
Four Steps for Creating Clocks (continued)
Page 33
© Copyright 2013 Xilinx.
Defining & Validating Clock Interactions
Page 34
© Copyright 2013 Xilinx.
Constraining Cross Clock Domains
set_property ASYNC_REG TRUE \[get_cells [list sync0_reg
sync1_reg]]
Use appropriate synchronizing techniques– 2 or more register stages, for single bit– FIFO for buses
Maximize MTBF– ASYNC_REG to place synchronizing flops in
the same slice for best Mean Time Between Failures (MTBF)
Page 35
© Copyright 2013 Xilinx.
Ignoring timing paths between individual clocksset_clock_groups –asynchronous –group {clk1} –group {clk2}
This is equivalent to:
set_false_path –from [get_clocks clk1] –to [get_clocks clk2]
set_false_path –from [get_clocks clk2] –to [get_clocks clk1]
BEWARE: This overrides any set_max_delay constraints!
Ignoring timing paths between groups of clocks# SDC create_clock for the two primary clocks
create_clock -name clk_oxo -period 10 [get_ports clk_oxo]
create_clock -name clk_core -period 10 [get_ports clk_core]
# Set Asynchronous Clock Groups
set_clock_groups -asynchronous
-group [get_clocks –include_generated_clocks clk_oxo] \
-group [get_clocks –include_generated_clocks clk_core} ]
BEWARE: This overrides any set_max_delay constraints!
Constraints for Asynchronous CDC
Page 36
© Copyright 2013 Xilinx.
Start with no IO constraints
– Focus on finding and fixing core timing issues
– Vivado does not time from IOs without IO constraints
• No Need to false_path –from or –to get_ports to ignore IO timing
Specify realistic IO delays Once Core Timing Reasonable
– Use set_input_delay and set_output_delay
– Wrong delay value (e.g. <0 ns) can cause invalid analysis
The delay value specified is the external delay
– Default in UCF: internal delay
Setting Input / Output Delays
Page 37
© Copyright 2013 Xilinx.
set_multicycle_path N implies a HOLD check at N-1– E.g.: a multicycle_path of 10 implies a HOLD requirement of 9 cycles!
Whenever setup check is changed, hold check is also changed
Guidelines for proper multicycle path constraints– Should always be pairs of set_multicycle_path constraints
• One for –setup and one for –hold– Bring the HOLD requirement back to 0 (reduce by N-1) to avoid incorrect HOLD violations
Multicycle Paths
Page 38
regA
CE
D Q
regB
CE
D Q
Multicycle Path = 3T
CLK
regA/CLK
regB/CLK
REGB/D
SETUPHOLD
set_multicycle_path –from [get_cells regA] –to [get_cells regB] 3 -setup
set_multicycle_path –from [get_cells regA] –to [get_cells regB] 2 –hold
hold checked at edge 3-1-2 = 0
© Copyright 2013 Xilinx.
Accessing templates in IDE– Windows Language Templates
SDR & DDR Templates– Inputs and outputs– Source / System synchronous– Center / Edge aligned
Using Vivado Language TemplatesXDC Template
Page 39
© Copyright 2013 Xilinx.
Reading the report_timing_summary
– Intra-clock report– Inter-clock report
Use report_timing for interactivity and advanced options– You would typically use it in the TCL window
• report_timing –through [get_nets {/cpu_top/crit_net_name}]• report_timing –setup –max_paths 10 # For 10 worst setup paths• report_timing –hold –to [get_cells {/top/item}] # Hold on “item”
– Use filters from your XDC files to check each expression• set_multicycle_path –from [get_pins regA/C] –to [get_pins regB/D]
• report_timing –from [get_pins regA/C] –to [get_pins regB/D]
Reading the Reports
Page 40
© Copyright 2013 Xilinx.
Timing Command Summary
Obtain full timing summary of the design– report_timing_summary: summary subsections for all timing checks
Create and validate clocks– check_timing: for missing clocks and IO constraints– report_clocks: check frequency and phase– report_clock_networks: possible clock root
Validate clock groups– report_clock_interaction
Validate I/O delays– report_timing –from [input_port] –setup/-hold– report_timing –to [output_port] –setup/-hold
Add exceptions if necessary– Validate using report_timing
Page 41
© Copyright 2013 Xilinx.
Using a single XDC file– XDC apply to both synthesis & implementation
Using multiple XDC files– Main XDC with top level constraints
• Primary clocks and I/O delays
• Exceptions on clocks and RTL objects
– Implementation specific XDC• Physical constraints
• Exceptions based on physical netlist
The order of constraint files matters!– To report the order of XDC files: report_compile_order –constraints
Page 42
Managing Constraint Files
Implementation
main.xdc
impl.xdc
Synthesis
Elaboration
© Copyright 2013 Xilinx.
Page 43
Managing IP Constraint Files
Some IP come with their own XDC constraints– Example: The clocking wizard
The order of constraint files matters!– To report the order of XDC files: report_compile_order –constraints– Always verify the clocks using report_clocks (step 2 of 4-step process)– To change the default processing order
set_property set_processing_order early|late IP_XDC_File
– If necessary, IP_XDC_files can be enabled/disabled
The clocking wizard XDC will be read before the user XDC by default(user constraints can override IP defined clocks by default)
© Copyright 2013 Xilinx.
© Copyright 2013 Xilinx.
Vivado Design SuiteClock Planning, Pin Planningand Floorplanning
© Copyright 2013 Xilinx.
Pin and Clock Planning often happens early in the Project– Decisions here can have prolific effects throughout the design
• Excessive clock skew• Poor I/O timing• Timing hazardous clock domain crossing• Less flexible logic placement• Fewer clocking resource choices• Excessive routing delays• Reduced device utilization
Pin and Clock Planning should be considered together– Choices made for clock pins affect clocking timing and resources choices– Choices made for data pins affect clock pin placement decisions
Review the Board & Device Planning Chapter in UG949
Review the Board and FPGA Planning tab in the Design Methodology Checklist
Page 45
Clock and Pin Planning
© Copyright 2013 Xilinx.
Considerations for clock pin planning– Generate all I/O interface and clocking IP prior to pin assignment
– Consolidate clocking where possible and consolidate MMCMs• Fewer clocks and MMCM means fewer clock resources and crossings
– Consider all CDC when assigning clocking resource and pins
Considerations for data pin planning– Group related data pins in same bank, or adjacent banks if single bank not possible
• Place associated I/O clock in same bank when possible
– Consider associated control signal placement along with data paths
– Consider data flow as planning pinout• Chose a pinout that has clean passage through device
– Place high fanout signals towards the middle of the chip• Really high fanout signals considered for CCIO pins with BUFG resources
– Evaluate all pin attributes (I/O Standard, Slew, etc.) during placement
Page 46
Clock and Pin Planning
© Copyright 2013 Xilinx.
Use Vivado Pin Planning capabilities– Import pin & clocking assignments from generated IP
– Visualization of I/O resource placement on package and in device
– DRC, SSN and other checks available to validate choices
– Configuration pin assignments & possible device migration considerations
Page 47
Clock and Pin Planning
Re-evaluate in Vivado any subsequent pin changes– Understand how PCB pin swaps
affect timing & resources
Vivado I/O & Clock Planning Tutorial UG935– Available in DocNav and Vivado
© Copyright 2013 Xilinx.
Clocking – High fanout clocks should be placed in center SLRs
– Place regional clocks on center clock region within an SLR– Place clock pin / MMCMs in same SLR as timing critical I/O interfaces
(avoid driving timing critical I/O interfaces from a different SLR)– Clock pin choices should be balanced across upper & lower SLR:
• 2 upper SLR clock domains have 8 BUFG x 2
• 4 lower SLR clock domains have 4 BUFG x 4
Pinout– High fanout signals feeding all SLRs placed in center SLRs
– I/O interfaces should not span across SLRs
– Pay attention to data flow across SLRs• Avoid the need for multiple SLR crossings due to pinout decisions
For more details
Consult UG872: Large FPGA Methodology Guide for more details
Page 48
Additional Considerations for SSI Devices
© Copyright 2013 Xilinx.
Improving Placement Through Floorplanning
First improve HDL, synthesis & constraints– Easier, more repeatable to not floorplan when avoidable
Start design without any floorplanning– See what P&R algorithms can do without restrictions
Using Vivado IDE– Highlight placement per module as guideline– Visualize placement of critical timing paths
• Understand data flow in & out of Pblocks • Understand affects of Pblock inside & out• Resources around placement can affect data flow
– Create Pblocks minding resource utilization
Careful not to over floorplan – Less is best– Only floorplan the critical areas of the design– Do not create Pblocks with very high utilization
• Can create routing congestion or new timing problems
– Avoid overlapping Pblocks• Creates more complex placement and clock scenarios
Page 49
Baseline run with highlighted regions
© Copyright 2013 Xilinx.
© Copyright 2013 Xilinx.
Vivado Design SuiteSummary
© Copyright 2013 Xilinx.
For optimal results, adapt your HDL style to the FPGA– Be mindful of BRAM, LUTRAM, DSP, SRL inference needs– Avoid asynchronous reset and wired resets in general– Minimize control signals– For large FPGAs, design with the dataflow and floorplanning in mind
Baseline your constraints to converge rapidly
Provide clean timing constraints– Bad constraints results in bad runtime, performance and HW failures– Learn the essentials of timing creation & validation methods
Follow pin/clock planning guidelines– Must follow dataflow– Place large fanout clocks and pins in the center of SSIT devices
UltraFastTM Methodology Review
Page 51
© Copyright 2013 Xilinx.
Follow Xilinx
facebook.com/XilinxInc twitter.com/XilinxInc youtube.com/XilinxInc
© Copyright 2013 Xilinx.
© Copyright 2013 Xilinx.
Vivado Design SuiteThank You