24
ULTRASCALE FPGA DDR4 2400 MBPS SYSTEM LEVEL DESIGN OPTIMIZATION AND VALIDATION

ULTRASCALE FPGA DDR4 2400 MBPS SYSTEM LEVEL DESIGN OPTIMIZATION AND VALIDATION … · 2021. 2. 4. · ultrascale fpga ddr4 2400 mbps system level design optimization and validation

  • Upload
    others

  • View
    45

  • Download
    0

Embed Size (px)

Citation preview

  • ULTRASCALE FPGA DDR4 2400 MBPS

    SYSTEM LEVEL DESIGN OPTIMIZATION

    AND VALIDATION

  • Massive amount of High Performance IO can be used for DDR4

    FPGA High Speed High Bandwidth Unique

    Challenges

  • High Speed High Performance IO supports many memory

    interface; hence, the IO capacitance is higher than in ASIC

    design.

    Multiple Supported for High Performance IO

    (HPIO) Standards

    Memory IO Standards

    DDR4 POD12

    DDR3 SSTL15

    DDR3L SSTL135

    LPDDR3 HSUL

    RLDRAM3 SSTL12

    QDR4 HSTL/SSTL:1.2V,1.25V

    POD:1.1V,1,2V

    QDR2+ HSTL:1.2V,1.8V

  • High Performance IOs are located at the center of the package,

    IO breakouts are more susceptible to cross talk

    FPGA HP IO location

  • High Speed Parallel Bus System Considerations

    Page 5

    What is the memory interface speed?

    What is the memory devices variations & electrical limits?

    What is the electrical channel characteristics ?

    What is the optimal design space ?

    FPGA

  • High Speed Parallel IO Bus System Design Flow Memory System

    Analysis Start

    i=0

    Channel Config_i

    Set Up Design

    Of

    Experiment Runs

    Analyze Factors

    Main Effects

    On Output Response

    Meet

    Budget

    ?

    Enablers

    i=i++

    No

    End

    Yes

  • Statistical Design Of Experiment Approach

    Identify the performance output (Response)

    Identify the design factors (parameters) limits

    Create design run table & simulate response(s)

    Analyze response & Identify key design parameters from Prediction

    Profiles

  • Design Parameter Table & Design Data Eye

    Response

  • Mother Board via Improvement Quantification

    –Upper Routing vs Lower Routing improvement

    Tx Feature

    – POD12 Driver with De-emphasis Equalization

    Rx Feature

    –Continuous Time Linear Equalization

    De-skew Feature

    –Data (DQ) & Data Strobe (DQS) per bit de-skew

    IO Key Features to Enable DDR4 Interface

  • Mother Board Via Cross Talk Quantification

  • Upper and Lower Routing Eye Diagram

    Comparison

    Lower Layer Upper Layer

    ~ 7.2% jitter improvement using upper layer

  • Tx De-emphasis Architecture

    0 0 1 1 0 0 1 1 0 0 1 1 0 0 1

    Vshelf Vswing

    2 tap De-emphasis Spec(dB)= −𝟐𝟎 𝒍𝒐𝒈(𝑽𝒔𝒉𝒆𝒍𝒇

    𝑽𝒔𝒘𝒊𝒏𝒈)

    = −𝟐𝟎 𝒍𝒐𝒈(( 𝒄𝟎 −|𝒄−𝟏|)/𝟐

    ( 𝒄𝟎 +|𝒄−𝟏|)/𝟐)

    𝒄𝒌 = 𝟏

    𝒌=−𝟏

    𝒌=𝟎

    D

    X n

    D Q T x s ig n a l

    W it h

    D e - e m p h a s is

    C - 1

    X n - 1

    C 0

    D Q 0 ( I / O )

    D Q 1 ( I / O )

    D Q S ( I / O )

    D Q S # ( I / O )

    C K

    G E N

    D

    M a in

    D r v

    E Q

    D r v

    T X

    F I R

    D Q 7 ( I / O )

    D Q 0 ( I / O )

    D Q 1 ( I / O )

    D Q S ( I / O )

    D Q S # ( I / O )

    D Q 7 ( I / O )

    C o n t r o l le r

    P h y ( I / O )D R A M s

  • Write Data Eye Improvement with De-emphasis S

    21

    (dB

    )

    With De-emphasis ~ 4% improvement

  • Receiver Continuous Time Linear Equalizer

    DQ0 (I/O)

    DQ1 (I/O)

    DQS(I/O)

    DQS#(I/O)

    CTLE RCV

    DQ7(I/O)

    DQ0 (I/O)

    DQ1 (I/O)

    DQS (I/O)

    DQS# (I/O)

    DQ7 (I/O)

    Vref

    DRAMs FPGA

  • Read Data Eye Improvement with CTLE

    Improvement S

    21

    (dB

    )

    With CTLE ~ 9.6% improvement

  • D Q 0 (I/O )

    D Q 1 (I/O )

    D Q S(I/O )

    D Q S#(I/O )

    CK_G EN _

    D Q

    D

    M ain

    D rv

    EQ

    D rv

    TX

    FIR

    D Q 7(I/O )

    D Q 0 (I/O )

    D Q 1 (I/O )

    D Q S (I/O )

    D Q S# (I/O )

    D Q 7 (I/O )

    CTLE

    R CV

    V ref

    D elay

    CK_G EN _

    D Q S

    D

    M ain

    D rv

    EQ

    D rv

    TX

    FIR

    CTLE

    R CV

    Vref

    D elay

    Per Bit De skew Capability

    DRAMs FPGA

    CK_GEN

    Delay

    Delay

  • Validation System Configuration

    Write Shmoo Procedure Overview

    Read Shmoo Procedure Overview

    Data Eye Scope Capture

    Over Clocking Results

    Experimental Data Validation

  • Validation System

    FPGA

    5 DRAMs

    DIMM

    4 DRAMs

    9 DRAMs

  • Write Shmoo Margining Test Flow

    Margining

    Shmoo

    DQ[7:0]

    DQS_w

    Delayed DQS_w

    Margin

    = Starting Pt

    After Calibration

    Write DQS pushes to find the min. passing eye

    DQ[7:0]

    DQS_r

    = FPGA

    Internal Cal.

    Strobe position

    DQ[7:0]

    DQS_r

    F P F F P P P P P F F F DRAM Vref_i F F F F P P P P F F F F DRAM Vref_i+1

    DRAM Vref_i-1 F F F F P P P P F F F F

    F F F F P P F F F F F F F F F F F F F F F F F F F F

    F F F F P P F F F F F F F F F F F F F F F F F F F F

  • Read Shmoo Margining Test Flow Write Data Send to DRAM (like regular Write)

    = Starting Pt

    After Calibration

    Read

    Margining

    DQ[7:0]

    DQS_w

    DQ[7:0]

    DQS_r

    = FPGA

    Internal Cal.

    Strobe position

    DQ[7:0]

    Margin

    Delayed

    Internal DQS_r

    Internal DQS_r

    FPGA Vref_ j FPGA Vref_ j+1

    FPGA Vref_ j-1 F F F F P P P P F F F F

    F F F F P P F F F F F F F F F F F F F F F F F F F F

    F F F F P P F F F F F F F F F F F F F F F F F F F F

    F F F F P P P P F F F F F P F F P P P P P F F F

  • Write and Read Eye Shmoo at 2400MTs

    Write Eye Shmoo Read Eye Shmoo

    UI UI

  • DDR4 Memory Write Eye (Scope) Measurement

    Write Eye Capture at 2400MTs

    Probes Attachment Write Data Eye Capture

  • Over Clocking Results

    (at 2933MTs)

    No Error

    System

    Clock has low

    Jitter.

    Data

    Eye has

    sufficient

    margin.

  • Summary & Conclusions

    A top down systematic approach using statistical DOE enabled

    an effective method to ensure design robustness.

    System enablers such as routing selection, IO equalization

    circuits improvement were quantified.

    Validation procedures and empirical data showed healthy

    margin for the DDR4 running at 2400MTs.

    Over clocking data indicated that the Interface is functioning at

    2993MTs with lower system clock jitter and sufficient data eye

    margin.