Credes Report Fan

Embed Size (px)

Citation preview

  • 8/10/2019 Credes Report Fan

    1/45

    IHP

    Im Technologiepark 2515236 Frankfurt (Oder)

    Germany

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2010 - All rights reserved

    Pausible Clocking Based GALS Design:

    Analysis, Optimization and Applications

    Xin FAN

    fan@ihp_microe lect ron ics.com

  • 8/10/2019 Credes Report Fan

    2/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Outline

    Overview of GALS design methodology

    Performance analysis of pausible clocking based GALS data link

    System optimization for area/power/noise efficient GALS design

    Moonrake chip: SYNC/GALS OFDM TX in IFX 40nm technology

    Conclusions

  • 8/10/2019 Credes Report Fan

    3/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Overview of GALS design methodology

    Whats GALS design?

    Globally-asynchronous locally-synchronous design for large-scale

    digital system integration.

    Processing is performed by synchronous functional modules;

    Communication is accomplished by asynchronous interfaces.

    Sync Core

    Logic

    AsyncIF

    AsyncIF

    Sync Core

    Logic

    AsyncIF

    AsyncIF

    req

    ac k

    Data

    req

    ac k

    Data

    req

    ac k

    Data

    Clock A Clock B

  • 8/10/2019 Credes Report Fan

    4/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Overview of GALS design methodology (Cont)

    Why do we need GALS design?

    Relaxing the timing constraints at the system level

    GALS design requires no global clock reference, only local clocks.

    Each locally-timed compact GALS block could be optimized muchmore efficiently and aggressively, leading to lower power and areaoverheads with better timing performance.

    The simplified clock trees also contribute to the power/area savingat the system level.

    Reducing simultaneous switching noise of digital circuits

    The switching activity in GALS design is naturally randomized andspread over time, resulting in a lower switching noise.

    Facilitating the system integration based on modular designGALS design presents an infrastructure for dynamical voltage andfrequency scaling (DVFS) and SoC/NoC integration.

  • 8/10/2019 Credes Report Fan

    5/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Overview of GALS design methodology (Cont)

    How to implement GALS system?

    The main issue is about the design of robust interface circuits withlow overhead.

    Robust means to resolve the metastability at an acceptable meantime between failures (MTBF).

    Two aspects of GALS overhead:

    A. Hardware overheadpower and area;

    B. Performance overheadarbitration latency and throughput drop.

    Three asynchronous communication schemes:

    A.Synchronizer;

    B. Dual-clock FIFO;C. Pausible clocking.

  • 8/10/2019 Credes Report Fan

    6/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Overview of GALS design methodology (Cont)

    Boundary synchronizer

    Cascaded double-DFF:

    One extra clock cycle is reserved for resolving metastability.

    Simple but slow:

    4-phase protocol: 6 TX cycles plus 6 RX cycles for each data transfer.

    Q

    Q

    SET

    CLR

    Dtx_data rx_data

    Q

    QSET

    CLR

    D

    ENEN

    FSM

    Q

    QSET

    CLR

    D

    Q

    QSET

    CLR

    D

    Q

    QSET

    CLR

    D

    Q

    QSET

    CLR

    D

    Q

    QSET

    CLR

    D

    Q

    QSET

    CLR

    D

    t x_c lock doma in rx_clock doma in

    req

    ac k

    vld_in vld_out

    EN

  • 8/10/2019 Credes Report Fan

    7/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Overview of GALS design methodology (Cont)

    Dual-clock FIFO

    Data is written into the FIFO at the TX clock and read out from the

    FIFO at the RX clock.

    Write and read pointers, instead of data, need to be synchronized

    through the clock boundary.

    The FIFO has to be sufficiently large to avoid the throughput drop

    caused by write/read pointer synchronization.

    Dual-clock FIFO

    tx_data rx_data

    bWrPtr

    B2G Sync G2B

    Empty

    Logic

    bRdPtr

    Full

    Logic

    G2B Sync B2G

    emptyful l

    t x_c lock domain rx_c lock domain

  • 8/10/2019 Credes Report Fan

    8/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Overview of GALS design methodology (Cont)

    Pausible clocking scheme

    The local clock can be re-scheduled (paused and stretched), when

    necessary, to avoid metastability at data sampling;

    The data transfer i s ini tiated by the synchronous TX/RX cores through some

    output/input flow control l ogic.

    The communication between TX and RX is performed by the asynchronous

    handshaking channels;

    SYNC_REG

    TX CORE

    OPCTX PAUSIBLE CLOCK

    op_teop_ta

    op_req

    op_ackop_ri

    op_ai

    tx_clkop_giop_ai

    OUT_FLOW_CNTR

    OUT_REG

    IPC

    ip_ri

    ip_ai

    RX PAUSIBLE CLOCK

    rx_clk ip_gi ip_ai

    IN_FLOW_CNTR

    IN_REG

    ip_taip_te

    RX CORE

    ip_req

    ip_ack

    handshake signals

  • 8/10/2019 Credes Report Fan

    9/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Overview of GALS design methodology (Cont)

    Pausible clock generator

    The clock is generated based on a programmable ring oscillator;

    A C-element is inserted to gate the incoming clock rising edge;

    An array of MUTEX is used as arbiter of concurrent requests.

    Req0

    Req1

    Ack1

    Ack0

    MUTEX 0

    C-ELE

    MUTEX 1

    A

    B

    YProgrammable Delay Line

    Req0

    Req1

    Ack0

    Ack1

    RClk

    LClk

    MUTEX

    C-ELEMENT

  • 8/10/2019 Credes Report Fan

    10/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Overview of GALS design methodology (Cont)

    Asynchronous FSM

    All the state transitions are

    triggered by the events on

    input and feedback output

    Behavior description using

    the signal transition graph

    (STG)

    Simple and fast.

    Sensitive to glitch.

    Need particular synthesis

    toolPetrify.

    ip_ri+ ip_ai+ ip_ta+

    ip_ta- ip_ai+ ip_ri+

    ip_req+

    ip_te+

    ip_ack- ip_req-

    ip_te-

    ip_ack+

    ip_ri- ip_ai-

    ip_ai- ip_ri-

    ip_ai

    ip_rp

    ip_te

    ip_ai

    ip_ai

    ip_rp

    ip_te

    ip_ap

    ip_ta

    ip_ri

    Asynchronous I/O port controllers

  • 8/10/2019 Credes Report Fan

    11/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Performance analysis of GALS data link

    Data synchronization latency

    lClk

    rClk

    w

    t

    T

    MUTEX

    ri

    rClk

    ai

    gi

    Acknowledge window w

    I f the request ar ri ves at the off -phase of rClk , then i t can be acknowl edged immediately

    by the MUTEX and the data wil l be sampled at the cur rent r ising edge of the clock;

    I f the request arri ves at the on-phase of rClk, then it couldnt be granted by the MUTEXuntil rCl k turns to be low and the data wil l be sampled at the next ri sing edge of the clock.

  • 8/10/2019 Credes Report Fan

    12/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Performance analysis of GALS data link (Cont)

    Synchronization latency function L (t, w)

    , [0, );

    ( , ) 3 / 2 , [ , ];

    2 , ( , ).

    T t t w

    L t w T w t w w

    T t t w T

    T/20 T t

    1/2

    1

    3/2

    3T/40 T t

    1/4

    1

    5/4

    T/40 T

    3/4

    1

    7/4

    5/4

    3/4

    t

    L/T L/T L/Tw=T/4 w=T/2 w=3T/4

    ( , ) 2T w L t w T w

    lClk

    rClk

    ri

    datas

    T-t 2T-t

  • 8/10/2019 Credes Report Fan

    13/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Performance analysis of GALS data link (Cont)

    Average synchronization latency LAVG

    We are not interested in the synchronization latency for a particular incoming

    data, but in the average synchronization latency, LAVG , over a large amount of

    requests.

    The value of LAVGis determined by:(1) the synchronization latency function L D (t, w) for

    any t, and (2) the distribution of t i n a data link .

    For example, assuming a uniform distribution on t, the average latency due todata synchronization can be derived as:

    For a uni form distribution on t, the average latency of data synchronization i s determi ned

    by the relati ve width of acknowl edge window to the clock period, w/T.

    I f w/T=1/2, then L AVG=T;I f w/T>1/2, then L AVG

  • 8/10/2019 Credes Report Fan

    14/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Performance analysis of GALS data link (Cont)

    Data throughput

    Another important issue is the maximum data throughput which

    could be achieved by an asynchronous handshaking data link.

    In particular, if TX and RX both support burst-mode data transfer

    (one data per cycle), whats the throughput of the data link?

    Previous studies announced that the data throughput of pausible

    clocking based GALS data link could reach at most 0.5 (one data

    every other RX cycle).

    Why!?

    Only experimental results, no any analysis.

  • 8/10/2019 Credes Report Fan

    15/45

  • 8/10/2019 Credes Report Fan

    16/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Performance analysis of GALS data link (Cont)

    The throughput is determined not by TRXor TTX, but by the

    period of handshaking loop TLoopof the asynchronous link.

    I n ti ghtly coupled data l ink, the transiti on on ap is always tr iggered by rx_clk+.

    Th is ap is then sampled by tx_clk+ with a synchronization latency of L (t, w).

    Af ter synchronization, the ap wil l further tri gger the next transiti on on rp.

    The arr ival t ime of next rp is exactly the synchronization latency of TX, which

    satisf ies TTXwTX + dw< t < 2TTXwTX + dw.

    Case I. 2TTXwTX + dw< wRX,:

    TLoop= TRX, and Th=TRX/TLoop=1.ip_ap

    ip_rpmin ip_rpmax

    wRX

    TTX-wTX+dw

    TRX

    TLoop=TRX

    rx_clk

    2TTX-wTX+dw

  • 8/10/2019 Credes Report Fan

    17/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Performance analysis of GALS data link (Cont)

    Case II. TTXwTX + dw< wRX < 2TTXwTX + dw< TRX + wRX:

    TLoop= TRXwhen 0 < t

  • 8/10/2019 Credes Report Fan

    18/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Performance analysis of GALS data link (Cont)

    Case III. wRX < TTXwTX + dw< 2TTXwTX + dw< TTX + wTX:

    TLoop= 2TRXand Th = 0.5.

    ip_ap

    ip_rpmin ip_rpmax

    wRX

    TRX

    rx_clk

    TLoop=2TRX

    2TTX-wTX+dw

    TTX-wTX+dw

    TLoopisnt a linearly increasing function of clock ratio R = TTX/TRX. There are

    cri tical thresholds of R, which depends on wRX ,wTX and dw. To improve the throughput of the tightl y coupled asynchronous data li nk, wRX

    and wTX need to be maximized and dw shoul d be minimized.

  • 8/10/2019 Credes Report Fan

    19/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Performance analysis of GALS data link (Cont)

    Throughput of tightly coupled data link

    Max error < 15%, average error < 4%.

    Throughput Comparison between simulation and analysis

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    1.49 1.46 1.42 1.39 1.35 1.31 1.28 1.24 1.20 1.17 1.13 1.10 1.06 1.020.990.950.920.880.840.810.770.730.700.660.630.590.550.520.480.450.410.370.340.300.27

    Clock ratio (Ttx/Trx)

    Datatransferperc

    ycle

    Simulated @ Dw=2,0ns Estimated @ Dw=2,0ns Simulated @ Dw=0,5ns Estimated @ Dw=0,5ns

  • 8/10/2019 Credes Report Fan

    20/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Performance analysis of GALS data link (Cont)

    Improving throughput by extending acknowledge window

    Estimated throughput at different acknowledge windows

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    1.17 1.13 1.10 1.06 1.02 0.99 0.95 0.92 0.88 0.84 0.81 0.77 0.73 0.70 0.66 0.63 0.59 0.55 0.52 0.48 0.45 0.41 0.37 0.34 0.30

    Clock ratio (Ttx/Trx)

    Datatransferpercycle

    Wrx=Trx/2, Wtx=Ttx/2 Wrx=3Trx/4, Wtx=Ttx/2 Wrx=3Trx/4, Wtx=3Ttx/4

    1/31/23/5Max throughput< 0.7

  • 8/10/2019 Credes Report Fan

    21/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Performance analysis of GALS data link (Cont)

    Loosely coupled asynchronous data link

    By introducing concurrency in the IPC, the handshaking loop of data

    link is partially decoupled.

    Here, ap is asserted by IPC once rp gets acknowledge fr om the MUTEX .

    Therefore, the transitions in the OPC are partial ly concurr ent wi th the I PC.

    By th is means, the reduction in the peri od of handshaki ng loop can be achieved.

    op_te+op_req+op_ack+op_ri+op_ai+op_ta+

    op_ta-op_ai+op_ri+op_ack-op_req-op_te-

    op_ri-op_ai-

    op_ai-op_ri-

    ip_ri+ ip_ai+ ip_ta+

    ip_ta- ip_ai+ ip_ri+

    ip_req+

    ip_te+

    ip_ack- ip_req-

    ip_te-

    ip_ack+

    ip_ri- ip_ai-

    ip_ai- ip_ri-

  • 8/10/2019 Credes Report Fan

    22/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Performance analysis of GALS data link (Cont)

    Now, the transition of opisnt triggered by rx_clk+, but randomly

    distributed within (0, wRX). Each time when receiving an optransition, the OPC will trigger the

    next rpin one TX clock cycle.

    Therefore, the maximum arrival time of the next rpis (wRX + TTX).

    ip_opmin

    ip_rpmax

    wRX

    TRX

    rx_clk

    TTX

    ip_opmax

    Condition for Th = 1: wRX + TTX < wRX + TRX TTX < TRX.

    Otherwise, Tloop= TTXand Th = TRX/ TTX.

  • 8/10/2019 Credes Report Fan

    23/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Performance analysis of GALS data link (Cont)

    Improving throughput by loosely coupled data link

    Throughput comparison of loosely coupled data link

    0.6

    0.7

    0.8

    0.9

    1

    1.46 1.39 1.31 1.24 1.17 1.10 1.02 0.95 0.88 0.81 0.73

    Clock ratio (Ttx/Trx)

    Datatransferper

    cycle

    Dw=0ns, Wrx=Trx/2 Dw=2ns, Wrx=Trx/2 Dw=2ns, Wrx=3Trx/4

  • 8/10/2019 Credes Report Fan

    24/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Performance analysis of GALS data link

    Design of loosely coupled asynchronous data link

    Compared to the tigh tly coupled data link design, two stages of D -latchare used on the

    RX side to lock the input data, since TX could overwri te the output data befor e it being

    fi nall y sampled into the RXpenal ty of the decoupling in the handshaki ng loop.

    Q

    QSET

    CLR

    Dtx_data_comb

    tx_clk

    A0

    A1

    Z

    S

    Q

    QSET

    CLR

    Dop_te_comb op_te

    tx_ta_comb

    tx_data_latch

    op_ta

    Q

    QSET

    CLR

    D

    op_giQ

    QSET

    CLR

    D

    G

    tx_te

    tx_ta op_ta_l

    OPC

    T1

    op_rp ip_rp

    op_ap ip_ap

    TX PAUSIBLE CLOCK GENERATOR

    op_aitx_clk op_gi

    op_ri

    op_ai

    EN

    Q

    QSET

    CLR

    D

    EN

    G

    Q

    QSET

    CLR

    Dtx_te_pending

    tx_clk

    tx_data

    Q

    QSET

    CLR

    D

    Q

    QSET

    CLR

    DA0

    A1

    Z

    S

    Q

    QSET

    CLR

    D

    ip_gi Q

    QSET

    CLR

    Dip_ta

    ip_te

    rx_ta_combip_ta_l

    rx_clk

    rx_data

    ip_te_comb

    rx_te

    rx_taIPC

    T2

    RX PAUSIBLE CLOCK GENERATOR

    ip_ai rx_clk ip_gi

    ip_ri

    ip_ai

    EN

    G

    Q

    QSET

    CLR

    D rx_te_pending

    Q

    QSET

    CLR

    D

    GQ

    QSET

    CLR

    D

    G

    ip_giip_ai

    rx_data_l

  • 8/10/2019 Credes Report Fan

    25/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    System optimization by GALS design

    GALS design for power saving

    Simplify the on-chip clock tree distribution by GALS partitioning with

    averaged area occupation and clock fanout load.

    Some evaluations on ASIC designs were reported with 70% reduction

    in the power dissipation of clock networks.

    Modeling on GALS processor shows marginal system power saving.

    GALS design for EMI noise suppression

    Partition the system according to the average power dissipation.

    Introduce clock phase/frequency modulation for efficiently spreading

    the switching activity of different GALS blocks over time/spectrum.

  • 8/10/2019 Credes Report Fan

    26/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    System optimization by GALS design (Cont)

    GalsEmilatorModeling EMI in digital systems at high level

    A software in M ATLAB to

    investigate EM I in digi tal

    systems with di ff erent

    structures and topologies

    Programmable in:

    Switching cur rent prof il e

    Clock ji tter percentage

    System topologies

    Parti tioning granulari ty

  • 8/10/2019 Credes Report Fan

    27/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    System optimization by GALS design (Cont)

    Supply current profile:

    The supply current profile could be modeled as triangular shape

    or as a superposition of different triangular shapes.

    I t i s possible to describe up to fi ve dif ferent supply cur rent pr ofi les and specify the

    probabil ity of their appearance in the system.

  • 8/10/2019 Credes Report Fan

    28/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    System optimization by GALS design (Cont)

    Evaluated topologies of digital systems

    (a) Pipelined:

    (b) Star (c) Mesh

    Module 1 Module 2 Module 3 Module 4

    Module 4

    Module 2Module 1

    Module 3

    Module 1

    Module 3

    Module 2

    Module 4

  • 8/10/2019 Credes Report Fan

    29/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    System optimization of GALS design (Cont)

    EMI features of the synchronous systems

    clock ji tter + clock phase shi ft

  • 8/10/2019 Credes Report Fan

    30/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    System optimization by GALS design (Cont)

    EMI features of the GALS systems

    with dif ferent GALS granulari ty and frequency distribution

  • 8/10/2019 Credes Report Fan

    31/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    System optimization by GALS design (Cont)

    EMI comparison between the synchronous and GALS designs

    Low-EM I Synchronous: theoretically possible, practically dif fi cult.

  • 8/10/2019 Credes Report Fan

    32/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    System optimization by GALS design (Cont)

    Example: a low-EMI 64-point pipelined FFT processor

    Pausible Clock Gen 4

    BF

    6

    1

    P

    I

    P

    BF

    4

    4

    BF

    5

    2

    D

    O

    P

    Pausible Clock Gen 3

    P

    I

    P

    CMULT

    ROM

    D

    O

    P

    Pausible Clock Gen 2

    BF3

    8

    P

    I

    P

    BF2

    16

    D

    O

    P

    Pausible Clock Gen 1

    BF 1

    32

  • 8/10/2019 Credes Report Fan

    33/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Measurements of the core VDD spectrum in synchronous mode (a)

    and in low-EMI GALS mode (b)

    System optimization by GALS design (Cont)

  • 8/10/2019 Credes Report Fan

    34/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Moonrakechip design and test

    Top-level block diagram of Moonrakechip

    A synchronous OFDM baseband TX and the GALS counterpart were

    implemented on the same die, allowing for an objective performance

    comparison in a homogeneous setting: identical both in the function

    and in the process.

    All the data pads were shared by the two TX cores to save the area.

    SYNC OFDM TX

    JTAG

    PRNG

    GALS OFDM TX

    PLLCLK MUX

    INPUTCNTR

    MISR

    OUTPUTCNTR

  • 8/10/2019 Credes Report Fan

    35/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Moonrake chipdesign and test (Cont)

    Datapath structure of the synchronous TX

    The starti ng point of our wor k was the synchronous baseli ne TX. I t was highly pipeli ned

    and parall eli zed in datapath to reach Giga-bit throughput : 12 symbol coding channels, 6

    interlevers and 4 64-point I FFT.

    INPUTFIFO

    INPUTCONTROL U

    NIVE

    RSAL

    SCRA

    MBER

    SYMBOL

    MAPPING

    MIDDLECONTROL

    FE

    C

    ENCOD

    ER

    12

    F

    EC

    ENCODER

    1

    INTERLEAVER

    INTERFACE

    INT

    ER-

    LEAV

    ER

    6

    INTER-

    LEA

    VER

    1PILOTINSERTER SU

    BCA

    RRIER

    MAPP

    ER

    4

    SUBC

    ARRIER

    MAP

    PER

    1

    64-POINT

    IFF

    T4

    64-POINT

    IF

    FT1

    4-POINTIFFT

    OUTPUTSTAGE

  • 8/10/2019 Credes Report Fan

    36/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Moonrake chipdesign and test (Cont)

    Power/area estimation and GALS partitioning

    GALS Block 1

    Input

    controller

    Symbol

    mapping

    Universal

    scrambler

    Middle

    controller

    FEC encoder

    [12:1]

    Output

    interfacePilot insertion

    Mapping

    [4:1]Total

    Power 0.1% 0.5% 0.0% 7.0% 0.09% 0.1% 3.1% 0.08% 10.97%

    Area 0.1% 1.0% 0.0% 12.8% 0.06% 0.1% 5.1% 0.14% 19.3%

    GALS Block 2 GALS Block 3 GALS Block 4

    Interleave 1 Interleave 2 Total Interleave3 Interleave 4 Total Interleave 5 Interleave 6 Total

    Power 8.7% 8.7% 17.4% 8.7% 8.7% 17.4% 8.7% 8.7% 17.4%

    Area 8.9% 8.9% 17.8% 8.9% 8.9% 17.8% 8.9% 8.9% 17.8%

    GALS Block 5 GALS Block 6Post-synth

    OFDM TXFFT_64P 1 FFT_64P 2 FFT_64P 3 FFT_64P 4 Total FFT_4P Out Stage Total

    Power 4.9% 4.3% 4.3% 4.3% 17.8% 11.3% 7.2% 18.5% 240mW

    Area 2.7% 2.4% 2.4% 2.4% 9.9% 10.3% 6.7% 17% 2.2mm2

  • 8/10/2019 Credes Report Fan

    37/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Moonrakechip design and test (Cont)

    GALS TX top-level block diagram

    6 GALS blocks, 16 data links, 32 asynchronous I /O port controll ers.

    Middle

    control

    Input

    control

    P-IND-OUT

    Mapper

    [4:1]

    Pilot inserter

    D-OUT

    Interleaver interface

    Interleaver [2:1] Interleaver [6:5]Interleaver [4:3]IFFT

    64p [4:1]IFFT4p

    OUTPUTSTAGE

    P-IN P-IN D-OUT

    D-OUT

    P-IN

    Input dataFIFO

    Symbolmapping

    Universalscrambler

    Universal FEC encoder [12:1]

    Pausible Clock GEN 1

    GALS BLOCK 1

    Pausible Clock GEN 2

    GALS BLOCK 2

    Pausible Clock GEN 3

    GALS BLOCK 3

    Pausible Clock GEN 4

    GALS BLOCK 4

    Pausible ClockGEN 5

    Pausible Clock GEN 6

    GALS BLOCK 5 GALS BLOCK 6

    P-IND-OUT P-IN D-OUT P-IN D-OUTD-OUT P-IN D-OUT P-IN D-OUT P-IN

    P-IN P-IN P-IND-OUT D-OUT D-OUTP-IN P-IN P-IND-OUT D-OUT D-OUT

  • 8/10/2019 Credes Report Fan

    38/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    16M equivalent gates, 30% core lo gic;

    218 memory : 8 FIFOs (64Kb), 86 SROMs (192Kb ), 134 SRAMs (400Kb); 219 pads: 136 TX/shared p ads, 20 NoC d edicated pads, 63 pow er pads.

    I FX 40-nm CMOS process;

    4000m2x2250m2=9mm2;

    LBGA-345 package;Bondli b 55m pitch.

    Moonrakechip design and test (Cont)

  • 8/10/2019 Credes Report Fan

    39/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Complexity of clock trees after layout

    0

    5

    10

    15

    20

    25

    30

    Number of clock tree levels

    CLK_PLLO GA LS_CLK1 GALS_CLK2 GALS_CLK3 GALS_CLK4 GALS_CLK5 GALS_CLK6

    SYNC

    CLK

    GALS

    CLK1

    GALS

    CLK2

    GALS

    CLK3

    GALS

    CLK4

    GALS

    CLK5

    GALS

    CLK6

    No. of Levels 27 10 6 7 5 9 8

    Max Local Skew 10ps 3ps < 2ps < 2ps < 2ps < 2ps 3ps

    1stpro of GALS design:simp l i f ied clock trees with better t imin g balance.

    Moonrakechip design and test (Cont)

  • 8/10/2019 Credes Report Fan

    40/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Cell area occupation after layout

    Total

    OFDM TX

    NOC Pads

    GALS SYNC

    Others TotalCore

    Clock

    Gen &

    IO ports

    Total Core PLL Total

    5406853

    (100%)2220080

    Included

    in core

    2220080

    (41%)2234712 100000

    2334712

    (43.2%)

    91916

    (1.7%)

    4643900

    (85.9%)

    227374

    (4.2%)

    537075

    (9.9%)

    74.2%

    12.2%9.9%

    41%

    43.2%

    9.9%

    2ndpro of GALS design:smaller area by m ore aggressive o pt im izat ion .

    Moonrakechip design and test (Cont)

  • 8/10/2019 Credes Report Fan

    41/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Power consumption after layout

    SYNC TX GALS TX

    IO Memory Clock Logic Total IO Memory Clock Logic Total

    0.0489 0.1731 0.0419 0.0255 0.2894 0.0488 0.1693 0.0316 0.0280 0.2777

    16.89% 59.81% 14.49% 8.81 100% 17.56% 60.98% 11.37% 10.09% 100%

    25,80

    35,60 35,60 35,60

    44,10

    48,30

    0,00

    10,00

    20,00

    30,00

    40,00

    50,00

    Power distribution over GALS clock domains

    LCLK 1 LCLK 2 LCLK 3 LCLK 4 LCLK 5 LCLK 6

    3rdpro of GALS design:

    > 20% saving in th e clock

    tree diss ipat ion;

    6% saving in the system

    pow er d iss ipat ion .

    Moonrakechip design and test (Cont)

  • 8/10/2019 Credes Report Fan

    42/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    A. VDD_AE22

    B. VDD_BOARD

    Moonrake Adapter Board

    EMI measurements

    Spectrum o f core VDD

    At fundamental f requency:

    A. 26dB attenuation on chip,B. 19dB attenuation on board.

    Amplitude of on-chip core VDD from SYNC TX

    Amplitude of on-chip core VDD from GALS TX

    4thpro. of GALS design:attenuat ion in EMI no ise on the on -chip core VDD.

    Moonrakechip design and test (Cont)

  • 8/10/2019 Credes Report Fan

    43/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Synchronous/GALS TX comparison

    Area, power dissipation, and EMI noise

    Area(1)

    (m2)

    Power

    Dissipation(2)

    (mW)

    Spectral amplitude of Core VDD(3)(dBm)

    1stpeak 2ndpeak 3rdpeak

    SYNC TX 2325823 252 -15 -32 -23

    GALS TX 2220080 237 -41 -48 -53

    Difference -5.0% -6.0% -26dB -16dB -30dB

    Notes:

    1 . The a rea i s es t ima ted based on the layou t net li s t;

    2 . The power is measured when the ch ip is working at 160MHz in both SYNC and GALS modes ;

    3 . The spect rum is measured on the SMA socket wh ich is connected to the on-ch ip power pad VDD_AE22.

    Moonrakechip design and test (Cont)

  • 8/10/2019 Credes Report Fan

    44/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Conclusions

    Pausible clocking scheme presents an alternative to area and power

    efficient GALS design;The hardware overhead for in troducing pausible clocking scheme is negligible;

    Balanced GALS parti tioning resul ts in a group of compact locall y-timed blocks,

    whi ch can be optimized much more eff icientl y and aggressively.

    Therefore, the marginal hardware overhead due to the pausible clocking based

    GALS inf rastructure can be ful ly compensated at the system level.

    Also, With careful design optimization, performance overhead due to

    the asynchronous communication can be minimized;

    Sub-cycle of data synchronization latency can be achieved;

    Decoupli ng of handshaking loop contri butes to high data throughput.

    Behavioral modeling and silicon measurement both demonstrate the

    efficiency of GALS design for EMI-noise suppression.

  • 8/10/2019 Credes Report Fan

    45/45

    IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com 2011 - All rights reserved

    Thank you!

    For more information about IHP: www.ihp-mi croelectroni cs.com .

    For more details about pausible clocking: www.galaxy-project.org .