Aman Occ Final

Embed Size (px)

Citation preview

  • 8/8/2019 Aman Occ Final

    1/19

    ON-CHIP-OPTICAL-

    COMMUNICATION

    Presented byAman Chitransh

  • 8/8/2019 Aman Occ Final

    2/19

    2

    Moores Gap

    1 9 9 8tim e

    2 0 0 2

    .0 01

    1 9 9 2 2 0 0 6

    .01

    1

    1 0

    1 00

    1 0 00

    2 0 1 0

    Tran

    sisto

    rs

    Diminishing returns fromsingle CPU mechanisms

    ( , , .)pipelining caching etc Wire delaysPower envelopes

    Pipelining

    Superscalar

    SMT, FGMT, CGMT

    OOO

    The

    GOPS

    Gap

    Multicore

    Tiled Multicore

    Pe rfo rm a n ce

    ( )GOPS

  • 8/8/2019 Aman Occ Final

    3/19

  • 8/8/2019 Aman Occ Final

    4/19

    4

    The Future of Multicore

    Number of cores doublesevery 18 months Parallelism replaces

    clock frequencyscaling and core

    complexity

    ResultingChallenges

    ScalabilityProgrammingPower

    MIT RAW Sun Ultrasparc T2 IB M XC ell8 i TileraTILE64

  • 8/8/2019 Aman Occ Final

    5/19

    5

    Multicore Challenges

    Scalability How do we turn additional cores into additional performance?

    Must accelerate single apps, not just run more apps in parallel Efficient core-to-core communication is crucial

    Architectures that grow easily with each new technologygeneration

    Programming Traditional parallel programming techniques are hard Parallel machines were rare and used only by rocket scientists Multicores are ubiquitous and must be programmable by

    anyone

    Power Already a first-order design constraint More cores and more communication more power Previous tricks (e.g. lower Vdd) are running out of steam

  • 8/8/2019 Aman Occ Final

    6/19

    6

    Multicore Communication Today

    Single shared resource

    Uniform communication cost

    Communication throughmemory

    Doesn t scale to many coresdue to contention and

    long wires Scalable up to about 8

    cores

    BU S

    p p

    c c

    2 Cache

    DRAM

    -us basedInterconnect

  • 8/8/2019 Aman Occ Final

    7/19

  • 8/8/2019 Aman Occ Final

    8/19

  • 8/8/2019 Aman Occ Final

    9/199

    Optical Broadcast Network

    Waveguide passesthrough everycore

    Multiple

    wavelengths(WDM) eliminatescontention

    Signal reaches allcores in

  • 8/8/2019 Aman Occ Final

    10/1910

    Optical Broadcast Network -Electronic photonic

    integration usingstandard CMOSprocess

    Cores communicatevia optical WDM

    broadcast and

    select network Each core sends on

    its own dedicatedwavelength using

    modulators

    Cores can receivefrom some set of

    senders usingoptical filters

    N cores

  • 8/8/2019 Aman Occ Final

    11/1911

    Optical bit transmission

    sending core

    receiving core

    -flip flop -flip flop

    fil

    ter

    photodetector

    modulator

    modulator

    driver

    data waveguide

    transimpedanceamplifier

    -multi wavelength source waveguide

    Each core sends data using a different wavelength nocontention

    ,Data is sent once any or all cores can receive it efficientbroadcast

  • 8/8/2019 Aman Occ Final

    12/19

    ATAC Bandwidth

    64 cores, 32 lines, 1 Gb/s

    Transmit BW: 64 cores x 1 Gb/s x 32 lines = 2 Tb/s

    Receive-Weighted BW: 2 Tb/s * 63 receivers= 126Tb/s

    Good metric for broadcast networks reflects WDM

    ATAC allows better utilization of computational

    resources because less time is spent performingcommunication

  • 8/8/2019 Aman Occ Final

    13/1913

    System Capabilities and Performance

    :Baseline Raw Multicore Chip -Leading edge tiled multicore- ( )64 core system 65nm process

    :Peak performance 64 GOPS :Chip power 24 W

    .: .Theoretical power eff 2 7/GOPS W :Effective performance .3 GOPS :Effective power eff .3/OPS W :Total system power 150 W

    ATAC Multicore ChipFuture optical interconnect

    multicore

    - ( )64 core system 65nm process

    :Peak performance 64 GOPS

    : .Chip power 25 5 W .: .Theoretical power eff 2 5

    /GOPS W :Effective performance .8 0GOPS .:Effective power eff .5/OPS W :Total system power 153 W

    ptical communications require a smallmount of additional system power but allow

    or much better utilization of.omputational resources

  • 8/8/2019 Aman Occ Final

    14/1914

    Programming ATAC

    Cores can directly communicate with anyother corein one hop (

  • 8/8/2019 Aman Occ Final

    15/1915

    Communication-centric Computing

    Operation Energy Latency

    Networktransfer

    3pJ 3 cycles

    ALU addoperation

    2pJ 1 cycle

    32KB cacheread 50pJ 1 cycle

    -Off chipmemory read

    500pJ 250cycles

    BUS

    p p

    c c

    L2 Cache

    - ,ATAC reduces off chip memory calls and hence energy and latency

    -View of extended global memory can be enabled cheaply with onchip distributed cache memory and ATAC network

    ATAC

    memory

    -Bus BasedMulticore

    3pJ

    3pJ

    3pJ

    3pJ500pJ

    500pJ500pJ

    500pJ

  • 8/8/2019 Aman Occ Final

    16/1916

    ATAC is an Efficient Network

    Modulators are Primary Source of Power Consumption : ~ / -Receive Power Require only 2 fJ bit even with 5dB link loss :Modulator Power

    - ~ / ( /Ge Si EA design 75 fJ bit assume 50 fJ bit for modulator)driver

    : -Example 64 Core Communication

    ( . . = = ; : / / )i e N 64 cores 64 s for 32 bit word 2048 drops core and 32 adds core

    : / /Receive Power 2 fJ bit x 1Gbit s x 32 bits x N2 = 262 W : / / =Modulator Power 75 fJ bit x 1Gbit s x 32 bits x N 153 W

    / = / + / ( - ) = /Total energy bit 75 fJ bit 2 fJ bit x N 1 201 fJ bit

    :Comparison Electrical Broadcast Across 64 Cores

    / = / (Require 64 x 150fJ bit 10 pJ bit ~50X more power)( / / , - )Assumes 150fJ mm bit 1 mm spaced tiles

  • 8/8/2019 Aman Occ Final

    17/1917

    Summary

    ATAC uses optical networks to enable multicoreprogramming and performance scaling

    ATAC encourages communication-centricarchitecture, which helps multicore performance and

    power scalability

    ATAC simplifies programming with a contention-freeall-to-all broadcast network

    ATAC is enabled by recent advances in CMOSintegration of optical components

  • 8/8/2019 Aman Occ Final

    18/19

    18

    What Does the Future Look Like?

    :Corollary of Moore s law Number of coreswill double every 18 months

    05 08 11 14

    64 256 1024 409602

    16esearchIndustry 16 64 256 1024

    (Cores minimally big enough to run a self respecting

    !K c o r e s b y 2 0 1 4 A r e w er e a d y ?

  • 8/8/2019 Aman Occ Final

    19/19

    19

    Scaling to 1000 Cores

    Purely optical design scales to about 64 coresAfter that, clusters of cores share optical hubs

    ENet and BNet move data to/from optical hub Dedicated, special-purpose electrical networks

    Proc

    Dir $

    $

    memory

    memory

    -64 Optically Connected ClustersElectrical Networks

    Connect 16 Cores toOptical Hub

    ONet

    BNet

    ENet

    HUB

    NET