41
Memcon 2015 Serial Memories Fill a Need

Serial Memories Fill a Need-Final

Embed Size (px)

Citation preview

Page 1: Serial Memories Fill a Need-Final

Memcon 2015

Serial Memories Fill a Need

Page 2: Serial Memories Fill a Need-Final

Agenda

Michael Sporer – Director of Marketing

The future of parallel versus serial interface for memory

Mark Baumann – Director of Applications Engineering

Based on experience at MoSys developing and introducing the GigaChip interface and 1st, 2nd and 3rd generations of Bandwidth Engine ICs we will describe several options for future memory interface solutions.

Copyright ©MoSys, Inc. 2015. All rights reserved. 2 MemCon 2015 - October 12th

Page 3: Serial Memories Fill a Need-Final

Discrete DRAM doesn’t do Serial… yet

Memory is the last holdout that still hasn’t gone serial

Copyright ©MoSys, Inc. 2015. All rights reserved. 3 MemCon 2015 - October 12th

Page 4: Serial Memories Fill a Need-Final

Challenges of Implementing DDR

Copyright ©MoSys, Inc. 2015. All rights reserved. 4

Source: Agilent MemCon 2015 - October 12th

DRAM bus trace length matching requirements

Design, Development & Qualification

Page 5: Serial Memories Fill a Need-Final

Tradeoffs: Serial vs. Parallel

On the Chip

SerDes adds costs on chip • MUX deMUX • 2.5GHz chip with 25 Gbps IO

IO Bandwidth / Chip Area Roughly the same on chip Depends on the range

IO Bandwidth / Power

It depends on reach

On the Board

Fewer lanes • 25GHz is more challenging, but is

solvable Longer reach than parallel

• Easier board floor planning • Distributed thermal loads

Greater noise immunity

Is it a balanced tradeoff?

Copyright ©MoSys, Inc. 2015. All rights reserved. 5 MemCon 2015 - October 12th

Page 6: Serial Memories Fill a Need-Final

HMC gives them the bandwidth they need

“DDR has run out of pins on the package”

Copyright ©MoSys, Inc. 2015. All rights reserved. 6

Source: Xilinx Technology Outlook - Liam Madden, FPL, Sept-2014 MemCon 2015 - October 12th

Page 7: Serial Memories Fill a Need-Final

TSV Based DRAM Stacks

The performance potential of TSV based DRAM stacks can be

realized with two very different interface and packaging solutions.

High Bandwidth Memory (HBM) Evolutionary wide, parallel interface

Hybrid Memory Cube (HMC) high performance serial interface.

Both solutions have their place in new systems design and there are advancements in both options on the horizon.

Copyright ©MoSys, Inc. 2015. All rights reserved. 7 MemCon 2015 - October 12th

Page 8: Serial Memories Fill a Need-Final

and HBM is coming …

Just look at what AMD and nvidia have planned

Copyright ©MoSys, Inc. 2015. All rights reserved. 8 MemCon 2015 - October 12th

HBM Gen1 shipping now

HBM Gen2 coming soon

Page 9: Serial Memories Fill a Need-Final

Interposer based MCM

Xilinx highlighted that the technology wasn’t the critical element, it was the supply chain.

Copyright ©MoSys, Inc. 2015. All rights reserved. 9

Source: Xilinx Technology Outlook - Liam Madden, FPL, Sept-2014 MemCon 2015 - October 12th

Page 10: Serial Memories Fill a Need-Final

Economics of Direct Attach HBM

@Customer: Can customer afford Direct Attach HBM?

Interposer development costs Fixed memory footprint Special Supply Chain

What is the volume required to recoup incremental costs?

@Manufacturer: Can DA-HBM exist in a low volume, high mix manufacturing environment?

Copyright ©MoSys, Inc. 2015. All rights reserved. 10 MemCon 2015 - October 12th

Page 11: Serial Memories Fill a Need-Final

Serial HBM: High Performance, Low Pin count

Page 12: Serial Memories Fill a Need-Final

Serial HBM Solution

Serial HBM Reduces Risk at the Customer Lower Technology Risk

• Pin count advantage for host device, • Ease of routing a serial interface • Standard CEI interface • Scalable and versatile

Component type Supply Chain • Inventories • Test and Burn-In

Cost Advantages • Standard board assembly

Serial HBM Markets Networking

• Packet Buffering and high capacity tables Embedded

• Supports a range of capacity and speeds with long product lifecycles • Protects customers from changing HBM memory interface on host

All the Bandwidth but none of the headaches of DA-HBM

12 Copyright ©MoSys, Inc. 2015. All rights reserved.

Serial Interface HBM

shim GCI

MemCon 2015 - October 12th

Page 13: Serial Memories Fill a Need-Final

Flexible Capacity Expansion : Serial

One host port of 16 lanes can connect to 1, 2 or 4 devices

No additional bus loading or pin count

No throughput degradation

Expansion example shows MoSys Bandwidth Engine

Host

16 8 8

4 4 4 4

Host

Host 1x

4x

2x

13 Copyright ©MoSys, Inc. 2015. All rights reserved.

Page 14: Serial Memories Fill a Need-Final

HBM MCM Yield Analysis

Page 15: Serial Memories Fill a Need-Final

HBM Memory Solutions

Direct Attach HBM – 4 HBM MCM Yield Single Sourced Interface support longevity Memory controller complexity and power

added to ASIC

Serial HBM Package on Package Tested and optional burn in of component

HBM before MCM assembly shim features optimized for application Incremental power for additional shim ASIC USR SerDes for MCM

Serial HBM On Motherboard: VSR SerDes for Motherboard Lowest Cost, highest yield solution 30% board area increase Easiest thermal solution

Copyright ©MoSys, Inc. 2015. All rights reserved. 15

ASIC 55 um

HB

M

HB

M

HB

M

HB

M

ASIC 180 um

HB

M

shim

HB

M

shim

HB

M

shim

HB

M

shim

HB

M

shim

HB

M

shim

HB

M

shim

HB

M

shim

ASIC 180 um

MemCon 2015 - October 12th

Page 16: Serial Memories Fill a Need-Final

Serial vs. Direct Attach Value Comparison

Copyright ©MoSys, Inc. 2015. All rights reserved. 16

Attribute Serial HBM Direct Attach HBM

Technical Risk + +

•Smaller Interposer •Discrete Component BI & Test

- -

•MCM Yield •HBM Repair

Cost + +

•Lower yielded cost •Supply Chain Inventory

- -

•MCM Development Cost •MCM Yield

Power - • incremental power /BW + •Lower power

Thermal + •Distributed sources - •Higher Thermal Density

Time to Market + +

•Proven Standard SerDes •Discrete Component Design

- -

•HBM Interface IP Availability •MCM Complexity

Flexibility + + +

•On or Off substrate •Memory expansion •Fungible Serdes

- -

•Depopulate or not •Single purpose HBM IO Block

Reliability + +

•Burn-In Option •Field Repair managed in Serial HBM

-

•JEDEC Field Repair in host ASIC

Supply Chain Ownership

+ + +

•Single Point •Discrete component •Multi-sourced

- - -

•Multiple or Single Points •MCM Model •Single Sourced

Board Area - •0% to 30% larger + •baseline

MemCon 2015 - October 12th

Page 17: Serial Memories Fill a Need-Final

Normalized Yielded Cost of HBM

Copyright ©MoSys, Inc. 2015. All rights reserved. 17 MemCon 2015 - October 12th

Assembly yield expected to be 95%

Page 18: Serial Memories Fill a Need-Final

HMC – Hybrid Memory Cube

Breakthrough in power due to TSV based construction 5 pJ/b DRAM only

Combined with Logic die resulting in 24.5W per 1Tbps 3 links @ 12.5G 24.5 pJ/b total (vs. 39 for DDR4)

Copyright ©MoSys, Inc. 2015. All rights reserved. 18 MemCon 2015 - October 12th

Page 19: Serial Memories Fill a Need-Final

Serial vs. Parallel Memory Comparison

Attribute Bandwidth Engine BE-2 | BE-3

Hybrid Memory Cube (HMC)

High Bandwidth Memory (JEDEC)

DDR4 (JEDEC)

Physical Interface Serial CEI Standard Serial CEI Std JEDEC HBM IO JEDEC DDR4 IO

Protocol GigaChip™ Interface HMC Consortium RAS/CAS

Source of Supply Dual-Sourced Single Sourced Multi-Sourced

Access TDM Scheduler Sched./Switch Banked RAM

Capacity 576 Mb 1152 Mb 16~32 Gb 32-64 Gb 4-8 Gb

Buffer Bandwidth 400 Gbps 800 Gbps 1280 Gbps 2048 Gbps 38 Gbps

Transaction Rate >4.5 Bt/s >10 Bt/s 2.6~2.9 Bt/s TBD 0.2 Bt/s

Signal Pins 66 66 272 ~1600 42

Package BGA 19x19 BGA 25x25 BGA 31x31 KGSD BGA 8x12

Power 7-11W TBA ~28W 8W estimated 0.7W

DDR4 ~ 16+20Switch

Serial IO

16 16 16 16

………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………

TDM / Scheduler

Serial IO

8 8

19 Copyright ©MoSys, Inc. 2015. All rights reserved.

Channel 0 Channel 1

HBM – 8 channels & 128 banks,

~1600 pins, Si Interposer

MemCon 2015 - October 12th

Page 20: Serial Memories Fill a Need-Final

Future TSV DRAM Comparison

Copyright ©MoSys, Inc. 2015. All rights reserved. 20

Direct Attach HBM Serial HBM concept HMC

Bandwidth equal

Interposer / Yield cost CPU Memory Memory

Power 1x <2x >3x

Latency Lowest Low ?

Deterministic Yes Yes No

Longevity of Interface 5 years indefinitely

Field Repair Host based Serial HBM based HMC based

Host IO (PHY & pins) Single Purpose General Purpose and LP SerDes

Test or Burn-In Not possible Possible

Supply Chain MCM-type Component

Application Performance

none Optimized for application

Generic HMC Specification

Source Multi-sourced Single Source

MemCon 2015 - October 12th

Page 21: Serial Memories Fill a Need-Final

What to build with? It depends…

Page 22: Serial Memories Fill a Need-Final

The Ultimate Network Processor’s Memory Implementation

Memcon 2014 MoSys presented on extreme memories for networking and showed the relative position and value for different memories for a 1.2Tbps Network processor.

HBM for buffering Serial memories

for header processing and search

Off chip PHY to optimize datapath

This is a great point solution for 1.2 Tbps datapath

What about less extreme systems?

Copyright ©MoSys, Inc. 2015. All rights reserved. 22 MemCon 2015 - October 12th

Page 23: Serial Memories Fill a Need-Final

Fron

t Pan

el

Example 400G Line Card w/ EZchip NPS Z30 Adds 50% System Memory Bandwidth

Packet Buffer 24 x DDR4 devices

Embedded Memory

uP uP uP uP uP uP uP uP uP uP uP uP uP uP uP uP

Intelligent Offload Flexible Feature &

Performance Expansion

Memory I/O Memory bandwidth for Packet Buffering, cores

and HW Accelerators

Packet Forwarding Engine

Hardware Accelerators

8-16 serial lanes

Back

plan

e

MoSys Framer/

Gear Box

MoSys

MSRZ30

FIC

Flexibility + Performance “C” Programmable Processors

+ L2-L7 Accelerators

23 Copyright ©MoSys, Inc. 2015. All rights reserved.

DDR4

DDR4

DDR4

DDR4

DDR4

DDR DDR4

DDR4

DDR4

DDR4

DDR4

DDR DDR4

DDR4

DDR4

DDR4

DDR4

DDR DDR4

DDR4

DDR4

DDR4

DDR4

DDR

MemCon 2015 - October 12th

Page 24: Serial Memories Fill a Need-Final

800GE Using Serial HBM & BE3

Copyright ©MoSys, Inc. 2015. All rights reserved. 24

400G PFE (ASIC/FPGA)

400G PFE (ASIC/FPGA)

4 x 100G

4 x 100G

Optics Module

GB/RT

LineSpeed Gearbox, Retimer

Optics Module

GB/RT

LineSpeed Gearbox, Retimer

Bandwidth Engine Gen 3

Shared: • FIB Tables •Statistics •Metering •Semaphores •Packet Buffers

MemCon 2015 - October 12th

shim

GCI

Page 25: Serial Memories Fill a Need-Final

Conclusion

Serial memory offers advantages over Direct Attach HBM

Economics driven by Supply Chain Flexible and adaptable Scalable performance Quality and reliability Simplifying board design and cooling

Pick your memory for your application

Memory core performance and capacity (DRAM vs. others) Architecture ( Point to Point versus Chainable) IO serial vs. parallel

DDR DRAM is the defacto standard based on decades of

evolution and optimization. If DDR doesn’t meet your needs there are other options available.

Copyright ©MoSys, Inc. 2015. All rights reserved. 25 MemCon 2015 - October 12th

Page 26: Serial Memories Fill a Need-Final

Mark Baumann Director of Applications

Bandwidth Engine Serial Interface (GCI)

Page 27: Serial Memories Fill a Need-Final

Topics

Parallel Interface evolution – faster, wider How long can this Last?

Serial Interface evolution – NRZ PAM4 emerging

Interface efficiency – HMC vs. GCI vs. ILA Standards based solutions vs. proprietary Interface for offload (abstracted) serial is better (variable size transfers) Splitting transaction layer from transport layer

Purpose built vs. Fungible IO

Copyright ©MoSys, Inc. 2015. All rights reserved. 27 MemCon 2015 - October 12th

Page 28: Serial Memories Fill a Need-Final

NPU Interface Options Today

NPU SSTL/HSTL SerDes

DDR-3 SDRAM

RLDRAM

QDR SRAM

KBP/ TCAM

SSTL/HSTL

SSTL/HSTL SerDes

SerDes

DDR Style Serial Style

Net

wor

k &

Bac

kpla

ne In

terf

aces

XAUI

10G KR

Interlaken

PCIex

Mem

ory

& C

oPro

cess

or

28 Copyright ©MoSys, Inc. 2015. All rights reserved. MemCon 2015 - October 12th

Page 29: Serial Memories Fill a Need-Final

NPU Interfaces Using Serial

NPU SerDes

DDR-3 SDRAM

SerDes

SerDes

Serial Style Serial Style

Net

wor

k &

Bac

kpla

ne In

terf

aces

SerDes

SerDes

DDR-3 Bridge

Enabled by 10G KR GCI enabled SerDes

SSTL/HSTL

3x to 4x Bandwidth Density per mm2

GCI

GCI

Interlaken

KBP/ TCAM

Serial SRAM?

BE

XAUI

10G KR

Interlaken

PCIex Mem

ory

& C

oPro

cess

or

29 Copyright ©MoSys, Inc. 2015. All rights reserved. MemCon 2015 - October 12th

Page 30: Serial Memories Fill a Need-Final

NPU Interfaces Using Serial

NPU SerDes

SerDes

SerDes

Serial Style Serial Style

Net

wor

k &

Bac

kpla

ne In

terf

aces

SerDes

SerDes

HMC or Ser. HBM

Enabled by 10G KR GCI enabled SerDes

SSTL/HSTL

3x to 4x Bandwidth Density per mm2

GCI

Interlaken

KBP/ TCAM

Serial SRAM?

BE

XAUI

10G KR

Interlaken

PCIex Mem

ory

& C

oPro

cess

or

30 Copyright ©MoSys, Inc. 2015. All rights reserved. MemCon 2015 - October 12th

Page 31: Serial Memories Fill a Need-Final

Parallel vs Serial

31 Copyright ©MoSys, Inc. 2015. All rights reserved. MemCon 2015 - October 12th

Page 32: Serial Memories Fill a Need-Final

GigaChip Interface Layers & Frame Format

Transaction Application Specific

Data Link

Physical Coding Sublayer (PCS)

Physical Media Access Electrical

Link initialization Lane Deskew Scrambling

Reliable transport of Frames via CRC & Positive Ack

GigaChip Interface Protocol

PC Board Trace

BE QDR,TCAM…

32 Copyright ©MoSys, Inc. 2015. All rights reserved.

CEI Compatible SerDes

Payload DLL Rx Ack CRC

Data Link Layer Frame Format

Frame striped across SerDes lanes (1, 2, 4, 8,16) Modulo 10 UI, Fixed size Sized to meet needs of application >90% bandwidth efficiency at 80b

Data Link Layer operations DLL Indicates if payload is Transaction Link Layer

operation or Data Payload Data Link Layer operations: Replay, Pause (no-op)

Data Payload format up to application Op codes, address, data…formatting left to higher level For memory transactions: 1 frame = transaction For packets: variable number of frames can be used

72b 1b 1b 6b

MemCon 2015 - October 12th

Page 33: Serial Memories Fill a Need-Final

CRC Error Handling w/Positive Ack

Tx Request Transactor

Queue

Device A CSI Tx

Device B CSI Rx

CRC Error Check

Rx Target Transactor

Queue

Rx Ack Counter

Tx SerDes

Rx SerDes CRC

Gen

Ack Count

Compare, Set Tx

Replay if “stuck”

Tx Replay Queue

Rx SerDes

Prev Rx Ack Count

Rx SerDes

PISO SIPO

6

1

Ack Count

1

Compare Ack, Replay when

“stuck”

Freeze Ack If CRC Error, Resume Replay Frame

Post if CRC OK, Freeze if not OK, Resume posting on Replay Frame

72 72

72 + 6 72 + 6

33 Copyright ©MoSys, Inc. 2015. All rights reserved. MemCon 2015 - October 12th

Page 34: Serial Memories Fill a Need-Final

Multi Core => Multi-Partition & Multi-bank

Copyright ©MoSys, Inc. 2015. All rights reserved. 34

Packet Processor 0

1

n-1

n

Serial Link

Serial Link

Serial Link

Serial Link

Bandwidth Engine

Multi-cycle Scheduler

10 GA

800 Gb/s

BIST Self- repair

ingress egress

Multi-bank Multi-partitions allow for high access availability

Multi-threaded Multi-Cores allow for high processing throughput Multi-linked

allow for concurrent transport operations

ALU for functional Acceleration Local processing minimizes intra-chip traffic

Allows Extended Carrier Class & In package Repair

ALU

MemCon 2015 - October 12th

Page 35: Serial Memories Fill a Need-Final

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0 5 10 15 20 25 30 35 40 Payload Size (B)

Read-Only Data Efficiency

BE

ILA

HMC

Protocol Transfer Efficiency Comparison: Range of Payload Sizes and Applications

35 Copyright ©MoSys, Inc. 2015. All rights reserved.

Transfer Efficiency = Data / (CMD + Address + Data + Transport Protocol)

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0 20 40 60 80 100 120 140 160 180 Payload Size (B)

Read/Write Data Transfer Efficiency

BE 50:50

HMC 50:50

HMC 128B Block Size HMC 64B HMC 32B

Packet Header Processing Application Packet Buffering Applications

Efficiency includes Transaction & Transport protocol:

Note GCI: GCI + TL 2.0

HMC 32B Block Size

MemCon 2015 - October 12th

Page 36: Serial Memories Fill a Need-Final

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0 10 20 30 40 50 60 70 80 Frame Size (Bytes)

ILA

Interlaken

GCI 2.0

Protocol Transport Efficiency Comparison: GCI Optimized For Smaller Transfers

36 Copyright ©MoSys, Inc. 2015. All rights reserved.

GCI + TL 2.0

GCI ≈ Interlaken

GCI ~ 2x Interlaken

Packet Transfers

Header Processing

MemCon 2015 - October 12th

Page 37: Serial Memories Fill a Need-Final

Serial Link Rate Road Map

Xilinx UltraScale+ 2016 33G GTY SerDes

BE3 2016 Q1 31G SerDes

56G PAM4 is being demonstrated now

Copyright ©MoSys, Inc. 2015. All rights reserved. 37 MemCon 2015 - October 12th

Page 38: Serial Memories Fill a Need-Final

CEI-56G Will Address Chip to Chip, Module, +

Copyright ©MoSys, Inc. 2015. All rights reserved. 38 MemCon 2015 - October 12th

Page 39: Serial Memories Fill a Need-Final

Summary

GCI is a proven chip to chip reliable transport protocol

Multiple designs in FPGA, ASIC and ASSP in production systems

GCI Specification is freely available without restriction on use Same as Interlaken model

GCI protocol is designed to evolve as the CEI standard evolves

The inherent performance efficiency of GCI naturally equates to

improved energy efficiency

Copyright ©MoSys, Inc. 2015. All rights reserved. 39 MemCon 2015 - October 12th

Page 40: Serial Memories Fill a Need-Final

Thank You

Copyright ©MoSys, Inc. 2015. All rights reserved. 40 MemCon 2015 - October 12th

Page 41: Serial Memories Fill a Need-Final

CMOS Memory Core Technologies

Copyright ©MoSys, Inc. 2015. All rights reserved. 41

DDR

•Transaction Rate •Power •mm2/bit •Cost

#BitCells per SenseAmp

LL/RL DRAM

eDRAM

SRAM

Logic Fab

DRAM Fab (limited metal)

TCAM

Mobile DRAM

MemCon 2015 - October 12th

HMC HBM