60
16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing © 2021 IEEE International Solid-State Circuits Conference 1 of 60 eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing Shanshan Xie 1 , Can Ni 1 , Aseem Sayal 1 , Pulkit Jain 2 , Fatih Hamzaoglu 2 , Jaydeep P. Kulkarni 1 1 University of Texas at Austin, Austin, TX 2 Intel, Hillsboro, OR

eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

  • Upload
    others

  • View
    14

  • Download
    1

Embed Size (px)

Citation preview

Page 1: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 1 of 60

eDRAM-CIM: Compute-In-Memory Design with

Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters

and Charge Domain Computing

Shanshan Xie1, Can Ni1, Aseem Sayal1, Pulkit Jain2, Fatih Hamzaoglu2, Jaydeep P. Kulkarni1

1University of Texas at Austin, Austin, TX 2Intel, Hillsboro, OR

Page 2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 2 of 60

Self Introduction

Shanshan XieEmail: [email protected]

Website: http://cyanxie.com/

Education• B.S. 2018 Worcester Polytechnic Institute• Ph.D. student at University of Texas at Austin since 2018

• Circuit Research Lab: https://sites.utexas.edu/CRL/• Advisor: Professor Jaydeep P. Kulkarni

Experience• Analog Design Intern, Texas Instruments, 2019• Application Engineering Intern, Analog Devices Inc., 2018• Test Engineering Intern, Analog Devices Inc., 2017• Product Engineering Intern, Analog Devices Inc., 2016

Research Area• Mixed signal approaches for ML accelerators• Compute-in-memory techniques

Page 3: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 3 of 60

n Compute-in-memory: motivation and challenges

n Proposed eDRAM-CIM macro

l Overall architecture

l Digital-to-analog conversion (DAC)

l Multiply-accumulate-averaging (MAV) computation

l Analog-to-digital conversion (ADC)

n Chip measurement and classification accuracy results

n Summary

Outline

Page 4: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 4 of 60

n Compute-in-memory: motivation and challenges

n Proposed eDRAM-CIM macro

l Overall architecture

l Digital-to-analog conversion (DAC)

l Multiply-accumulate-averaging (MAV) computation

l Analog-to-digital conversion (ADC)

n Chip measurement and classification accuracy results

n Summary

Outline

Page 5: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 5 of 60

Deep Neural Network Applications

AI High Performance Data Center

AI Cloud Computing

Edge Computing Object Detection

Edge Computing Face Detection

Page 6: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 6 of 60

DNN Model Size Challenges

341k LeNet-5

724M AlexNet

2.8G OverFeat

3.9G ResNet50

1.43G GoogleNet v1

15.5G VGG16

Energy Efficiency Challenges

Data Intensive

Huge Data Movement

Data Transfer Latency

Computation Energy Cost

Total Number of MACs

Memory Access Energy Cost

1

2

3

4

5Ref: Vivienne Sze, Proceedings of the IEEE 105.12 2017

Page 7: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 7 of 60

eDRAM-CIM MotivationOperation Energy(pJ)

Computation Energy CostInteger Add (32b) 0.1

Integer Multiply (32b) 3.1Floating Point Add (32b) 0.9

Floating Point Multiply (32b) 3.7Memory Access Energy Cost

8KB SRAM (64b) 101MB SRAM (64b) 100

DRAM 2000

~650X

DRAM memory access is expensive: ~650X higher than computation energy costRef: Mark Horowitz,

ISSCC 2014

Page 8: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 8 of 60

eDRAM Advantages for CIM

ü Highest bitcell density

ü High access speed

ü Low read/write energy

ü High write endurance

ü High bandwidth

Desired attributes for CIM designs

Ref: Gregory Fredeman, JSSC 2015; Fatih Hamzaoglu, JSSC 2014; Nasser Kurd, JSSC 2015

Page 9: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 9 of 60

Our Approach: eDRAM based CIM

• Directly perform computation inside embedded DRAM (eDRAM)• Reconfigure existing 1T1C eDRAM columns as charge domain computing units• Reduce off-chip memory access and latency

MAV Compute Array

Page 10: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 10 of 60

n Compute-in-memory: motivation and challenges

n Proposed eDRAM-CIM macro

l Overall architecture

l Digital-to-analog conversion (DAC)

l Multiply-accumulate-averaging (MAV) computation

l Analog-to-digital conversion (ADC)

n Chip measurement and classification accuracy results

n Summary

Outline

Page 11: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 11 of 60

eDRAM CIM Overall Architecture: DACData Flow

Va & Va_bar

2. Generate Analog Values Va & Va_bar

Load Input to DAC eDRAM array

1. Load Data to eDRAM

Bitcell

*DAC: Digital-to-Analog Converter

Page 12: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 12 of 60

eDRAM CIM Overall Architecture: MAV

3. Load Weights to MAV Compute Array

*DAC: Digital-to-Analog Converter*MAV: Multiply-Accumulation-Average

4. Generate VMAV

VMAV

Page 13: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 13 of 60

eDRAM CIM Overall Architecture: Partial Sum

Store Partial Sum

*DAC: Digital-to-Analog Converter*MAV: Multiply-Accumulation-Average

Page 14: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 14 of 60

eDRAM CIM Overall Architecture: Partial Sum

Pooling & ReLU

*DAC: Digital-to-Analog Converter*MAV: Multiply-Accumulation-Average

Page 15: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 15 of 60

eDRAM CIM Overall Architecture: ADC

5. Generate VREF

6. Final YOUT

*DAC: Digital-to-Analog Converter*MAV: Multiply-Accumulation-Average

*SAR: Successive-Approximation*ADC: Analog-to-Digital Converter

Page 16: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 16 of 60

eDRAM CIM Overall Architecture*DAC: Digital-to-Analog Converter*MAV: Multiply-Accumulation-Average

*SAR: Successive-Approximation*ADC: Analog-to-Digital Converter

1. Load Data to eDRAM

Bitcell

2. Generate Analog Values Va & Va_bar

3. Load Weights to MAV Compute Array

4. Generate VMAV

5. Generate VREF

6. Final YOUT

Page 17: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 17 of 60

In-eDRAM Differential Digital-to-Analog Converter (DAC)

Page 18: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 18 of 60

In-eDRAM DAC – Positive DAC

𝑉! = 𝑉"" ∗16×1 + 8×𝑋# + 4×𝑋$ + 2×𝑋% + 1×𝑋&

32

eDRAM Bitcell Storage

Constant “1”

X3

X2

X1

X0

Constant “0”

Input Data (4b)

32 capacitors charge share result

Page 19: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 19 of 60

In-eDRAM DAC – Negative DACeDRAM Bitcell

Storage

Constant “0”

X3

X2

X1

X0

Constant “1”

32 capacitors charge share result

Complimentary to positive DAC

𝑉!_(!) = 𝑉"" ∗8×𝑋# + 4×𝑋$ + 2×𝑋% + 1×𝑋& + 1

32

Page 20: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 20 of 60

Differential Analog Voltages, Va and Va_barVDD

VSS

Zero Reference=VDD/2Va

Va_bar

Positive DAC

Negative DAC

• Prepare differential analog voltages (Va and Va_bar) for sign computation• Positive DAC: Va; Negative DAC: Va_bar

• Depends on sign of the weight, choose either Va or Va_bar

Page 21: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 21 of 60

Positive DAC Demonstration

Page 22: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 22 of 60

8b In-eDRAM DAC Dataflow – Step 1

Load data on to memory array using eDRAM peripheral circuits (not shown in the diagram)

Turn on twice to make sure 16 capacitors are fully charged

FloatingVa/Va_bar

Vx16, Vx1

WL<5>WL<4>WL<3>WL<2>

WL<0>WL<1>

WLx16WLx1

DAC_CS

Va

Step 1: Load lower 4-bit to DAC

Step 1

Input Data

0

0

0

1

Constant ‘0’

Constant ‘1’

Page 23: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 23 of 60

0

0

0

1

0

1

Vx16

Vx1

8b In-eDRAM DAC Dataflow – Step 2

Activate all wordlines for charge share to convert digital data to analog voltage

Voltage is sampled on Cx1 capacitor

Va/Va_bar

Vx16, Vx1

WL<5>WL<4>WL<3>WL<2>

WL<0>WL<1>

WLx16WLx1

DAC_CS

Va

Step 2: Charge Share

Step 1 Step 2

Data Flow

Page 24: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 24 of 60

8b In-eDRAM DAC Dataflow – Step 3

Vx16, Vx1

WL<5>WL<4>WL<3>WL<2>

WL<0>WL<1>

WLx16WLx1

DAC_CS

Va

Step 3: Load higher 4-bit to DAC

Step 1 Step 2 Step 3

Va/Va_bar

Input Data

1

1

0

1

Constant ‘0’

Constant ‘1’

Page 25: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 25 of 60

1

1

0

1

0

1

Va/Va_bar

Vx16

Vx1

8b In-eDRAM DAC Dataflow – Step 4

Voltage is sampled on Cx16 capacitor

Vx16, Vx1

WL<5>WL<4>WL<3>WL<2>

WL<0>WL<1>

WLx16WLx1

DAC_CS

Va

Step 4: Charge Share

Step 1 Step 2 Step 3 Step 4

Data Flow

Page 26: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 26 of 60

Vx16

Vx1

8b In-eDRAM DAC Dataflow – Step 5

Vx16, Vx1

WL<5>WL<4>WL<3>WL<2>

WL<0>WL<1>

WLx16WLx1

DAC_CS

VaVa generated

Step 1 Step 2 Step 3 Step 4 Step 5

Charge share to generate final analog voltage

Step 5: Generate Va and Va_barData Flow

Page 27: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 27 of 60

1T1C In-eDRAM Differential DAC Demonstration

WL<0>WL<5>

WLx1WLx16

DAC_CS

VCx1

Va

Va_bar

Load 4b LSB CS Load 4b MSB Generate Va & Va_bar

VDD/2 = 600mV

2 cycles: Make sure 16 capacitors are fully charged

CS

100ns0.5V

• Signals match the expected waveforms• Va and Va_bar are symmetrical around VDD/2 (600mV)• Higher 4 bits are loaded last: minimize capacitor leakage on final charge share value

VCx16

*CS=Charge Share

Page 28: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 28 of 60

In-eDRAM DAC Characterization: Measured Results

• Systematic deviation: positive DAC saturate at high value (can be correct during training)

Negative DACPositive DAC

Page 29: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 29 of 60

• Good linearity, DNL<1LSB

In-eDRAM DAC Characterization: Zoomed InNegative DACPositive DAC

Page 30: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 30 of 60

In-eDRAM Multiply-Accumulate-

Averaging (MAV) Computation

Page 31: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 31 of 60

eDRAM-CIM: MAV ComputationWeights are stored in 1T1C eDRAM bitcell: W7 (sign bit); W6~W0 (magnitude bits)

Analog Voltage from DAC

Page 32: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 32 of 60

eDRAM-CIM: MAV Computation

Weight magnitude (W0~W6) are used as MUX control signals

W7 (sign bit) is used to select between Va and Va_bar

Data Flow

Voltage from DAC

Zero reference

Page 33: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 33 of 60

eDRAM-CIM: MAV Computation

Voltages are sampled on binary scaled 1T1C eDRAM capacitors

Reuse 1T1C eDRAM bitcell

2:1 MUX: analog dot-product unit

Data Flow

Page 34: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 34 of 60

eDRAM-CIM: MAV Computation

• Switches (S0~S6) are closed to charge share between all the 1T1C eDRAM cells• VMAV = final MAV computation result

Data Flow

Page 35: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 35 of 60

ReLU

Final MAV Analog Results

Zero Reference VDD/2

VMAV > VDD/2

VDD/2VMAV < VDD/2

VMAV

• Comparator compares VMAV and zero reference voltage VDD/2

• If VMAV > VDD/2 à output VMAV

• If VMAV < VDD/2 à output VDD/2

Comparator

Page 36: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 36 of 60

In-eDRAM Analog-to-Digital Converter (ADC)

Page 37: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 37 of 60

VREF1

eDRAM-CIM: ADC Operation

VREF3

VREF2

VREF Generation Waveforms for 1st

(b=1) conversion cycle

• Step 1: Charge corresponding bitcells

VREF1 VREF2 VREF3

VMAV

VREF3

VREF1VREF2

YOUT[7]SA2_OUTSA3_OUTYOUT[6]

Page 38: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 38 of 60

VREF1

eDRAM-CIM: ADC Operation

VREF3

VREF2

• Step 1: Charge corresponding bitcells• Step 2: IDLE

VREF1 VREF2 VREF3

VREF Generation Waveforms for 1st

(b=1) conversion cycle

VMAV

VREF3

VREF1VREF2

YOUT[7]SA2_OUTSA3_OUTYOUT[6]

Page 39: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 39 of 60

eDRAM-CIM: ADC Operation

VREF1½VDD

VREF2¾VDD

VREF3¼VDD

¾VDD½VDD¼VDD

VREF1VREF3

VREF2

• Step 1: Charge corresponding bitcells• Step 2: IDLE• Step 3: Charge Share to generate reference voltages

VMAV > VREF1

VREF Generation Waveforms for 1st

(b=1) conversion cycle

VMAV

VREF3

VREF1VREF2

YOUT[7]SA2_OUTSA3_OUTYOUT[6]

VMAV < VREF2

YOUT[7] YOUT[6]

Page 40: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 40 of 60

VREF1

eDRAM-CIM: ADC Operation

VREF3

• Step 1: Charge corresponding bitcells• Step 2: IDLE• Step 3: Charge Share to generate reference voltages• Step 4: Discharge and reset 1T1C eDRAM capacitors

VREF2VMAV

VREF3

VREF1VREF2

YOUT[7]SA2_OUTSA3_OUTYOUT[6]

VREF1 VREF2 VREF3

VREF Generation Waveforms for 1st

(b=1) conversion cycle

Page 41: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 41 of 60

2b/Conversion In-eDRAM ADC WaveformsVMAV VREF1 VREF2 VREF3

10 10 00 11

Step 1 Step 2 Step 3

Step 4

• 2b/conversion cycle

• 2 extra eDRAM columns

• 3 reference voltages (VREF1~VREF3)

• Minimize VMAV capacitor leakage

during conversion

VREF2

VREF1VREF3

VMAV

Page 42: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 42 of 60

eDRAM-CIM: Adaptive ADC Motivation

X = Clipping Threshold

VLB VHB

X

Output Data YOUT

Occ

urre

nce

Freq

uenc

y

Occ

urre

nce

Freq

uenc

y

Output Data YOUT

X

Infrequent appearing MAV computation outputs (VMAV) are trimmed to mitigate ADC energy and latency

Convolution Outputs (CiFAR-10 dataset Layer 1)

Outputs are

clipped

Outputs are

clipped

VLB VHB

µ=-0.07σ=25.7

Page 43: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 43 of 60

Clipping In-eDRAM ADC Data FlowV R

EF2

= V H

B

V REF

3=

V LB

VREF2 = VHB

VMAV

VREF3

VREF2

ADC_DONE

SA2_OUTSA3_OUT

• Step 1: Charge corresponding bitcells• Step 2: IDLE

Page 44: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 44 of 60

Clipping In-eDRAM ADC Data Flow

VREF3 = VLB

VREF2 = VHB

VMAV

VREF3

VREF2

• Step 1: Charge corresponding bitcells• Step 2: IDLE• Step 3: Charge share to generate reference voltages• Step 4: Compare à VMAV>VHB? or VMAV<VLB? à Yes• Step 5: Skip subsequent conversion & discharge BLs

V REF

2=

V HB

= ¾

VD

D

V REF

3=

V LB

= ¼

VD

D

VMAV>VHB

ADC_DONE

SA2_OUTSA3_OUT

Page 45: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 45 of 60

Clipping ADC Simulation WaveformsVMAV ADC_DONEVREF2 VREF3

Step 1 Step 2 Step 3 Step 4 Step 5

• Skip subsequent ADC conversion cycles if VMAV is higher than VHB or lower than VLB

• If VMAV is in conversion range, in-eDRAM ADC will perform regular SAR conversion

VMAV>VHB

Page 46: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 46 of 60

Clipping Threshold vs Accuracy and Energy

Optimum clipping threshold = 60%

*ADC with clipping energy/without clipping energy

60%

Clipping threshold

DNN accuracy

ADC Energy

Page 47: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 47 of 60

Clipping vs Non-Clipping ADC Tradeoff

Mea

sure

d To

p-1

Acc

urac

y (%

)

AD

C E

nerg

y (p

J)

1.21%3.79%

1.14X1.14X

• Throughput is improved by 1.14X while incurring 3.79% (1.21%) drop in Top-1 (Top-5) in CIFAR-10

Thro

ughp

ut

(GO

PS)

Mea

sure

d To

p-5

Acc

urac

y (%

)

Page 48: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 48 of 60

In-eDRAM ADC Measurement Results

(VLB, 64)

(VHB, 191)

• Linear for both clipping and non-clipping in-eDRAM ADC

Simulated

DN

L (L

SB)Simulated

DN

L (L

SB)

Code Code

Page 49: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 49 of 60

n Compute-in-memory: motivation and challenges

n Proposed eDRAM-CIM macro

l Overall architecture

l Digital-to-analog conversion (DAC)

l Multiply-accumulate-averaging (MAV) computation

l Analog-to-digital conversion (ADC)

n Chip measurement and classification accuracy results

n Summary

Outline

Page 50: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 50 of 60

Chip Micrograph

0.66

9mm

1.759mm

• 65nm native CMOS process

• Supply Voltage: 1~1.2V

• Input Clock Frequency:

50~200MHz

• Bitcell Size: 22.08um2

• Macro Area: 0.57mm2

• eDRAM Capacitor Type:

Metal-Oxide-Metal (MOM)

DAC

Weight array

Main ADC

Input image eDRAM

ADC test structure

MAV computation arrayscanchain2

MAV test

structure

scanchain1

Page 51: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 51 of 60

Test-chip Area Breakdown

0.66

9mm

1.759mm

DAC

Weight array

Main ADC

Input image eDRAM

ADC test structure

MAV computation arrayscanchain2

MAV test

structure

scanchain1

Non-Macro: scan-chains & test structures

DAC array

ADC array

Weight array

Input image array

MAV computation array

eDRAM-CIM Macro Area

Unit: mm2

Page 52: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 52 of 60

CIFAR-10 Classification Details

• 4 convolution layers, 2 average pooling layers, and 2 fully connected layers• Input precision: 8 bits• Weight precision: signed 8 bits• Output precision: 8 bits

Avg. Poolingc

Input Image 32 Filters 32 Filters 64 Filters 64 Filters

Input Xi(32x32x3)

CONV1(32,32,32)

CONV2(32,32,32)

CONV3(16,16,64)

CONV3(16,16,64)

Avg. Pooling

FC14096x512

FC2512x10

3x33x3 3x32x2

2x2

Page 53: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 53 of 60

83.27% 82.96% 82.60% 80.10% 76.31%

0%

20%

40%

60%

80%

100%98.43% 98.40% 98.40% 98.10% 96.89%

0%

20%

40%

60%

80%

100%

Classification Accuracy on CIFAR-10 Dataset

Software 8-bit INT

Without Clip

With 60% Clip

Without Clip

With 60% Clip

Simulated over 10,000 images

Measured over 5,000 images

Software 8-bit INT

Without Clip

With 60% Clip

Without Clip

With 60% Clip

Simulated over 10,000 images

Measured over 5,000 images

• Measured Top-5 accuracy is dropped 0.3% with non-clipping ADC and dropped 1.51% with clipping ADC• Comparing with 8b software accuracy, eDRAM-CIM preserved good accuracy results

Top-

1 A

ccur

acy

Top-

5 A

ccur

acy

0.3%

1.51%

2.86%

6.29%

1T1C eDRAM-CIM 1T1C eDRAM-CIM

Top-1 Accuracy Top-5 Accuracy

Page 54: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 54 of 60

Comparison with Prior WorksThis work ISSCC’20 [2] ISSCC’20 [3] ISSCC’20 [4] ISSCC'18 [5]

Technology 65nm 28nm 28nm 22nm 65nm

Memory Cell Structure 1T1C eDRAM 6T SRAM

6T + Local Computing

SRAM

1T1R SLC ReRAM 6T SRAM

Array Size 16Kb 64Kb 64Kb 2Mb 128KbInput Precision (bit) 8 8 8 4 8

Weight Precision (bit) 8 8 8 4 8Supply Voltage (V) 1~1.2 0.85~1.0 0.7~0.9 0.8 1

Dataset CIFAR-10 CIFAR-10 MNIST

Model CNN: 4 CONV + 2 Pooling + 2 FC

CNN: ResNet-20

CNN: ResNet-20 N/A SVM

Measured Accuracy 80.1% (Top-1), 98.1 % (Top-5)

591.91% 592.02% N/A 583.27%

Throughput (GOPS) 1,34.71 N/A N/A N/A 4Average Energy

Efficiency (TOPS/W)14.76 7.3

2(1.35)14.082(2.61)

28.932(3.31) 3.125

GOPS/mm2 8.26 N/A N/A N/A 2.784FoM 304.6 86.4 167 53 201.6

1measured at 1.1V5Top-1 or Top-5 is not mentioned

2Scaled to 65nm, assume energy ∝ (Tech.)2 [7] 3 Limited by clocking infrastructure, chip size, technology and bit cell area4FoM = input precision x weight precision x energy efficiency (scaled to 65nm)

Page 55: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 55 of 60

Comparison with Prior WorksThis work ISSCC’20 [2] ISSCC’20 [3] ISSCC’20 [4] ISSCC'18 [5]

Technology 65nm 28nm 28nm 22nm 65nm

Memory Cell Structure 1T1C eDRAM 6T SRAM

6T + Local Computing

SRAM

1T1R SLC ReRAM 6T SRAM

Array Size 16Kb 64Kb 64Kb 2Mb 128KbInput Precision (bit) 8 8 8 4 8

Weight Precision (bit) 8 8 8 4 8Supply Voltage (V) 1~1.2 0.85~1.0 0.7~0.9 0.8 1

Dataset CIFAR-10 CIFAR-10 MNIST

Model CNN: 4 CONV + 2 Pooling + 2 FC

CNN: ResNet-20

CNN: ResNet-20 N/A SVM

Measured Accuracy 80.1% (Top-1), 98.1 % (Top-5)

591.91% 592.02% N/A 583.27%

Throughput (GOPS) 1,34.71 N/A N/A N/A 4Average Energy

Efficiency (TOPS/W)14.76 7.3

2(1.35)14.082(2.61)

28.932(3.31) 3.125

GOPS/mm2 8.26 N/A N/A N/A 2.784FoM 304.6 86.4 167 53 201.6

1measured at 1.1V5Top-1 or Top-5 is not mentioned

2Scaled to 65nm, assume energy ∝ (Tech.)2 [7] 3 Limited by clocking infrastructure, chip size, technology and bit cell area4FoM = input precision x weight precision x energy efficiency (scaled to 65nm)

Page 56: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 56 of 60

Comparison with Prior WorksThis work ISSCC’20 [2] ISSCC’20 [3] ISSCC’20 [4] ISSCC'18 [5]

Technology 65nm 28nm 28nm 22nm 65nm

Memory Cell Structure 1T1C eDRAM 6T SRAM

6T + Local Computing

SRAM

1T1R SLC ReRAM 6T SRAM

Array Size 16Kb 64Kb 64Kb 2Mb 128KbInput Precision (bit) 8 8 8 4 8

Weight Precision (bit) 8 8 8 4 8Supply Voltage (V) 1~1.2 0.85~1.0 0.7~0.9 0.8 1

Dataset CIFAR-10 CIFAR-10 MNIST

Model CNN: 4 CONV + 2 Pooling + 2 FC

CNN: ResNet-20

CNN: ResNet-20 N/A SVM

Measured Accuracy 80.1% (Top-1), 98.1 % (Top-5)

591.91% 592.02% N/A 583.27%

Throughput (GOPS) 1,34.71 N/A N/A N/A 4Average Energy

Efficiency (TOPS/W)14.76 7.3

2(1.35)14.082(2.61)

28.932(3.31) 3.125

GOPS/mm2 8.26 N/A N/A N/A 2.784FoM 304.6 86.4 167 53 201.6

1measured at 1.1V5Top-1 or Top-5 is not mentioned

2Scaled to 65nm, assume energy ∝ (Tech.)2 [7] 3 Limited by clocking infrastructure, chip size, technology and bit cell area4FoM = input precision x weight precision x energy efficiency (scaled to 65nm)

Page 57: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 57 of 60

Test-chip SummaryTest-chip Summary Advanced Technology

Node EstimationTechnology 65nm CMOS 22nm tri-gate CMOS

Total eDRAM Size 16Kb 238.4MB

Bitcell Size (µm2) 22.08 0.029

Unit eDRAM Capacitor Size (fF) 13 13

eDRAM Capacitor Type Metal-oxide-metal (MOM) Capacitor-over-bitline (COB)

Input Clock Frequency 50 MHz (measured) 1.6GHz200MHz (simulation)GOPS/mm2 8.26 1305

MAV Operation Latency (ns) 10 (2 cycles) 1.25 (2 Cycles)1Throughput @ 1.1V w/o ADC Clipping 4.037 GOPS 19.5 TOPS

with 60% ADC Clipping 4.71 GOPS 22.2 TOPS3Average Energy Efficiency (TOPS/W) @ 1.1V 4.76 11.12

Energy/MAV (fJ/MAV) with 60% ADC Clipping @ 1.1V 30.6 40.21

1Assume 1MAV = 2 operations2Assume 30% utilization of 128MB (77mm2) [6] eDRAM memory for CIM usage

3Excluded scan-chains power4Increased parallelism in large-capacity eDRAM array

Page 58: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 58 of 60

n Compute-in-memory: motivation and challenges

n Proposed eDRAM-CIM macro

l Overall architecture

l Digital-to-analog conversion (DAC)

l Multiply-accumulate-averaging (MAV) computation

l Analog-to-digital conversion (ADC)

n Chip measurement and classification accuracy results

n Summary

Outline

Page 59: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 59 of 60

Summary• 1T1C eDRAM bit cell: good candidate for compute-in-memory design

• Repurposing existing 1T1C eDRAM bitcell

• Intrinsic charge share operation in eDRAM bitcell

• Support 8b input and 8b signed/unsigned weight for MAV operations

• Measured Top-5 (Top-1) peak accuracy 98.1% (80.1%) on CIFAR-10

• Measured average energy efficiency 4.76 TOPS/W

• Measured throughput 4.71GOPS

• FoM metrics can exceed significantly in an advanced eDRAM technology

• Potential for scalability to advanced eDRAM technology node

Page 60: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable

16.2: eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded Dynamic Memory Array Realizing Adaptive Data Converters and Charge Domain Computing

© 2021 IEEE International Solid-State Circuits Conference 60 of 60

Acknowledgements• Clifford Ong for technical discussions

• Intel for funding support

Thank you for your attention!