Energy-efficient and High Throughput Sparse Distributed...

Preview:

Citation preview

Energy-efficient and High ThroughputSparse Distributed Memory Architecture

Mingu Kang, Eric P. Kim, Min-Sun Keel, Naresh R. Shanbhag(mkang17@illinois.edu)

May 27, 2015

• Energy efficient - 100X over von Neumann architecture• Associate stored data with information• Extracts information from noisy and incomplete data to achieve

cognition and decision making

Brain-inspired Computing

1/10

• Mathematical model inspired by human long-term memory (Kanerva 98’)

• Used as auto/hetero associative memory

• Hamming distance based address decoder + counter array

• Tolerance to noisy data translates to tolerance to hardware non-idealities

Sparse Distributed Memory (SDM)

2/10

Sparse distributed memory (SDM)

A(i) S C(i)

P D

Addr

ess d

ecod

er

Coun

ter a

rray

sums

Read/Write Operation of SDM

Write operation of SDM

Read operation of SDMSparse distributed memory (SDM)

3/10

A(i) S C(i)

P D

Addr

ess d

ecod

er

Coun

ter a

rray

sums

SDM Implementation

Challenges

• Address decoder presents a throughput bottleneck and energy – Memory read requires over 90% delay and 70% energy

• Counter array requires high data rate communication between memory banks– Large numbers of routing lines toggled (ie. 30K lines toggled / read)

• Proposed solution – Compute Memory based address decoder– Hierarchical Binary Decision (HBD) based counter array 4/10

• In-memory computing platform• Multi-row (MR) READ• Embedded analog processing

w/ low voltage swing• Pattern matching (ICASSP 14’)• Conv. Net. (ICASSP 15’)

Compute Memory

5/10

• Separate memory (low-swing) and processor (high-swing)

• Memory-processor interface bottleneck → severe for data rich applications

• Memory hierarchy and latency

Conventional System

Compute Memory-based Address Decoder

CM-based address decoder

Capacitive adder

• All column read at a time• No bottleneck from IO bus-width

– Throughput -> less leakage w/ power gating• Embedded analog signal processing

– No digital blocks -> energy & area savings𝐻𝐻(𝑖𝑖) = �

𝑖𝑖,𝑗𝑗𝑎𝑎 𝑖𝑖, 𝑗𝑗 + 𝑝𝑝(𝑗𝑗)

6/10

Counter Array usingHierarchical Binary Decision (HBD)

Multi-block counter array w/ HBD

• Local binary decision from each block– Reduced global routing lines and toggling energy

• Final decision based on weighted summation by number of chosen rows in m-th block (Nm ) 7/10

Auto-associative Memory Recall

25% 21% 4% 0%Error rate

Image

Input 1st iteration 2nd iteration 3nd iteration

• 45nm SOI process technology• Read/Write operation

– Gray-scale nine patterns (P) with shapes of 1 to 9 with 25% randomly reversed bits

– 225 noisy copies for each P written -> total 2025 training data

Auto-associative memory

Example of proposed SDM operation

8/10

% of incorrect pixels

Error Rate vs. Energy

• 14.5X delay reduction• 2.7X energy saving• Negligible error rate degradation (< 0.4%)

Bit error rate Energy consumptions

9/10

Conclusion

• In-memory computing platform (Compute Memory) applied to address decoder

• Hierarchical binary decision (HBD) used for hardware reduction in counter array

• Delay reduction: 14.5X• Energy saving: 2.7X• Energy delay product reduction: 39X• Negligible accuracy degradation

10/10

Q & A

Recommended