26
Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc., Amherst, MA *Research part of PhD at University of Massachusetts Amherst (Directed by Prof. C. Andras Moritz) Contact: [email protected], [email protected]

Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

Architecting for Causal Intelligence at Nanoscale*

Santosh Khasanvis

Senior Research Scientist

BlueRISC Inc., Amherst, MA

*Research part of PhD at

University of Massachusetts Amherst

(Directed by Prof. C. Andras Moritz)

Contact: [email protected], [email protected]

Page 2: Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

2

Introduction

Emerging opportunities • Personalized medicine, big data analytics, cyber-security, etc.

• Cognitive computing frameworks such as Bayesian networks (BNs) may be helpful

Challenges • High computational complexity; require persistence

• Implementation on CMOS Von Neumann microprocessors inefficient • Layers of abstraction, emulation on deterministic Boolean logic, rigid separation of memory and

computation

Rethink computing from the ground-up leveraging emerging nanotechnology

• Architecting with Physical Equivalence – as direct mapping as possible of conceptual framework to physical layer

• Disruptive technology: Potential for orders of magnitude efficiency

• This talk: Architecting for probabilistic reasoning with BNs

Page 3: Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

3

Probabilistic modeling of domain knowledge for reasoning under uncertainty

Graphical representation of a domain • Structure: Directed Acyclic Graph; Nodes domain variables (w/ several states); Edges relationships/dependence

between variables

• Parameters: Conditional probability distributions (or tables; CPTs) for strength of relationship

• Inference task: Find probability of unobserved variables given observed quantities (evidence)

C B D=1 D=0

Bayesian Networks (BNs)

Adapted from Slides by Irina Rish, IBM – “A Tutorial on Inference and Learning in Bayesian Networks”

Available online: http://www.ee.columbia.edu/~vittorio/Lecture12.pdf

Evidence

BEL(lung cancer) =

Inference

Bayesian Networks are graphs, representing domain knowledge using

probabilities and involve probability computations for inference

Page 4: Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

4

Overview of Approach: Architecting for Causal Intelligence Architectural Approach

• Reconfigurable Bayesian Cell Architecture to map Bayesian Networks

Information Encoding

• Probabilities tied to physical layer, encoded in electrical signals/S-MTJ resistances used in circuits

Circuit Framework

• Mixed-signal hybrid circuits (S-MTJ + CMOS)

• Direct computation on probabilities (memory in-built)

• Bayesian Cells incorporate these circuits

Physical Layer

Non-volatile Straintronic magnetic tunneling junctions (S-MTJs) + CMOS S-MTJ

Page 5: Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

5

Outline

Technology Overview: Nanoscale Straintronic MTJs (S-MTJs)

Physically Equivalent Intelligent System for Reasoning with BNs

• Data Encoding: Mapping probabilities in physical layer

• Circuit Framework: Mixed-signal circuits operating on probabilities for Bayesian

computations

• Reconfigurable Bayesian Cell Architecture for BN Mapping

Evaluation

Summary

Page 6: Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

6

Rhigh

Rlow

Non-Volatile Straintronic-MTJ (S-MTJ)

Device Structure Schematic Circuit Schematic Device Characteristics

Input Voltage vs. Resistance

A. K. Biswas, Prof. Bandyopadhay, Prof. Atulasimha, Virginia Commonwealth Univ.

A. K. Biswas, S. Bandyopadhyay and J. Atulasimha, “Energy-efficient magnetoelastic non-volatile memory,” Appl. Phys. Lett., 104, 232403,

2014.

Vh Vh

V2 V1

Voltage-controlled magneto-electric devices

Stacked nanomagnets separated by spacer layer: Resistance depends on relative magnetization orientation of nanomagnets

Strain-based switching

Page 7: Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

7

Outline

Technology Overview: Nanoscale Straintronic MTJs

Physically Equivalent Intelligent System for Reasoning with BNs

• Data Encoding: Mapping probabilities physically using S-MTJs

• Circuit Framework: Mixed-signal circuits operating on probabilities for Bayesian

computations

• Reconfigurable Bayesian Cell Architecture for BN Mapping

Evaluation

Summary

Page 8: Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

8

↔ Voltages Vi1, Vi2∈ 0V, 40mV ↔ Resistance ri ∈ ROFF, RON

Encoding Probability Represented as non-Boolean flat probability vector of spatially distributed digits

Physical Equivalence: Direct correlation to S-MTJ resistances and electrical signals

E.g. Using 10 digits, pi∈ 0, 1

Digit pi related to S-MTJ resistance ri as follows

β and ε are constants

p1 p2 p3 … pn

1 1 1 1 0 0 0 0 0 0 P = 0.4

Resolution = 1/n; where n: #digits

Equivalent S-MTJ Resistances

r1 = Rlow r2 =Rlow r3 = Rlow r4 = Rlow r5 = Rhigh r6 = Rhigh r7 = Rhigh r8 = Rhigh r9 = Rhigh r10 = Rhigh

Equivalent Digital Voltages

0

Vh

0

Vh

0

Vh

0

Vh

V 0

Vh

0

Vh

0

Vh

0

Vh

0

Vh

0

Vh

Page 9: Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

9

Circuit Framework

Unconventional magneto-electric mixed-signal circuit framework

Physical Equivalence: Directly implements Bayesian computations on probabilities using underlying circuit principles in analog domain • Input: Digital; Output: Analog

Approach

• Operating on spatial probability digital vectors that are converted into an analog representation of single probability value this is referred to as Probability Composer

• Probability Addition, Multiplication Composers internally use Probability Composers

• Cascade computational blocks for Bayesian functions: Enabled by Decomposers*

Incorporates S-MTJs + CMOS support for mixed-signal computations

Probabilities

Probability

* S. Khasanvis, et al., “Self- similar magneto-electric nanocircuit technology for probabilistic inference engines,” IEEE Transactions on Nanotechnology,

Special Issue on Cognitive Computing with Nanotechnology, in press, 2015.

Page 10: Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

10

Needed to convert spatial probability representation (digital) analog quantity representing total probability value in current/voltage domain

Parallel topology of S-MTJs; effective resistance encodes probability • Individual S-MTJ resistances set using digital voltages as shown earlier

Non-volatility

Resistance read-out using reference voltage

Probability Composer Circuit

RPC – Effective resistance ri – Resistance of i-th S-MTJ P – Encoded probability value

, RL << RPC

Output

n – No. of digits = No. of S-MTJs β, ε – S-MTJ device parameters

Simulated Output Characteristics (HSPICE)

VREF = 1V RL = 100KΩ

RPC = 2-4MΩ Radj = 4MΩ

Ou

tpu

t V

olt

age

(V)

Input Probability

Vout = Iout.RL

RPC

10 S-MTJs

All S-MTJs OFF

1 S-MTJ ON 2 S-MTJs ON

Probability Composer: Collection of S-MTJs

- Probability value encoded in 1/RPC

- Read-out in current/voltage

Page 11: Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

11

Addition Composer Circuit

Elementary Arithmetic Composer Circuits

Simulated Output Characteristics (HSPICE)

Sum of Probabilities

Ou

tpu

t V

olt

age

(V)

Vout = Iout.RL

Current Addition

Multiplication Composer Circuit

Vout

Iout

Simulated Output Characteristics (HSPICE)

Output Probability

Ou

tpu

t V

olt

age

(V)

, Vout = Iout.RL

Ohm’s law

Input PA: Voltage domain Input PB: S-MTJ Resistance

Page 12: Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

12

Example: Pout = Pa.Pb + Pc.Pd; typical in BN inference computations

ADD MUL(Pa, Pb) , MUL(Pc, Pd); two levels of hierarchical instantiation

Elementary Composers = MUL, arranged in topology self-similar to ADD (Dominator Composer)

Combining Elementary Composers: Add-Multiply

Simulated Output Characteristics

(HSPICE)

Add-Multiply Composer Circuit

Output Probability

Ou

tpu

t V

olta

ge

(V

)

Page 13: Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

13

Outline

Technology Overview: Nanoscale Straintronic MTJs

Physically Equivalent Intelligent System for Reasoning with BNs

• Data Encoding: Mapping probabilities in physical layer

• Circuit Framework: Mixed-signal circuits operating on probabilities for Bayesian

computations

• Elementary Arithmetic Composers

• Inference in BNs: Belief Propagation Algorithm Overview

• Composers for BN Inference Operations

• Reconfigurable Bayesian Cell Architecture for BN Mapping

Evaluation

Summary

Page 14: Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

14

Compute belief P(Xi I E) based on evidence E using local computations and message propagation

Each node maintains • Conditional probability tables (CPTs): CPTjk(Xi) = P(Xi=j | Pa(Xi)=k)

• Likelihood λ(Xi) = P(E-|Xi) and Prior π(Xi) = P(Xi|E+) Vectors

• Belief Vector BEL(Xi) = P(Xi I E)

Local node computations using messages from neighbors • λ messages from child to parent to compute λ(Xi)

• π messages from parent to child nodes for π(Xi)

• BEL(Xi) = λ(Xi) . π(Xi)

Applicable to trees and poly-trees

Bayesian Inference: Pearl’s Belief Propagation

E+

E-

Repeated

application of

Bayes Rule

J. Pearl, Probabilistic reasoning in intelligent systems: Networks of plausible inference, San Francisco, CA, USA: Morgan

Kaufmann Publishers Inc., 1988.

Page 15: Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

15

Composer Circuits for BN Inference Operations Uses either elementary arithmetic composers or combines

them

Likelihood Estimation

Prior Estimation

Belief Update

Diagnostic Support to Parent

Predictive Support to Child nodes

Multiplication Composers for Likelihood Estimation, Belief Update, Predictive

Support

Add-Multiply Composers for

Prior Estimation, Diagnostic Support

Page 16: Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

16

Outline

Technology Overview: Nanoscale Straintronic MTJs

Physically Equivalent Intelligent System for Reasoning with BNs

• Data Encoding: Mapping probabilities in physical layer

• Circuit Framework: Mixed-signal circuits operating on probabilities for Bayesian

computations

• Reconfigurable Bayesian Cell Architecture for BN Mapping

Evaluation

Summary

Page 17: Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

17

Physically Equivalent Architecture for BNs Physical Equivalence: Every node in DAG mapped to a Bayesian Cell in H/W; incorporates non-volatile

Arithmetic Composers for Bayesian computations

Reconfigurable links using Switch Boxes (similar to FPGAs) to map any BN structure

Persistence in configuration + computation through non-volatile Composers; no need for external memory

Page 18: Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

18

Outline

Technology Overview: Nanoscale Straintronic MTJs

Physically Equivalent Intelligent System for Reasoning with BNs

Evaluation

• Methodology

• System-level Evaluation for BN Inference using Physically Equivalent Framework

• Analytical Modeling of BNs Inference Performance on CMOS Multi-core Processors

and Comparison

Summary

vs.

Page 19: Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

19

Example Bayesian Graph to Estimate System-level Performance

Assuming a balanced binary tree structure for system level performance estimation • Each parent has 2 child nodes; each node has 4 states (applications like gene expression networks require 3*)

• All leaf nodes are treated as evidence variables

Total number of nodes scaled from ~100 to ~1 million

* N. Friedman, M. Linial, I. Nachman, and D. Pe'er, “Using Bayesian networks to analyze expression data,” J. Comput. Biol., 7(3-4), pp. 601-20, 2000.

BN inference execution time estimated based on critical path delay (TBC) in each BC and Switch Box communication delay (TSB) for worst-case

For Bayesian Network with n levels; (active nodes in a time-step operate in parallel)

Texec = (2n-1) x TBC + Tcomm

Level 0 (Leaf Nodes)

Level 1

Level n-3

Level n-2

Root: Level n-1

Page 20: Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

20

S-MTJ Center-Center Distance

Dipole Coupling

Evaluation Methodology for BN Composer Circuits

Delay, power measured using HSPICE simulations • HSPICE behavioral macromodels built for S-MTJs

Area determined by number of S-MTJs + CMOS support • Accounting for S-MTJ spacing to minimize magnetic

interactions

Low coupling energy implies

minimal magnetic interaction

Collaboration: Data provided by VCU group (Prof.

Atulasimha, Prof. Bandyopadhay)

S-MTJ

500nm

500nm

S-MTJ Cell

Area

S-MTJ

S-MTJ

Module

Critical Path Delay (ns)

Area (μm2) Worst-case

Power (μW)

Likelihood Estimation (Multiplication Composersx4)

144 20 4.57

Belief Update (Multiplication Composersx4)

144 20 4.57

Prior Estimation (Add-multiply Composersx4)

137 50 11.24

Diagnostic Support (Add-multiply Composersx4)

137 50 11.24

Prior Support (Multiplication Composersx8)

144 40 9.14

Decomposer (x60) 132.9 240 11.37

CMOS Op-Amp (x176) 100 95.4 89.32

Switch Box 10 398.8 0.85

Page 21: Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

21

Path Delays within Bayesian Cell for Inference

Path Label Total Path Delay (ns)

1 746.8

2 754.2

3 998.2

4 991.2

Worst-case Delay

All possible paths for information flow

1 2 3

4

Y

λ From Child

λ To Parent

1

2

3

BEL

λ From Child

π To Child Z

X

A

Node X

π From Parent

π To Child

4

TBC

Page 22: Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

22

Hardware platform: Multi-core processor (100 cores) based on TILEPro from Tilera Corp.*

Lower bound execution time analytically estimated based on computation + memory requirements for inference using Belief Propagation algorithm • Maximum idealized parallelism and operation cost, no network contention, no synchronization cost

Power and area from specifications

Implementation of BNs on Multi-core Processors

* “Tile Processor Architecture Overview for the TILEPro Series”, Doc No. UG120, Feb. 2013, Tilera Corporation. * C. Ramey, “TILE-Gx100 manycore processor: Acceleration interfaces and architecture”, Aug. 2011, Tilera Corporation.

Architecture of a Tilera 100-Core Processor

Page 23: Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

23

Comparison vs. Multi-Core Processors

8686x

Speedup over 100-Core Processors

80x 12x

Delay Comparison for Bayesian Inference Log-scale

(PEAR)

Page 24: Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

24

Comparison vs. Multi-Core Processors (contd.)

4788x Efficiency (Power x Delay) Log-Scale

Power Comparison

Log-Scale

Area Comparison

Page 25: Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

25

Summary

Physically equivalent intelligent system for probabilistic reasoning using Bayesian Networks (BNs)

• Architected from ground-up and enabled by emerging nanotechnology

• Probability encoding based mixed-signal magneto-electric circuit framework

• Reconfigurable Bayesian Cell architecture

Up to 8686x inference speed-up, 4788x lower energy for BNs with ~1M nodes for resolution 0.1 vs. 100-core processor

Reasoning/learning tasks on complex problems with million variables made feasible

Embed real-time intelligence capabilities at smaller scale (100s of variables) everywhere

Page 26: Architecting for Causal Intelligence at Nanoscale* · 2016. 1. 22. · Architecting for Causal Intelligence at Nanoscale* Santosh Khasanvis Senior Research Scientist BlueRISC Inc.,

26

Thank you

Collaboration with Prof. Atulasimha, Prof. Bandyopadhyay, VCU

Sponsored by National Science Foundation (CCF-1407906, ECCS-1124714, CCF-1216614, CCF-1253370)

Acknowledgements