24
AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill Ellersick, Analog Circuit Works Dylan Hand, Reduced Energy Microsystems Peter A. Beerel, University of Southern California April 19, 2018

50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill

AnalogCircuitWorks

50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from

Synchronous RTL

Bill Ellersick, Analog Circuit Works

Dylan Hand, Reduced Energy Microsystems

Peter A. Beerel, University of Southern California

April 19, 2018

Page 2: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill

BIG PICTURE

Power-efficiency is key to mobile and datacenters

Industry shifting from raw performance to performance/W

REM and ACW are fabless semiconductor companies collaborating on a project using unique technology to build ultra-efficient processors

1 Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works

Page 3: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill

Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works

POWER EFFICIENT MICROPROCESSOR PROJECT

2

Synchronous µProcessor

µProcessor

Sync Core

Memory

SERDES

I/O

Sync.

Logic

AnalogLegend: Digital Interface

• Asynchronous logic to reduce power (same functionality from outside)

• Integrated power management with 5 zones to optimize different circuit types

Asynchronous µProcessor

µProcessor

Async Core

Memory

Power

SERDES

I/O

Async

Logic

Vdd<4:0>

Sync to Async Scripts

Integrated Power Mgmt

Optimize/Verify vs.Vdd

Page 4: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill

WORST-CASE PERFORMANCE

Instruction Time

1 2 3Traditional

SynchronousProcessor

Wasted Time

3

Block Size

Synchronous designs synthesized to meet worst-case

Area and power overhead required to hit performance target can be substantial

Power-

Optimized

Block Size

Extra

Area to

Meet

Timing

Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works

Page 5: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill

Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works

OVERVIEW

• Introduction

Asynchronous bundled data technology

• Supply voltage scaling

• Synchronous to asynchronous flow

• Summary

4

Page 6: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill

RESILIENT BUNDLED DATA

Static combinational logic, with error detecting latches to pass data to next block

Delay matches worst-case combinational logic delay 90% of the time

10% of the time, outputs change after Delay: Control holds off next stage

5 Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works

Page 7: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill

1 2 3

AVERAGE-CASE PERFORMANCEInstruction Time

1 2 3Traditional

SynchronousProcessor

ACW/REM

Asynchronous Processor

Wasted Time

6

Block Size

Relaxing timing constraints allow some instructions to take longer

On average, computational performance does not change

Smaller logic blocks are more power and area efficient

Power-

Optimized

Block Size

Extra

Area to

Meet

Timing

Power-

Optimized

Block Size

Async

Control

Logic

Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works

Page 8: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill

DATA DEPENDENCY

7

Instruction Time

Resilient BDa +

bc + d

e +

f

Time Saved

Block Size

Resilient BD allows cycle time to change depending on input data

Assumes long data dependent cycles are rare: average performance improves

Power-

Optimized

Block Size

Async

Control

Logic

Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works

Page 9: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill

SELECTABLE DELAYS

Identify blocks with variable delay based on operation (manually)

Change delay of stage on cycle-by-cycle basis

Achieve averaging over time

8 Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works

Page 10: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill

RESILIENT BD ADVANTAGESExploits variations in delay regardless of logical function enabling a more generic approach

Allows designers to cut margins as timing violations will be detected and corrected

Tolerates and averages random delay variations

9 Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works

Page 11: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill

DELAY DISTRIBUTION

Delay

Num

ber

of O

pera

tions

Slower

Traditional design

runs at “worst-case” performance

REM design runs

at “average-case” performance

10

Worst-case operations occur infrequently. Wide variability in delay shows

potential to optimize designs for average case performance, saving time.

Distribution of Operation Delays in Synchronous Processor

Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works

Page 12: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill

VOLTAGE SCALING

11

Dropping voltage trades time savings for power savings

Matches synchronous performance at a lower voltage

Instruction TimePower

HL

1 2 3

1 2 3

HL

Time Saved

Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works

Page 13: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill

Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works

TIME SAVINGS CAN TRADE FOR EVEN LARGER POWER SAVINGS

• When peak speed not needed, run at lower Vdd

• Power drops faster than speed as Vdd drops => efficiency increases• Chart shows 28nm normalized logic efficiency (Gops/W) vs. Vdd

• Blue curve includes efficient, integrated buck and charge pump converters

• Async designs use 2-3x less power than sync designs [D. Hand, et al, "Blade -- A Timing Violation Resilient Asynchronous Template," ASYNC 2015

12

Page 14: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill

EDA FLOW OVERVIEW

13

Design flow on top of existing synchronous flow

Custom Tcl / Perl scripts to glue between tools and generate constraints

Benefits from advancements in tools & synchronous IP (e.g. RISC-V core)

Sync RTL

(Verilog)

Async

Synthesis

Place & Route,

Optimization

Timing

Analysis

Verification

Analog

Verification

Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works

Page 15: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill

Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works

SYNC TO ASYNC• Automated conversion of synchronous netlist to asynchronous

• Choose asynchronous logic domains• Replace flip-flops with latches• Add custom cells: delay lines, error detecting latches, and control logic

14

Async TCL Scripts ReSynthesis

Async

Netlist

Sync

RTLSync Synthesis

Sync

Netlist

Sync

Timing.sdf

Custom Cell Design/Verif/Char

Async Constraints

File (Tcl)

Sync Constraints

File (Tcl)

Page 16: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill

Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works

INTERACTIVE OPTIMIZATION, TIMING CLOSURE

Define virtual clock for each async controller to establish constraints

Periods, delays of clocks based on analysis of asynchronous circuit

Async Scripts analyze average case performance and generate constraintsIterate to obtain desired performance post-synthesis and place/routeDelay lines synthesized, modified during P&R

15

Place/Route

OptimizationAsync Scripts

Resynthesis

Modified

Netlist

Intermediate

Layout

Timing

ReportFinal

Layout

Target Hit

Target Missed

Performance

Target (Tcl)

Async

Netlist

Power Intent

(UPF)

Async Constraints

File (Tcl)Async Constraints

File (Tcl)

Page 17: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill

TECHNOLOGY SUMMARY

16

Power Savings

Smaller LogicLess Wasted

TimeLower Voltage

Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works

Page 18: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill

Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works

RECENT RESULTS

• 130nm computational logic ASIC• Flawless operation from 0.65 to 1.3V, silicon-verified

• 3.39M transistors in 10.35mm2 with 6 power domains

• Excellent power efficiency, matching simulation

• 28nm 3-stage OpenCore MIPS CPU• 14k gates

• 15,226 um2: 9% area increase vs. 14,000 um2 synchronous design

• 35% speedup vs. synchronous design

17

Page 19: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill

Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works

UPCOMING SILICON

• REM’s upcoming 22nm Machine Vision processor• Custom architecture tailored for edge computing

• Expected order-of-magnitude smaller power envelope than existing solutions

• Heterogeneous processors for maximum versatility• Dual RISC-V processors

• Powerful Tensilica Vision DSP

• Proprietary Asynchronous Neural Network Accelerator designed in-house

18

Vision

DSPAI Core

R1 Memory System

RISC-V

CPU

RISC-V

CPUISP

DDR USB GPIO I2C I2S UART SPI CSI

Page 20: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill

Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works

22FDX PROCESS

• 22nm Fully-Depleted Silicon-on-Insulator• FinFET performance at 28nm planar cost

• Energy efficient for logic, analog, RF

• Roadmap to 12nm FD-SOI

• Ultra-low power dissipation with supply voltage as low as 0.4V

• Body biasing can super-charge or throttle with no layout changes

Reference: https://www.globalfoundries.com/technology-solutions/cmos/fdx/22fdx

19

Page 21: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill

Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works

WHY ASYNCHRONOUS LOGIC

• Eliminates inefficiencies in synchronous RTL methodology• Waits for worst case transistor, process, voltage, temperature, noise

• It may finally be time• ILLIAC II was an early asynchronous processor, in 1962

• In ILLIAC’s first 3 weeks of operation, it discovered 3 new Mersenne primes

• (In one of my early school projects, I built the WILLIAC, powered by paper tape)

• Asynchronous arithmetic units are widely used

• Intel recently announced the Loihi neural asynchronous processor

20

Page 22: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill

Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works

WHY RESILIENT BUNDLED DATA

• Automatic conversion from sync RTL: no costly manual design

• Tolerates variations and recovers in realtime• Performance improves as supply drops (variations increase)

• Handles rapid changes in supply voltage gracefully

• 100s of timing constraints vs. 100K in previous async flows• Combinational blocks are similar to synchronous combinational blocks

• Meets urgent need for improved power efficiency

21

Page 23: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill

SUMMARY

22 Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works

• REM and ACW are building highly power-efficient processors

• REM’s asynchronous technology enables average-case design• Dramatic power-efficiency gains

• Can optimize asynchronous flow with latest IP advances

• Analog Circuit Works enhances power efficiency, enables SoCs• DDR Memory I/F, CSI Camera I/F, USB

• Integrated power management optimizes each circuit

• We are open to partnerships on products

Page 24: 50% Power Reduction Through Automated …...AnalogCircuitWorks 50% Power Reduction Through Automated Synthesis of an Asynchronous Microprocessor and Logic from Synchronous RTL Bill

Copyright 2018 Reduced Energy Microsystems, Analog Circuit Works

ANALOG CIRCUIT WORKS