37
Sketching ( in ) Hardware Jonathan Bachrach + Huy Vo + Andrew Waterman + Christopher Celio Patrick Li + Ben Keller + Palmer Dabbelt + Sebastian Mirolo + John Wawrzynek + Krste Asanovi´ c+ many more faculty @ EECS UC Berkeley cofounder @ Otherlab July 21, 2013

Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Sketching ( in ) Hardware

Jonathan Bachrach +Huy Vo + Andrew Waterman + Christopher Celio

Patrick Li + Ben Keller + Palmer Dabbelt +Sebastian Mirolo + John Wawrzynek + Krste Asanovic +

many more

faculty @ EECS UC Berkeleycofounder @ Otherlab

July 21, 2013

Page 2: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

I Have a Hardware Sketching Dream 1

i want to sketcharbitrary hardware building blocksbigger blocks from smaller blocksall the down to digital logic

pwmradio cpu

r/cservo usb

i2cmemctlreth

quaddec-oder

Page 3: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Sketching All The Way Down 2

Can sketch both audio scripts and enginesCan delay decision of what’s script and what’s engine

Audio Scripting

Audio Engine

DSP Code

Page 4: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Can Sketch Truly Reusable Modules 3

sketch as succinct specification as generatorparameterized by numbers, types, functionsabstract data typesprocedural construction

Page 5: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Open Source and Networkable 4

open sourcecomplete library of all componentsapt-get interfacecommon interface

pwm

radio

cpu

r/cservo

lcddriver

usb

i2c memctlr

eth filter

quaddec-oder

accel-erator

Page 6: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Want Powerful + Inexpensive Logic Substrate 5

=>

eat

sub

andmux

not

rnd

mux

or

rnd

not

ltand

add

reg

add eq

add

lt

sub

and

muxreg

rnd mux

add eat

=>

eat

sub

and

mux

not rnd

mux

or

rnd

not

lt

and

addreg

add

eq add

lt

sub and

muxreg

rnd

mux

add

eat

fast clock ratesscalable parallelismfast compilationautomatically mappedlogic, blocks, chipssketchable

Page 7: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

State of Art 6

Specification

Ctoo high levelnot enough parallelism

Verilogclumsy and longwindedminimal abstraction

Simulinklimited parameterizationWYSIWYG wiring

limited reusability!lots of manual steps!

Realization

Network of DSPslimited hardware choiceshard to meet timing

FPGAslow to compile forno virtualization

ASICcomplexexpensive

tedious to programslow to compile

Page 8: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

today 7

chiseldesign hw like softwaresoup to nuts

DREAMERnew highly programmable hardware fabricfast, cheap and scalable

Page 9: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Chisel is ... 8

Best of hardware and softwaredesign ideasEmbedded within Scala languageto leverage mindshare andlanguage designNot Scala -> VerilogAlgebraic construction and wiringHierarchical, object oriented, andfunctional constructionAbstract data types and interfacesBulk connectionsMultiple targets

Simulation and synthesisMemory IP is target-specific

single source

CPUC++

FPGAVerilog

ASICVerilog

Chisel

multiple targets

Page 10: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

The Scala Programming Language 9

Compiled to JVMGood performanceGreat Java interoperabilityMature debugging, execution environments

Object OrientedFactory Objects, ClassesTraits, overloading etc

FunctionalHigher order functionsAnonymous functionsCurrying etc

ExtensibleDomain Specific Languages (DSLs)

Page 11: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Primitive Datatypes 10

Chisel has 3 primitive datatypesUInt – Unsigned IntegerSInt – Signed IntegerBool – Boolean value

Can do arithmetic and logic with these datatypes

Example Literal Constructions

val sel = Bool(false)

val a = UInt(25)

val b = SInt(-35)

where val is a Scala keyword used to declare variables whose valueswon’t change

Page 12: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Aggregate Data Types 11

Bundle

User-extendable collection of values with named fieldsSimilar to structs

class MyFloat extends Bundle {

val sign = Bool()

val exponent = UInt(width=8)

val significand = UInt(width=23)

}

Vec

Create indexable collection of valuesSimilar to array

val myVec = Vec(5){ SInt(width=23) }

Page 13: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Abstract Data Types 12

The user can construct new data typesAllows for compact, readable code

Example: Complex numbersUseful for FFT, Correlator, other DSPDefine arithmetic on complex numbers

class Complex(val real: SInt, val imag: SInt)

extends Bundle {

def + (b: Complex): Complex =

new Complex(real + b.real, imag + b.imag)

...

}

val a = new Complex(SInt(32), SInt(-16))

val b = new Complex(SInt(-15), SInt(21))

val c = a + b

Page 14: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Example 13

class GCD extends Module {

val io = new Bundle {

val a = UInt(INPUT, 16)

val b = UInt(INPUT, 16)

val z = UInt(OUTPUT, 16)

val valid = Bool(OUTPUT) }

val x = Reg(resetVal = io.a)

val y = Reg(resetVal = io.b)

when (x > y) {

x := x - y

} .otherwise {

y := y - x

}

io.z := x

io.valid := y === UInt(0)

}

GCD

Bool

UFix

valid

z

UFix

UFix

b

a

Page 15: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Valid Wrapper 14

class Valid[T <: Data](dtype: T) extends Bundle {

val data = dtype.clone

val valid = Bool()

override def clone = new Valid(dtype)

}

class GCD extends Module {

val io = new Bundle {

val a = UInt(INPUT, 16)

val b = UInt(INPUT, 16)

val out = new Valid(UInt(OUTPUT, 16))

} }

...

io.out.data := x

io.out.valid := y === UInt(0)

}

Bool

T

valid

data

Page 16: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Function Filters 15

abstract class Filter[T <: Data](dtype: T) extends Module {

val io = new Bundle {

val in = new Valid(dtype).asInput

val out = new Valid(dtype).asOutput

} }

class FunctionFilter[T <: Data](f: T => T, dtype: T) extends Filter(dtype) {

io.out.valid := io.in.valid

io.out := f(io.in)

}

Bool

UFix

valid

data

Bool

UFix

valid

data

f

Page 17: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Clipping Filter 16

def clippingFilter[T <: Num](limit: Int, dtype: T) =

new FunctionFilter(min(limit, max(-limit, _)), dtype)

Bool

UFix

valid

data

Bool

UFix

valid

data

clip

Page 18: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Shifting Filter 17

def shiftingFilter[T <: Num](shift: Int, dtype: T) =

new FunctionFilter(_ >> shift, dtype)

Bool

UFix

valid

data

Bool

UFix

valid

data

shift

Page 19: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Chained Filter 18

class ChainedFilter[T <: Num](dtype: T) extends Filter(dtype) = {

val shift = new ShiftFilter(2, dtype)

val clipper = new ClippingFilter(1 << 7, dtype)

io.in <> shift.io.in

shift.io.out <> clipper.io.in

clipper.io.out <> io.out

}

Bool

UFix

valid

data

Bool

UFix

valid

data

Shift

Bool

UFix

valid

data

Bool

UFix

valid

data

clip

Bool

UFix

valid

Bool

UFix

valid

datadata

Page 20: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Functional Composition 19

Map(ins, x => x * y)

* y

* y

* y

ins[0]

ins[1]

ins[2]

Chain(n, in, x => f(x))

f f fin

Reduce(ins, Max)

Max

Max

Max

ins[0]ins[1]

ins[2]

ins[3]

Page 21: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Generator 20

def delays[T <: Data](x: T, n: Int): List[T] =

if (n <= 1) List(x) else x :: taps(Reg(x), n-1)

def FIR[T <: Num](hs: Seq[T], x: T): T =

(hs, delays(x, hs.length)).zipped.map( _ * _ ).reduce( _ + _ )

class TstFIR extends Filter(SInt(width = 8)) {

val io = new Bundle{ val x = SInt(INPUT, 8); val y = SInt(OUTPUT, 8) }

val h = Array(SInt(1), SInt(2), SInt(4))

io.y := FIR(h, io.x)

}

Page 22: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Chisel Audio Support 21

Flo and Dbl data types and opsAdd FP support in C++ backendAudio harness with mics, speakers, and controls

Page 23: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Emulated Korg Monotron 22

Monotron is a portable classic analog synthBuilt out of SawWave, LFO, mixer, and VCFUse laptop / C++ for emulationUse BCF-2000 USB based mixer for controls

Page 24: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Chiseled Korg Monotron 23

class Monotron extends Module {

val io = new Bundle {

val swof = Dbl(INPUT);

val lfof = Dbl(INPUT); val lfoi = Dbl(INPUT);

val vcfc = Dbl(INPUT); val vcfq = Dbl(INPUT);

val out = Dbl(OUTPUT);

}

val lfo = io.lfoi * SawWave(io.lfof);

val vco = SawWave(io.swof + lfo)

val vcf = VCF(io.vcfc, io.vcfq, vco);

io.out := vcf

}

LFO VCO VCF*

LFOI

LFOF

SWOF VCFC VCFQ

+

Page 25: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Wiring All The Way Down 24

Can write both audio scripts and engines in ChiselCan choose which part is baked into hardwareFor example, can map entire DSP to FPGA or ASIC

Audio Scripting

Audio Engine

DSP Code

Page 26: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Chisel Graph Execution on DREAMER 25

spatial fabric of graph execution tilesmap piece of graph to each corehave network route intertile dataflow valuesuse dataflow scheduling to hide latencycoarser grained high level chisel instructions

eat

sub

and

mux

not rnd

mux

or

rnd

not

lt

and

addreg

add

eq add

lt

sub and

muxreg

rnd

mux

add

eat

Page 27: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

DREAMER Workflow 26

=>

eat

sub

andmux

not

rnd

mux

or

rnd

not

ltand

add

reg

add eq

add

lt

sub

and

muxreg

rnd mux

add eat

=>chisel graph netlist

=>

eat

sub

and

mux

not rnd

mux

or

rnd

not

lt

and

addreg

add

eq add

lt

sub and

muxreg

rnd

mux

add

eat =>

eat

sub

and

mux

not rnd

mux

or

rnd

not

lt

and

addreg

add

eq add

lt

sub and

muxreg

rnd

mux

add

eat

netlist layout execution

Page 28: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

DREAMER Properties 27

efficient to compile to – 10-100x faster than FPGAefficient to run – nearly as fast as FPGAsquick to probe any signal – no recompile necessaryeasily scalable – multiple chipseasy to map large designs – auto FAME + nice DRAM interface

additional facilitiesdebugging and tracingactivity counters for energyfault injection

eat

sub

and

mux

not rnd

mux

or

rnd

not

lt

and

addreg

add

eq add

lt

sub and

muxreg

rnd

mux

add

eat

Page 29: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

FPGA Mapping Opportunity 28

FPGAs have great density and economies of scaleprogram FPGA with DREAMER oncethen throw away Xilinx toolsmatch DSP + BRAM densitymap to few BRAMs using port schedulingdouble pump BRAM for extra ports

BRAM

DSP

LUTs

Registers

DSP

dreamer.scala bitstream

cpu.scala

Zynq

DREAMERZynq

cpu emulator

cpu.dm

dreamer.vchisel xilinx tools

chisel

Page 30: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Chisel is Real 29Digital Circuits Written in Chisel

Page 31: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Chisel is Open Source 30

chisel.eecs.berkeley.edu

BSD Licensecomplete set of documentationone goal is creation of library of high level and reusable components

Page 32: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Chisel Contains Library of Modules 31

queues, pipe,prioritymux, decoders, encoders,fixed-priority arbiters, round-robin arbiters,popcount, scoreboardsROMs, RAMs, CAMs, TLB, caches, prefetcher,integer ALUs, LFSR, Booth multiplier, iterative dividerIEEE-754/2008 floating-point units

Page 33: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

RISC-V 32

fifth Berkeley RISC ISAopen source specificationfast functional simulatorboots linuxlots of open source implementations

Page 34: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Teaching Computer Architecture with Sodor 33

+4

Instruction Mem

RegFile

IType SignExtend

DecoderData Mem

ir[21:17]

branchpc+4

pc_s

el

ir[21:10]

rs1

ALU

ControlSignals

wb_

sel

RegFile

rf_w

en

val

mem

_rw

PC

tohosttestrig_tohost

cpr_en

mem

_val

addrwdata

rdata

Inst

JumpTargGen

BranchTargGen

ir[26:22]

ir[31:27],ir[16:10]

PC+4jalr

12

rs2

BranchCondGen

br_eq?br_lt?

co-p

roce

ssor

regi

ster

s ir[31

:27]

jump

ir[26:7]

wa_sel

Execute Stage

br_ltu?

1

PC

addr

BType Sign Extend

ir[31:7]

JumpRegTargGen

Op2Sel

Op1SelAluFun

data

wa

wd

en

addr data

1 stage

+4

Instruction Mem

RegFile

IType Sign Extend

DecoderData Mem

ir[21:17]

branchpc+4

pc_s

el

ir[21:10]

rs1

ALU

ControlSignals

wb_

sel

RegFile

rf_w

en

val

mem

_rw

PC

tohosttestrig_tohost

cpr_en

mem

_val

addrwdata

rdata

nop

if_ki

ll

IR

JumpTargGen

BranchTargGen

ir[26:22]

ir[31:27],ir[16:10]

PC+4

jalr

12

rs2

BranchCondGen

br_eq?br_lt?

co-p

roce

ssor

regi

ster

s ir[31

:27]

jump

ir[26:7]

wa_sel

Fetch Stage Execute Stage

br_ltu?

1

PC

addr

BType Sign Extend

ir[31:7]

JumpRegTargGen

Op2Sel

Op1SelAluFun

data

wa

wd

en

addr data

2 stage

+4

Instruction Mem

RegFile

IType Sign Extend

ir[26:22]

br or jmp

pc+4

pc_s

el

ir[21:10]

Decoder

val

PC

tohosthtif_tohost

cpr_en

Data Mem

mem

_rw

mem

_val

addrwdata rdata

bubble

if_kill

IR

ir[31:27],ir[16:10]

jalr

rf_rs2

ir[26:7]

Decode Stage

BranchCondGen

br_eq?br_lt?br_ltu?

PC

addr

BType Sign Extend

ir[31:7]

Op2SelALU

AluFun

data Reg

File

rf_w

en

ir[31

:27]

wa_sel

1

wa

wd

en

addr data

PC

RS2

OP2

RS1ALUOUT WBData

RS2

RS1

rf_rs1

Execute Stage Memory Stage Writeback StageFetch Stage

pc+4

Ctrl

ir[21:17]

ControlSignalsbubble

dec_kill

}

+

Branch & JumpTargGen

<< 1

JType Sign Extend

LType Sign Extend

<< 12

adder

wb_

sel

wb_

sel

co-p

roce

ssor

regi

ster

s

+4

bypa

sses

5 stage

microcode

out of order

Page 35: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

UC Berkeley Classes 34

2x CS152 – Undergraduate Computer ArchitectureSodorMulticore and Vector

2x CS250 – VLSI System DesignProcessorsImage Processing

1x CS294-88 – Declarative Design SeminarHigh Level SpecificationAutomated Design Space Exploration

Page 36: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Outside Projects 35

NOC generator – MSRMonte Carlo Simulator – TU KaiserslauternPrecision Timed Machine (PRET) – Edward Lee’s GroupChisel-Q – Quantum Backend – John Kubiatowicz’s Group

Page 37: Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization

Conclusions 36

sketching all the way downpowerful new hardware substratetruly open source reusable hardwareprintable electronics ready

fundingProject Isis: DoE Award DE-SC0003624.Par Lab: Microsoft (Award #024263) and Intel (Award #024894)funding and by matching funding by U.C. Discovery (Award#DIG07-10227). Additional support came from Par Lab affiliatesNokia, NVIDIA, Oracle, and Samsung.ASPIRE: DARPA PERFECT program, Award HR0011-12-2-0016.