29
VLSI Digital Signal Processing EEC 281 Lecture 1 Bevan M. Baas Tuesday, January 9, 2018

VLSI Digital Signal Processing - UC Davis ECEbbaas/281/notes/Lecture01.pdf · VLSI Digital Signal Processing EEC 281 Lecture 1 ... •I assign a letter grade only for the final course

Embed Size (px)

Citation preview

VLSI Digital Signal Processing

EEC 281

Lecture 1

Bevan M. Baas

Tuesday, January 9, 2018

EEC 281, B. Baas3

Today

• Administrative items

• Syllabus and course overview

• My background

• Digital signal processing overview

• Read Programmable DSP Architectures, Part Iby E. A. Lee

EEC 281, B. Baas4

Course Communication

• Email– Urgent announcements

• Web page– http://www.ece.ucdavis.edu/~bbaas/281/

• Office hours– After lecture Tuesday

– Tentatively Wednesday afternoon

– After lecture Thursday

EEC 281, B. Baas5

Course Workload

• 4 unit graduate course

• This course requires significant effort and time– Multi-disciplinary field coverage

• DSP algorithms

• Digital processor architectures

• Arithmetic

– Utilizes robust industry-standard CAD tools (but we will make use of only the core essential features)

• Verilog

• Synthesis tool

• Matlab

EEC 281, B. Baas6

Course Overview

• EEC 281 web page contents– Reading materials

and references

– Hwk/Project descriptions

– Handouts– http://www.ece.ucdavis.

edu/~bbaas/281/

EEC 281, B. Baas7

Course Overview

• Canvas– Grades posted here

• Let me know if you ever see a score different than you expect

– Upload electronic portions of hwk/projects here

• Syllabus– Posted on course web page

EEC 281, B. Baas8

Lectures

• Ask questions at any time

• Please hold conversations outside of class

• Please silence phones

• Integrated Solid-State Circuits Conference (ISSCC) February 12–14– Quiz and guest lecture on Tue, Feb 13; or possibly a special

make-up lecture

• I assign a letter grade only for the final course grade

• I look at the final exams and course record of the class and assign two key dividing points: the A/A+ and (probably B/B+) boundaries, and assign course grades from there using equally-sized intervals

– No required numbers of any particular letter grades

– Absolute scores are not important; the boundaries shift accordingto the difficulty of the exams in any quarter

– Ignore any letter grades you might see on canvas

EEC 180A, B. Baas9

Letter Grade Assignments

A A+

A/A+

B/B+

B B+

Example with hypothetical data:

© B. Baas19

Critical Challenges Facing Industry

• Energy Efficiency

• Performance

• Software development cost and time

• Hardware development cost and time

• Opportunity: Critical workloads sometimes/frequently have

relatively simple tasks as critical kernels

(e.g., machine learning, digital signal processing,

multimedia, data record processing, pattern matching, etc.)

– Embedded (e.g., IoT)

– Mobile

– Datacenter

Number of Processors on a Single Die vs. Year

23

Note: Each processor capable of independent program execution

Academic

Industry

© B. Baas24

Processor Eras

• Transistor Era: the Intel 4004 was

the first commercial single-chip

microprocessor and it contained

2300 hand-drawn transistors

• Single/Multi-Processor Era: focus

on components of single processors

and multi-processors, which

generally scale well to only small

numbers of processors

• 1000-Processor Era: focus on

making systems scalable and

working with processors as

building blocks. The 32 nm

1000-processor KiloCore chip would contain approximately 2300-3700

processors if its area were the same as a 32 nm Intel Core i7 processor,

or 11,000 processors if its area were the same as an Nvidia GP100

EEC 281, B. Baas25

• Basic trends– Number of available devices: continually increasing– Energy dissipation per operation: decreasing too slowly

• There are a lot of ways to place and connect a billion transistors• The most efficient implementations (throughput, energy, area) will have:

– Processor sizes that capture computational kernels with few excess circuits

– Optimized clock frequencies and supply voltages matched to dynamic workloads

Future Fabrication Technologies

VDD1VDD3

VDD2

26

Optimal Computational Tile Size

• The most efficient implementations (energy, throughput, chip area) have: Processor sizes that capture computational kernels with few excess circuits

~~~ ~~~

Energy Effic.

Clock rate

Area Effic.

Unused or low

benefit-per-cost

circuits

Inter-

processor

interconnect

Tile Size

Single Processor

Cores per

area of

ARM A9

22 mW/GHz

per

0.055 mm^2 area

ARM Cortex-A9 1 1643 mW/GHz

Intel Atom Clover Trail 1.5 1120 mW/GHz

ARM Cortex-A15 7.8 212 mW/GHz

MIT RAW 8.3 198 mW/GHz

UC Davis KiloCore 74.7 22 mW/GHz

27

Some Benefits of Fine-Grain Many-Core

• Die drawn approximately

to scale

UC Davis

KiloCoreMIT

RAW

ARM

Cortex-A9

ARM

Cortex-A15

Intel Atom

Clover Trail Saltwell

KiloCore Chip

Slide 28

7.8

2 m

m

7.67 mm8 mm

8 m

m

Technology32nm IBM

PDSOI CMOS

Num. Procs. 1000

Num. Mems. 12

Num. Oscs. 2012

Die Area 64 mm2

Array Area 60 mm2

Transistors 621 Million

C4 Bumps 564 (162 I/O)

Package676 Pad

Flip-Chip BGA

KiloCore Chip

29

EEC 281, B. Baas31

Advancing CMOS Technologies

• Moore’s “Law” (Observation) was made in 1965 and notes that transistor density ~doubles every year (every 1.5 years now)

• "Cramming more components onto integrated circuits," Gordon Moore, Electronics, April 19, 1965.

© B. Baas32

Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten

New plot and data collected for 2010-2015 by K. Rupp

New data added by B. Baas

Number of Logical Cores

Transistors(thousands)

EEC 281, B. Baas33

Digital Signal Processing

• Digital– Discrete time

– Discrete valued

• Signal– 1, 2, 3,… dimensional

• Processing– Analysis

– Synthesis

– Enhancement

EEC 281, B. Baas34

DSP Workloads

• Often “real-time”– Data producer and consumer can not be paused or held up

• Examples: antenna, controller, camera, video monitor,…

– Very strict minimum performance levels

– Performance above that minimum is often of little value

DSPsystem

Dataproducer

Dataconsumer

EEC 281, B. Baas35

DSP Workloads

DSPsystem

Dataconsumer

Synthesis. Ex: music keyboard

DSPsystem

Dataproducer

Analysis. Ex: anti-lock brakes

Maybe

MSamples/sec

Maybe

1 Sample/sec

EEC 281, B. Baas36

DSP Workloads

• Data stream can be considered infinite duration– Length of data stream >> any buffering

– Ex: high-pass filter, automotive collision-detection radar distance measurement system

DSPsystem

1

0

0

1

0

0

1

0

0

1

1

0

0

1

1

0

1

0

0

0

1

1

1

0

……… …

EEC 281, B. Baas37

DSP Workloads

• Digital signal processing

• Typically very numerically intensive– Lots of +, -, x

DSP

system1

0

0

1

0

0

1

0

0

1

1

0

0

1

1

0

1

0

0

0

1

1

1

0

……… …

EEC 281, B. Baas38

DSP Compared withAnalog Processing

• Digital signal processing– Compare with analog signal processing

• If possible in analog domain (at required precision), analog processing will likely require far fewer devices

• If possible in analog domain, either domain may produce the most energy-efficient solution

• Many algorithms are possible only with DSP (arbitrarily high precision, non-causal, …)

• DSP arithmetic is completely stable over process, temperature, and voltage variations

– Ex: 2.0000 + 3.0000 = 5.0000 will always be true as long as the circuit is functioning correctly

EEC 281, B. Baas39

DSP Compared withAnalog Processing

• Digital signal processing– Compare with analog signal processing

• DSP energy-efficiencies are rapidly increasing

• Once a DSP processor has been designed in a portable format (gate netlist, HDL, software), very little effort is required to “port” (re-target) the design to a different processing technology. Analog circuits typically require a nearly-complete re-design.

• DSP capabilities are rapidly increasing

– Analog A/D speed x resolution product doubles every 5 years

– Digital processing performance doubles every 18-24 months (6x to 10x every 5 years)

EEC 281, B. Baas40

Common DSP Applications

• Early applications

– Instrumentation

– Radar

– Communication

– Imaging

• Current applications

– Consumer audio, video

– Networking

– Telecommunications

– Machine learning

– Imaging

– Many many more…

EEC 281, B. Baas41

Consumer Products’ Trends

• Analog based Digital based– Music records, tapes CDs, MP3s

– Video VHS, 8mm DVD, Blu-ray, H.264, H.265

– Telephony analog mobile (1G) digital (4G, LTE,…)

– Television NTSC/PAL digital (DVB, ATSC, ISDB, …)

– Many products use digital data and “speak” digital: computers, networks, digital appliances

• Impacts– Processing

– Transmission

– Storage

– etc.

EEC 281, B. Baas42

Future Applications

• Very limited power budgets

• Require significant digital signal processing

EEC 281, B. Baas43

Key Design Metrics

1) Performancea) Throughput (high); e.g., 250 MSamples/secb) Latency (low); e.g., 2.7 µsec from first sample in -> first outc) Numerical precision

2) Chip area (cost); e.g., mm2 die area, area of standard cell netlist

3) Energy dissipation per workload, e.g., Joules per JPEG image

4) Design complexity

– Design time = lower performance

– Software more important as systems become more complex

5) Suitability for future fabrication technologies

– Many transistors

– Faulty devicesi) During manufacturing processii) device wear out due to effects such as NBTI