60
Tut orial Tut orial Hot Chips Hot Chips 01 01 Jan M. Rabaey Jan M. Rabaey BWRC University of California @ Berkeley http://www.eecs.berkeley.edu/~jan Silicon Architectures f or Silicon Architectures f or Wireless Systems Wireless Systems Part 1 Part 1

S ilicon Archit ect ures f or W ireless S yst ems – Part 1

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Tut orial Tut orial Hot ChipsHot Chips 01 01

Jan M. RabaeyJan M. Rabaey

BWRC

University of California @ Berkeley

http://www.eecs.berkeley.edu/~jan

S ilicon Archit ect ures f orS ilicon Archit ect ures f or

W ireless S yst ems W ireless S yst ems –– Part 1 Part 1

The The FibonacciFibonacci Law on Wireless Growth Law on Wireless Growth

Source: Goldman-Sachs

“The number of world-wide wireless

subscribers (in tens of millions)

grows as a Fibonacci series”

1999 2000 2001 2002 2003

46

57

67

?

From Handsets to Mobile DevicesFrom Handsets to Mobile Devices

Internet access the most important driver

(text, graphics, multimedia)

Berkeley Infopad,

One of the first wireless

internet applicance

1990-1996

A A Smorgasboard Smorgasboard of Choicesof Choices

NARROWBANDNARROWBAND WIDEBAND WIDEBANDCIRCUIT PACKETCIRCUIT PACKETVOICE DATAVOICE DATA

9.6Kbps9.6Kbps

GSM

IS136

IS95

PCS1900

DCS1800

IS54B

22ndnd generation generation

HCSD

IS95-B

GPRS

64-384Kbps64-384Kbps

IS136+

generation 2generation 2.5.5

384-2000Kbps384-2000Kbps

WCDMA

Edge

cdma2000

HDR

W-TDMA

ARIB-WCDMA

WCDMA-FDD

33rdrd generation generation

And the AlternativesAnd the Alternatives

• Metropolitan Wireless Networks

– Various proprietary solutions

• Ricochet (up to 138 Kb/sec)

• Flarion Flash-OFDM (> 384 kBits/sec)

• Wireless LANs– 802.11 (b,a); Hyperlan

– from 1 to 56 Mbit/sec

– Restricted to the 50 meter range (at present)

(Projected) Growth in 802.11 WLAN(Projected) Growth in 802.11 WLAN

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

2001 2002 2003 2004 2005

1-2Mbps

10-11Mbps

20Mbps+

54 Mbps

Total

Units, k

Source: Cahners In-Stat 2001

The New InternetThe New Internet

Clusters

Massive Cluster

Gigabit Ethernet

Pre-1990:

Client-Server Systems

The 2000s:

Extending toward the Small

Enabled by integration

and wireless connectivity

The 1990s:

Conquering the World

The Network revolution

The Post-PC Era: The Distributed Approach toThe Post-PC Era: The Distributed Approach to

Information ProcessingInformation Processing

The emergence of ad-hoc wirelessThe emergence of ad-hoc wireless

networks networks –– the wire replacement the wire replacement

• Bluetooth, HomeRF

ü up to 800 kBit/sec

• Sensor networks

ü Low data-rates

The Evolving Wireless SceneThe Evolving Wireless Scene

Range

Dat

a R

ate

1m 10m 100m 1km 10km

1Kb

10Kb

100Kb

1Mb

10Mb

100Mb

Cellular (WAN)

3G Cellular

2.5 G Cellular

802.11 (LAN)

802.1a

Bluetooth (PAN)

Sensor networks

Metropolitan

More bit/secMore bit/sec

Compelling Issues in Wireless (1)Compelling Issues in Wireless (1)

Ubiquitous services put wireless spectrum at apremium

• Effective use of aether hampered by standardizationand fragmentation

• Current spectral efficiency far below theoreticallimits

• Emerging Solutions

– Adoption of better spectrum utilization techniques (interference

cancellation, multi-path fading mitigation and exploitation)

– multi-functional, adaptive systems

• But … huge appetite for computations

Evolution of MOPS Requirements inEvolution of MOPS Requirements in

CellularCellularFunction MOPS

Digital RRC Channel 3600

Searcher 2100

RAKE 2050

Maximal Ratio Combiner 24

Channel Estimator 12

AGC, AFC 10

Deinterleaver 15

Turbo Coder 90

TOTAL 7901

Single 384 kbps UTRA W-CDMA Channel

Source: IEEE Comm Theory Workshop, May 1999

1

10

100

1000

10000

GSM HSCSD GPRS 3G

DS

P M

IPS

Comm

Application

1

10

100

1000

10000

100000

0 2 4 6 8 10 12

SNR (db)

Rela

tive C

om

ple

xit

yThe Cost of Approaching ShannonThe Cost of Approaching Shannon’’s Bounds Bound

Courtesy Engling Yeo, UCB

The Bliss and Challenge of Error CodingThe Bliss and Challenge of Error Coding

8/9 Turbo, =4, N=4k

2/3 Turbo, =4, N=64k 1,2, and 3 iterations

1/2 Turbo, =4, N=64k 1,2,

and 3 iterations

8/9 LDPC, N=4k

1,3, and 5 iterations

8/9 Conv. Code,=3, N=4k

1/2 Conv. Code,=4, N=64k

2/3 Conv. Code,=4, N=64k

8/9 Capacity Bound

2/3 Capacity Bound

1/2 Capacity Bound

1/2 LDPC, N=107, 1100 iterations

for BER of 10-5

Dealing with Non-ideal Channels (e.g., fading)Dealing with Non-ideal Channels (e.g., fading)

• Multi-antenna approach exploitsmulti-path fading by sending dataalong good channels

• Results in large theoreticalimprovements in bandwidthefficiency for fading channels

• But…computationally hungry

)(tx

Array

Processing

1st path, 1 = 1

2nd path, 2 = 0.6

SNR (dB)0 5 10 15 20 250

5

10

15

20

25

30

Capacity (

bits/s

/Hz)

(4,4) With Feedback

(4,4) No Feedback(4,1) Orthogonal Design (1,1) Baseline

The Cost of Dealing with Non-ideal ChannelsThe Cost of Dealing with Non-ideal Channels

0

1000

2000

3000

4000

5000

6000

7000

Performance

MIPS

0.8 Mb/s

0.9 b/s/Hz

1.6 Mb/s

1.8 b/s/Hz

1.9 Mb/s

2.1 b/s/Hz

3.8 Mb/s

4.2 b/s/Hz

5.6 Mb/s

6.3 b/s/Hz

Data rate per user

Spectral efficiency

* Assume 25 MHz bandwidth and 28 users

Matched

Filter

Multi-User

Detector

OFDM +

Coding

OFDM +

Multi-Antenna

OFDM + Multi-

Antenna + Coding

Source: Ning Zhang, UCB

Shannon beats Shannon beats MooreMoore’’s s lawlaw

1

10

100

1000

10000

100000

1000000

10000000

1980

1984

1988

1992

1996

2000

2004

2008

2012

2016

2020

Algorithmic Complexity(Shannon’s Law)

Processor Performance (~Moore’s Law)

Courtesy: Ravi Subramanian (Morphics)

1G

2G

3G

Single-Chip DSPs are Lagging ...Single-Chip DSPs are Lagging ...

1

10

100

1000

10000

1980 1985 1990 1995 2000

Year

Megam

acs

DSP Trend: x 1.4/year

Moore’s law: x 1.58/year

Source: TI

DSPs

While algorithms are beating While algorithms are beating MooreMoore’’s s law!law!

Digital Processor Performance Digital Processor Performance

1.000E+00

1.000E+01

1.000E+02

1.000E+03

1.000E+04

1.000E+05

1.000E+06

1.000E+07

1.000E+08

1.000E+09

1.000E+10

1.000E+11

1.000E+12

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

memory

processors

1960 1970 1980 1990 2000 2010

100

10

1

0.1

0.01

0.001

No

rmali

zed

p

rocesso

r sp

eed

microprocessor/DSP

mA/ MIP

computational efficiency

Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld

memory

Tra

nsis

tors

/ch

ip

Courtesy of Ravi Subramanian (Morphics)

The Law of Diminishing ReturnsThe Law of Diminishing Returns

• More transistors are being thrown atimproving general-purpose CPU and DSPperformance

• Fundamental bounds are being pushed– limits on instruction-level parallelism

– limits on memory system performance

• Returns per transistor are diminishing– new architectures realizing only 2-3 instructions/clock

– increasingly large caches to hide DRAM latency

Some observationsSome observations• Von-Neuman style instruction set architectures

were perceived when switching devices andinterconnections were extraordinarily expensive,and multiplexing-in-time provided the mosteconomical solution– Intel 4004: 2000 transistors, 1 MHz clock frequency, 1 metal

layer

• This led to the “clock-speed” affixation, which infact is only a secondary measure of performance

• Power is rapidly becoming a limiting factor– Newest processors are including thermal sensors and

automatic slow-down (throttling) using pipeline bubbles andnop’s to combat overheating and meltdown

Compelling Issues in Wireless (2)Compelling Issues in Wireless (2)

“The Last Meter Problem”Ubiquitous wireless networking requiressteep reduction in cost and energydissipation

• To be acceptable, radio cost has to be below 1$

• Frequent battery replacement on 100’s of devicesunacceptable

• Technology not likely to be of major help

Energy to Play a Major RoleEnergy to Play a Major Role

1

10

100

1000

10000

100000

1000000

10000000

1980

1984

1988

1992

1996

2000

2004

2008

2012

2016

2020

Algorithmic Complexity(Shannon’s Law)

Processor Performance (~Moore’s Law)

Battery Capacity1G

2G

3G

Courtesy: Ravi Subramanian (Morphics)

Energy Trends in DSPsEnergy Trends in DSPs

C15 @ 3v

C52 @ 3v

32010 @ 5v

C25 @5v

ATT16xx @2.7V

1v DSP

C52 @ 5v

0.5v DSP

2v DSP

C5x @ 2v

C15 @ 5v

0.001

0.01

0.1

1

10

100

1000

1982

1984

1986

1988

1990

1992

1994

1996

1998

2000

2002

2004

2006

2008

Year

mW

/MIP

S

DSP

Power

Gene's

Law

Factor 1.6 reductionper year

Source: TI

Energy Trends in Energy Trends in DSPsDSPsGene (Frantz)Gene (Frantz)’’s Laws Law

Gene’s Law

DSP Power

1,000

100

10

1

0.1

0.01

0.001

0.0001

0.00001

mW

/MIP

S

1982

1984

1986

1988

1990

1992

1994

1996

1998

2000

2002

2004

2006

2008

YearSource: Gene Frantz (TI)

Energy to Play a Major RoleEnergy to Play a Major RoleA holistic perspectiveA holistic perspective

Energy = upper bound on the amount of availablecomputation

– Total Energy of Milky Way Galaxy: 1059 J

– Minimum switching energy for digital gate(1 electron@100 mV): 1.6 10-20 J (limited by thermalnoise)

– Upper bound on number of digital operations: 6 1078

– Operations/year performed by 1 billion 100 MOPScomputers: 3 1024

– Energy consumed in 180 years assuming a doublingof computational requirements every year.

Putting energy in perspectivePutting energy in perspective

• Energy cost of digital computation– 1999 (0.25µm): 1pJ/op (custom) … 1nJ/op (µproc)

– 2004 (0.1µm): 0.1pJ/op (custom) … 100pJ/op (µproc)

• Factor 1.6 per year; Factor 10 over 5 years

• Assuming reconfigurable implementation: 1 pJ/op

• Energy cost of communication– 1999 Bluetooth

• 1 nJ/bit transmission energy (thermal limit 30 pJ/bit)

– 2.4 GHz band, 10 m distance

• Overall energy: 170 nJ/bit reception / 150 nJ/bit transmission (!)

• Standby power: 300 µW

– 2004 Radio (10 m)

• Only minor reduction in transmission energy

• Reduce transceiver energy with at least a factor 10-50

The Changing MetricsThe Changing Metrics

• Power and/or Energy have becomedominant drivers

– Limiting factor for performance and reliability in

wall-plugged applications

– Enabler for wide-spread use of distributed

computing and data access

• Energy reduction requires jointoptimization process between applicationand implementation

The Changing MetricsThe Changing Metrics

• Design complexity, and “context complexity” issufficiently high that design verification is amajor limitation on time-to-market

• Cost of fabrication facilities and mask making hasincreased significantly

– NRE cost of new design has increased significantly

• Physical effects (parasitics, reliability issues,power management) are increasingly significantin the design process

– These must now be considered explicitly at the circuit level

The Changing MetricsThe Changing Metrics

Flexibility

Power

Cost

Performance as a Functionality ConstraintPerformance as a Functionality Constraint

((““Just-in-Time ComputingJust-in-Time Computing””))

Challenges in Single-Chip Radio DesignChallenges in Single-Chip Radio Design

Physical

+ RF

Mac/

Data Link

NetworkApplicationDataData

Data Acquisition

DataEncoding

DataFormatting

Mod/Demod

UI

ControlControl

Synchron-ization

SlotAllocation

CallSetup

Data and Time Granularity

nsecµsecmsecsecbitspacketsstreamssource data

RadioRadio

The Software RadioThe Software Radio

A/D Converter

D/A ConverterDSP

• Idea: Digitize (wideband) signal at antenna anduse signal processing to extract desired signal

• Leverages of advances in technology, circuitdesign, and signal processing

• Software solution enables flexibility andadaptivity, but at huge price in power and cost

• 16 bit A/D converter at 2.2 GHz dissipates 1 to 10W

The Mostly Digital RadioThe Mostly Digital Radio

DigitalBasebandReceiver

RF input(fc = 2GHz)

LNA

cos[2π(2GHz)t]

RF filter

chip boundary

I (50MS/s)

Q (50MS/s)

A/D

A/D

sin[2π(2GHz)t]

Analog Digital

The Opportunities The Opportunities ……• Scaling of technology, of course

– Performance doubles every 18 months

– Energy reduced by 10 every 5 years

• The system-on-a-chip– Integration of heterogeneous functions on a single die leads

to new solutions

• Novel architectures– Processor architectures exploiting concurrency (e.g. VLIW)

– Reconfigurable hardware

– Network-on-a-chip

• Novel circuit solutions– Voltage as a design variable

• Novel design methodologies– Communication-based Design

– Platform-based Design

Reconfigurable

DataPath

Reconfigurable

State Machines

Embedded uP

+ DSPs

FPGA

Dedicated

DSP

The Ideal The Ideal ““Radio-on-a-ChipRadio-on-a-Chip”” Platform Platform

• Heterogeneous

• Supports massive

concurrency

• Matches the computational

model

• Operates at minimum supply

voltage and clock frequency

• Provides flexibility only where

needed and desirable and at

the right granularity

Combines performance, flexibility and energy-efficiency

The System-on-a-Chip NightmareThe System-on-a-Chip Nightmare

““Femme seFemme se coiffant coiffant””

Pablo Ruiz PicassoPablo Ruiz Picasso

19401940

The System-on-a-Chip NightmareThe System-on-a-Chip Nightmare

Bridge

DMA CPU DSP

Mem

Ctrl.

MPEG

CI O O

System Bus

PeripheralBus

Control Wires

CustomInterfaces

The “Board-on-a-Chip”

Approach

Courtesy: Sonics, Inc

Current Integrated Wireless TransceiversCurrent Integrated Wireless Transceivers

AD

Timing

recovery

phone

bookJava

VM

ARQ

Keypad,

Display

Control

FiltersAdaptiveAntenna

Algorithms

Equalizers MUD

Accelerators

(bit level)

analog digital

Logic

A “Board-on-a-Chip” Approach

DSP

µprocASIC

Example: Single-Chip BluetoothExample: Single-Chip Bluetooth

Source: Alcatel, ISSCC 2001

System-on-a-ChipSystem-on-a-Chip

A Renaissance in DesignA Renaissance in Design

ApplicationsApplicationsMultimediaConsumerCommunications

ImplementationImplementationFabricsFabricsSilicon reuseSilicon networks

DesignDesignMethodologyMethodologyHard+Soft

Aart De Geus

DAC’99

ConvergenceConvergence

A Central Theme:A Central Theme:

Raising the Reuse FactorRaising the Reuse Factor

Reuse comes in generations

Generation Reuse e lement Status

1st Standard ce lls Well established

2nd IP blocks Be ing int roduced

3rd Architec ture Emerging

4th IC Ear ly research

Source: Theo Claasen (Philips) – DAC 00

Platform-Based DesignPlatform-Based Design

• A platform is a restriction on the space of possibleimplementation choices, providing a well-defined abstractionof the underlying technology for the application developer

• New platforms will be defined at the architecture-micro-architecture boundary

• They will be component-based, and will provide a range ofchoices from structured-custom to fully programmableimplementations

• Key to such approaches is the representation ofcommunication in the platform model

““Only the consumer gets freedom of choice;Only the consumer gets freedom of choice;

designers need freedomdesigners need freedom fromfrom choicechoice””

((OrfaliOrfali, et al, 1996, p.522), et al, 1996, p.522)

Source:R.Newton

Hardware PlatformsHardware Platforms

Hardware Platform: not only a fully specified SoC butalso a family of architectures that share somecommon feature:

A Hardware Platform is a family of architectures thatsatisfy a set of architectural constraints imposed toallow the re-use of hardware and softwarecomponents.

The stronger the constraints the more component re-use but stronger constraints imply fewerarchitectures to choose from!

Hardware Platforms Not Enough!Hardware Platforms Not Enough!

• Hardware platform has to be abstracted

• Interface to the application software is API

• Software layer performs abstraction:

– Programmable cores and memory subsystem with

RTOS

– I/O subsystem via Device Drivers

Software PlatformsSoftware Platforms

Output DevicesInput devices

Hardware Platform

I O

Hardware

Software

network

Software Platform

Application Software

Platform API

Software Platform

RT

OS

BIOS

Device Drivers

Net

wo

rk

Co

mm

un

icat

ion

The Platform TensionThe Platform Tension

Architectural Space

Application Space

The Platform ApproachThe Platform Approach

Architectural Space

Application Space

Application Instance

Platform Instance

System

Platform

Platform Design Space

Exploration

Platform

Specification

Source:

Alberto Sangiovanni-Vincentelli

TM-xxxxD$

I$

Tr iMedia CPU

DEVICE I /P BLOCK

DEVICE I /P BLOCK

DEVICE I /P BLOCK

.

.

.

DVP System Silicon

VLIW MediaProcessor :• 100 to 300+MHz• 32-bit or 64-bit

Nexper iaSystem Busses• PI bus• Memory bus• 32-128 bit

PI

BU

S

SDRAM

MMI

DV

P M

EM

OR

YB

US

DEVICE I /P BLOCK

PRxxxxD$

I$

MIPS CPU

DEVICE I /P BLOCK

.

.

.

DEVICE I /P BLOCK

PI

BU

S

Genera lPurpose RISCProcessor• 50 to 300+ MHz• 32-bit or 64-bitLibra ry ofDevice Blocks• Image

coprocessors• DSPs• UART• 1394• USB

•…and more

Tr iMedia TMMIPS TM

Example: PhilipsExample: Philips Nexperia NexperiaTMTM

DVP DVP

Flexible a rchitec ture for digita l video applica t ion sSource: Theo Claasen (Philips) – DAC 00

NexperiaNexperiaTMTM

ScalabilityScalability

VLIW

SDRAM

MMI

Tr iMedia CPU + Device blocks when

control func t ions a reminima l

MIPS CPU +Tr imedia CPU

replac ing some Device blocks

RISC

SDRAM

VLIWMMI

• Single architecture, multiple product configurations– Processor core options - TM32, TM64, MIPS32, MIPS64 ...

– Device block options

• Highly programmable to weakly programmable

MIPS CPU +Device blocks +

Software

RISC

SDRAM

MMI

Source: Theo Claasen (Philips) – DAC 00

An Example of an InstantiationAn Example of an Instantiation

Philips Nexperia NX-2700

A programmable HDTV

media processor

Combines Trimedia VLIW with

Configurable media co-processors

Pleiades:Pleiades:

Digital Wireless PlatformDigital Wireless Platform

AD

Timing

recovery

phone

book

Java

VM

ARQ

Keypad,

Display

Control

FiltersAdaptive

Antenna

Algorith

ms

Equalizers MUD

Accelerators

(bit level)

analog digital

DSP core

uC core

(ARM)

Logic

Dedicated Logic

and Memory

Source: Berkeley Wireless Research Center

The Architectural ChoicesThe Architectural Choices

µP

Prog Mem

MACUnit

AddrGenµP

Prog Mem

µP

Prog Mem

Satellite

ProcessorDedicated

Logic

Satellite

Processor

Satellite

Processor

GeneralPurpose

µP

Software

DirectMapped

Hardware

HardwareReconfigurable

Processor

ProgrammableDSP

Fle

xib

ility

1/Efficiency

An Attractive Option:An Attractive Option:

Multi-Processor System-on-a-ChipMulti-Processor System-on-a-Chip

Courtesy: Chris Rowen, Tensilica

Copyright Tensilica, Inc 2001

“The New NAND gate”

The Energy-Flexibility GapThe Energy-Flexibility Gap

Embedded ProcessorsSA1100.4 MIPS/mW

ASIPsDSPs 2 V DSP: 3 MOPS/mW

DedicatedHW

Flexibility (Coverage)

En

ergy E

ffic

ien

cy

MO

PS

/mW

(or

MIP

S/m

W)

0.1

1

10

100

1000

ReconfigurableProcessor/Logic

Pleiades10-80 MOPS/mW

Communication-Based DesignCommunication-Based Design

DSPCPU

MPEG

MemCtrl.

C

Bridge

Orthogonalizing Communication fromOrthogonalizing Communication from

BehaviorBehavior

• Historically lots of work on Behavior– hierarchy well established

– several descriptions available (with variable levels of precision)

– synthesis available

• Communication less well investigated– hard to separate from behavior, usually intertwined

– telecomm protocols are the best existing example

• Need to understand Formalism, Abstraction, andDecomposition for communication

DEFINED BY MODELS OF COMPUTATION!

Communication-based DesignCommunication-based Design

DSP MPEGCPUDMA

C MEMI O

An Integrated Radio Processor (TCI)An Integrated Radio Processor (TCI)

Embedded

Processor

MemoryMemory

Sub-systemSub-system

Baseband

Processing

FixedFixed

Protocol StackProtocol Stack

Programmable

Protocol Stack

The The ““Network-on-a-ChipNetwork-on-a-Chip””

N Inputs

B Buses

M Outputs

Multi-Bus

cluster

cluster

cluster

Hierarchical MeshMesh

Module

Platform ExplorationPlatform Exploration

The Y-Chart ApproachThe Y-Chart Approach

ApplicationsApplicationsArchitecture

Instance

Mapping

Applications

Performance

Analysis

Performance

Numbers

Source: B. Kienhuis

• Technology scaling is redefining the term “complexity”

• System-on-a-Chip fosters renaissance in processorarchitecture, opening the door for new models andcombinations thereof:Platform and Communication Based Design

• SOC for wireless driven by new set of metrics: how tosimultaneously optimize flexibility, cost, energy, andperformance?

• Application-Architecture Exploration is Focal Part ofImplementation Methodology

Summary and PerspectiveSummary and Perspective