A 0.4-V UWB Baseband Processor...Outline UWB Specifications and System Architecture Baseband...

Preview:

Citation preview

ISLPED – Portland, OR – August 28, 2007

A 0.4-V UWB Baseband Processor

Vivienne Sze

Anantha P. Chandrakasan

Massachusetts Institute of Technology

OutlineOutline

� UWB Specifications and System Architecture

� Baseband Algorithm and Architecture

� Parallelism for Energy Efficiency� aggressive voltage scaling� reduced receiver on-time

� Chip Measurements

� Conclusions

UltraUltra --wideband (UWB) Radiowideband (UWB) Radio

distance1m 10m 100m

500Mb

50Mb

5Mb

500kb

WLAN

Wireless USB &

Multimedia

Locationing/Tagging

� Advantages of UWB communications include�High Data Rate�Excellent Multi-path Resolution�Low Interference

� Integrate UWB radios on battery operated devices

� Need an energy efficient UWB System

Narrowband

Narrowband

UWB

UWB

Possible UWB ApplicationsUWB versus Narrowband

freq

time

UWB System ArchitectureUWB System Architecture

3.1 10.6-61.3

-51.3

-41.3

Frequency [GHz]

Pow

er d

ensi

ty [d

Bm

]

14 Channel Frequency Plan

Figure courtesy of D.Wentzloff

Sampling Rate500 MSPS

Data Rate 100 Mbps

[F.S. Lee et al., ICUWB 2006]

Packet Structure (PHY layer)Packet Structure (PHY layer)

� Goal : Reduce overhead energy (acquisition)

� Majority of acquisition energy spent on computation of cross-correlation

PREAMBLE PAYLOAD

...

State 1

Channel

Estimation

State 1

Acquisition

State 2

Demodulation

Packet Begins

Receiver

Turns OFF

Receiver

Turns ON

40ns

10ns

... Payload BitsGold

Code

31-bits

Gold

Code

Gold

Code

Gold

Code

Gold

Code

CrossCross --Correlation ComputationCorrelation Computation

∑=

−×=619

0

][][][k

nkxkhny

� Points can be computed independently

Baseband ArchitectureBaseband Architecture

10100110

� Cross-correlation requires a fixed number of operations

� Reduce energy of each operation in order to reduce baseband energy

� Map operations to architecture that reduces system energy

Energy Efficiency Using ParallelismEnergy Efficiency Using Parallelism

� Ultra-Low Voltage Operation - Maintain Throughput (L)� Reduce supply voltage � Minimize energy of baseband processor

� Reducing Acquisition Time - Parallelized Computation (M)� Reduce receiver on-time � Minimize energy of entire receiver (system)

Exploit TWO forms of parallelism in Correlator Bank

Baseband Energy SavingsBaseband Energy Savings

� Correlators compute the cross-correlation function

� Voltage scaling to reduce energy per operation

� Parallelize to maintain throughput of 500 MSPS

� Designed and simulated in a 90-nm process

Correlator Architecture ∑ −×= ][][][ nkxkhny

0 0.2 0.4 0.6 0.8 110-3

10-2

10-1

100

VDD

(V)

Ene

rgy

per

oper

atio

n (n

orm

aliz

ed)

Leakage Energy

Dynamic Energy

Total Energy

Minimum Energy Point

Baseband Energy SavingsBaseband Energy Savings

� At the minimum energy point of 0.3 V � 9X energy reduction

� Set clock frequency to 25 MHz (preamble PRF)

� Parallelize by L=20 to maintain 500 MSPS throughput

� Need to raise voltage to 0.4 V to achieve 25 MHz

� At 0.4 V, reduce energy per operation by 5.8X

Minimum Energy Point

Total Energy

Active Energy

Leakage Energy

leakageactivetotal EEE +=

leakDDperiodleakage IVTE ∝2DDeffactive VCE ∝

Summary of MethodologySummary of Methodology

� Select the optimal degree of parallelism that� minimizes energy consumption� meets performance constraints

1. Determine VDDMEP (at minimum energy point)

2. Determine delay and throughput at VDDMEP

3. Divide required throughput by the throughput at VDDMEP to obtain the necessary degree of parallelism (L)

System Energy SavingsSystem Energy Savings

� Trade-off area for time by mapping to parallel architecture

� Reducing acquisition time allows for fewer number of Gold Code repetitions in the preamble

� RF front-end and ADC can be turned off earlier

� Energy savings across the entire system

Acquisition Energy ReductionAcquisition Energy Reduction

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1 4 8 11 16 31

Parallelism (M)

Syn

chro

niza

tion

ener

gy

(nor

mal

ized

)

Digital Baseband ADCs Baseband Amplifiers RF front end

Acq

uisi

tion

Ene

rgy

Reduce RF front-end and ADC energy! 14.7X overall

reduction

[V. Sze, R. Blázquez, M. Bhardwaj, A. Chandrakasan, ICASSP 2006.]

Maximum Ratio Combiner (MRC)Maximum Ratio Combiner (MRC)

� Demodulation uses a 5-fingered RAKE receiver � A hard decision is made at the output MRC to resolve a bit

Parallelized Baseband ArchitectureParallelized Baseband Architecture

L = 20; M = 31Total # of correlators = 620

Correlator Bank

Demodulation

Correlator Sub-bank 1

Correlator 1

Correlator 2

Correlator L

Correlator Sub-bank 2

Retim

ing Block

5-finger RAKE MRC

5-finger RAKE MRC

5-finger RAKE MRC

5-finger RAKE MRC

………

……

Correlator L+1

Correlator L+2

Correlator 2L

Correlator Sub-bank M

Correlator (M-1)L+1

Correlator (M-1)L+2

Correlator ML

5-bit complex input fromA

DC

s

Serial to P

arallel

Peak D

etector

Bit D

ecoder

………

………

………

DemodulatedBits

Synchronization/Timing Control

ChannelEstimation

Cross-CorrelationFunction

npeak

n

Cross-CorrelationFunction

PeakDetector

npeak

EnergyEnergy --Area TradeoffArea Tradeoff

Energy-Area Tradeoff for Digital Baseband Processor

0

1

2

3

4

5

6

7

8

1 10 100 1000

Normalized Baseband Processor Area

Nor

mal

ized

Bas

eban

d P

roce

ssor

Ene

rgy

This Design

5.8x

9.6x

EnergyEnergy --Area TradeoffArea Tradeoff

Receiver Energy - Digital Baseband Area Tradeoff

0

2

4

6

8

10

12

14

16

0 2 4 6 8 10

Normalized Digital Baseband Processor Area

Nor

mal

ized

Rec

eive

r E

nerg

y

This Design

14.7x

8.6x

400400--mV mV Baseband ProcessorBaseband Processor

� STMicroelectronicsstandard-VT 90-nm CMOS process

� 281,260 gates

� Includes 620 Correlators & 4 Maximum Ratio Combiners

� Die area: 10.94mm2

� Active area 23%3.

3 m

m

3.3 mm

Correct Operation at 400 mVCorrect Operation at 400 mV

� Oscilloscope plot shows correct functionality at 400mV� Note: I/O has a 1V power supply

� Operating frequency of 25 MHz

� Four bits demodulated in parallel every 40-ns cycle

Data Ready

Output Clock

Output Data [1-0]

Energy Per BitEnergy Per Bit

Breakdown of Baseband Processor's Energy Per Bit

0

5

10

15

20

25

30

35

40

45

50

512b 1kb 2kb 4kb 8kb

Size of Packet (bits)

Ene

rgy

per

bit (

pJ)

Acquisition Energy

Demodulation Energy

16.8 pJ/bit

� Power Measurements� Acquisition 7 mW / Demodulation 1.7 mW

Power GatingPower Gating

� Reduce leakage power using a high VT “sleep” transistor to gate the leakage current when block is idle

� Breakeven time 137 µs

Instantaneous Power

-5

0

5

10

15

20

0 50 100 150 200 250 300 350 400

Time (us)P

ower

(mW

)

Sleep = 1Sleep = 0Sleep = 1

Recovery

Acquisition

Demodulation

ConclusionsConclusions

� Reduce energy to receive a UWB packet by� Scaling to optimum supply voltage� Mapping algorithm to parallel architecture

� Voltage scaling to ultra-low voltage (1 V � 0.4 V)� 5.8X reduction in energy per operation of correlators

� Reduced acquisition time � 14.7X reduction in receiver acquisition energy

� 400-mV 100 Mbps UWB Baseband Processor� 16.8 pJ/bit for demodulation� 20 pJ/bit for a 4-kb packet

� Demonstrate high performance at ultra-low voltage

� Can be applied to other high performance communication and signal processing applications

AcknowledgementsAcknowledgements

� Funding: DARPA and NSERC Fellowship

� Chip Fabrication: STMicroelectronics

Recommended