24
From Algorithms to Systems-on-a-Chip in a Semester E225C - 2000 Borivoje Nikolić

From Algorithms to Systems-on-a-Chip in a Semester E225C - 2000 Borivoje Nikolić

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

From Algorithms to Systems-on-a-Chip in a Semester

E225C - 2000

Borivoje Nikolić

Fall 2000 - EE225C• Course topics:

– Communication systems oriented– Building blocks

• Datapaths, arithmetic (adders, multipliers, MACs, dividers, CORDICs)

• Parallelization, pipelining, unrolling, etc.• Transformations: FIR filters, Viterbi decoders

– Systems• Finite wordlengths, ADCs, AGC, adaptive equalizers,

sequence detection• Applied to xDSL, Gigabit ethernet, wireless, disk drives

Projects• 18 students

• Two phases:– Block design– Putting a system

together

• Simulink + Module Compiler + functional equivalence (VSS)

Module Compiler

MCL code

std. cellnetlist

behavioralVHDL

testvectors

correspondencereport

Simulinkmodel

Simulink

VHDLSimulation

Design Projects– Timing recovery for CDMA– OFDM receiver with multi-antenna support– 3G Turbo decoder– LDPC iterative decoder– Polyphase filter bank– RAKE receiver– Adaptive image-reject mixer– Decoder for maskless lithography

OFDM Receiver Similar to 802.11a system specification• Blocks

– Synchronization– FFT– Viterbi decoder– SVD

• System Integration and Simulation• Students: Hayun Tang, Ning Zhang, Dejan

Markovic, Yun Chiu

OFDM receiver

Students: Hayun Tang, Ning Zhang, Dejan Markovic, Yun Chiu

SVD for multi-antenna

481

2

4

3

1

2

48

transpose1

2

4

3

1

2

48

transposeSVD4 4

SVD4 4

SVD4 448

48

48

48

48

48

48

from Rx

S/ P

S/ P

S/ P

S/ P

Cod

ing

an

dM

od

ula

tion

1

2

3

4

1

1

1

1

1bits

IFFT

IFFT

IFFT

IFFT

64

64

64

64

cyclicprefix

cyclicprefix

cyclicprefix

cyclicprefix

64

64

64

64

P/ S

P/ S

P/ S

P/ S

D/ ARF

1

2

3

4

1

1

1

1

481

2

4

3

1

2

48

transpose1

2

4

3

1

2

48

transposeSVD4 4

SVD4 4

SVD4 448

48

48

48

48

48

48

to Tx

P/ S

P/ S

P/ S

P/ SDecod

ing

an

dD

em

od

ula

tion 1

2

3

4

1

1

1

1

1bits

FFT

FFT

FFT

FFT64

cyclicpref-1 S/ P

S/ P

S/ P

S/ P

A/ DRF

1

2

3

4

1

1

1

1

cyclicpref-1

cyclicpref-1

cyclicpref-1

64

64

64

64

64

64

64

Transmitter

Receiver

481

2

4

3

1

2

48

transpose1

2

4

3

1

2

48

transposeSVD4 4

SVD4 4

SVD4 448

48

48

48

48

48

48

from Rx

Students: Hayun Tang, Ning Zhang, Dejan Markovic, Yun Chiu

CDMA BasebandStudents: Josie Ammer, Mike Sheets•Design a 1.6 Mbps DSSS timing recovery unit•Modulation

–Length 31 PN code–QPSK symbol constellation

•System specifications–Maximum frequency offset of +/- 200 KHz–Minimum input SNR of +1 dB–Input is in-phase & quadrature samples at 200 MHz with 7 bits each

CDMA Baseband

MUX

CONTROLLER

Coarse TimingAcquisition

PLL

MUX

Frequency OffsetEstimation and FineTiming Acquisition

Rotate andCorrelate

streams

PN_PILOT

PN_DATA

SOFT SYMB

HARD SYMB

en star

t

clk

pilo

t det

sym

str

obe

sel

en clk

PN

s_st

art

s_en

d

clea

r

sel

freq

est

en clk

en clk

pilo

t mod

e

corr

ectio

n

sel1

sel2

PN

corr

ectio

n

s_st

art

s_en

d

8*14 3*14

3*14

1*14

1*242 2

CONTROL

Students: Josie Ammer, Mike Sheets

RAKE Receiver

S[n-3]S[n-2]

2nd

S[n-1]

R[n]

Z-1Z-1

0th 1st

R[n]

C*1 C*3

S[n]

Ul

C*0

Z-1Z-1

3rd

R[n]

Z-1Z-1

R[n]

C*2

ΣΣ ΣΣ ΣΣ ΣΣ

One finger

I

Q

I

Q

22 2222

24

22

8

Correlator Third multipath component can be observed in here

Dissipates 4mW power, runs at 25 MHz has an areaof .4mm2

Student: Tufan Karalar

Polyphase Filter Bank480MH

z 15MHz*

Students: Kevin Camera, Changchun Shi

Adaptive Image-Reject Mixer

I QI Q

A/D

Image Tone

LO2

LNA

rf filter

LO1

DSP

Mixer 2 Gain

Students: Gabriel Desjardins, Isaac Sever

0.02 0.06 0.1 0.14 0.1845

50

55

60

65

70

Imag

e R

ejec

tion

(d

B)

(deg.)

1 A+ = 1.001

1 A+ = 1.003

1 A+ = 1.01

•Image-Rejection Ratio is reduced by circuit mismatches

–Phase mismatch in quadrature oscillators

–Gain (DA) mismatch in I and Q paths

•Need 60 dB IRR

Adaptation via Spectral Estimation

• Two Components– Discrete Fourier Transform, Finite State Machine

• FSM uses DFT output to make gain and phase tuning decisions

MIXER+ A2D

DFT FSM

GAINTUNE

PHASETUNE

6

6

13 32

Adaptation via LMS

Mixer & ADC

Phase Tune

Gain Tune

Phase Tune

Gain Tune

Phase Tune

I Channel

Q Channel

Q Channel

I Channel

Gain TuneI Channel

Q Channel

nnnAnA XGG 1

Equation UpdateLMS

3G Turbo Decoder

Encoder, Encoder, parallel parallel concatenationconcatenation

DecoderDecoder -1

SISO 1 SISO 2

ys

yp1

yp2

uk^

y

Encode 2

Encode 1

uk x

xs

xp1

xp2

Students: Stephanie Augsburger, Chris Savarese

SISO Block: SOVA Implementation

• Standard Viterbi algorithm plus soft output

• Reliability Measure Unit computes soft outputs

• Less complex than MAP

• Expected higher BER than MAP

current

SISO Block: MAP Implementation

• Double Viterbi algorithm: forward and backward

• Soft output is a Log-Likelihood Ratio (LLR)

• More complex than SOVA

• Expected BER improvement over SOVA

current

High-Speed Iterative Decoder

Outer Encoder

Inner Encoder

Inner Decoder

Outer Decoder

Noise

-1

Students: Yeo, Zlatanovici

Outer: Turbo (convolutional) or Low-density parity-check codeInner: Channel with MAP (BCJR) or SOVA decoder

MAP Decoder

BCJR- (Bahl, Cocke, Jelinek, Raviv)Algorithm

Bi-directional trellis decoding

LDPC Decoder

• Fine grained pipelining

• Carrysave Operations

• Shift Registers for Memory and automatic pipelining for Address Decoding

Bit-to-Check(Each Bit node connected to 4 Check

nodes)

Check-to-Block(Each Check node connected to 36 Bit nodes)

Maskless Lithography

On-chip Hardware

Storage Disks640 GBit

DecompressProcessor Board64 GBit Memory

Writers

25 to 1 allcompressed

layers

1.1 GBit/s

400 GBit/s 10 TBit/s

25 to 1 singlecompressed layer

WritersDecomp.

I/O Demux+ Buffer

Parallel decompression paths

Writers built on ‘smart’ memory array

10mm

20mm

Decoding for Maskless Lithography

Huffman decoding 1D match decoding

StreamDecode

HuffmanDecode

BufferLZ Systolic

Array

1 bit / cycle1 pixel / cycle

Writers

400 GBit/s

10 TBit/s

10 kHz flash

Students: Vito Dai, Yasheh Shroff, Mason Freed

Separate Class Project

SCF 10.25m CMOSFully functional first time

SCF 2Bob Brodersen, Mats Torkelsen, Nathan ChanUsing the new design flow

What did we learn?• Flow works surprisingly well

• Easy to learn

• Still fragile

• Need to add support for SRAM

• Need block-level timing analysis

• For faster designs will need regular placement