17
Ultra-Low-Power Software-Defined Radio for LTE Wireless Baseband: for LTE Wireless Baseband: an embedded systems grand challenge Chris Rowen Founder and CTO, Tensilica 18 March 2010 18 March 2010

Ultra-Low-Power Software-Defined Radio for LTE Wireless Baseband

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Ultra-Low-Power Software-Defined Radio for LTE Wireless Baseband

Ultra-Low-Power Software-Defined Radio for LTE Wireless Baseband:for LTE Wireless Baseband:

an embedded systems grand challenge

Chris RowenFounder and CTO, Tensilica

18 March 201018 March 2010

Page 2: Ultra-Low-Power Software-Defined Radio for LTE Wireless Baseband

Two Curves2009 ITRS MPU/ASIC Metal 1 half pitch2009 ITRS MPU Printed Gate Length2009 ITRS MPU Physical Gate Length

800

Mobile Broadband Terminal Units

1000

600

700

nano

met

er

Smartphones400

500

ts (M

)

100

200

300

Uni

t10

Notebooks, netbooks,

MIDs0

100

200

Copyright © 2009, Tensilica, Inc.

02006 2007 2008 2009 2010 2011 2012 2013

11995 2000 2025 2010 2015 2020 2025

Page 3: Ultra-Low-Power Software-Defined Radio for LTE Wireless Baseband

Two CurvesWhy Are These Curves Exciting?• Incredible density mean true single-chip integration• High volume on range of semiconductor designs• New opportunities for processors

Why are These Curves Frightening?• Moore’s Law enables high density, but how to we cope with complexity

and rate of change?and rate of change?• Many required functions, but few viable single-function chips

Silicon platforms must integrate or die:• So many transistors, so little battery capacity

Steeper improvement in density than energy efficiency (new transistor types?)Steeper improvement in density than energy efficiency (new transistor types?)

Copyright © 2009, Tensilica, Inc.

Page 4: Ultra-Low-Power Software-Defined Radio for LTE Wireless Baseband

A Grand Challenge Problem: LTE HandsetsDesign Goals• High data rates: 150Mbps DL/50Mbps UL• High spectral efficiency with

O th l F Di i i M lti l i– Orthogonal Frequency Division Multiplexing– Multiple-Input Multiple Output

• Scalable bandwidth:1.25MHz to 20 MHz• Both Frequency-Division and Time Division Duplex

All IP N t k• All IP Networks

Implementation Needs• Low silicon cost: <20mm2 for digital PHY• Low power: < 2mW/Mbps DL

Paradox: Reach performance and power goals hil b ildi bl twhile building a programmable system

Copyright © 2009, Tensilica, Inc. 4

Page 5: Ultra-Low-Power Software-Defined Radio for LTE Wireless Baseband

The World’s Quickest Introduction to LTE

PUCCH Signal GenCRC-Scrambler,

Bit Interleave,HARQ,Rate Match,

FEC Encode

Frequency RotatorFFT-DFT

TransmitResource Block Map

Pulse Shaper FIRRef Signal Gen

omai

n PDCP Encrypt

RLC,MAC

Frequency RotSynchronization

FFT

Channel Estimation

Decoder (ML)Soft DemapperTurbo Decode

Viterbi Decodeon Control Descrambler

DeinterleaverRate Dematch

Receive RF

pps/

IP D

o

MAC, RLCDecryptPDCP FFTSoft DemapperTurbo Decode

On DataRate Dematch

HARQAp PDCP

Data stream typesI,Q (16b,16b)Soft bit (8b) Hard bit (1b)F t t 10 10 bf Hard bit (1b)

Subframe structure: 1ms = 2 slots Frame structure: 10ms = 10 subframes

Slot = 7 symbols1 Symbol = 2048 tones 1200 useful tones1200 useful tones = 100 RBs x 12 tones… b-

freq

uenc

ies

Copyright © 2009, Tensilica, Inc.

1 tone = up to 6b (64QAM)

2048

sub

Page 6: Ultra-Low-Power Software-Defined Radio for LTE Wireless Baseband

Building Solutions to Solve Problems

The raw material: more efficient processorsefficient processors

A range of building blocks: baseband processorsbaseband processors

A solution architecture: LTE Reference Architecture

Copyright © 2009, Tensilica, Inc.

Reference Architecture

6

Page 7: Ultra-Low-Power Software-Defined Radio for LTE Wireless Baseband

The Essential Building BlockXtensa LX3 Extensible Dataplane Processor

Inst. Memory Management,

Protection & Error Recovery

InstructionRAM

InstructionROM

Processor Controls

Exception Support

Exception Registers

Instruction Fetch / Decode

InstructionVLIW (FLIX)

Parallel Execution pipelines

Recovery

External Interface

Xtensa

Trace PortJTAG Tap Control

Data AddressWatch Registers

Instruction Address

On-Chip Debug

Base

Base ISA Execution Pipeline

RAM

SystemBus

InstructionCache

LX3 Processor Interface Control

Write Buffer

PIF

Brid

ge

QIF32

Instruction AddressWatch Registers

TimersInterrupt Control

Register File

Base ALU

Optional Functional Units

Register FilesProcessor State DeviceDevice

Bus

Brid

geAH

B-Li

te/A

XI

RAM

DMA

Device

Data Memory Management,

Protection & Error

DataRAMDataROM

GPIO32

Designer-Defined Queues, Ports & Lookups

Data

Optional Functional Units

Designer-Defined

Designer-Defined Functional Units

Register FilesProcessor StateRegister Files

Processor State

Recovery

XLMI Local Memory

Base ISA Feature

Designer-Defined Features (TIE)

Configurable Function

Optional FunctionRTL, FIFO, Memory

KEY

Load/Store Unit

Dual Load/Store Unit

DataCache

Copyright © 2009, Tensilica, Inc. 7

Local MemoryInterface

g ( )

External RTL & Peripherals

p

Optional & Configurable Function

Memory, Xtensa

Page 8: Ultra-Low-Power Software-Defined Radio for LTE Wireless Baseband

The Essential AutomationXtensa Processor Generator

Complete Hardware DesignSource pre-verified RTL, EDA scripts, test suite

Processor Configuration

ProcessorProcessorExtensionsExtensions

Xtensa ProcessorGenerator

Use standard ASIC/COT design techniques and

libraries for any IC fabrication process

1. Select from menu2. Explicit instruction

description (TIE)

Customized Software ToolsC/C++ compiler Debuggers, Simulators, RTOSes

3. Automatic instruction discovery (XPRES)

Copyright © 2009, Tensilica, Inc.

Page 9: Ultra-Low-Power Software-Defined Radio for LTE Wireless Baseband

Multiple Core Communication = Interconnect + Software

Software for Multi-core:• Modeling at every level:Modeling at every level:

1. Gate/RTL level2. FPGA netlist3. Pin exact 4. Cycle accurate

Fast Memory sRTL

RAM RAM RAM

5. Instruction accurate

• Multi-core debug and analysis• Communications APIs• Complete DSP Libraries

Xtensa Baseas small as 16K gates

Fast Memory Interfaces

RTL

inte

rface

s

omm

sin

terfa

cesRTL

RTLOther

Processors Complete DSP Libraries• Solution stacks for LTE

• L1 PHY• L2/L3 MAC, RLC

TIE Extensions

Bus InterfacesExt

ende

d R

Ext

ende

d co

RTL

RTLProcessors

On-chip bus

Copyright © 2009, Tensilica, Inc.

Page 10: Ultra-Low-Power Software-Defined Radio for LTE Wireless Baseband

Scalable platform for all design approaches

Small, programmable

Deeply integrated Task Engines

(Xtensas performs specific tasks in HW flow)

Function-Specific Light-Weight DSP

(Xtensa Foundation + DSP

DPU/DSP(Xtensas control HW

accelerators)

)

Baseband Engines DSPs, Multi-Standard SDR

(Module Extensions)

Copyright © 2009, Tensilica, Inc. 10

Page 11: Ultra-Low-Power Software-Defined Radio for LTE Wireless Baseband

Core 1: ConnX BBE16 Baseband EngineUltra-High Performance and Programming Ease

Architecture: • 16 simultaneous 18bx18b MAC per cycle• 8 way SIMD + 3 way VLIW• 8 way SIMD + 3 way VLIW• Dual load/store unit (128b wide)• Scalar CPU pipeline with general 32b

RISC instructions• Extended precision with guard bits• Extended precision with guard bits• 6 addressing modes

Performance: 16 multiply adds per cycle• 16 multiply-adds per cycle

• Three 8-way ops per cycle• 4 complex FIR taps / cycle• 1 Radix-4 FFT butterfly / cycle

17GB/ b d idth(@ 550MH )• 17GB/s memory bandwidth(@ 550MHz)• 40-bit accumulation on all MAC operations

without performance penalty

Copyright © 2009, Tensilica, Inc. 11

Page 12: Ultra-Low-Power Software-Defined Radio for LTE Wireless Baseband

Core 2: ConnX SSP16 ProcessorProcessing soft bit streams– 3x more efficient than std DSPg

Data-types:• 10-bit and 8-bit Vectors• 8-bit,16-bit and 32-bit Scalars8 bit,16 bit and 32 bit Scalars

Performance: • 16 arithmetic operations per cycle• 2 issue VLIW architecture• ~10GB/s memory bandwidth

Software:• Vectorizing compiler• Multi-core debuggerMulti core debugger• Fast, cycle-accurate simulator• Energy models• Function libraries and 3rd party

cellular stacks

Copyright © 2009, Tensilica, Inc.

Page 13: Ultra-Low-Power Software-Defined Radio for LTE Wireless Baseband

Core 3: ConnX BSP3 Processor“Bit Stream Processing at minimal size and power”

Data-types:• 8-bit,16-bit and 32-bit Scalars

Performance: • Dual Load/Store• 3 issue VLIW architecture• 600MHz in 45nm

Load Store Unit(32-bit)Local

Memory and / or Cache

Local Memory and / or Cache

AR General Registers(32 or 64 x 32 bits)Load Store Unit

(32-bit)

• >4GB/s memory bandwidth

Software:• C/C++ compiler• Multi-core debuggerMulti core debugger• Fast, cycle-accurate simulator• Energy models• Function libraries and 3rd party

cellular stacks Arith, Logical,Shift Ops

Q Q

Xtensa 32-bit Base

Operations

Arith, Logical,Shift Ops

Q Q

Xtensa 32-bit Base

Operations

Arith, Logical,Shift Ops

Q Q

Xtensa 32-bit Base

Operations

pp p

Copyright © 2009, Tensilica, Inc.

Page 14: Ultra-Low-Power Software-Defined Radio for LTE Wireless Baseband

Core 4: ConnX Turbo16 Programmable Turbo Decoding Engine

• Customized Processor based solution for LTE Turbo decoding at 150 Mbps

• Matches throughput of hardwired RTL• Matches throughput of hardwired RTL solution with similar area, power

• MAX-Log-MAP based decoding• Uses: exp(log(a) + log(b)) ~ max(a,b)

• 8-parallel windows based decoding• Two bits from each window processed per update• Interleaving and deinterleaving operations integrated

with load/ store operations to the local memories (Odd and Even D-RAM)

• Confidence estimator for software-basedConfidence estimator for software based early-termination lower power

Copyright © 2009, Tensilica, Inc. 14

Page 15: Ultra-Low-Power Software-Defined Radio for LTE Wireless Baseband

LTE Reference Architecture “Atlas”

Demonstration of best practices for low-cost/low-power multi-core-based digital PHY sub-system• 100% processor-based – no RTL blocks

• Highlights agile configuration of ConnX processors

Proves efficiency of digital-PHY solution with processorsProves efficiency of digital PHY solution with processors • Reference architecture implements all blocks, memories and interconnect

• Port of complete PHY software solution: mimoOn mi!MobilePHY™ 3GPP Fully Compliant LTE PHYCompliant LTE PHY

7 core solution for Cat 4 UE (150Mbps DL/50Mbps UL 20MHz):• 3 x BBE16

2 SSP16• 2 x SSP16

• 1 x BSP3

• 1 x Turbo16

Copyright © 2009, Tensilica, Inc. 15

Page 16: Ultra-Low-Power Software-Defined Radio for LTE Wireless Baseband

ATLAS-LTE UE

Transmit

50Mbps

Low Bandwidth Bus

Receive

150Mbps

High Bandwidth

p

Copyright © 2009, Tensilica, Inc.

Typical utilization of cores: 60%

Page 17: Ultra-Low-Power Software-Defined Radio for LTE Wireless Baseband

Wrap-up

1. Data-plane processors combine three characteristics

1

Relative SOC ComplexityM1 features/mm2

• Adaptable: interface, instruction set, memory system automatically fit need

• Programmable: proven instruction sets compilers libraries SW stacks 0 7

0.8

0.9

sets, compilers, libraries, SW stacks • Efficient: Rivals hardware area and

power dissipation on complex problems2 LTE handset challenge shows need

0.5

0.6

0.7

2. LTE handset challenge shows need … and possibility …for better solutions

3. Even bigger challenges ahead” - SOC complexity in 2010 = 0.0 0.2

0.3

0.4

0

0.1

1995 2000 2005 2010 2015 2020 2025

Copyright © 2009, Tensilica, Inc. 17