14
May 2, 2012 1 New Solutions for Wireless Infrastructure Applications May 2, 2012 Moshe Anschel DSP System & Architecture Manager Freescale

New solutions for wireless infrastructure applications

Embed Size (px)

DESCRIPTION

MosheAnschel, Freescale

Citation preview

Page 1: New solutions for wireless infrastructure applications

May 2, 2012 1

New Solutions for Wireless Infrastructure Applications

May 2, 2012Moshe AnschelDSP System & Architecture ManagerFreescale

Page 2: New solutions for wireless infrastructure applications

May 2, 2012 2

Agenda

• The wireless baseband market trends and requirements

• Freescale Approach: QorIQ Converge B4860 overview

• StarCore SC3900 Flexible Vector Processor architecture

Page 3: New solutions for wireless infrastructure applications

May 2, 2012 3

Macro Base Station Challenges

• Space: Miniaturization and consolidation of equipment• Low Impact: Power & Cost• Future Proof: Easy upgrades, SDR• Complete solutions: Ease of development, faster time to market

Cost

• Users: Hundreds of active users• Throughputs: Over 1Gbps data rate• Scalable/Modular: Sectors, antennas, users…• Active Antenna, MIMO: Improved QoS

Capacity

• Coverage: Urban, highways and rural• Spectral efficiency: Radio and network performance• Multi-standard: Supports variety of users• Reliability: Zero down time

Connectivity

High Throughputs &

Coverage

Lowering CostsEnergy Efficiency

MultiStandard

&SDR

Many Active Users

Page 4: New solutions for wireless infrastructure applications

May 2, 2012 4

Introducing the New

QorIQ Qonverge B4860Industry Flagship for Performance, Power and Cost

Optimal System Cost – industry-leading levels of integration, drastically reducing chip count and component cost

Delivers on Scalability – a common architecture from femto to macro providing vertical and horizontal scalability; allows customers to leverage both software and hardware architectures

Performance Optimized – offering a leap in performance with efficient, high-performance next generation of our field proven DSP & MPU cores as well as enhanced application specific accelerators

Power Efficiency – SoC solution allows for intelligent load balancing and power management

B4860 delivers the highest performance in the industry through intelligent, balanced integration with a focus on cost and power efficiency

Page 5: New solutions for wireless infrastructure applications

May 2, 2012 5

3 sector, 20 MHz LTE with 5 major components

3 sector, 20 MHz LTE on a single SoC

Benefit of Intelligent Integration

Multicore

MPU

sRIOSwitch

Layer-1

Layer-2/3Transport

&Control

DSP

CP

RI

I2C

UART

SPI

GE

sRIO

CPRI

Flash

DDR2 DDR1

Flash

Antenna

10 Gbps

1Gbps

DDR3

DDR3

Back Haul

Maint.

PHY

PHYAntenna

DSP

DSP

CPRI

B4860

POWERCOST

4X Cost Reduction3X Power Reduction

B4860 SoC

4X 3X

Page 6: New solutions for wireless infrastructure applications

May 2, 2012 6

QorIQ Qonverge B4860 – Block Diagram & Benefits• Next generation, e6500 Dual-Thread

Power Architecture® cores offer highest CoreMark/Watt with AltiVec technology for dramatic L2 scheduling acceleration

• Next generation, SC3900 StarCore™ provides 2x DSP performance compared to competitive offerings

• Above 21GHz of Programmable Performance

• Smart hardware acceleration for Layer 1, 2, Control and Transport allows for best in class performance, power and cost

• Large scale SoC integration allows for simpler programming models and easier load balancing

• Integrated, Rich I/O including backhaul & antenna interfaces provides flexibility, interoperability and reduces overall system cost

Page 7: New solutions for wireless infrastructure applications

May 2, 2012 8

StarCore SC3900 -Flexible Vector Processors

• StarCore SC3850 DSP is used in many base stations powered by the MSC815x family

• StarCore SC3900 is targeted to handle future base station requirements and challenges

• SC3900 architecture is presented next

Page 8: New solutions for wireless infrastructure applications

May 2, 2012 9

SC3900 Core & ClustersStarCore SC3900 FVP Clusters

• Six SC3900 Cores• Clustering two SC3900 under a 2MB, multi-banked L2 cache• High bandwidth accelerator ports (up to 1Tbps per cluster)• Hardware support for memory coherency between L1, L2

caches and the main memory

BDTI recently benchmarked the SC3900 core included in the Freescale B4860. Running at 1.2 GHz, the SC3900 core received a BDTIsimMark2000™ score of 37,460 – the highest speed score recorded. See www.BDTI.com for details

SC3900 FVP CoreHigh Speed

BasebandAccelerators

Interface

CoreNet Coherent Fabric

32K32K

SC3900 FVP Core

32K32K

2MB 16-way Shared L2 Cache, 4 Banks

Texas Instruments

C66x 1.5GHz

20,030

BDTIHighest Speed Score

37,460

Freescale SC39001.2GHz

BDTIsimMark2000™

BDTImark2000™

Page 9: New solutions for wireless infrastructure applications

May 2, 2012 10

SC3900 Optimized for Baseband L1 Processing

• SC3900 is optimized to efficiently handle Baseband PHY Layer processing

• PHY layer processing can be divided into three categories:– Computation intensive DSP code (mainly MAC intensive)– Data manipulation and less intensive DSP code– Control code

• Each one of the categories is non-negligible in processing requirements

• There is no clear boundary separation• SC3900 accelerates all types of Baseband L1 processing

Page 10: New solutions for wireless infrastructure applications

May 2, 2012 11

Computation Intensive DSP Code Acceleration

• SC3900 provides Vector processor capability by increasing the execution units and optimizing the whole datapath accordingly– Up to 32 MACs per cycles (4x versus SC3850)– Optimized register file and memory throughput

• SC3900 optimized datapath lead to high MAC utilization

• Performance: – SC3900 is 3.5x-4x better than SC3850 in intensive DSP code

Page 11: New solutions for wireless infrastructure applications

May 2, 2012 12

L1 Processing - Data Manipulation Acceleration

• “Data manipulation” stands for many different functions existing in Baseband Layer 1 - For examples:– Data preparation before/after intensive kernels

• Ex: data re-ordering, matrix transpose, pack/unpack

– Less regular kernels or serial/cyclic kernels with low parallelism• Ex: QR Decomposition, Interleaver, encoder.

• SC3900 architecture addresses “Data manipulation” by different means:– Datapath flexibility: This is the “Flexible Vector Processor” essence

• Register file flexibility: Each unit can read/write any registers

• Execution unit flexibility: Each unit can run different and independent instructions

– Rich and flexible Instructions set• Efficient instruction set which large support of different data type and size

• New powerful data manipulation specific instructions

• Performance:– SC3900 is 2x-3x better than SC3850 in “Data Manipulation”

Page 12: New solutions for wireless infrastructure applications

May 2, 2012 13

Data Manipulation Acceleration Flexible Datapath

MAC MAC MAC MAC MAC

ADD

SHIFT

CMP

A0A1A2A3B0B1B2B3C0C1C2C3

SC3900 flexible model

Every execution unitcan read/write every register

A0 A1 A2 A3

B0 B1 B2 B3

C0 C1 C2 C3

Traditional Vector processor model

Exec Unit #ncan only

read/write registers #n

• Unlike traditional vector processor, SC3900 Datapath is flexible:– Flexible execution units:

• 4 independents units, each capable of 8-way SIMD• Each unit can run different and independent instructions

– Flexible register files:• Registers are not defined as long Vector of 100’s bits, but scalar which can be

accessed by any execution unit (read and write)

Page 13: New solutions for wireless infrastructure applications

May 2, 2012 14

L1 Processing - Control Code Efficiency• One of the SC3900 goals is to improve in control code efficiency

– L1 control functions are tightly integrated with the Arithmetic intensive SW

– Useful for running scheduling functions that are control intensive

• Control code performance is affected by two main aspects:– Core and Compiler efficiency in typical control code constructs

– Memory system efficiency

• Both have been addressed on the SC3900 , E.g. : – Ability to flatten decision trees using multiple predicates

– Full support for non-aligned memory access without penalty

– Larger, clustered 2MB L2 cache to keep the program close to the core

• Performance:– SC3900 is up to 1.5x better than SC3850 in control processing

Page 14: New solutions for wireless infrastructure applications

May 2, 2012 15

Summary & Conclusion•Three 20 MHz sectors of LTE base station in a single SoC,

supporting multiple standards and multimode operation for macro base stations

•Complete baseband solution, integrates L1, L2, Control and Transport baseband processing from backhaul network to antenna Interface

• StarCore SC3900 is a key technology providing the processing efficiency and flexibility on the PHY layer processing (Computation intensive DSP, Data manipulation and less intensive DSP code & Control code ) for the B4860 SoC