Upload
chiportal
View
502
Download
0
Tags:
Embed Size (px)
DESCRIPTION
MosheAnschel, Freescale
Citation preview
May 2, 2012 1
New Solutions for Wireless Infrastructure Applications
May 2, 2012Moshe AnschelDSP System & Architecture ManagerFreescale
May 2, 2012 2
Agenda
• The wireless baseband market trends and requirements
• Freescale Approach: QorIQ Converge B4860 overview
• StarCore SC3900 Flexible Vector Processor architecture
May 2, 2012 3
Macro Base Station Challenges
• Space: Miniaturization and consolidation of equipment• Low Impact: Power & Cost• Future Proof: Easy upgrades, SDR• Complete solutions: Ease of development, faster time to market
Cost
• Users: Hundreds of active users• Throughputs: Over 1Gbps data rate• Scalable/Modular: Sectors, antennas, users…• Active Antenna, MIMO: Improved QoS
Capacity
• Coverage: Urban, highways and rural• Spectral efficiency: Radio and network performance• Multi-standard: Supports variety of users• Reliability: Zero down time
Connectivity
High Throughputs &
Coverage
Lowering CostsEnergy Efficiency
MultiStandard
&SDR
Many Active Users
May 2, 2012 4
Introducing the New
QorIQ Qonverge B4860Industry Flagship for Performance, Power and Cost
Optimal System Cost – industry-leading levels of integration, drastically reducing chip count and component cost
Delivers on Scalability – a common architecture from femto to macro providing vertical and horizontal scalability; allows customers to leverage both software and hardware architectures
Performance Optimized – offering a leap in performance with efficient, high-performance next generation of our field proven DSP & MPU cores as well as enhanced application specific accelerators
Power Efficiency – SoC solution allows for intelligent load balancing and power management
B4860 delivers the highest performance in the industry through intelligent, balanced integration with a focus on cost and power efficiency
May 2, 2012 5
3 sector, 20 MHz LTE with 5 major components
3 sector, 20 MHz LTE on a single SoC
Benefit of Intelligent Integration
Multicore
MPU
sRIOSwitch
Layer-1
Layer-2/3Transport
&Control
DSP
CP
RI
I2C
UART
SPI
GE
sRIO
CPRI
Flash
DDR2 DDR1
Flash
Antenna
10 Gbps
1Gbps
DDR3
DDR3
Back Haul
Maint.
PHY
PHYAntenna
DSP
DSP
CPRI
B4860
POWERCOST
4X Cost Reduction3X Power Reduction
B4860 SoC
4X 3X
May 2, 2012 6
QorIQ Qonverge B4860 – Block Diagram & Benefits• Next generation, e6500 Dual-Thread
Power Architecture® cores offer highest CoreMark/Watt with AltiVec technology for dramatic L2 scheduling acceleration
• Next generation, SC3900 StarCore™ provides 2x DSP performance compared to competitive offerings
• Above 21GHz of Programmable Performance
• Smart hardware acceleration for Layer 1, 2, Control and Transport allows for best in class performance, power and cost
• Large scale SoC integration allows for simpler programming models and easier load balancing
• Integrated, Rich I/O including backhaul & antenna interfaces provides flexibility, interoperability and reduces overall system cost
May 2, 2012 8
StarCore SC3900 -Flexible Vector Processors
• StarCore SC3850 DSP is used in many base stations powered by the MSC815x family
• StarCore SC3900 is targeted to handle future base station requirements and challenges
• SC3900 architecture is presented next
May 2, 2012 9
SC3900 Core & ClustersStarCore SC3900 FVP Clusters
• Six SC3900 Cores• Clustering two SC3900 under a 2MB, multi-banked L2 cache• High bandwidth accelerator ports (up to 1Tbps per cluster)• Hardware support for memory coherency between L1, L2
caches and the main memory
BDTI recently benchmarked the SC3900 core included in the Freescale B4860. Running at 1.2 GHz, the SC3900 core received a BDTIsimMark2000™ score of 37,460 – the highest speed score recorded. See www.BDTI.com for details
SC3900 FVP CoreHigh Speed
BasebandAccelerators
Interface
CoreNet Coherent Fabric
32K32K
SC3900 FVP Core
32K32K
2MB 16-way Shared L2 Cache, 4 Banks
Texas Instruments
C66x 1.5GHz
20,030
BDTIHighest Speed Score
37,460
Freescale SC39001.2GHz
BDTIsimMark2000™
BDTImark2000™
May 2, 2012 10
SC3900 Optimized for Baseband L1 Processing
• SC3900 is optimized to efficiently handle Baseband PHY Layer processing
• PHY layer processing can be divided into three categories:– Computation intensive DSP code (mainly MAC intensive)– Data manipulation and less intensive DSP code– Control code
• Each one of the categories is non-negligible in processing requirements
• There is no clear boundary separation• SC3900 accelerates all types of Baseband L1 processing
May 2, 2012 11
Computation Intensive DSP Code Acceleration
• SC3900 provides Vector processor capability by increasing the execution units and optimizing the whole datapath accordingly– Up to 32 MACs per cycles (4x versus SC3850)– Optimized register file and memory throughput
• SC3900 optimized datapath lead to high MAC utilization
• Performance: – SC3900 is 3.5x-4x better than SC3850 in intensive DSP code
May 2, 2012 12
L1 Processing - Data Manipulation Acceleration
• “Data manipulation” stands for many different functions existing in Baseband Layer 1 - For examples:– Data preparation before/after intensive kernels
• Ex: data re-ordering, matrix transpose, pack/unpack
– Less regular kernels or serial/cyclic kernels with low parallelism• Ex: QR Decomposition, Interleaver, encoder.
• SC3900 architecture addresses “Data manipulation” by different means:– Datapath flexibility: This is the “Flexible Vector Processor” essence
• Register file flexibility: Each unit can read/write any registers
• Execution unit flexibility: Each unit can run different and independent instructions
– Rich and flexible Instructions set• Efficient instruction set which large support of different data type and size
• New powerful data manipulation specific instructions
• Performance:– SC3900 is 2x-3x better than SC3850 in “Data Manipulation”
May 2, 2012 13
Data Manipulation Acceleration Flexible Datapath
MAC MAC MAC MAC MAC
ADD
SHIFT
CMP
A0A1A2A3B0B1B2B3C0C1C2C3
SC3900 flexible model
Every execution unitcan read/write every register
A0 A1 A2 A3
B0 B1 B2 B3
C0 C1 C2 C3
Traditional Vector processor model
Exec Unit #ncan only
read/write registers #n
• Unlike traditional vector processor, SC3900 Datapath is flexible:– Flexible execution units:
• 4 independents units, each capable of 8-way SIMD• Each unit can run different and independent instructions
– Flexible register files:• Registers are not defined as long Vector of 100’s bits, but scalar which can be
accessed by any execution unit (read and write)
May 2, 2012 14
L1 Processing - Control Code Efficiency• One of the SC3900 goals is to improve in control code efficiency
– L1 control functions are tightly integrated with the Arithmetic intensive SW
– Useful for running scheduling functions that are control intensive
• Control code performance is affected by two main aspects:– Core and Compiler efficiency in typical control code constructs
– Memory system efficiency
• Both have been addressed on the SC3900 , E.g. : – Ability to flatten decision trees using multiple predicates
– Full support for non-aligned memory access without penalty
– Larger, clustered 2MB L2 cache to keep the program close to the core
• Performance:– SC3900 is up to 1.5x better than SC3850 in control processing
May 2, 2012 15
Summary & Conclusion•Three 20 MHz sectors of LTE base station in a single SoC,
supporting multiple standards and multimode operation for macro base stations
•Complete baseband solution, integrates L1, L2, Control and Transport baseband processing from backhaul network to antenna Interface
• StarCore SC3900 is a key technology providing the processing efficiency and flexibility on the PHY layer processing (Computation intensive DSP, Data manipulation and less intensive DSP code & Control code ) for the B4860 SoC