SOFTWARE IMPLEMENTATION OF THE IEEE 802 11A/P … · SOFTWARE IMPLEMENTATION OF THE IEEE 802 11A/P PHYSICAL LAYERIEEE 802.11A/P PHYSICAL LAYER SDRSDR 12`12 – WInnComm EuropeWInnComm

SOFTWARE IMPLEMENTATION OF THE IEEE 802 11A/P PHYSICAL LAYERIEEE 802.11A/P PHYSICAL LAYER

SDR`12 – WInnComm EuropeSDR 12 WInnComm Europe27 29 June, 2012 Brussels, BelgiumT. Cupaiuolo, D. Lo Iacono, M. Siti and M. Odoni

Advanced System TechnologiesSTMicroelectronics, Agrate Brianza, Italy

i lDaniele Lo Iacono

Outline

The systemWireless Access in Vehicular Environments (WAVE): IEEE 802 11p Wireless Access in Vehicular Environments (WAVE): IEEE 802.11p

Comparison with IEEE 802.11a/g A Software Defined Radio (SDR) implementation approach: the BPEA Software Defined Radio (SDR) implementation approach: the BPE

baseband communication platform

Digital baseband implementation Reference system model 802.11p: Data-aided channel estimation

Customization and code profiling Customization and code profiling

Results

2

IEEE 802.11p WAVE

Requirements Fast access as a priority (latency <50 ms)p y ( y ) Mobility (>60Km/h) and Range (~1Km) Robustness and reliability Securityy

Applications Vehicle safety (emergency warning systems, Intersection collision avoidance,

forward collision warning)g) Tolling Infotainment Traffic managementg Cooperative Adaptive Cruise Control

Comparison with 802.11a/g 10 MHz OFDM bandwidth (vs 20 MHz): max PHY data rate 27 Mbit/s (vs 54)10 MHz OFDM bandwidth (vs 20 MHz): max PHY data rate 27 Mbit/s (vs. 54) 5.9 GHz carrier frequency digital baseband: added support for mobility → Data-aided channel estimation

3

The BPE baseband communication platform

customizable distributed

reconfigurabledata-path

d‐unit mesh

d‐memory

customizablecoarse-grain hardwareoperators

embedded memory

d‐unitbank

routing d memory

bank instructionmemory

d‐instructionscheduler

memorymanagementscheduler and

dispatcherfetch &

decodingb‐instructionexecution

data‐port

registers space

dispatcher

system bus interface

data port

4

Flow-control: b-instruction

out = opcode(in0,in1,in2)

d‐unit mesh

d‐memoryd‐unitbank

routing

ybank

instructionmemory

b-instructionare also used


memorymanagement

are also used to set the way d-instruction will access the memory bankb-instruction

fetch &


data‐port

registers spacememory bank execution unit


data port

register file

5

Vector processing: d-instruction

out = unit1.opcode(unit0,in0,in1)

d‐unit mesh

d‐memory

bank of static memories for vector allocation

processing units performing parallel and

d‐unitbank

routing d memory

bank instructionmemory

routing mesh

pipelined vector processing


memorymanagement

d instruction

routing mesh dynamically configuringunit-memoryand unit-unit

fetch &


data‐port

registers spaced-instruction scheduling unit

a d u t u tconnections


data port

6

Algorithm mapping: macros

arith0.mulv0 comm0.ed v7

v1 v3

arith1.mul arith2.sub comm1.qt arith3.mul

v8

v9macro made by t ll l

v2 v4 v5 v6

two parallelbranches each performing pipelined

i

9 ith3 l( 1 6)

arith0.mul

comm0 ed

processing among different units parallel and

pipelined processingto reduce execution time and memory v9 = arith3.mul(comm1,v6)

comm1.qt(arith2,v5)v8 = arith2.sub(v4,arith1)v7 = comm0.ed(arith0,v3)arith0.mul(v0,v1);

arith1.mul

i h2 b

comm0.ed yaccesses

arith1.mul(v0,v2) arith2.sub

comm1.qt

arith3.mul

7

Pipeline of macros

v0 macro #0 v2 macro #1 v3 macro #2 v1

macro #0 macro #0

macro #1

macro #2

macro #1

macro #2conflicts on shared memory access f limemory access inhibit a full pipelined processing

use of memory alias(ping-pong mechanism)at intermediate stage of processing

v0 macro #0 r0 macro #1 r1 macro #2 v1

macro #0

macro #1

macro #0

macro #1

macro #0

macro #1

8

macro #2 macro #2 macro #2

Multi-thread

macro #0

macro #1

macro #0

macro #1

macro #0

macro #1

function

macro #1

macro #2

macro #1

macro #2

macro #1

macro #2

Single‐thread executionOFDM symbol #1 OFDM symbol #2

function #1 function #2 function #3 function #1 function #2 function #3

Multi‐thread (3) executionOFDM #1 OFDM #2 OFDM #3 OFDM #4function #1function #1

OFDM #1

OFDM #1

OFDM #2

OFDM #2

OFDM #3

OFDM #3

OFDM #4

OFDM #4function #3function #3

function #2function #2

9

IEEE 802.11a/p reference system model

encoder puncturer+interleaver mapper upsampling/

filtering D/Asourcebits IFFT

channel

FFT equalizer de‐map de‐intd t

Viterbid difilter decoded

bA/D q p de‐punct decoding

channelsynchronizer

bits/

discard pilot and virtual sub‐carriers

estimationsynchronizer

802.11p: data -aided channel estimation

10

channel estimation

Data-aided channel estimation (1/2)

Data-aided channel estimation basic idea:1 d t d t ti f th t i d OFDM b l i h l1. data detection of the current received OFDM symbol using channel

estimation corresponding to the previous OFDM symbol2. the channel corresponding to the current OFDM symbol is estimated p g y

by using the estimated data QAM symbols

Data detection through simple hard decision detection (HDD) low extra complexity

l l t d t 802 11 / low latency compared to 802.11a/g

11

Data-aided channel estimation (2/2)

Initial CE based on the LTS field

Time DomainLeast Square CE

initial CEreceived sequence (FFT)

LTS based freq. domain CE

IFFT time‐domainfiltering FFT

For successive OFDM symbols (SIG and DATA) CE tracked exploiting both pilot and the estimated data symbols

Data‐aidedprevious CE

and the estimated data symbols

Time DomainData‐aided freq. domain CE updated CEHDD

previous CE Time DomainLeast Square CEreceived

sequence (FFT)

Multimode 11a/p receiver data pipeline

FFT equalizer de‐map de‐intde‐punct

Viterbidecodingfilter

channelestimationsynchronizer recursive update: processing

bottleneck which does not allowestimation

N N 1

11a

bottleneck which does not allow to build a pipeline as for 802.11a

N‐1

N

N‐2

N+1

N

N‐1

symbol processing time11p

N N N N+1 N+1 N+1

N N+1

13

symbol processing time

Multimode 11a/p receiver code profiling

FFT equalizer de‐map de‐intde‐punct

Viterbidecodingfilter

channelestimationsynchronizer

function 11a/g (clock cycles) 11p (clock cycles)

filter 162

h i 1536 (l t )synchronizer 1536 (latency)

FFT (iFFT) 200 (radix-4)

channel estimation 64 (@LTS) 759 (data-aided, hard-detection)

equalizer 64

de-mapper 348 (MCS #7)

de-interleaver / de-puncturer 408 (MCS #7)

OFDM symbol single-thread(@250MHz) 1432 (5.7s) 2150 (8.6s)

OFDM symbol multi-thread (@250MH ) 596 (2.4s) 1490 (5.9s)

14

(@250MHz) ( ) ( )

Conclusions

The BPE software programmable architecture has support for: macro building (macro-) instructions pipelining emulate memory ping-pong access Multi-threading

Al ith fili th BPE Algorithm profiling on the BPE Translate the algorithm steps into macros Build the macro-pipeline

PHY profiling on the BPE (MCS #7, @250 MHz) 802.11a/g: 5.7 s (single thread) 2.4 s (three threads) (i.e. 54 Mbit/s) 802 11p: 8 6 s (single thread) 5 9 s (two threads) (i e 27 Mbit/s) 802.11p: 8.6 s (single thread) 5.9 s (two threads) (i.e. 27 Mbit/s)

Future steps 802 11p 20 MHz optional mode 802.11p 20 MHz optional mode Soft decision directed DA CE (FEC based, i.e. Viterbi decoding) to address these and other issues: investigating architectural enhancements

(including the idea of a “cluster of BPE”)( g )

15

Documents

SOFTWARE IMPLEMENTATION OF THE IEEE 802 11A/P … · SOFTWARE IMPLEMENTATION OF THE IEEE 802 11A/P PHYSICAL LAYERIEEE 802.11A/P PHYSICAL LAYER SDRSDR 12`12 – WInnComm EuropeWInnComm