ParCast: Soft Video Delivery in MIMO-OFDM WLANsqifan/papers/parcast.pdf · ParCast: Soft Video Delivery in MIMO-OFDM WLANs Xiao Lin Liu∗, Wenjun Hu†, Qifan Pu†∗, Feng Wu†,

ParCast: Soft Video Delivery in MIMO-OFDM WLANsXiao Lin Liu∗, Wenjun Hu†, Qifan Pu†∗, Feng Wu†, and Yongguang Zhang†

Microsoft Research Asia† University of Science and Technology of China∗

{wenjun, fengwu, ygz}@microsoft.com {lin717, puqf}@mail.ustc.edu.cn

ABSTRACT

We observe two trends, growing wireless capability at thephysical layer powered by MIMO-OFDM, and growing videotraffic as the dominant application traffic. Both the sourceand MIMO-OFDM channel components exhibit non-uniformenergy distribution. This motivates us to leverage the sourcedata redundancy at the channel to achieve high video re-covery performance. We propose ParCast that first sepa-rates the source and channel into independent components,matches the more important source components with higher-gain channel components, allocates power weights with jointconsideration to the source and the channel, and uses analogmodulation for transmission. Such a scheme achieves fine-grained unequal error protection across source components.We implemented ParCast in Matlab and on Sora. Extensiveevaluation shows that our scheme outperforms competitiveschemes by notable margins, sometimes up to 5 dB in PSNRfor challenging scenarios.

Categories and Subject Descriptors

C.2 [Computer-Communication Networks]: Miscella-neous

General Terms

Algorithms, Design, Performance

Keywords

MIMO-OFDM, Video, Joint source-channel design

1. INTRODUCTIONMIMO-OFDM technologies have become the default build-

ing blocks for next generation wireless networks. As wire-less capability continues to grow, so does the application de-mand. According to the Cisco Visual Networking Index [7],traffic from wireless devices will exceed traffic from wireddevices by 2015. Internet video is a major source of suchtraffic growth, which accounted for 40% of consumer Inter-net traffic in 2010, and will reach 62% by the end of 2015.Video-on-demand traffic is expected to triple by 2015, andvideo unicast constitutes the bulk of the volume. As a result,considerable efforts have been devoted to improving video

∗The work was done while Xiao Lin Liu and Qifan Pu wereresearch interns with Microsoft Research Asia.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.MobiCom’12, August 22–26, 2012, Istanbul, Turkey.Copyright 2012 ACM 978-1-4503-1159-5/12/08 ...$15.00.

delivery quality over wireless links, especially unicast. Sup-porting in-home high-definition video streaming is preciselyone of the motivations for 802.11ac.

Traditionally, video sources are first compressed, repre-sented digitally as a bit stream, and then transmitted in thesame way as other binary data, as shown by MPEG [19].Conventional video coding schemes normally include a Dis-crete Cosine Transform (DCT) to compact energy across allpixels and remove correlation between them. Non-uniformlydistributed DCT coefficient energy levels imply statisticalfeatures, or redundancy, in the source data, and suitableentropy coding could remove such redundancy. Less redun-dancy means more power per unit compressed data, andhence better error resilience and overall decoding perfor-mance. Given the expected channel condition, we can thenseparately determine the compression rate at the source andthe transmission rate at the channel. However, if the actualchannel condition turns out worse than expected, the pre-determined source and channel rates would have been tooaggressive and cause bit errors in decoding. Since digitalrates do not have a graceful fallback behavior in the face ofbit errors, glitches will occur. This can happen frequentlyfor wireless links, due to unpredictable channel conditions.

While wireless links are loss prone, the original video neednot be received in its entirety to ensure a good visual qual-ity. A synergy between the two sides could therefore be moreeffective than separately optimizing the source and the chan-nel coding. This has motivated an industry standard effort,Wireless High Definition Interface (WHDI) [4], which sendsuncompressed high definition videos over wireless links. Nev-ertheless, this is still inefficient due to the sheer volume ofuncompressed video data for transmission, since the redun-dancy within the data is not utilized at either the sourceencoder or the decoder.

A few recent cross-layer approaches aim to better exploitthe redundancy in the source to improve video delivery qual-ity over wireless. They are motivated by the observationthat such source redundancy could be used at the channelto protect against errors. SoftCast [13] starts with a sim-ilar DCT transform to that in MPEG, but then allocatespower weights based on the redundancy level of the sourcecomponents before transmission. The entire codec uses aseries of linear transforms to ensure the mean squared er-ror of the reconstructed pixel luminance is proportional tothe channel noise throughout. While conventional systemsperform lossy compression over lossless digital communica-tion, SoftCast performs lossless compression over lossy ana-log communication. FlexCast [8] retains much of the MPEGencoding process, but replaces the entropy coding stage witha rateless code. The source blocks that contribute more tothe overall distortion, i.e., the high-entropy components, arerepresented with more bits. Note that allocating differentamounts of power or bits to different source components isa form of unequal error protection (UEP). Both SoftCast

233

and FlexCast use a single code to simultaneously compressand protect the source. Their performance shows that it isunnecessary to optimally compress the source, provided theamount of residual source redundancy matches the requirederror protection over the channel.

However, these approaches were designed with single-antennalinks in mind or channel oblivious for broadcast. OFDM(Orthogonal Frequency Division Multiplexing) decomposesa wideband channel into a set of mutually orthogonal sub-carriers, and the channel gains across these subcarriers areusually different [11], sometimes by as much as 20 dB. WithMIMO (Multiple-Input Multiple-Output), each subcarrier isfurther divided into a set of spatial subchannels, again withdifferent channel gains. Furthermore, a channel dependentprecoding operation is often necessary to make the spatialsubchannels on the subcarrier mutually orthogonal, or inde-pendent, so that signals do not interfere with one anotheralong different subchannels. As a result, even for unicast,there are several issues with running SoftCast or FlexCastdirectly over a MIMO-OFDM link. In particular, error be-havior differs across subchannels. If a one-size-fits-all coderate is used for a few or all subchannels, it generally needsto be conservative, and hence suboptimal, to accommodatethe worst subchannels. In this sense, unicast over a MIMO-OFDM link resembles a broadcast channel. The need forprecoding makes it difficult to ensure that a single channeloblivious error protection scheme can perform well acrossMIMO-OFDM links.

Given that both the source DCT component energy andsubchannel gains are non-uniformly distributed, if we en-code the video source such that the more important, high-energy DCT components will be transmitted in high-gainsubchannels, and less important parts in lower-gain sub-channels, we may better utilize the overall channel. Note,however, that the DCT and the precoding steps are essen-tial at the source and channel respectively to avoid interfer-ence between source components and between subchannels.This suggests that we should match DCT components tosubchannels based on the respective sorted order of energylevel, and then perform joint source-channel power alloca-tion to optimize per DCT component and subchannel errorperformance. This is unequal error protection operating ata finer granularity than conventional layered video coding.

With these in mind, we present ParCast (Parallel videouniCast), which tailors the unicast video delivery qualityto the MIMO-OFDM channel. The key features are: (1)obtaining independent source components and subchannels,(2) matching important source components to high-gain sub-channels, (3) scaling the source data with power weightscomputed with joint source-channel considerations, and (4)transmission using analog modulation.

We implement the video codec in Matlab, and the channeldependent modules of ParCast both as a Matlab simulationand on Sora [28]. We run experiments on Sora to validate theMatlab simulation, and then run channel trace driven sim-ulations to compare ParCast against alternatives. Resultsshow that ParCast outperforms the best conventional digi-tal scheme, sometimes by 5 dB, for videos with fast motionand a large energy spread over links with a large subchan-nel gain spread. This is a challenging case for conventionalschemes, but favors ParCast.

To summarize, our main contributions are: First, we high-light the analogy between the energy distributions for video

10-1

100

101

102

103

104

105

106

0 10 20 30 40 50 60 70

Ave

rag

e p

ow

er

of

ea

ch

ba

nd

Band index

foremanmobilenewsbusfootball

Figure 1: 2D 8× 8 block DCT coefficient energy.

10-2

10-1

100

101

102

103

0 50 100 150 200 250 300 350

Sq

ua

re o

f sin

gu

lar

va

lue

Subchannel index

Link (2 12)Link (3 13)Link (6 13)Link (9 11)Link (9 12)

Figure 2: 3× 3 MIMO-OFDM subchannel gains.

sources and MIMO-OFDM channels, and show how imper-fect distributions for both can work in synergy. Second, weshow the importance of decorrelating both the source andthe channel and matching them accordingly at a fine gran-ularity. Third, we show that a simple scheme could achievesignificant benefit by implementing the above steps. Exten-sive evaluation shows that this is indeed an effective videodelivery mechanism over MIMO-OFDM links.

2. BACKGROUND AND MOTIVATION

2.1 Source and channel characteristics

Source. Video sources exhibit spatial correlation withina frame and temporal correlation across frames. A DCTtransform is often a key step of video coding to remove re-dundancy. If we divide a frame into 8 × 8 blocks, and plotthe energy distribution of the block DCT coefficients, wetypically see very non-uniform distributions (Figure 1). Theenergy often spreads across 5 or 6 orders of magnitude, andthe high energy end drops very quickly.

The non-uniform distribution implies statistical features,and hence remaining redundancy, in the DCT coefficients.If left as is, such residual source redundancy is consideredharmful, because there would be more information to trans-mit, with less power per unit of information. Therefore, tra-ditional video compression schemes follow DCT with entropycoding to further compress the DCT coefficients. The high-entropy coefficients naturally require more bits to be fullyrepresented. The end result is that the energy distributionover each bit is the same and the encoded source appearsBernoulli distributed across bits.

Unequal error protection for the source is often a key ele-ment of video coding. Each DCT coefficient has a differentcontribution to the total distortion and should be protectedaccordingly. Entropy coding could be viewed as implicit un-equal error protection, precisely because more bits are allo-cated to more complex, high-entropy signals. Since all bits

234

are equal, more bits imply more power needed for transmit-ting the complex signal, which means more protection.

Channel. A MIMO-OFDM link can be viewed as a set ofnarrowband MIMO channels for each subcarrier of OFDM.Each subcarrier channel can be represented with a channelmatrix H . We can obtain its singular value decomposition(SVD), H = U S V H , where U and V are two unitary ma-trices, S is a diagonal matrix with the singular values ofH on the diagonal, and (·)H means conjugate transpose, orHermitian. Each non-zero singular value si corresponds to asubchannel i with channel gain s2i . The diagonal form of Smeans the subchannels are mutually orthogonal. In generalV is not diagonal, and neither is H , which means we cannotdirectly leverage the mutually orthogonal subchannels whensending a signal vector x unless all si are the same. If, in-stead, we always transmit V x, or precode the signal vector xwith a precoding matrix V , then the received signals wouldbe mutually orthogonal and can be decoded after a rotation,i.e., multiplying by UH . Therefore, with SVD based precod-ing for each subcarrier, a MIMO-OFDM link can simply beviewed as a set of independent subchannels.

Figure 2 shows sorted subchannel gains for a few 3 × 3MIMO-OFDM links, with 3 antennas on either end of thelink. Each link is labeled (<source ID> <destination ID>).We can also see a large spread between the strongest and theweakest subchannels, though faster dropoff at the low gainend. Having more antennas increases the spread but flattensthe high end slightly.

The diverse subchannels naturally merit different quan-tities of information each. This is the overarching goal ofchannel coding, so that the energy per bit sent on any sub-channel would appear the same. Consequently, rate increaseacross a MIMO-OFDM channel is no longer monotonic withthe overall channel SNR, unlike for single-antenna channels,which are scalar.

Source-channel separation and challenges. With fullyrandom source bits and fully random channel bits, Shannon’ssource-channel separation theorem then shows we get theoptimal result.

However, a capacity-achieving channel code must havean infinite block length. Given wireless channels are er-ror prone, optimally loading source bits to each subchannel,along with sufficient protection bits, is impractical. Further-more, the transmitter must know the exact subchannel gainsand also notify the receiver how to decode on each subchan-nel. The former is susceptible to channel variation and thelatter step would incur a high overhead.

2.2 Issues with existing approachesNaturally, jointly performing source and channel coding

would resolve some of the issues above, by retaining somesource redundancy as channel protection. Such schemes typ-ically still design separate, but matching codes for the sourceand the channel.

Instead, two recent systems, SoftCast and FlexCast, eachdesigns a single code that compresses and protects simul-taneously. SoftCast builds on an analog code that adjuststhe compression-protection tradeoff with power allocation,whereas FlexCast achieves a similar goal with bit allocation.

For single-antenna channels, both schemes were shown toachieve almost optimal unicast performance, although Soft-Cast was designed for broadcast. However, directly running

these over MIMO, without precoding, cannot avoid interfer-ence between received signals on different subchannels.

Even with precoding, neither distinguishes between differ-ent subchannel gains. Note also that FlexCast is currentlybuilt without knowledge of the channel, whereas precodingrequires channel information. Once there is channel infor-mation, the source could narrow down the transmission rateto within a few choices, and the rateless code in FlexCastwould be largely unnecessary.

Although there have been other digital unequal error pro-tection schemes over MIMO or MIMO-OFDM (e.g., [25]),they are typically based on the scalable video coding (SVC)extension of H.264 [22]. These schemes divide the sourceinto layers of different importance levels and allocate themost important layer to the best spatial subchannel. Thereis strong dependency between the layers. If the most impor-tant layer is not correctively decoded, any further layers areuseless. Furthermore, there are usually more subchannelsthan layers. Limited digital modulation choices could con-strain the performance when a layer needs to be transmittedover several heterogeneous subchannels.

2.3 Source-channel similaritiesWe observe a few similarities between the operations of

source and channel coding, and therefore a joint source-channel scheme that exploits the synergy between the two islikely to work well.

First, SVD based precoding for a MIMO channel can beviewed as compacting energy to diagonalize the channel ma-trix, whereas DCT at the source is also expected to diago-nalize the correlation matrix between source pixels. Withoutprecoding at the channel, the received spatial signals wouldinterfere with one another and hurt decoding performance,as some signal strength is used to cancel out the inter-signalinterference before decoding. With correlation between thesource components, power or bit allocation to them would besuboptimal. Therefore, it is important to avoid correlationat both the source and the channel.

Second, Figures 1 and 2 show a similar spread betweenthe highest- and lowest-energy components, although thedropoff rate behavior differs. Therefore, it seems natural tomatch both sides, so that the high-energy DCT componentsare transmitted on the high-gain subchannels to avoid themacting against each other. Furthermore, the large number ofsubchannels allows fine-grained error protection levels.

Third, power weights can be allocated to transmitted sourcecomponents to minimize the overall distortion. On the otherhand, power allocation between different subchannels wouldalso affect the achieved channel rates, which would eventu-ally translate to the reconstructed source distortion based ona suitable rate-distortion function. Therefore, a joint source-channel power allocation strategy could optimize the overallrecovery performance.

3. ParCast DESIGN

3.1 OverviewParCast encodes and transmits a video sequence via a se-

ries of linear transforms.The source encoder first computes 3D-DCT [10] coeffi-

cients for each 4-frame group of pictures (GOP). These DCTcoefficients are grouped into chunks, the total number ofwhich matches that of the subchannels in the MIMO-OFDM

235

link, and a variance is computed for the energy of eachchunk. The chunks are assigned to subchannels such thathigh-energy DCT coefficients would be sent on high-gainsubchannels in successive packets, which is a permutationtransform. Power whitening is then performed per chunk toeven out power distribution.

The power allocation stage formulates an optimizationproblem to minimize the total distortion, and scales the mag-nitude of the DCT coefficients in each chunk by a weight.These power weights are computed by taking into accountboth the variance of the DCT coefficients and the corre-sponding subchannel gain.

Pairs of coded values are combined into complex symbols.Those for different spatial subchannels on each subcarrierare then precoded, so that they will be transmitted on or-thogonal spatial channels and arrive at the receiver unin-terfered by other spatial streams. The precoded values arethen transmitted without further channel coding over theMIMO-OFDM channel.

The encoding and transmission process can be written as:

Y = H Q G M R X = C X (1)

where H is the overall channel matrix of all the subcarriersstacked into one, Q is the precoding matrix, G is a diagonalmatrix with the power weights on the diagonal, M representsper-stream whitening, R implements the mapping betweenDCT coefficient streams to subchannels, and X are the DCTcoefficients.

The decoder does not perform regular MIMO processing tofirst decode the received signals on each antenna. Instead,it forms an aggregate matrix with the received signals onall the spatial and frequency domain subchannels, beforeinvoking a Linear Least Square Estimator (LLSE) decoderto recover the DCT coefficients. Then, it reconstructs theoriginal frames by taking the inverse of the 3D-DCT.

3.2 Effects of a MIMO-OFDM channel

Per-subcarrier view. Let us consider a 2×2 MIMO-OFDMlink as an example. By design, the signals on differentOFDM subcarriers are independent from one another. Ona particular subcarrier, each transmit antenna can send acomplex symbol composed of two real values, and therefore4 real values can be sent at a time. Let Hcomplex be the 2×2complex channel matrix for the subcarrier, with each matrixelement hij to denote the path gain from transmit antennaj to receive antenna i. Let xi = mi1 + j mi2 be the complexsymbol on the ith transmit antenna, and yi be the complexsymbol received on the ith receive antenna. Disregardingchannel noise for now, we have

(

y1y2

)

=

(

h11 h12

h21 h22

)(

m11 + j m12

m21 + j m22

)

Separating the real and imaginary parts on both sides:

ℜ(y1)ℑ(y1)ℜ(y2)ℑ(y2)

=

ℜ(h11) −ℑ(h11) ℜ(h12) −ℑ(h12)ℑ(h11) ℜ(h11) ℑ(h12) ℜ(h12)ℜ(h21) −ℑ(h21) ℜ(h22) −ℑ(h22)ℑ(h21) ℜ(h21) ℑ(h22) ℜ(h22)

m11

m12

m21

m22

Essentially, the vector (m11,m12,m21,m22)T has traversed

an equivalent channel represented by the real matrix, Hreal,that can be derived from the complex channel, Hcomplex.

Precoding at the channel. Recall that precoding basedon the Singular Value Decomposition (SVD) can decorrelatethe spatial subchannels to be mutually orthogonal. Sincewe can now view the channel as either a real or a complex

matrix, there are two options for calculating the precodingmatrix. The SVD of both, Hreal = UrealSrealV

Hreal and

Hcomplex = UcomplexScomplexVHcomplex, are closely related.

There are twice as many singular values in Sreal, but just thesame numbers from Scomplex and each with a duplicate copy.Since a singular value represents a subchannel, each complexsubchannel of Hcomplex corresponds to two equal gain realsubchannels of Hreal. Vreal and Vcomplex are related similarto Hreal and Hcomplex.

Therefore, we simply compute Scomplex and Vcomplex, andconvert them to Sreal and Vreal for later steps.

The overall link. For the overall link, we can stack to-gether all the individual subcarrier matrices to form a largeblock diagonal matrix. Since each chunk is mapped to a sub-channel on a subcarrier, we can still decode on each subcar-rier separately before combining the chunks to reconstructthe video frame. However, the per-chunk whitening createsdependency between the symbols on the same subchannel indifferent packets (time slots), and therefore we should firststack the matrices across time on the same subcarrier.

3.3 Managing bandwidth requirementWe determine the number of source chunks per video frame

based on the number of real subchannels and the availablechannel time, and then compute the required number of timeslots to send all data in the source chunk. For each chunk,we calculate the variance of the energy of all data, λ. The λis the metadata per chunk to be used at the decoder. Morechunks would mean more metadata, which could help im-prove the reconstruction performance though at a higheroverhead. That said, we need only enough chunks so thatthe distribution of λ has the desirable fast dropoff behavioras shown in Figure 1.

The product of the number of OFDM symbols, the numberof packets (i.e., time slots), and the number of chunks perGOP reflects the bandwidth requirement of sending a GOP.

Insufficient bandwidth. When there are insufficient timeslots, we should discard the least important chunks, sincethey contribute least to the overall reconstruction quality.

Bad subchannels. If some subchannels are very bad, itwould be better to skip those. This way, the power budgetcan be expended on the remaining, more important chunksin higher gain subchannels. If this causes bandwidth short-age, we simply discard the least important source chunks.

Redundant bandwidth. If there is more bandwidth thanwe need to transmit the video sequence, we could also skipthe bad subchannels and allocate the power to the remaininggood channels. With each skipped bad subchannel, ParCastcan benefit from a diversity gain.

3.4 Error protection and power allocationThe distribution of per source chunk energy variance, λ, is

analogous to that of the 8× 8 block DCT coefficients in Fig-ure 1. If the chunk data are sent verbatim, their energy helpsguard against the channel noise and serves as implicit chan-nel coding. However, it is likely that the high-energy sourcecomponents would consume too much transmission powerbudget and leave the remaining components more sensitiveto the channel noise. Therefore, we need to re-distribute thepower among source components before transmission.

Power weights. With the precoding operation at the en-coder using Vreal and equalization based on the matrix UH

236

at the receiver, the MIMO-OFDM channel is separated intoa set of orthogonal subchannels. The optimization problemin ParCast is to minimize the expected mean square error(MSE) under the transmission power constraint with budgetP . Assuming inverse decoding, i.e., using C−1 from Equa-tion (1), we can formulate the problem as:

min MSE =∑

i

σ2

s2i g2

i

(2)

subject to: trace E[(QGx)(QGx)T ]

=trace E[(Gx)(Gx)T ]

=∑

i

g2i λi ≤ P

where σ2 is the average noise power, si is the singular valueon subchannel i, Q is the channel precoding matrix, and G isthe diagonal, power weight matrix, with gi as its i

th diagonalentry. By using Lagrange multipliers, the solution is:

gi =

√

√

√

√

P√

λis2i∑

j

√

λj/s2j

(3)

and the derived MSE isσ2

P

(

∑

i

√

λi/s2i

)2

(4)

The MSE (4) is determined by the mapping of λ to s,i.e., the matrix R in Equation (1). By the RearrangementInequality [12], the best R lets the λi follow the same sortedorder as si. Hence, the chunk with the largest variance isassigned to the highest-gain subchannel.

Strictly speaking, the above is an approximation for whenthe optimal LLSE decoder is used. This is because inversedecoding could magnify the channel noise. When the channelcondition is good, however, the two decoders have compa-rable performance, and the optimization of MSE of inversedecoding is much simpler to solve.

If we re-formulate the optimization using LLSE instead,the derived gi depend on the channel noise σ2 [15]. If λi/s

2

i

is too low, the corresponding gi would be zero in the newoptimization result. This means the subchannel and thecorresponding chunk should be discarded, and their powerreturned to the overall power budget for others. This is anal-ogous to the water-filling power allocation strategy in MIMOsystems to achieve the channel capacity.

Per-chunk whitening. The data within a chunk normallycarry different amounts of energy. If these are assigned toOFDM symbols directly, there could be a large power dif-ference between symbols and cause issues at the receiver.Therefore, we first mix data within a chunk with a randomorthogonal matrix to achieve the same effect as a Hadamardtransform1, such that the average power per OFDM sym-bol is the same. This also ensures the same average poweracross transmit antennas and eases the requirement on thedynamic range of the power amplifier at the transmitter.

3.5 DecoderGiven the linear encoder, we use the LLSE decoder at the

receiver, which is the optimal decoder [15].

DLLSE = ΛCT (CΛCT +Σ)−1 (5)

where C is the overall encoder from Equation (1), and Σ and

1Hadamard matrices can only have certain dimensions dueto their orthogonal property, whereas our data can be ofarbitrary dimensions.

Λ are both diagonal matrices. The ith diagonal element of Σis the channel noise experienced by the packet carrying theith row of the received Y. The ith diagonal element of Λ isλi/s

2

i , where λi is the variance of the ith chunk and si is thegain on the corresponding subchannel.

After decoding the DCT coefficients, ParCast reconstructsthe original frames by taking the inverse 3D-DCT.

3.6 Managing metadata

Source metadata. The λ per chunk is the metadata fromthe video source, which should be sent to the video decoder.The video sender transmits this metadata per GOP in a reg-ular packet at the lowest rate before sending encoded videodata. The overhead is less than 0.005 bits per pixel.

Channel state information (CSI). The video senderneeds to know the channel detail to perform the chunk-subchannel mapping, power allocation, and precoding. There-fore, the channel state information can be viewed as meta-data from the channel. Since the upcoming WiFi standard802.11ac and LTE both expect to use channel dependentprecoding, it is reasonable to assume the availability of CSI.

The CSI updates are cumulative. Since later packets wouldbe precoded, the video receiver would then measure a pre-coded channel, Hmeasured = Hactual Vprevious. Hmeasured

has the same singular values as the actual channel Hactual,so that we can assign chunks to subchannels and calcu-late power weights directly. However, typically Vactual 6=Vmeasured, so Vmeasured alone cannot be used as the precod-ing matrix. We have two options: (1) the sender computesthe actual channel as Hactual = Hmeasured · V H

previous, getVactual as the new precoding matrix; or (2) the sender getsVnew from Hmeasured, and use Vprevious×Vnew as the precod-ing matrix. Although Vactual 6= Vprevious × Vnew in general,the two sides differ by a multiplication by a unitary blockdiagonal matrix, and achieve the same precoding effect. Wehave verified both options and currently use the first one.

Currently, the video receiver would measure the channelusing the regular channel estimation technique, e.g., that for802.11n MIMO-OFDM packets. It then feeds the CSI backto the sender by sending a regular packet. A CSI updateevery 100 ms is normally considered acceptable overhead. Ifthere is also traffic from the receiver to the sender, we couldleverage channel reciprocity by letting the sender estimatethe channel. Therefore, the CSI update overhead can beeasily managed.

3.7 ParCast vs SoftCastParCast is inspired by SoftCast and follows a similar linear

transform process, but with several modifications and addi-tions. Although the same techniques are adopted in severalsteps, they serve different goals.

The main stages in SoftCast include 3D-DCT, power weight-ing, and per-frame whitening via a Hadamard matrix, trans-mission over an uncoded OFDM channel, and decoding usingLLSE. The matrix view of the overall encoding process is:

Y = H Ha GX = CSoftCastX (6)

where H is the overall channel matrix, Ha is the Hadamardmatrix, G is the diagonal power weight matrix, and X is theDCT coefficients.

ParCast also starts with 3D-DCT, not only because itachieves compression, but it allows us to generate equal-sized, independent chunks. Otherwise, different numbers ofOFDM symbols would be required to transmit the chunks,

237

necessitating a further power re-distribution. The indepen-dence between chunks allows us to gain performance fromany successfully delivered and decoded chunk. We will re-turn to this point in Section 5.

SoftCast treats each channel as a whole, whereas ParCastspecifically optimizes for each subchannel individually. Thepower allocation formulation and the LLSE decoder in Par-Cast incorporate channel details, whereas those in SoftCastconsider the source property only. At low SNR, ParCast nat-urally discards the worst subchannels and the correspondingsource chunks. The diversity gain thus achieved mitigatesthe lack of error correction capability of analog modulation.This is unlike SoftCast, which can only send duplicate datato improve reliability if there are extra time slots.

SoftCast performs Hadamard across DCT chunks for eachGOP to guard against packet loss, whereas the correspond-ing transform in ParCast operates within a chunk and mainlybalances power across OFDM symbols for the same chunkand power across transmit antennas. Notice that dividingthe source into chunks turns X in Equations (6) and (1)into a diagonal matrix. The overall encoding matrix, C, inEquation (1) is diagonal after a rotation, but CSoftCast isnot even if H is. Ha specifically mixes X. Therefore, anysource correlation will manifest as channel correlation.

SoftCast uses analog modulation to achieve graceful degra-dation across different broadcast receivers, whereas ParCastuses this per subchannel of a unicast link to harness thechannel capacity with a single modulation. ParCast effec-tively turns a unicast session into many parallel sessions, asif over a broadcast channel.

4. IMPLEMENTATION

4.1 ParCast implementationWe divide the whole system into the video codec at the

application layer and physical layer operations.At the video sender side, the video encoder performs the

3D-DCT transform, generates a whitened video data streamand the variance per chunk, and passes these to the PHYmodule. The number of chunks is determined based onthe bandwidth availability. The PHY module performs thechunk-subchannel mapping, power allocation, and precod-ing, before grouping enough OFDM symbols into individualpackets for transmission. In addition, the PHY module onthe video sender sends the source metadata and processesthe channel state feedback from the video receiver to calcu-late the SVD of each subcarrier channel matrix.

Note that although we currently implement the entire sourceencoder at the transmitter, this is not strictly necessary. The3D-DCT transform is independent of the channel and couldeasily take place on a remote video server connected with thewireless transmitter over the wire. If the server is notified ofthe number of chunks, the division into chunks, chunk vari-ance calculation, and the per-chunk power whitening couldalso be performed remotely, leaving only the PHY operationsto be performed at the wireless transmitter.

At the video receiver, the PHY module performs chan-nel estimation, compensation for carrier frequency offsets(CFO), and OFDM pilot phase tracking without compen-sating for the channel, before passing the received complexsamples to the video decoder. It also decodes the controlpacket with the source metadata and feeds back the esti-mated channel matrices to the video sender. The video de-

coder then finds the LLSE solution to the received signalvector, decodes video data in each chunk, and finally per-forms the inverse DCT to reconstruct the video source.

We use Sora [28] as our experimental platform. Sincethe current Sora platform can support real-time 20 MHzwideband packet transmission and reception, but not quitesingle-device MIMO, we need to emulate MIMO with mul-tiple single-antenna boxes. Due to our analog modulation,this means that we would need to send raw signal samples tothe sending nodes separately, and copy the received signalsfrom the receiving nodes to one box, both via Ethernet. Thelatency from moving the samples across the network makesit currently infeasible to run the whole system in real time.However, since CSI updates can be sent using a single an-tenna, we can still run the channel dependent componentsin real time to assess the latency of PHY processing.

Therefore, we currently implement the PHY modules us-ing Sora SDK 1.5 to be run in real time. The actual videocodec is implemented in Matlab and interfaces with the PHYmodule via the generated video data as real values and thereceived complex samples. The video data packets follow802.11n like packet structure, so that we can leverage tech-niques for CFO compensation, channel estimation and pre-coding in 802.11n. Unlike in 802.11n, however, the pilotsubcarriers for each antenna are used alternately across dif-ferent OFDM symbols, similar to the approach in [18], sothat we can update all the estimated channel coefficients forone transmit antenna per symbol. Noise is estimated fromthe error vector magnitude of the signal field in the packetpreamble, and passed to the video decoder to be pluggedinto the LLSE decoder.

We also implement a channel simulation that could act inplace of the Sora based PHY modules but replay measuredchannel traces. This allows us to study a larger variety ofchannel instances and compare different schemes more easilyand fairly.

4.2 Schemes for comparisonThe implementation of ParCast can be viewed as a super-

set of SoftCast. In addition, we consider two schemes forcomparison: (1) a version of omniscient H.264/MPEG-4 [19]over 802.11n like channel rates (Omni-MPEG) as an opti-mal case, and (2) Layered Scalable Video Coding (LayeredSVC) [22] with unequal error protection as a practical case.Both are based on MPEG-4.

MPEG-4 takes a sequence of raw frames and encodes itinto three types of frames. The I-frame is an independentlyencoded still image, used as a reference frame. A P-frame en-codes changes in the image from the previous frame, whereasa B-frame includes changes from both the previous and thefuture frames.

The raw I-frame is first divided into blocks. The blocks inconsecutive raw frames are then compared to each other toidentify the closest matches and the corresponding motionvector. The differences between the matching blocks arethen stored in the P- and B-frames. The blocks for all framesthen go through DCT transform, quantization, and entropycoding for further compression.

Omni-MPEG. Typically, the source encoder needs to knowthe expected channel throughput in order to select an appro-priate quantization levels, and the physical layer transmittershould determine the most suitable bit rate accordingly. Wealter the standard MPEG-4 encoding and transmission pro-

238

cess to optimally choose the source and channel rates basedon the given CSI.

We implement MPEG based on the reference implemen-tation JM16.1 [3]. The source first generates encoded dataunder different Quantization Parameter (QP) values. We de-rive a table of PSNRs under various QPs, the correspondingdata stream size and the channel throughput required.

Data is transmitted using 802.11n like modulation andcoding, and we select the channel rate based on the effectiveSNR of the given channel [11]. Depending on the numberof antennas on both sides, a sender could choose betweensending 1 to 4 concurrent spatial streams (i.e., packet frag-ments) at a time. To use a good rate, we precode the trans-mitted streams and allow a different modulation and codingcombination per transmitted spatial stream. This allows ahigher overall rate than those provided by unequal modula-tion across streams in 802.11n [5]. If the worst spatial streamis too weak to support even BPSK, we allocate the powerto a stronger spatial stream instead, using a high rate wherepossible. If any stream is decoded incorrectly, the entirepacket is dropped at the receiver and retransmitted.

To compare Omni-MPEG with ParCast under the samebandwidth constraint, we select an encoded data size whichrequires a transmission time closest to ParCast, look up theappropriate QP, and select the corresponding PSNR as theperformance of Omni-MPEG.

Layered SVC. In order to adapt to varying channel con-ditions, the H.264 scalable video coding extension providesspatial, temporal, and quality scalability. For SVC, the en-tropy coded bits are separated into layers with varying im-portance levels. The base layer data is always expected tobe correctly received to achieve an acceptable video qual-ity, while a higher quality is derived from the enhancementlayers. There have been several proposals for unequal errorprotection between layers.

The SVD-based video transmission scheme [25] is an ex-ample design for a narrowband MIMO channel. The baselayer would normally be assigned to the best spatial sub-channel and sent using QPSK. The enhancement layers canuse denser modulations. Power is carefully allocated to thesubchannels to ensure a target error rate for the base layerand the highest possible rates for other layers. We extend itto run over 802.11n like OFDM for our comparison.

The source codec is implemented based on JSVM [6]. Weuse quality scalability, which adopts a large QP for the baselayer, and smaller QPs for enhancement layers. Each en-hancement layer is generated by using a smaller QP to en-code the quantized residue from the previous layer. There-fore, there are strong dependencies between layers. Whenone transmitted packet can not be correctly decoded for thebase layer data, the whole frame and any other frame thatis dependent on the current frame has to be discarded.

The original proposal in [25] aimed to achieve an uncodedBER of 10−5 for the base layer and 10−3 for the enhance-ment layers. Our initial trials suggest these settings are toooptimistic and would produce bad video quality. Therefore,we set the target coded BER for all layers to 10−5. A Reed-Solomon code (239, 255) is used for all layers at the appli-cation layer, along with a 3

4channel code to provide bet-

ter channel protection. We also skip the antenna selectionsuggested, since it often over-aggressively improves the en-hancement layer quality at the expense of the base layer.

FlexCast. A comparison against FlexCast would be inter-

esting but unfair. First, FlexCast is built to achieve a dif-ferent goal. For our setting, the equivalent would be H.264along with a capacity achieving channel code over MIMO-OFDM. Second, FlexCast was designed for single-antennasystems. Extending it to MIMO is beyond the scope of thispaper. In particular, we are not aware of a rateless code fornon-ideal MIMO channels, since MIMO rate increase is notmonotonic with the overall channel SNR, as shown by the802.11n rates [11].

5. EVALUATION

5.1 Experimental setup

Performance metric. We evaluate video delivery qualitywith the standard Peak Signal-to-Noise Ratio (PSNR) met-

ric. PSNR = 20 log10

2L−1√MSE

, where L is the number of bits

used to encode pixel luminance, typically 8, and MSE is themean squared error between all pixels of the decoded videoand the original. We average the PSNR across frames toproduce a single value for each video sequence.

Test videos. We select a few representative reference videosequences [1, 2] with different characteristics: foreman (lit-tle motion, large energy spread), news (even more station-ary, large energy spread), bus (more motion, smaller en-ergy spread), mobile (faster motion, larger motion area, evensmaller energy spread), and football (significant motion, largeenergy spread). Figure 1 shows the energy distribution of thefirst frame of each sequence. The distribution of the varianceof the chunk energy after the 3D-DCT is similar.

We use a video frame size of 352 × 288 pixels for all ex-periments. For a 2 × 2 link, we could send up to 2 valuesper complex symbol, 2 symbols per subcarrier, and 52 datasubcarriers, or 208 values total in a single 802.11n OFDMsymbol. We currently send 192 values (i.e., 48 chunks pervideo frame) in each OFDM symbol and skip the worst fewsubchannels, so that an entire video frame takes 528 OFDMsymbols. For a fair comparison with other schemes, we definepacket size in terms of number of OFDM symbols. Similarly,we use 72 chunks per video frame for a 3× 3 link, and send288 real values in one OFDM symbols. Calculations showthat, if all subchannels within and between the correspond-ing 2 × 2 and 3 × 3 links have the same gain, the PSNRwill increase by 0.5 dB from 192 chunks to 288 chunks, and0.8 dB from 192 to 396 chunks.

We take 32 frames per sequence for microbenchmarks and200 frames per sequence for the mobile experiments.

Sora setup. We use four single-antenna Sora boxes to em-ulate a 2×2 MIMO link, and move the antennas to differentlocations to get different channel instances. The carrier fre-quencies of all four radios are configured to be within a fewhundred Hz of one another. This is the precision we canachieve given the configuration tool, and we have verifiedthat pilot tracking can correct residual carrier frequency off-sets between each transmit-receive pair sufficiently well.

We calibrate the CPU clocks of all boxes offline and use aSourceSync [18] like approach to achieve synchronized trans-missions from the two transmit antennas.

Channel traces. We use two sets of channel traces of3 × 3 802.11n MIMO-OFDM links from previous work [11]for trace-driven simulations, one including around 120 sta-tionary links and the other a mobile trace. These were col-lected on commodity Intel iwl5300 NICs. Figure 2 shows

239

25

30

35

40

45

50

55

foreman,(9 11) mobile,(9 11) foreman,(9 12) mobile,(9 12)

PS

NR

(dB

)w/o Hadamard, w/o precodingw/ Hadamard, w/o precoding (SoftCast)w/o Hadamard, w/ precodingw/ Hadamard, w/ precoding

Figure 3: Effects of Hadamard and precoding.

the squared singular value distributions for example links.Each link is identified by (<source ID> <destination ID>).

For each stationary trace, there is one packet approxi-mately every 6 ms. Experiments show that results differmainly for whether there is CSI feedback delay, and is notsensitive to the delay period. Therefore, we simply presentthe results for one-packet delay, or around 6 ms. For themobile trace, we select a CSI sample approximately every10 ms to ensure channel variation.

5.2 Microbenchmarks from Sora experimentsThe main purposes of the Sora experiments are to validate

the simulation implementation and assess the complexityof several operations. Subsequent experiments are all doneusing trace-driven simulation, since that makes comparisonfairer and the mobility experiment more manageable.

Minimum CSI delay. In our current implementation, thetime cost for an explicit CSI update is less than 1 ms, whichis much smaller than the typical update period.

SVD calculation latency. Given all subcarriers are in-dependent, we only need to calculate SVD for 52 4 × 4 realmatrices. This takes around 300µs for a box with a 3.40 GHzCPU frequency.

Residual CFO. Since the two numbers in a complex symbolare correlated, any residual CFO after pilot tracking couldinduce an error in the decoding process. However, our ex-periments show that the error is insignificant at low channelSNR, and within 0.5 dB even for a 200 Hz residual CFO athigh SNR.

Limited sample precision. Although the power alloca-tion step results in a large energy spread in the frequencydomain, the IFFT of OFDM has a mixing effect so the timedomain signal energy is better evened out. Therefore, lim-ited precision of transmitted time-domain samples causes anerror of around 0.2 dB to the final PSNR.

Computational complexity. The computational over-head of 3D-DCT in ParCast is the same as for SoftCast,and was evaluated previously [13]. The main overhead inthe remaining steps of ParCast is in computing the precod-ing matrix (SVD), evaluated above. Current wireless trans-mitters need to do this anyway in order to apply channel-specific precoding. Precoding in a broad sense (i.e., channelindependent spatial expansion or mapping) is in the 802.11nstandard and already performed on commercial cards (e.g.,by default on Intel iwl5300), which shows that the overheadis acceptable.

5.3 ParCast microbenchmarksWe run a few benchmarks to assess the individual contri-

25

30

35

40

45

50

55

foreman,(9 11) mobile,(9 11) foreman,(9 12) mobile,(9 12)

PS

NR

(dB

)

w/o precoding, w/ matchw/o precoding, w/o matchw/ precoding, w/ matchw/ precoding, w/o match

Figure 4: Effects of matching important chunks withhigh gain subchannels.

30

35

40

45

50

55

foreman,(9 11) mobile,(9 11) foreman,(9 12) mobile,(9 12)P

SN

R (

dB

)

Joint, SNR weightedJointSourceVS

-1, source

Figure 5: Effects of joint source-channel power allo-cation.

butions of the operations of ParCast, one of which also servesas a comparison against SoftCast. We use two representa-tive video sequences, foreman and mobile, which feature alarge and small spread of energy distribution respectively.We also choose two representative 3×3 channel traces, links(9 11) and (9 12), which feature a very small and large spreadof squared singular values respectively. The CSI is delayedunless otherwise stated.

Spatial correlation over MIMO. SoftCast includes aHadamard transform to even out energy between packetsand provide resilience to packet loss. However, applying itnaively over a MIMO channel would cause interference be-tween the signals on different spatial subchannels of the samesubcarrier and degrade the decoding performance. Instead,a subcarrier-specific precoding matrix should be used.

Figure 3 shows that Hadamard does not help when we donot precode the channel, but hurts when we do precode2.Without precoding, the PSNR without Hadamard is on av-erage 2.8 dB higher than with. With precoding, the PSNRwithout Hadamard is a further 5.4 dB higher. The reason isthat Hadamard acts as random precoding, which is statisti-cally equivalent to not precoding at all [29]. Note that thesource-channel matching is implicit for ‘without Hadamard’,but there is no matching for ‘with Hadamard’. All are with-out the joint source-channel power allocation but use thesource aware power allocation as in SoftCast.

The importance of matching. We next show the im-portance of sending the more important source componentson high gain subchannels. Since this can be affected bywhether there is precoding, we compare between ParCastwith different combinations of with and without precodingand matching. Without matching means that chunks aresimply allocated to subchannels on each subcarrier in a ran-

2We let SoftCast divide each video frame into 72 chunks.

240

0

20

40

60

80

100

120

140

6 10 14 18 22

Num

ber

of dis

card

ed c

hunks p

er

GO

P

Overall channel SNR (dB)

foreman, (9 11)mobile, (9 11)foreman, (9 12)mobile, (9 12)

Figure 6: ParCast discards source chunks if needed.

20

25

30

35

40

45

0 2 4 6 8 10 12 14 16 18 20

PS

NR

(dB

)

Overall channel SNR (dB)

ParCast, double bandwidthParCast + 3dBSoftCast, double bandwidthSoftCast + 3dB

Figure 7: Effects of extra bandwidth.

dom order. Again, all variants use the source aware powerallocation only.

Figure 4 shows that matching alone could improve theperformance by around 3.5 dB even without precoding, andby 8.1 dB with precoding. Note that the bars for ‘withoutmatching’ should not be compared directly across videos orchannels due to their random nature.

There are some interesting similarities between Figure 3and Figure 4. The without precoding, with Hadamard bar inFigure 3 is similar to the without precoding, without matchbar in Figure 4. The former is slightly better due to Hadamardmixing the data for different subcarriers. The with precod-ing, with Hadamard bar in Figure 3 and the with precoding,without match bar in Figure 4 are also similar, but the lattercould be better if the data chunks happen to match subchan-nels on the same subcarrier.

Joint source-channel power allocation. With precod-ing and matching, we next compare ParCast with sourceaware power allocation only and with joint source-channelallocation. Note that, in effect, the source-only allocationin SoftCast assumes all subchannels are equivalent, whichwould correspond to precoding with V S−1, where V and Sare from the SVD of H = U S V H . Therefore, we also con-sider this case, i.e., precoding with V S−1 combined withsource-only allocation. We use the ground-truth CSI forthese experiments, and defer the discussion on the effectsof CSI inaccuracy to the next subsection.

Figure 5 shows that, depending on the channel character-istics, the non-weighted joint power allocation outperformsthe source-only approach by 0.6 to 0.9 dB. However, precod-ing with V S−1 is similar to using H−1, and is especially badfor a link with a large singular value spread. As expected,the trend is the same for both videos for the same channel,since this comparison highlights the benefit from leveragingchannel details.

The non-weighted joint power allocation is most effectiveunder three conditions: (1) the source chunks exhibit a large

energy variation, (2) the MIMO-OFDM channel is very spa-tially correlated and frequency selective, and (3) CSI is fairlyaccurate. The first condition is often met. The latter tworesult in a large variation between subchannel gains, whichis a challenging case for MIMO-OFDM to work well. Thejoint power allocation strategy is thus a way to take advan-tage of otherwise challenging conditions, although the secondcondition is a double-edged sword when CSI is inaccurate.

The SNR weighted approach is most helpful at low SNR byskipping the worst subchannels and the corresponding sourcechunks. Figure 6 shows the number of chunks discarded per4-frame GOP as the channel SNR lowers, assuming there isno spare bandwidth. Recall that a chunk-subchannel pair isdiscarded if its λ/s2 is too low (Section 3.4). This is moreaffected by the chunk energy at high channel SNR, and bythe subchannel gain at low channel SNR.

Extra bandwidth. Finally, we study the PSNR of ParCastand SoftCast when given extra bandwidth. Figure 7 showsthe results for 16-frame mobile and Link (9 12). The trendsfor other videos and links are similar and hence omitted.

The ‘+3dB’ lines show the performance of both schemeswith exact bandwidth but twice the power. SoftCast canonly repeat the same data when given double the bandwidth,which is equivalent to using twice the power but no extrabandwidth. In contrast, ParCast benefits from a subchan-nel diversity gain when given extra bandwidth, although thegain is less pronounced at low SNR when some bad subchan-nels (and source chunks) would be skipped anyway.

5.4 Comparison against alternatives

Overall. We first compare ParCast, Omni-MPEG, and Lay-ered SVC over all stationary 3× 3 links, with precoding anddelayed CSI feedback. This represents the most common sce-nario. Figure 8 shows that ParCast often achieves a muchhigher PSNR than the other schemes, especially for football,which is a very challenging case for traditional video com-pression techniques. We next zoom into individual tracesand video sequences.

Channel dimension. For ideal channels, with equal gainon all subchannels, an N × N MIMO system would scalethe SISO capacity by approximately N times, or reduce thebandwidth requirement by a factor of N . However, this Nfactor is hard to achieve for real channels. In fact, moreantenna may not help if not used optimally. We thereforecompare the performance of all three schemes for 2× 2 and3 × 3 configurations for the same source-destination pair,both with delayed CSI. Note that the bandwidth require-ment for 3× 3 is only 2

3of that for 2× 2 links.

We see in Figure 9 that ParCast normally achieves com-parable performance for both 2× 2 and 3× 3, whether witha large subchannel gain spread or not. This shows that Par-Cast can harness the capacity scaling potential of the ad-ditional channel dimensions. The slight drop from 3 × 3 to2 × 2 is due to different numbers of chunks and how goodthe 2 × 2 part compares with its 3 × 3 counterpart. Lay-ered SVC often benefits from 3 × 3 by being able to use adenser modulation for the enhancement layer than in 2× 2.Omni-MPEG is more sensitive to the channel detail. The2 × 2 channel for (9 11) is nicer than its 3 × 3 counterpartby allowing high rates to be selected at a higher accuracy,and hence the PSNR is much higher for 2× 2 than for 3× 3.

It is interesting to note that ParCast often outperforms

241

0

0.2

0.4

0.6

0.8

1

20 30 40 50 60

CD

F o

ver

links

PSNR (dB)

foreman

SVCMPEGParCast

0

0.2

0.4

0.6

0.8

1

20 30 40 50 60

CD

F o

ver

links

PSNR (dB)

news

SVCMPEGParCast

0

0.2

0.4

0.6

0.8

1

20 30 40 50 60

CD

F o

ver

links

PSNR (dB)

football

SVCMPEGParCast

Figure 8: PSNR for H.264, SVC, and ParCast for all stationary traces.

25

30

35

40

45

50

55

60

65

SVC MPEG ParCast

PS

NR

(dB

)

foreman, (9 11)

2x2

3x3

3x3,w/o precoding

3x3,Ins

25

30

35

40

45

50

55

60

65

SVC MPEG ParCast

PS

NR

(dB

)

news, (9 11)

2x2

3x3

3x3,w/o precoding

3x3,Ins

25

30

35

40

45

50

55

60

65

SVC MPEG ParCast

PS

NR

(dB

)

football, (9 11)

2x2

3x3

3x3,w/o precoding

3x3,Ins

25

30

35

40

45

50

55

60

65

SVC MPEG ParCast

PS

NR

(dB

)

foreman, (9 12)

2x2

3x3

3x3,w/o precoding

3x3,Ins

25

30

35

40

45

50

55

60

65

SVC MPEG ParCast

PS

NR

(dB

)

news, (9 12)

2x2

3x3

3x3,w/o precoding

3x3,Ins

25

30

35

40

45

50

55

60

65

SVC MPEG ParCast

PS

NR

(dB

)

football, (9 12)

2x2

3x3

3x3,w/o precoding

3x3,Ins

Figure 9: Effects of channel dimension and CSI accuracy.

Omni-MPEG by a few dBs, except for the video news. Thisis because the digital modulations used for Omni-MPEGcannot easily achieve the channel capacity. In contrast, theanalog modulation approach in ParCast is very helpful inMIMO-OFDM situations to fully utilize the diverse channelgains without much overhead. Otherwise, the transmitterwould also need to signal to the receiver how to demodulateeach subchannel, and even a FARA [17] like scheme may notwork well. news is fairly compressible and favors traditionalcompression approaches. Even the non-capacity-achieving802.11n like rates suffice for such compressible videos.

CSI accuracy. The main steps of ParCast, precoding,matching, and power allocation, all depend on the SVD ofthe channel matrix, which is sensitive to the CSI accuracy.Therefore, we next assess the effects of CSI accuracy on theoverall performance. We first study the 3 × 3 stationarytraces and then consider the mobile trace in the next part.Figure 9 compares the performance with delayed (the 3× 3bar) and instantaneous CSI (labeled 3× 3, Ins).

For the stationary traces, CSI inaccuracy is mainly causedby estimation noise and precision errors. The inaccuracy inour trace indicates the amount of error margin manageableon commodity NICs.

The issue with delayed CSI is that precoding would notbe able to entirely decorrelate spatial streams. Our data

show that such spatial correlation generally weakens thestrongest spatial subchannel, but strengthens the weakestslightly. The larger the singular value spread, the larger theeffects of inaccurate CSI. This could cause modulation selec-tion to be too optimistic, which affects both Omni-MPEGand Layered SVC. Inaccurate modulation selection couldcause decoding errors. Therefore, the Omni-MPEG perfor-mance is about the same with or without precoding (3 × 3vs 3× 3, w/o precoding in Figure 9).

In contrast, ParCast does not adapt the encoding of thechunk after selecting a subchannel, and the effective sub-channel gain variation would just translate to slight varia-tion in the mean squared error for the pixel. Hence, ParCastcan degrade gracefully at a subchannel level. In particular,since Link (9 11) has a smaller singular value spread, thePSNR difference between ParCast with instantaneous CSI(the ground truth) and delayed CSI is only about 0.1 dBregardless of the video. For Link (9 12), the PSNR drop forParCast due to inaccurate CSI is between 1.4 and 1.8 dB forthe three videos. The bulk of that is due to precoding. Ifwe use the ground-truth CSI for matching and power allo-cation, then precoding with delayed CSI causes ParCast tolose 1.1 to 1.4 dB in PSNR. The matching step alone losesabout 0.2 dB in a similar comparison. SNR weighted powerallocation alone is hardly affected, but can interact with the

242

VideoNo delay 10ms 100ms 1s 5s

Ours MPEG Ours MPEG Ours MPEG Ours MPEG Ours MPEG

foreman 48.9 46.8 43.6 43.5 42.2 43.3 41.8 43.2 41.4 42.7mobile 42.8 40.5 37.4 35.6 36.0 35.3 35.6 35.2 35.2 34.6news 50.5 52.2 44.8 48.4 43.2 48.2 42.7 48.1 42.3 47.5bus 45.9 42.7 40.1 37.7 38.7 37.3 38.3 37.1 37.7 37.1football 50.6 43.7 44.7 38.4 43.2 38.1 42.8 37.9 42.3 37.1

Table 1: PSNR (dB) for videos given different CSI delay, ParCast vs MPEG.

precoding step adversely to cause more PSNR degradation.In particular, the non-weighted joint power allocation evalu-ated previously can be worse than the source-only allocation.

As we saw in the previous subsection, matching chunkwith subchannel is more important than additional channel-aware power allocation. Therefore, even without very ac-curate CSI, matching alone could still help as long as therelative subchannel gain order is not affected. Similarly,matching the base layer with a good subchannel also helpsLayered SVC, but different layers carry different number ofbits and require different modulations. Omni-MPEG over802.11n means that the implicit unequal protection achievedin the entropy coding stage (Section 2) is not aligned withthe channel, even with MIMO precoding. What does help isthat precoding reduces the difference between the SNRs onthe corresponding spatial subchannel across subcarriers.

Even though delayed CSI makes precoding less effective,it is still better than not precoding at all for both LayeredSVC and ParCast. Without precoding, each received signalon an antenna has interference from other spatial streams.Reducing such interference could significantly improve thedecoding performance.

Mobility. We next use ParCast and Omni-MPEG to de-liver 200-frame video sequences over the mobile trace3. In-accurate CSI is a main feature of the trace, but likely dueto changing channel as opposed to estimation noise. If thechannel itself changed too much, the matching would bewrong in ParCast, and power allocation based on the wrongmatching would hurt further. We consider CSI delay peri-ods of 0 (No delay), 1 packet (10 ms), 10 packets (100 ms),100 packets (around 1 s), and 500 packets (around 5 s). Tri-als suggest that the joint power allocation in ParCast andprecoding for Omni-MPEG are sometimes too aggressive forthe mobility case. Therefore, we only use the source-awarepower allocation for ParCast4 and no precoding for MPEG.

Table 1 summarizes the results. The error margin is about±0.6 dB for MPEG and ±0.9 dB for ParCast for any delayperiod. We see that the main difference is again betweenwhether there is delay. Omni-MPEG degrades less with CSIdelay, though it cannot benefit from more accurate CSI in-formation. ParCast performance degrades slightly more.

If we compare the ParCast PSNRs for 1s delay with nodelay, matching alone incurs about 1 dB loss, and the re-maining loss can be attributed to inaccurate precoding. Thebulk of the drop is likely caused by the estimation error onthe Intel NICs. Sora based experiments show PSNR degra-dation of 1 to 2 dB with delayed CSI.

6. DISCUSSION AND FUTURE WORK

Limitation of analog modulation. Analog modulation

3Only 148 frames for bus, because the sequence is shorter.4PSNRs with the SNR weighted joint power allocation areabout 1 dB lower.

lacks error-detection or correction capabilities. However,several design decisions in ParCast can mitigate these.

First, ParCast is designed for heterogeneous MIMO-OFDMchannels. Our results suggest analog modulation can workwith MIMO-OFDM in complementary ways. At low SNR,MIMO-OFDM provides better reliability and error resiliencevia spatial and frequency diversity. At high SNR, analogmodulation is an easy mechanism to harness the channel ca-pacity. The latter is more useful a setting for MIMO forgetting higher rates, which motivates HD video streaming.

Second, ParCast encodes video frames independently, sothat the distortion for any GOP does not affect other GOPs.Distributed video coding has a similar property and couldbe an alternative. If instead, the video frames are encodedbased on one another as the B- and P-frames in MPEG-4,errors might accumulate over frames due to the distortion inreconstructing the reference frame.

Third, the DCT and precoding steps in ParCast aim toremove correlation between any source chunks and betweensubchannels. Without either, there would interference be-tween received signals. The LLSE decoder would need tofirst cancel out the inter-signal interference during decodingand incur performance loss.

Source division. There are several other ways of dividingthe source into layers to assign to subchannels, such as usingthe I-, B-, and P-frames in MPEG. The number of framesof each type is very source dependent and the size of differ-ent frame types can differ based on channel conditions. Incontrast, the source coding stage of ParCast and SoftCast di-vides source independent of the source detail. Furthermore,ParCast allocates the same number of symbols to all sub-channels, while the amount of decodable information natu-rally scales with the subchannel SNR. Therefore, the symbolmodulation is also independent of the channel.

Source-channel dissimilarities. Since there is still dis-parity between the distribution of DCT coefficient energyand the subchannel gains as shown in Figures 1 and 2, theremight still be more residual redundancy at the source than iseffectively used at the channel. Moreover, 3D-DCT does notfully capture motion in the video. Further refinement to the3D-DCT or LLSE could improve the overall performance.However, the former should be designed with the subchan-nel characteristics in mind similar to the joint source-channelpower allocation step, whereas the latter could be more flex-ible based on actual channel conditions.

7. RELATED WORKParCast is related to work on unequal error protection,

joint source-channel coding, and graceful degradation.

Unequal error protection (UEP) overMIMO-OFDM.As mentioned earlier, unequal error protection is often im-plicit in entropy coding. There have also been many explicitUEP schemes specifically for MIMO or MIMO-OFDM chan-

243

nels, for example, involving more forward error correction forthe worst subchannels [20].

The fine-grained scalable (FGS) bit-stream extension ofMPEG-4 can be used to generate many layers for differentMIMO-OFDM subchannels [14]. However, FGS is rarelyused in practice, and it assumes unequal digital modulationacross subchannels, which is hard to implement in practice.Another proposal [26] is similar in principle, though with farfewer, coarse-grained layers and only the subchannels of anarrowband MIMO channel. Together with the scheme westudied earlier [25], these schemes mainly differ in how theydivide the source into layers and the granularity of layers andsubchannels used. Nevertheless, common to all, the layersare not independent, and an incorrectly decoded base layerwould render all other layers useless.

ParCast is similar in matching importance source datawith high-gain subchannels, but does so at a fine granularityand for independent source components. Unlike in any SVCbased schemes, the importance and implicit redundancy lev-els of the source components in ParCast are aligned, i.e., themost important component also embeds the highest level ofredundancy. Therefore, the power allocation process actu-ally tries to better protect the less important components.

Graceful degradation. Hierarchical modulation is a com-mon digital approach to mitigate glitches and reduce the cliffeffects [21, 9]. It can be combined with multiple antennas,as in [16, 31]. These work typically focus on how to correctlydecode part of the complex symbol.

Apex [23] recognizes that the bits in a regular QAM sym-bol have different error probability, and allocates importantdata to the bits with lower error probability.

Although ParCast is not explicitly optimized for grace-ful degradation, its linear codec performance degrades withnoise in a manner similar to SoftCast.

Joint source-channel coding. Besides SoftCast and Flex-Cast, there have been a number of designs for lossy single-antenna channels (e.g., [24, 30, 27]), though few for MIMO.ParCast follows the same motivation to jointly consider sourcecompression and error resilience at the channel, but specifi-cally designs a scheme to leverage the properties of both thesource and the MIMO-OFDM channel with a single code.

8. CONCLUSIONParCast is motivated by non-uniform energy distribution

at both the source and the channel. It sends independentsource components on independent subchannels, with themore important components on higher-gain subchannels, jointsource-channel power allocation, and analog modulation tooptimally leverage source redundancy for error protectionat the channel. We show that such a design can signifi-cantly outperform conventional digital schemes by turningchallenging MIMO-OFDM links into opportunities.

9. ACKNOWLEDGMENTSWe sincerely thank Chong Luo, Ji-Zheng Xu, and the

anonymous reviewers for invaluable feedback.

10. REFERENCES[1] http://trace.eas.asu.edu/yuv/.

[2] ftp://ftp.tnt.uni-hannover.de/pub/svc/testsequences/.

[3] H.264/AVC JM Reference Software. http://iphome.hhi.de/suehring/tml/.

[4] High-Definition Wireless. AMIMON-WHDI TechnologyOverview. http://www.amimon.com/technology.shtml.

[5] IEEE Std. 802.11n-2009: Enhancements for Higher Throughput.

[6] SVC Reference Software. http://ip.hhi.de/imagecom_G1/savce/downloads/.

[7] Cisco visual networking index: Forecast and methodology2010-2015. http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-481360.pdf,June 2011.

[8] S. Aditya and S. Katti. FlexCast: Graceful wireless videostreaming. In ACM MobiCom, 2011.

[9] B. Barmada, M. Ghandi, E. Jones, and M. Ghanbari. Prioritizedtransmission of data partitioned H. 264 video with hierarchicalQAM. IEEE Signal Processing Letters, 12(8):577–580, 2005.

[10] R. K. W. Chan and M. C. Lee. 3D-DCT quantization as acompression technique for video sequences. In International

Conference on Virtual Systems and MultiMedia, 1997.

[11] D. Halperin, W. Hu, A. Sheth, and D. Wetherall. Predictable802.11 packet delivery from wireless channel measurements. InACM SIGCOMM, 2010.

[12] G. Hardy, J. Littlewood, and G. Polya. Inequalities, (2nd ed.),Cambridge Mathematical Library, 1989.

[13] S. Jakubczak and D. Katabi. A cross-layer design for scalablemobile video. In ACM MobiCom, 2011.

[14] Z. Ji, Q. Zhang, W. Zhu, Z. Guo, and J. Lu. Power-EfficientMPEG-4 FGS Video Transmission over MIMO-OFDM Systems.In IEEE ICC, 2003.

[15] K. Lee and D. Petersen. Optimal linear coding for vectorchannels. IEEE Trans. on Communications, 24(12):1283–1290,1976.

[16] Y. Noh, H. Lee, and I. Lee. Design of Unequal error protectionfor MIMO-OFDM systems. In VTC, 2005.

[17] H. Rahul, F. Edalat, D. Katabi, and C. G. Sodini.Frequency-aware rate adaptation and MAC protocols. In ACMMobiCom, 2009.

[18] H. Rahul, H. Hassanieh, and D. Katabi. SourceSync: ADistributed Wireless Architecture for Exploiting SenderDiversity. In ACM SIGCOMM, 2010.

[19] I. Richardson. H. 264 and MPEG-4 video compression,volume 20. Wiley Online Library, 2003.

[20] M. F. Sabir, R. Heath, and A. Bovik. An unequal errorprotection scheme for Multiple Input Multiple Output systems .In Asilomar, 2002.

[21] A. Schertz and C. Weck. Hierarchical modulation-thetransmission of two independent DVB-T multiplexes on a singlefrequency. EBU Technical review, 2003.

[22] H. Schwarz, D. Marpe, and T. Wiegand. Overview of thescalable video coding extension of the H. 264/AVC standard.IEEE Trans. on Circuits and Systems for Video Technology,17(9):1103–1120, 2007.

[23] S. Sen, S. Gilani, S. Srinath, S. Schmitt, and S. Banerjee.Design and Implementation of an “Approximate”Communication System for Wireless Media Applications. InACM SIGCOMM, 2010.

[24] M. Skoglund, N. Phamdo, and F. Alajaji. HybridDigital-Analog Source-Channel Coding for BandwidthCompression/Expansion. IEEE Trans. on Information Theory,52(8):3757–3763, 2006.

[25] D. Song and C. Chen. Maximum-throughput delivery ofSVC-based video over MIMO systems with time-varyingchannel capacity. Journal of Visual Communication andImage Representation, 19(8):520–528, 2008.

[26] D. Song and C. W. Chen. Scalable H.264/AVC VideoTransmission Over MIMO Wireless Systems With AdaptiveChannel Selection Based on Partial Channel Information. IEEETrans. on Circuit Systems and Video Techonology, 17(9),2007.

[27] M. Stoufs, A. Munteanu, J. Barbarien, J. Cornelis, andP. Schelkens. Optimized scalable multiple-description codingand FEC-based joint source-channel coding: A performancecomparison. In IEEE 10th Workshop on Image Analysis for

Multimedia Interactive Services, 2009.

[28] K. Tan, J. Zhang, J. Fang, H. Liu, Y. Ye, S. Wang, Y. Zhang,H. Wu, W. Wang, and G. M. Voelker. Sora: High PerformanceSoftware Radio Using General Purpose Multi-core Processors.In NSDI, 2009.

[29] D. Tse and P. Viswanath. Fundamentals of Wireless

Communication. Cambridge University Press, 2005.

[30] Q. Xu, V. Stankovic, and Z. Xiong. Distributed jointsource-channel coding of video using Raptor codes. IEEEJSAC, 25(4):851–861, 2007.

[31] G.-H. Yang, D. Shen, and V. Li. Unequal Error Protection forMIMO Systems with a Hybrid Structure. In ISCAS, 2006.

244

http://trace.eas.asu.edu/yuv/

ftp://ftp.tnt.uni-hannover.de/pub/svc/testsequences/

http://iphome.hhi.de/suehring/tml/

http://iphome.hhi.de/suehring/tml/

http://www.amimon.com/technology.shtml

http://ip.hhi.de/imagecom_G1/savce/downloads/

http://ip.hhi.de/imagecom_G1/savce/downloads/

http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-481360.pdf

http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-481360.pdf

Documents

ParCast: Soft Video Delivery in MIMO-OFDM WLANsqifan/papers/parcast.pdf · ParCast: Soft Video Delivery in MIMO-OFDM WLANs Xiao Lin Liu∗, Wenjun Hu†, Qifan Pu†∗, Feng Wu†,