13
Robust mobile video streaming in a peer-to-peer system Jeonghun Noh a,n , Bernd Girod b a ASSIA Inc., 333 Twin Dolphin Drive, Redwood City, CA, USA b Stanford University, 350 Serra Mall, Stanford, CA, USA article info Available online 16 February 2012 Keywords: Peer-to-peer streaming Mobile users Robust transmission Transcoding Distributed multimedia abstract In a peer-to-peer (P2P) video streaming system, peers not only consume video, but also route it to other peers in the system, where ordinary peers are assumed to have sufficient downlink speed and media capability. This assumption often fails when the P2P system consists of peers that are heterogeneous in their computing power, hardware, and media capability. In this paper, we address a problem of streaming video to mobile devices, which are less capable than ordinary peers. In order to stream video to mobile devices, transcod- ing is often required to render video suitable for their small display, limited downlink speed, and limited video decoding capability. However, performing transcoding at a single peer is vulnerable to peer churn, which leads to video disruption. We propose interleaved distributed transcoding (IDT), a robust video encoding scheme that allows peers more capable than mobile devices to perform transcoding in a collaborative fashion. IDT is designed in such a way that transcoded substreams are assembled into a single video stream, which can be decoded by any H.264/AVC baseline profile compliant decoder. Extensive simulations and its implementation in a real P2P system demon- strate that the proposed scheme not only reduces computational load at a peer, but also achieves robust streaming in the case of peer failure or packet loss due to adverse wireless channel conditions. We confirm this finding by analyzing the effect of distributed transcoding under peer failure. & 2012 Elsevier B.V. All rights reserved. 1. Introduction The peer-to-peer (P2P) architecture is a promising cost- effective solution to live video streaming to a large popula- tion. The P2P systems are self-scalable as peers not only consume video contents, but also relay them to other peers who want the same contents. Although peers may exhibit heterogenous downlink and uplink bandwidths due to the access networks that peers connect to, it has been often assumed that every peer in the system wants and/or is able to consume the original video transmitted from the video source. In the last decade, mobile devices, such as smart phones and Personal Digital Assistants (PDAs), have become ubiquitous in our daily lives, expediting the heterogeneity among multimedia-capable devices. As most of the mobile devices are equipped with small displays or have access to the Internet with limited download speed, live media adaptation is performed to meet the streaming requirements of heterogeneous mobile users. For video, media adaptation is often achieved by video transcoding [1–4]. For mobile devices, the original video is converted to a new bitstream for a different video standard, smaller spatial resolution, reduced frame rate, or reduced quality (due to coarser quantization). However, transcoding poses a considerable computational burden on the streaming server because mobile devices may require individually customized transcoding. Contents lists available at SciVerse ScienceDirect journal homepage: www.elsevier.com/locate/image Signal Processing: Image Communication 0923-5965/$ - see front matter & 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.image.2012.02.014 n Corresponding author. E-mail addresses: [email protected] (J. Noh), [email protected] (B. Girod). Signal Processing: Image Communication 27 (2012) 532–544

Robust mobile video streaming in a peer-to-peer system

Embed Size (px)

Citation preview

Page 1: Robust mobile video streaming in a peer-to-peer system

Contents lists available at SciVerse ScienceDirect

Signal Processing: Image Communication

Signal Processing: Image Communication 27 (2012) 532–544

0923-59

doi:10.1

n Corr

E-m

bgirod@

journal homepage: www.elsevier.com/locate/image

Robust mobile video streaming in a peer-to-peer system

Jeonghun Noh a,n, Bernd Girod b

a ASSIA Inc., 333 Twin Dolphin Drive, Redwood City, CA, USAb Stanford University, 350 Serra Mall, Stanford, CA, USA

a r t i c l e i n f o

Available online 16 February 2012

Keywords:

Peer-to-peer streaming

Mobile users

Robust transmission

Transcoding

Distributed multimedia

65/$ - see front matter & 2012 Elsevier B.V. A

016/j.image.2012.02.014

esponding author.

ail addresses: [email protected] (J. Noh),

stanford.edu (B. Girod).

a b s t r a c t

In a peer-to-peer (P2P) video streaming system, peers not only consume video, but also

route it to other peers in the system, where ordinary peers are assumed to have

sufficient downlink speed and media capability. This assumption often fails when the

P2P system consists of peers that are heterogeneous in their computing power,

hardware, and media capability.

In this paper, we address a problem of streaming video to mobile devices, which are

less capable than ordinary peers. In order to stream video to mobile devices, transcod-

ing is often required to render video suitable for their small display, limited downlink

speed, and limited video decoding capability. However, performing transcoding at a

single peer is vulnerable to peer churn, which leads to video disruption. We propose

interleaved distributed transcoding (IDT), a robust video encoding scheme that allows

peers more capable than mobile devices to perform transcoding in a collaborative

fashion. IDT is designed in such a way that transcoded substreams are assembled into a

single video stream, which can be decoded by any H.264/AVC baseline profile compliant

decoder. Extensive simulations and its implementation in a real P2P system demon-

strate that the proposed scheme not only reduces computational load at a peer, but also

achieves robust streaming in the case of peer failure or packet loss due to adverse

wireless channel conditions. We confirm this finding by analyzing the effect of

distributed transcoding under peer failure.

& 2012 Elsevier B.V. All rights reserved.

1. Introduction

The peer-to-peer (P2P) architecture is a promising cost-effective solution to live video streaming to a large popula-tion. The P2P systems are self-scalable as peers not onlyconsume video contents, but also relay them to other peerswho want the same contents. Although peers may exhibitheterogenous downlink and uplink bandwidths due to theaccess networks that peers connect to, it has been oftenassumed that every peer in the system wants and/or is ableto consume the original video transmitted from the video

ll rights reserved.

source. In the last decade, mobile devices, such as smartphones and Personal Digital Assistants (PDAs), havebecome ubiquitous in our daily lives, expediting theheterogeneity among multimedia-capable devices. As mostof the mobile devices are equipped with small displays orhave access to the Internet with limited download speed,live media adaptation is performed to meet the streamingrequirements of heterogeneous mobile users. For video,media adaptation is often achieved by video transcoding

[1–4]. For mobile devices, the original video is converted toa new bitstream for a different video standard, smallerspatial resolution, reduced frame rate, or reduced quality(due to coarser quantization). However, transcoding posesa considerable computational burden on the streamingserver because mobile devices may require individuallycustomized transcoding.

Page 2: Robust mobile video streaming in a peer-to-peer system

J. Noh, B. Girod / Signal Processing: Image Communication 27 (2012) 532–544 533

In this paper, we propose a P2P-based mobile stream-ing that leverages the computing resources contributedby peers. Henceforth, we refer to personal computers orset-top boxes as fixed nodes, as opposed to mobile nodes.Fixed nodes usually have continuous power supply, sothere is no risk of exhausting their power source. Byharnessing the processing power of the fixed nodes, thetranscoding burden of the servers can be reduced oreliminated. Due to their limited resources (e.g., battery,uplink speed), mobile nodes in the proposed architectureare treated as leeches, i.e., peers that only receive packetsbut do not relay the packets to other peers. Moreover,videos customized for individual mobile devices makerelaying video less appealing.

However, P2P-based streaming systems often sufferfrom peer churn: when peers leave the system withoutprior notice, other peers who are connected to thedeparting peers may experience temporary video disrup-tion and/or disconnection from the system. To addressadversarial effects due to peer churn, the interleaveddistributed transcoding scheme introduced in our pre-vious work [5] allows more than one fixed node toperform transcoding for a mobile node. Even if a mobiledevice loses some of its parents, it may still receivesubstreams from the other parents. Then, the proposedscheme allows the mobile device to decode the incomingvideo partially, thereby achieving robust video transmis-sion. In addition, our scheme distributes transcodingoverhead to multiple fixed nodes. We design the distrib-uted transcoding scheme in a way that conforms to theH.264/AVC baseline profile. This allows any H.264/AVCdecoder to play back the transcoded video. The proposedvideo codec was integrated with the Stanford Peer-to-Peer Multicast system (SPPM) [6].

The remainder of the paper is structured as follows.Section 2 surveys the prior work in mobile streamingsystems, transcoding, and robust video transmission.Section 3 describes the proposed distributed transcodingand its implementation. Section 4 introduces SPPM anddiscusses the integration of IDT with the original SPPM. InSection 5, we analyze the effect of peer churn on dis-tributed transcoding. In Section 6, we provide simulationresults and compare the performance of IDT with MDC(Multiple Description Coding).

2. Related work

2.1. Mobile streaming systems

In typical streaming systems, video transcoding isperformed at servers to adapt video to mobile users. In[7,8], server-based adaptive video transcoding systems formobile users are proposed. A video proxy, located at theedge of two or more networks, adaptively transcodesvideo, considering the network conditions and constraintsof mobile users in the GPRS network. The authors of [9]also propose a server-based video transcoding scheme formobile users. The client proxy collects mobile clientprofiles and requests transcoding to the video proxy.Transcoded videos are then sent back to the client proxy,and the client proxy relays them to each client. The

authors of [10] compare solutions that extend IP multicastto support mobile users. Since IP multicast trees areconstructed at the network layer, these solutions oftenfocus on the reduction of the overhead associated withthe reconfiguration of the trees due to the mobility ofmobile users.

Although server-based systems work reasonably for amoderate number of users, transcoding servers can easilybecome a bottleneck because transcoding poses a con-siderable computational burden. Thus, P2P-based mobilestreaming has been actively studied in order to reduceserver load and increase service capacity. The authors in[11] investigated P2P-based video-on-demand (VoD)mobile streaming. In the system they proposed, themobile user establishes a one-to-one connection to aproxy peer chosen in the P2P network. Through the proxypeer, the mobile user searches for a pre-encoded videostored somewhere in the P2P network and downloads it.However, there is no discussion in [11] on the needs forvideo transcoding and error resilience in the case of proxypeer failure. The contribution [12] also studied P2P-basedVoD streaming. When a video is downloaded from severalsource peers, media transcoding is performed by multiplepeers to meet the requirements of the destination peer. Tocombat peer churn, the video quality is adapted by thetranscoders based on the feedback from the destinationpeer. However, no discussion is provided regarding theway video transcoding is performed at multiple peers andhow distributed transcoding and peer churn affects thevideo quality quantitatively. In [13], a fully collaborativeP2P streaming system is proposed. Instead of all users,only a few mobile users pull a video from the video serverthrough base stations. The pulled video is then sharedwith other neighboring users via a free broadcast channel,such as Wi-Fi or Bluetooth. The proposed system is shownto scale with the number of users, thereby reducingcellular bandwidth usage. However, it works well onlywhen a sufficient number of users watching the samevideo belong to the same base station, and the neighbor-ing users are physically close to each other to lessenpower consumption incurred while the video is beingrelayed among peers.

2.2. Video transcoding

Video transcoding converts an original video bitstreamto a new bitstream for a target video player [1–4]. In [14],temporal transcoding, by which video frames are skippedand the motion vectors in the following frame aremodified in order to reduce the bitrate, is conductedaccording to a channel condition. For rate control, rate-distortion-based frame transcoder [15] and frame skip-ping transcoders with motion vector update [16,8] havebeen proposed. A centralized transcoding algorithm isproposed in [17,18] for multi-user video streaming over awireless LAN. In [17,18], the rate of transcoded videobitstream is dynamically adjusted by the central server inorder to maximize the rate-distortion performance byconsidering the video characteristics and network condi-tions. Scalable video coding has its obvious merits whenan encoded video signal needs to be transmitted to mobile

Page 3: Robust mobile video streaming in a peer-to-peer system

J. Noh, B. Girod / Signal Processing: Image Communication 27 (2012) 532–544534

users with different bitrates or reception capabilities and/or the encoding cannot be done or is not economicallyviable for each and every receiver [19,20].

2.3. Robust video transmission

Feedback-based error control is a very effectiveapproach to robust video transmission when a feedbackchannel is available and the extra delay incurred by suchschemes is permissible. In this approach, a sender retrans-mits a lost or damaged packet when the sender receives afeedback message from the receiver [21–23]. In a moresophisticated framework known as RaDiO (Rate-Distor-tion Optimization) [24], proactive packet scheduling isperformed to determine when and which packet to sendto maximize the expected video quality at the receiver. Ina related approach named CoDiO (Congestion-DistortionOptimization) [25], self-congestion due to retransmissionis taken into account for packet scheduling. Our priorwork in [26] extended CoDiO to tree-based P2P videostreaming.

If the delay required for the feedback-based schemes isprohibitive, forward error correction (FEC) may be per-formed, in which a portion of the total transmitted bitrateis dedicated to parity information which enables thedecoder to correct transmission errors. The most popularchannel code used to provide FEC is the systematic Reed–Solomon code [27], in which packets of parity symbols areappended to the video packets prior to transmission. FEC,however, may suffer from the rapid reduction in picturequality, called the ‘‘cliff effect,’’ when the number ofsymbol errors is too high. To alleviate the cliff effect, theauthors of [28] introduce Systematic Lossy Error Protec-tion (SLEP), which transmits an additional bitstreamgenerated by Wyner-Ziv coding.

Multiple Description Coding (MDC) exploits the avail-ability of multiple paths between the source and thereceiver [29–31]. In MDC, a signal is encoded into multi-ple descriptions that are independently decodable, whichmay lead to better quality if more descriptions areavailable in decoding. The performance of MDC is exam-ined in [31] when applied to CDN, or in the context of P2P[32]. Layered coding and MDC are used together toachieve better system performance in [33–35]. Theauthors of [36] survey adaptive techniques used in videomulticast.

When some packets miss the playback deadline, thereceiver may not be able to decode video frames. Then,the receiver conceals the lost portion of the video based

1 2 3 4 DecodingInterproc

Fig. 1. A cascaded transcoder. An original video stream flows into the decoding

performing frame rate conversion, downsampling, cropping, or any other prep

bitstream.

on several methods. Error-concealment minimizes thenegative impact of late or missing frames as well as thevisual quality degradation caused by the missing frame.Some error concealment schemes use estimation of cod-ing modes and motion vectors [37,38] and spatial orspatio-temporal interpolation [39,40].

3. Interleaved distributed transcoding

In this section, we describe how video transcoding isperformed at one or multiple locations for a mobile nodein the proposed system. We consider a P2P system thatconsists of fixed nodes and mobile nodes; fixed nodes arepeers that receive and consume the original video ema-nating from the video source. Mobile nodes are peers thatcannot receive the original video due to limited downlinkbandwidth, or/and cannot consume the original video dueto limited video decoding capabilities. Thus, it is desirableto perform video transcoding to adapt the original videoto mobile nodes. Fixed nodes perform transcoding toadapt the original video according to the individualrequirements of each mobile node. In this study, we adoptthe cascaded transcoding scheme shown in Fig. 1.

3.1. Interleaved distributed encoding

The multiple parent approach is a popular solution toprovide robustness to peer churn [41,42]. In thisapproach, a peer has multiple peers as parents. When aparent disappears, only a subset of video packets are lost,which allows for graceful video degradation. Thus, wedevise a distributed transcoding scheme that allows multi-ple fixed nodes to perform transcoding for a mobiledevice. A mobile node selects multiple fixed nodes asparents and the parents perform transcoding collabora-tively. A straightforward scheme would be that eachparent transcodes the entire original video and deliversa disjoint substream of it to a mobile node. When one ofits parents disconnects, the mobile node asks for retrans-missions of the missing substream from other parents.However, such a straightforward scheme needlesslywastes computing power at the parents.

To reduce processing redundancy, yet achieve robust-ness with multiple parents, we propose interleaved dis-

tributed transcoding (IDT), which is depicted in Fig. 2. Inthe illustration, K¼4 parents are generating transcodedsubstreams with a Group of Pictures (GOP) of 12 frames(the original GOP size is assumed to be larger than 14 inFig. 2). This illustration demonstrates that the GOP size of

mediateessing Encoding 1 3

unit. The intermediate processing unit transforms the decoded stream by

rocessing before encoding. The output of the encoder is the transcoded

Page 4: Robust mobile video streaming in a peer-to-peer system

1 2 3 4 5 6 7 8 9 10 11 12 14

I1 B2 B3 P4 B5 B6 P7 B8 B9 P10 B11 B12 P13 B14

13

C CC C C C C C

C C C C C C C C C C

C C C C C C C C C

Original video

C CC C C C C C C

I1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 I1 P2

5 9

6 10

8 12 4

2

Substream 1

Substream 2

1

1

1

1

13

13

13

14

137 113Substream 3

Substream 4

Fig. 2. Interleaved distributed transcoding: an example of four parents (IDT encoders) with a GOP of 12 frames. The original video stream is divided into

four substreams. The first frame of a GOP is encoded as an I frame by all parents. To avoid duplicate transmission, only Parent 1 transmits I frames. The I

frames that are not transmitted are depicted with dotted boundaries. The solid arrows represent coding dependency. The dotted arrows depict the frame

copy operation. The original stream is downsampled before encoding.

J. Noh, B. Girod / Signal Processing: Image Communication 27 (2012) 532–544 535

the transcoded stream can be selected independently ofthe one of the original stream. In transcoding, the originalvideo frames are first decoded. In this illustration, thedecoded bitstream is downsampled to smaller frames inthe spatial domain. The first frame in a GOP is coded as anI frame, and each following frame is coded as a P framepredicted from the frame immediately preceding it in thesubstream. Parent i codes Substream i, which includesFrame i, Kþ i,2Kþ i, . . ., and each parent transmits everyKth frame in a disjoint manner. Parent i then codes theframes other than Frame i, Kþ i,2Kþ i, . . ., as frame copybits. Frame copy bits are independent of the actual videocontent as they are control bits embedded in a com-pressed video signal that instructs a decoder to copy theprevious frame decoded and available in the video buffer.1

This allows error concealment to be done at the bitstreamlevel and ensures that any standard-compliant decoderscan decode the assembled bitstream without having todetect a parent or packet loss by itself. This feature isvideo compression technology dependent, which is avail-able in the H.264/AVC standard and allows content-independent frame copy. For any other video technologiesthat have no such features, similar error concealment canbe implemented accordingly. Note that the I frames areencoded and used in prediction by all parents, yettransmitted by only Parent 1 to avoid duplicate transmis-sion. B frames can be employed within each substream toachieve higher coding gain.

The proposed distributed transcoding scheme achievesrobustness against peer churn and distributes transcodingworkload among multiple fixed nodes. The incurred costis the redundancy in the transcoding bitstream due to

1 As the frame copy bits are independent of contents, it is possible to

reconstruct the control bits at the decoder. When additional complexity

at the decoder is acceptable, removing those control bits can eliminate

redundancies in the substreams.

lower temporal correlation between video frames, whichwill be examined in Section 6 in detail.

3.2. Decoding transcoded video

As substreams generated by multiple parents aretransmitted to the mobile peer, the peer starts an assem-bling and decoding process. When there are no lostframes and every parent is available, the assemblingprocess takes frames from each substream according totheir interleaving order and places them in the final datastream. In particular, the peer interleaves frames accord-ing to their positions in the GOP structure. As the sub-streams are interleaved, the frame copy bits contained inthe other substreams are discarded for frames that aresuccessfully received. For example, the Frames 1, 5, 9 and13 are taken from Substream 1 and used as Frames 1, 5, 9,13 in the final data stream. Other frames that are not usedin the final data stream are then discarded. The assembledbitstream is passed to the decoder for playback.

When a parent disconnects, the corresponding sub-stream becomes unavailable at the mobile node. If Parent1 disconnects, then the mobile node requests missing Iframes from one of the other parents (recall that I framesare encoded at every parent). For the frames of themissing substream, the frame copy bits from the availablesubstream preceding the missing substream are used as areplacement. Fig. 3 depicts one example, where Parent 3disconnects and Substream 3 becomes unavailable. There-fore, Frames 3, 7, and 11 become unavailable to themobile peer. In assembling the final data stream, theframe copies (Frames 2, 6, and 10) in Substream 2, whichcorrespond to Frames 3, 7, and 11 in the missing sub-stream (i.e., Substream 3), are used as Frames 3, 7, and 11in the final data stream. Alternatively, the frame copiesfrom other substreams (e.g., Substream 1 or 4) can also beused to substitute the missing fames in final data stream.

Page 5: Robust mobile video streaming in a peer-to-peer system

1 C C 5C C C C 9 C C C C13

1 2 C CC 6 C C C 10 C C

1 C 3 CC C 7 C C C 11 C

1 C C C4 C C 8 C C C 12

13

13

13

14

C

C

I1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 I1 P2

Substream 1

Substream 2

Substream 3(not available)

Substream 4

1 2 C 4 5 6 C 8 9 10 C 12 13 14Assembled stream

Fig. 3. Decoding at a mobile node. In this example, one of four substreams, Substream 3, is assumed to be missing due to parent loss. The frame copy bits

from Substream 2 are used in substituting the missing P frames.

J. Noh, B. Girod / Signal Processing: Image Communication 27 (2012) 532–544536

Note that the redundancy of frame copy bits in multi-ple substreams allows the assembled bitstream to becorrectly played back even when more than one bitstreamis missing. Since the assembly of substreams and theselective insertion of frame copy bits are performed atthe bitstream level, we can avoid any modification to thedecoder.

3.3. Implementing IDT

We implement the interleaved distributed transcoding(IDT) scheme in such a way that no decoder modificationis required.2 The IDT generates no B frames and utilizesmultiple reference frames for encoding P frames. Thisensures that any decoder conforming to the H.264/AVCbaseline profile can decode transcoded bitstreams. Werefer to the H.264/AVC encoder that implements theproposed interleaved distributed transcoding as the IDT

encoder. Suppose that K parents are involved in transcod-ing. The IDT encoders at the parents encode the first framein a GOP as an I frame, which is identical across all theencoders. The remaining frames in a GOP are encoded as Pframes. To encode Frame n as a P frame, Frame n�K , thepreviously encoded frame in the same substream, is usedas a reference frame for motion-compensated prediction.Therefore, the IDT encoder is required to store K pre-viously encoded frames. For this, we take advantage of themultiple reference picture motion compensation specified inthe H.264/AVC baseline profile. It allows the short-termreference picture buffer to hold multiple reference pic-tures, in our case, K previously encoded frames.

We also employ the reference picture reordering speci-fied in the H.264/AVC baseline profile to ensure thecorrect frames are used as a reference picture for motionprediction. The H.264/AVC standard provides the P-SKIP

mode. For this coding type, neither a quantized predictionerror signal, nor a motion vector or reference index

2 The IDT encoder based on the x264 encoder is available for

download at http://www.stanford.edu/� jhnoh/IDT_SoftwareDownload.

html.

parameter is transmitted [43]. The reconstructed signalis obtained similar to the prediction signal of a P-16�16macroblock type that references the picture which islocated at index 0 in the reference picture buffer. Themotion vector used for reconstructing the P-SKIP macro-block is similar to the motion vector predictor (MVp) forthe 16�16 block, where MVp is computed as the medianof the motion vectors of the macroblock partitions orsub-partitions immediately above, diagonally above andto the right, and immediately left of the current partitionor sub-partition [44]. Since the P-SKIP mode always refersto the frame located at index 0 in the buffer, we move theprevious frame in a substream to the index 0 location bythe reference picture reordering. As a result, the encoderrefers to only the frame at the same location for predic-tion, although there may be up to K pictures available inthe buffer.

When the IDT encoder encodes every Kth frame, theremaining frames are encoded as an exact copy of thepreviously encoded frame. In Fig. 2, the IDT encoder atParent 1 encodes Frames P2, P3 and P4 as a copy of Frame1. Frame copy encodes frames with negligible computa-tional complexity at the cost of about 1–2% of controlbits3 added to the transcoded video.

4. Stanford Peer-to-Peer Multicast with mobileextension

We implemented the real-time IDT encoder and deco-der, and integrated them into our P2P system, calledStanford Peer-to-Peer Multicast (SPPM) [6]. SPPM is apeer-to-peer system designed for low latency and robusttransmission of live media. Similar to CoopNet [42] andSplitStream [41], SPPM organizes peers in an overlay ofmultiple complementary trees. Every tree is rooted at thevideo source. The media stream, originating from thevideo source, is packetized and distributed on different

3 The length of the control bits is 627 Bytes for a QCIF-resolution

video. The control bits are about 4 kbps for four parents, GOP size¼12

and 30 fps.

Page 6: Robust mobile video streaming in a peer-to-peer system

S

P4

P5

P8

P3

P6

Tree 1

Tree 2

P2

P1

P7

… … Video stream

Fig. 4. A peer-to-peer overlay consisting of two complementary dis-

tribution trees. The encoded video is packetized and split into disjoint

substreams, one for each distribution tree.

J. Noh, B. Girod / Signal Processing: Image Communication 27 (2012) 532–544 537

trees such that there are no duplicate packets across thetrees. Peers subscribe to every tree in order to receive themedia stream contiguously.

4.1. SPPM

Fig. 4 illustrates an example overlay in which twocomplementary trees are constructed among a smallgroup of peers. Typically, we use 4–8 trees, and peergroups might be much larger. We have run experimentswith up to 10,000 peers. As peers join the system, thetrees are incrementally constructed in a distributed man-ner. When a new peer contacts the video source, the videosource replies with session information, such as thenumber of multicast trees and the video bit rate. It alsosends a list of candidate parents randomly chosen fromthe table of participating peers it maintains. The new peerthen probes each candidate parent to know about theircurrent status. After receiving probe replies, the bestcandidate parent is selected and contacted for each treeby minimizing the height of the distribution tree. Oncethe selected candidate parent accepts the attachmentrequest, a data connection is established between theparent and the new peer. After data transmission starts,each child peer periodically sends hello messages to theirparents. When a peer leaves the system ungracefully, itsparents detect it by observing consecutive missing hello

messages, and stop forwarding video to the child. Thedeparting peer’s children notice that neither video pack-ets nor a response to hello messages arrive. Each aban-doned child then initiates procedure to connect to adifferent parent node in the tree.

To provide the best video quality despite congestionand peer churn, SPPM employs sender-driven packetscheduling [26] in conjunction with receiver-drivenretransmission requests [45]. Acting as a relay, each peer

schedules the next packet to forward by comparing theimportance of packets in its output queue. As a receiver,each peer evaluates the importance of missing packets bycomputing the expected video quality improvement if thecorresponding frame is received before its playout dead-line. Retransmission requests of selected missing packetsare then sent to alternative parents. Since packet lossestypically occur in rare bursts, such feedback error controlis generally superior to forward error control. Feedbackimplosion, prohibiting feedback error control in network-layer multicast, is not an issue for a P2P overlay becauseretransmissions are handled locally between a parent anda child peer.

4.2. Mobile extension

We implemented the IDT encoder for the SPPM fixedpeer, which performs distributed transcoding. The SPPMmobile peer was developed on the Nokia N96 platform[46]. The original SPPM protocol was extended to allowthe fixed and mobile peers to communicate with eachother as well as with the video source.

4.2.1. SPPM fixed peer

As a fixed peer, the SPPM peer is extended to include amobile child manager, a decoder, an intermediate proces-sing unit, a set of encoders, a packetizer, and a queue fortranscoded video, as shown in Fig. 5(a). The mobile childmanager creates, maintains, and terminates a service for amobile peer. When a new mobile peer requests a connec-tion, it creates a dedicated encoder. The video unitdecodes the original video stream. The decompressedvideo is stored in the Raw Video Buffer. Due to packet lossor internal errors in decoding, some frames may bemissing in the buffer. Each frame in the buffer is taggedwith both the global Group-of-Picture (GOP) ID and localframe ID. These metadata are tagged in the video packetsand referred to by the interleaver at the mobile peer. Theintermediate processing unit is used for modifying rawframes before they are passed into the encoder. Resizing(e.g., down-sampling or changing the ratio of the numberhorizontal and vertical pixels), cropping, and/or framerate reduction are performed at this unit. For each mobilepeer, a different intermediate processing may be per-formed. The encoder encodes the modified raw videosignal. For each mobile peer, a different set of encodingparameters may be used, such as the GOP size, encodingstandard (e.g., H.263 or H.264/AVC), and the total numberof parents. The packetizer generates video packets fornetwork transmission. It also adds metadata to packets,such as packet ID, GOP ID, Substream ID, and a timestamp.The queue for transcoded video is used to store videopackets. Packets in the queue are passed to the NIC(Network Interface Card) for transmission.

4.2.2. SPPM mobile peer

The mobile peer consists of the input/output NIC, acontrol unit, a video unit, and an interleaver. The controlunit processes incoming control messages and generatesresponse messages or new control messages to triggerevents, such as parent coordination or rejoin after

Page 7: Robust mobile video streaming in a peer-to-peer system

Fig. 5. (a) Extension for SPPM fixed peer. (b) SPPM mobile peer.

J. Noh, B. Girod / Signal Processing: Image Communication 27 (2012) 532–544538

detecting a missing parent. The component diagram isshown in Fig. 5(b).

The video unit requests the next video packets andprocesses incoming video packets. Each substream is abitstream of an H.264/AVC video signal. The video unitdivides a substream into frames by detecting their bound-aries (e.g., NAL unites). It also marks each frame withrelevant metadata, such as POC (Picture Order Count),GOP ID, Frame ID, and the Substream ID. The video unitsignals the control unit for retransmissions of missing Iframes. It also detects parent disconnect or evaluates thedownload channel status, and signals the control unit. Theinterleaver assembles the frames of the substreams into asingle bitstream. When there is a missing frame due to amissing packet or missing substream, corresponding copyframe control bits are substituted in order to concealmissing frames. The interleaver discards old frames orunnecessary frames in the queues for the substreams. Itstarts assembling after a time-out. The time-out interval

can be adjusted so that the initial buffering time and thepacket reception ratio (excluding late or missing packets)are balanced.

4.2.3. Protocol extension

In order to join the system, a mobile peer first contactsthe video source and sends the video source information,such as its device type, properties (e.g., display size,multimedia codecs), maximum downlink speed. Thevideo source replies with information, such as the bitrateof the original video and peer list. The mobile peer thencollects information from parent candidates, such as theavailable bandwidth, depth in the overlay tree, Frame IDof the recently decoded frame. Based on the collectedinformation, the peer selects several parent candidates toconnect to. When the mobile peer sends a connectionrequest to the parent candidates it selects, it assigns theSubstream ID for them. Note that the parent candidateshave no need to directly communicate with the others.

Page 8: Robust mobile video streaming in a peer-to-peer system

Table 1The average fraction of time during which a particular number of

parents of a mobile node is missing. The total number of parents is set

to 4. l=m is the ratio of the average parent lifetime to the average

recovery time.

l=m Number of missing parents

0 1 2 3 4

10 0.683 0.273 0.041 0.003 0

30 0.877 0.117 0.006 0 0

50 0.924 0.074 0.002 0 0

Table 2The average fraction of time during which a particular number of

parents is missing given a different total number of parents (a¼ l=m is

J. Noh, B. Girod / Signal Processing: Image Communication 27 (2012) 532–544 539

The connection request message also contains the down-sampling ratio, total number of parents, video qualityparameter QP, GOP size, global frame ID used to synchro-nize parents. Once the mobile peer connects to its parents,it continues to exchange messages with its parents forstatus report or change in settings. When a parent failureis detected by noticing no video reception, the mobilepeer triggers the rejoin procedure. If Parent 1, who isresponsible for I-frames, fails, the mobile peer selects oneof the other parents as Parent 1. The rest of the rejoinprocedure is similar to the rejoin procedure of the basicSPPM protocol. When a mobile peer failure is detected bynoticing no hello messages, the fixed peer discards themobile peerOs information and terminates the dedicatedencoder.

5. Peer churn analysis

In this section, we discuss how distributed transcodingmitigates the effect of peer churn. For simplicity, we assumethat a parent node’s lifetime is exponentially distributed,with an average of 1=mseconds. We further assume that amissing parent node can be replaced with another node in arecovery time that is also exponentially distributed, with anaverage of 1=l seconds. Each parent node can fail indepen-dently from any other parent, and the mobile node’srecovery process of finding another parent is independentfrom the state of all other parents. We consider a Markovchain, where a state represents the number of live parents[47]. Fig. 6 illustrates a state-transition-rate diagram, whereK represents the total number of parents.

When a parent arrives, the system state jumps fromState i to iþ1. For a transition from State i to iþ1(0r irK�1), K�i concurrent recovery processes are per-formed; the transition rate is therefore ðK�iÞl. When aparent leaves, the system state jumps from State i to i�1.For a transition from State i to i�1 (1r irK), i parents arealive and have the same failure rate of m; the transitionrate is therefore im. We can now compute the stationarystate probability distribution of this Markov chain byusing the following relationship:

lpðK�1Þ ¼ KmpK , ð1Þ

XK

i ¼ 0

pi ¼ 1, ð2Þ

where probability flow conservation and probability conser-

vation apply. By expressing pk (ka0) with p0, we obtain

p0 ¼XK

i ¼ 0

K

i

� �lm

� �i( )�1

¼lmþ1

� ��K

, ð3Þ

Fig. 6. State-transition-rate diagram for a mobile node with K parents.

1=l is the average recovery time. 1=m is the average parent lifetime.

pi ¼lmþ1

� ��K K

i

� �lm

� �i

, 0r irK : ð4Þ

Note that the binomial theorem is applied in Eq. (3).We can relate the states depicted in Fig. 6 with theeffective frame rate of the decoded video. Suppose that f

is the frame rate of the transcoded video. When themobile node is in State K�1, then one substream ismissing. With copy error concealment, previouslydecoded frames are displayed in lieu of the frames ofthe missing substream. Thus, the effective frame ratebecomes ððK�1Þ=KÞf (no packet loss over the wirelesschannel is assumed for this discussion). For instance,when K¼2, if the mobile node is in State 1, then theeffective frame rate is 0.5f. When K¼4, then if the mobilenode is in State 3, the effective frame rate is 0.75f.

Table 1 shows the average fraction of time duringwhich a particular number of parents (substreams) of amobile node are missing under different peer churn rates.l=m, denoted by a, indicates the ratio of the parentaverage lifetime to the parent average recovery time. Asa increases, the state probability of State K (no parents aremissing) also increases and the state probability of all theother states decreases. This is obvious because the mobilenode is more likely to have all parents alive when therecovery time is shorter or the average lifetime is longer.

In Table 2, the effect of peer churn on the number ofmissing parents is shown with the total number ofparents K varied from 0 to 4. As a mobile node connectsto more parents, the probability that all parents aremissing approaches 0, whereas the probability that atleast one parent is missing increases (when a is large, theprobability of missing one parent (State K�1) increaseslinearly as K increases). Note that regardless of K, theaverage fraction of missing frames is a constant 1=ðaþ1Þ;

set to 30).

Total number

of parents

Number of missing parents

0 1 2 3 4

1 0.968 0.032 N/A N/A N/A

2 0.936 0.062 0.002 N/A N/A

3 0.906 0.091 0.003 0 N/A

4 0.877 0.117 0.006 0 0

Page 9: Robust mobile video streaming in a peer-to-peer system

J. Noh, B. Girod / Signal Processing: Image Communication 27 (2012) 532–544540

in State i, ðK�iÞ=K fraction of packets are lost due to K�i

missing parents. Then, CðKÞ, the average fraction ofpackets lost given K parents is expressed as

CðKÞ ¼XK

i ¼ 0

K�i

K

� �lm þ1

� ��K K

i

� �lm

� �i

¼lmþ1

� ��K XK�1

i ¼ 0

K�1

i

� �lm

� �i( )

¼lm þ1

� ��K lm þ1

� �K�1

¼1

aþ1: ð5Þ

This result shows that connecting to multiple parentsdoes not affect the average number of missing packets.Instead, it alleviates the degree of video degradation atthe expense of more frequent video degradation4 [48].

6. Experimental results

We conducted computer simulations to evaluate theproposed distributed transcoding scheme in a P2P envir-onment. We implemented the IDT algorithm by modifyingthe x264 encoder [49]. The x264 encoder is an open sourcelibrary for encoding videos in the H.264/AVC syntax. Sixvideo sequences in CIF resolution were used as the originalvideos, encoded by using the H.264/AVC main profile. TheGOP size was 15 frames, and two B frames were generatedbetween the anchor frames. The frame rate of the originalvideo was 30 fps. The original video was streamed live to apopulation of SPPM fixed peers from the video source.Peers incrementally constructed the overlay as they joinedthe system. When a fixed peer was connected to thesystem, it received the original video as long as its down-link capacity was higher than the video bitrate. Forsimplicity, we assumed that all fixed nodes had downlinkcapacity larger than the video bitrate.

When a mobile user joins the system, it searches for K

fixed nodes that have available uplink bandwidth andprocessing power. We assumed that the number of fixednodes exceeded K. After the mobile user finds K fixednodes as parents, it assigns them unique Parent IDs (from1 to K). Then, it requests them to transcode disjoint sets ofvideo frames (substreams). For the synchronization ofsubstreams, parents add metadata to substreams, suchas the time stamp of a GOP. During the parent-coordina-tion process, the mobile node examines its device-specificprofile, such as the media decoding capability, displaysize, and user’s preference. It also detects time-varyingparameters including the remaining battery capacity andthe maximum downlink bandwidth of the wireless chan-nel. Based on the collected information, the mobile nodedetermines the video quality (e.g., quantization para-meter), frame rate, and spatial resolution. In our simula-tions, we kept the frame rate of the original video. Spatialresolution was reduced from CIF to QCIF. For a transcodedvideo, the GOP size was set to 24. The quantizationparameter and the number of parents were varied acrossdifferent simulations.

4 We assumed no packet loss in the analysis. In reality, packet loss

may be affected by video bitrate, which is subject to K.

The fixed node’s lifetime is exponentially distributed withan average of 90 s. When a fixed node serving as a parentleaves the system, its child node finds a different fixed nodeto recover the missing substream. The recovery time isexponentially distributed with an average of 3 s. Althoughthe IDT provides error resilience on its own, it is sensitive to Iframe loss. When Parent 1 failure is detected, the mobilenode selects one of its available parents as the new Parent 1.When I frames are lost due to a lossy channel, retransmis-sion is requested for the missing I frames. To avoid self-congestion, retransmissions of P frames are not requested.

Fig. 7 shows the simulation results for single-parenttranscoding and the interleaved distributed transcoding(two and four parents). A distortion measured in PSNR iscomputed between the input stream to the IDT encoders(the bistream after the intermediate processing unit) andthe assembled stream. The solid curves show the rate-distortion performance without packet loss and withoutpeer churn, in a case called ‘‘no-packet-loss’’. The single-parent shows the highest coding efficiency in this casebecause temporal redundancy is removed most effec-tively. When more parents are involved with transcoding,the distance between frames in a substream increases andthis lowers inter-frame temporal correlation.

Now, we consider the lossy scenario, where videodegradation is caused by two different sources: packetloss over the wireless channel and peer churn. Thewireless channel is modeled by the Gilbert–Elliot model,as a discrete-time Markov chain with two states. Table 3shows the Gilbert model transition probabilities based onthe GSM network, presented in [50]. During the goodstate there is no packet loss, whereas during the bad statethe channel produces packet loss with probability 0.5.State transitions occur according to Table 3 at the trans-mission of each packet. For peer churn, a burst of packetsthat belongs to a substream is lost when the correspond-ing parent leaves. The dashed curves in Fig. 7 indicate thedegraded video quality in the lossy scenario.

Some sequences, such as Foreman sequence, containmore motion than other sequences, such as Mother &Daughter, and the concealment of lost frames is oftenmore difficult. Fig. 7 also shows that the single-parentcase suffers from a significant quality drop from the no-packet-loss case. This indicates a huge fluctuation in videoquality occurs when the mobile node loses its only parent.On the contrary, distributed transcoding (both two andfour parents) exhibit a lower quality degradation thansingle-parent, which implies a lower variance in videoquality. We also observed that the performance differencebetween two parents and four parents is negligible exceptfor the Mobile sequence. Recall that two parents out-performed four parents in the no-packet-loss case. Thisshows that the variance of the video quality is smaller bymore parents because the adverse impact of packet loss isalleviated with more substreams. This result is analogousto that from the analysis of peer churn. In Section 5, weshowed that the impact of parent disconnect, in terms ofthe effective frame rate, is smaller with more parents atthe cost of longer video degradation periods.

We compared the performance of IDT against Multiple

Description Coding (MDC) [31]. The MDC in [31] is similar

Page 10: Robust mobile video streaming in a peer-to-peer system

100 200 300 400 500 600 70031

32

33

34

35

36

37

38

39

40

41

bitrate [kbps]

PSN

R [

dB]

50 100 150 20032

33

34

35

36

37

38

39

40

41

bitrate [kbps]

PSN

R [

dB]

0 100 200 300 40030

32

34

36

38

40

42

Bitrate [kbps]

PSN

R [

dB]

500 1000 1500 2000

25

30

35

40

Bitrate [kbps]

PSN

R [

dB]

0 100 200 300 400 50030

32

34

36

38

40

42

44

Bitrate [kbps]

PSN

R [

dB]

100 200 300 400 500 600 70028

30

32

34

36

38

40

Bitrate [kbps]

PSN

R [

dB]

1 parent, no packet loss1 parent, lossy scenario2 parents, no packet loss2 parents, lossy scenario4 parents, no packet loss4 parents, lossy scenario

1 parent, no packet loss1 parent, lossy scenario2 parents, no packet loss2 parents, lossy scenario4 parents, no packet loss4 parents, lossy scenario

1 parent, no packet loss

1 parent, lossy scenario

2 parents, no packet loss

2 parents, lossy scenario

4 parents, no packet loss

4 parents, lossy scenario

1 parent, no packet loss1 parent, lossy scenario2 parents, no packet loss2 parents, lossy scenario4 parents, no packet loss4 parents, lossy scenario

1 parent, no packet loss1 parent, lossy scenario2 parents, no packet loss2 parents, lossy scenario4 parents, no packet loss4 parents, lossy scenario

1 parent, no packet loss

1 parent, lossy scenario

2 parents, no packet loss

2 parents, lossy scenario

4 parents, no packet loss

4 parents, lossy scenario

Fig. 7. Rate-distortion curves with and without packet loss for conventional transcoding (one parent) and interleaved distributed transcoding (two and

four parents). (a) Foreman sequence; (b) Mother & Daughter sequence; (c) Container sequence; (d) Mobile sequence; (e) News sequence; and (f) Paris

sequence.

Table 3Probabilities used in the Gilbert–Elliot model.

State i PrðiÞ Prð19iÞ Prð09iÞ

0 (Good) 0.9449 0.0087 0.9913

1 (Bad) 0.0551 0.8509 0.1491

J. Noh, B. Girod / Signal Processing: Image Communication 27 (2012) 532–544 541

to our IDT scheme; however, each interleaved videoframe sequence contains its own I frames, and thus eachsubstream can be decoded independently. For a faircomparison with IDT, the GOP size of each substream ismade identical to the GOP size used for IDT. This ensuresthe ratio of the number of I frames and the number of Pframes in the bitstream is identical for both MDC and

Page 11: Robust mobile video streaming in a peer-to-peer system

100 200 300 400 50030

31

32

33

34

35

36

37

38

bitrate [kbps]

PSN

R [

dB]

0 50 100 15033

34

35

36

37

38

39

40

bitrate [kbps]

PSN

R [

dB]

50 100 150 200 25031

32

33

34

35

36

37

38

39

bitrate [kbps]

PSN

R [

dB]

400 600 800 1000 1200 1400 1600

24

26

28

30

32

34

bitrate [kbps]

PSN

R [

dB]

50 100 150 200 250 300 35030

32

34

36

38

40

bitrate [kbps]

PSN

R [

dB]

100 200 300 400 500 60028

30

32

34

36

38

bitrate [kbps]

PSN

R [

dB]

IDT: 2 parents, no packet lossIDT: 2 parents, lossy scenarioIDT: 4 parents, no packet lossIDT: 4 parents, lossy scenarioMDC: 2 parents, no packet lossMDC: 2 parents, lossy scenarioMDC: 4 parents, no packet lossMDC: 4 parents, lossy scenario

IDT: 2 parents, no packet lossIDT: 2 parents, lossy scenarioIDT: 4 parents, no packet lossIDT: 4 parents, lossy scenarioMDC: 2 parents, no packet lossMDC: 2 parents, lossy scenarioMDC: 4 parents, no packet lossMDC: 4 parents, lossy scenario

IDT: 2 parents, no packet lossIDT: 2 parents, lossy scenarioIDT: 4 parents, no packet lossIDT: 4 parents, lossy scenarioMDC: 2 parents, no packet lossMDC: 2 parents, lossy scenarioMDC: 4 parents, no packet lossMDC: 4 parents, lossy scenario

IDT: 2 parents, no packet lossIDT: 2 parents, lossy scenarioIDT: 4 parents, no packet lossIDT: 4 parents, lossy scenarioMDC: 2 parents, no packet lossMDC: 2 parents, lossy scenarioMDC: 4 parents, no packet lossMDC: 4 parents, lossy scenario

IDT: 2 parents, no packet lossIDT: 2 parents, lossy scenarioIDT: 4 parents, no packet lossIDT: 4 parents, lossy scenarioMDC: 2 parents, no packet lossMDC: 2 parents, lossy scenarioMDC: 4 parents, no packet lossMDC: 4 parents, lossy scenario

IDT: 2 parents, no packet lossIDT: 2 parents, lossy scenarioIDT: 4 parents, no packet lossIDT: 4 parents, lossy scenarioMDC: 2 parents, no packet lossMDC: 2 parents, lossy scenarioMDC: 4 parents, no packet lossMDC: 4 parents, lossy scenario

Fig. 8. Video quality comparison of IDT and MDC. (a) Foreman sequence; (b) Mother & Daughter sequence; (c) Container sequence; (d) Mobile sequence;

(e) News sequence; and (f) Paris sequence.

J. Noh, B. Girod / Signal Processing: Image Communication 27 (2012) 532–544542

IDT. Fig. 8 shows the performances of MDC and IDT forthe six video sequences. IDT consistently outperformsMDC for both the lossy scenario and the scenario withoutpacket loss. A plausible explanation is that in MDC, aGOP in a substream spans a larger duration of theoriginal video than a GOP in the assembled stream. Forinstance, for K¼4 parents and a GOP of 24 frames, IDTencodes every 24th frame as an I frame. In contrast, MDCencodes every 96th frame as an I frame. Reducing the

GOP size may alleviate this slow refresh, but it results ina worse rate-distortion performance due to the higherratio of the number of I frames to the number of Pframes.

Next, we measured the time spent by the modifiedx264 encoder. Since a fixed node performs decoding forits own playback, we do not include decoding time. Wealso ignored the intermediate processing time, which ismuch smaller than the encoding time. We averaged the

Page 12: Robust mobile video streaming in a peer-to-peer system

Table 4Average CPU time required for encoding a frame at one of parents.

One parent

(ms)

Two

parents

(ms)

Three

parents (ms)

Four

parents

(ms)

Foreman 6.60 4.14 3.19 2.74

Mother &

Daughter

4.39 2.86 2.41 2.14

Container 4.38 2.70 2.20 1.96

Mobile 7.25 4.03 2.49 2.42

News 3.08 2.54 2.06 1.82

Paris 4.48 2.77 2.28 1.98

J. Noh, B. Girod / Signal Processing: Image Communication 27 (2012) 532–544 543

encoding times from 100 experiments5 (Table 4). Encod-ing a frame of the Foreman sequence using single-parenttranscoding required about 6.6 ms on average. When theframe rate of the transcoded bitstream is 30 fps, thenabout 20% of the CPU cycles are required for a mobileuser. As more parents are involved in transcoding, eachparent spends less time because the generation of framecopy bits has low complexity.

7. Conclusions

In this paper, we proposed a robust video streamingscheme for mobile devices by leveraging the peer-to-peerarchitecture. In the proposed scheme, while ordinarypeers consume the original video transmitted from thevideo source, less-capable peers, such as mobile phones,consume video adapted by the interleaved distributedtranscoding (IDT). IDT is shown to provide graceful videodegradation against packet loss due to peer churn and toadverse wireless channel conditions. It is also shown toreduce the amount of computation that each parent has toperform. Our theoretical analysis shows that IDT balancesthe degree of video degradation with the occurrence ofvideo degradation. We also implemented the proposedscheme in a real-time P2P streaming system. For a certainvideo sequence, it was shown that the IDT achieves 67%reduction in the CPU time required for a parent peer byusing four parents.

Much interesting research may be carried out toextend this study. Instead of treating mobile peers indi-vidually, they may be clustered into a limited number ofprofile groups, thus improving system scalability in somescenarios where a large population of mobile peers ispresent. At the writing of this paper, virtually no SVCdecoders for mobile devices are available. However, whenSVC for mobile devices becomes prevalent, it will beinteresting to compare on-demand transcoding, such asIDT, with SVC-based streaming in the P2P context.

References

[1] A. Vetro, C. Christopoulos, H. Sun, Video transcoding architecturesand techniques: an overview, Signal Processing Magazine (2003)18–29.

5 The benchmark computer used in the experiment runs Linux OS

and its CPU is an Intel Pentium 4 (single core) with a clock speed of

2.80 GHz.

[2] H. Sun, T. Chiang, X. Chen, Digital Video Transcoding for Transmis-sion and Storage, CRC Press, Inc., Boca Raton, FL, USA, 2004.

[3] J. Xin, C. Lin, M. Sun, Digital video transcoding, IEEE Proceedings 93(1) (2005) 84–97.

[4] I. Ahmad, X. Wei, Y. Sun, Y.Q. Zhang, Video transcoding: an over-view of various techniques and research issues, IEEE Transactionson Multimedia 7 (5) (2005) 793–804.

[5] J. Noh, M. Makar, B. Girod, Streaming to mobile users in a peer-to-peer network, in: Proceedings of the Mobile Multimedia Commu-nications Conference, Kingston, UK, September 2009.

[6] J. Noh, P. Baccichet, F. Hartung, A. Mavlankar, B. Girod, Stanfordpeer-to-peer multicast (SPPM)—overview and recent extensions,in: Proceedings of the International Picture Coding Symposium(PCS), Chicago, IL, USA, May 2009.

[7] S. Dogan, A. Cellatoglu, M. Uyguroglu, A.H. Sadka, A.M. Kondoz,Error-resilient video transcoding for robust internetwork commu-nications using GPRS, IEEE Transactions on Circuits and Systems forVideo Technology 12 (6) (2002) 453–464.

[8] R. Kumar, A protocol with transcoding to support QoS over Internetfor multimedia traffic, IEEE Proceedings of the International Con-ference on Multimedia and Expo, Baltimore, MD, vol. 1, 2003,pp. 465–468.

[9] A. Warabino, S. Ota, D. Morikawa, M. Ohashi, Video transcodingproxy for 3G wireless mobile internet access, IEEE CommunicationsMagazine (2000) 66–71.

[10] A. Dutta, J.M. Chennikara, W. Chen, O. Altintas, H. Schulzrinne,Multicasting streaming media to mobile users, IEEE Communica-tions Magazine 41 (10) (2003) 81–89.

[11] S. Venot, L. Yan, On-demand mobile peer-to-peer streaming overthe JXTA overlay, in: Mobile Ubiquitous Computing, Systems,Services and Technologies, French Polynesia, Tahiti, November2007, pp. 131–136.

[12] F. Chen, T. Repantis, V. Kalogeraki, Coordinated media streamingand transcoding in peer-to-peer systems, in: IEEE Proceedings ofthe International Parallel and Distributed Processing Symposium,French Polynesia, Tahiti, 2005, pp. 56b.

[13] M.F. Leung, S.H.G. Chan, Broadcast-based peer-to-peer collabora-tive video streaming among mobiles, IEEE Transactions on Broad-casting 53 (2007) 350–361.

[14] M.A. Bonuccellit, F. Lonetti, F. Martelli, Temporal transcoding for mobilevideo communication, in: International Conference on Mobile andUbiquitous Systems: Networking and Services, 2005, pp. 502–506.

[15] S. Liu, C.-C.J. Kuo, Joint temporal–spatial rate control for adaptivevideo transcoding, in: IEEE Proceedings of the International Con-ference on Multimedia and Expo, vol. 2, July 2003, pp. 225–228.

[16] V. Patil, R. Kumar, An arbitrary frame-skipping video transcoder, in:IEEE Proceedings of the International Conference on Multimediaand Expo, Amsterdam, Netherlands, July 2005, pp. 1456–1459.

[17] M. Kalman, B. Girod, P. van Beek, Optimized transcoding rateselection and packet scheduling for transmitting multiple videostreams over a shared channel, in: IEEE Proceedings of the Inter-national Conference on Image Processing, vol. 1, September 2005,165–168.

[18] M. Kalman, B. Girod, Optimal channel-time allocation for thetransmission of multiple video streams over a shared channel, in:IEEE Proceedings of the Multimedia Signal Processing Workshop,January 2005, pp. 1–4.

[19] T. Schierl, T. Stockhammer, T. Wiegand, Mobile video transmissionusing scalable video coding, IEEE Transactions on Circuits andSystems for Video Technology 17 (9) (2007) 1204–1217.

[20] S. Wenger, Y.K. Wang, T. Schierl, Transport and signaling of SVC inIP networks, IEEE Transactions on Circuits and Systems for VideoTechnology 17 (9) (2007) 1164–1173.

[21] J. Postel, Transmission control protocol, RFC 793, 1981.[22] A.B. Roach, A negative acknowledgement mechanism for signaling

compression, RFC 4077, 2005.[23] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, TCP selective

acknowledgement options, RFC 2018, 1996.[24] P.A. Chou, Z. Miao, Rate-distortion optimized streaming of pack-

etized media, IEEE Transactions on Multimedia 8 (2001) 390–404.[25] E. Setton, B. Girod, Congestion-distortion optimized scheduling of

video over a bottleneck link, in: IEEE Proceedings of the MultimediaSignal Processing Workshop, 2004, pp. 179–182.

[26] E. Setton, J. Noh, B. Girod, Congestion-distortion optimized peer-to-peer video streaming, in: International Conference on ImageProcessing (ICIP), Atlanta, GA, October 2006, pp. 721–724.

[27] I.S. Reed, G. Solomon, Polynomial codes over certain finite fields,Journal of the Society for Industrial and Applied Mathematics 8 (2)(1960) 300–304.

Page 13: Robust mobile video streaming in a peer-to-peer system

J. Noh, B. Girod / Signal Processing: Image Communication 27 (2012) 532–544544

[28] S. Rane, A. Aaron, B. Girod, Systematic lossy forward error protec-tion for error resilient digital video broadcasting, in: Proceedings of

the SPIE Visual Communications and Image Processing, San Jose,CA, January 2004.

[29] V. Goyal, Multiple description coding: compression meets thenetwork, IEEE Signal Processing Magazine 18 (5) (2001) 74–93.

[30] J.G. Apostolopoulos, Reliable video communication over lossypacket networks using multiple state encoding and path diversity,in: Proceedings of the Visual Communications and Image Proces-

sing, San Jose, CA, January 2001, pp. 392–409.[31] J. Apostolopoulos, T. Wong, W. Tan, S. Wee, On multiple description

streaming with content delivery networks, IEEE InternationalConference on Communications, New York, NY, vol. 3, 2002,

pp. 1736–1745.[32] X. Xu, Y. Wang, S. Panwar, K.W. Ross, A peer-to-peer video-on-

demand system using multiple description coding and serverdiversity, International Conference on Image Processing, vol. 3,2004, pp. 1759–1762.

[33] P.A. Chou, V.N. Padmanabhan, H. Wang, Layered multiple descrip-tion coding, in: IEEE Proceedings of the International Packet Video

Workshop, Nantes, France, April 2003, US Patent App. 11/787,387.[34] J. Chen, C. Cai, Layered multiple description video coding, in:

International Conference on Signal Processing, October 2008,pp. 1296–1299.

[35] V. Stankovic, R. Hamzaoui, Z. Xiong, Robust layered multipledescription coding of scalable media data for multicast, IEEE SignalProcessing Letters 12 (2) (2005) 154–157.

[36] J. Liu, B. Li, Y.-Q. Zhang, Adaptive video multicast over the Internet,IEEE Transactions on Multimedia 10 (1) (2003) 22–33.

[37] P. Baccichet, D. Bagni, A. Chimienti, L. Pezzoni, F.S. Rovati, Frameconcealment for H.264/AVCdecoders, IEEE Transactions on Consu-

mer Electronics 51 (1) (2005) 227–233.[38] S. Shirani, F. Kossentini, R. Ward, A concealment method for video

communications in an error-prone environment, IEEE Journal onSelected Areas in Communications 18 (6) (2000) 1122–1128.

[39] Z. Alkachouh, M.G. Bellanger, Fast DCT-based spatial domaininterpolation of blocks in images, IEEE Transactions on ImageProcessing 9 (4) (2000) 729–732.

[40] S. Belfiore, M. Grangetto, E. Magli, G. Olmo, Spatio-temporal videoerror concealment with perceptually optimized mode selection, in:IEEE International Conference on Multimedia and Expo, vol. 3,2003, pp. 169–172.

[41] M. Castro, P. Druschel, A.-M. Kermarrec, A. Nandi, A. Rowstron, A.Singh, SplitStream: high-bandwidth content distribution in a coop-erative environment, in: Proceedings of the International workshopon Peer-To-Peer Systems, February 2003, pp. 298–313.

[42] V.N. Padmanabhan, H.J. Wang, P.A. Chou, K. Sripanidkulchai, Dis-tributing streaming media content using cooperative networking,in: Proceedings of ACM NOSSDAV, Miami Beach, Miami, FL, May2002, pp. 177–186.

[43] H. Schwarz, D. Marpe, T. Wiegand, Overview of the scalable videocoding extension of the H.264/AVC standard, IEEE Transactions onCircuits and Systems for Video Technology 17 (9) (2007)1103–1120.

[44] ITU-T, Advanced video coding for generic audiovisual services,Recommendation H.264—ISO/IEC 14496-10(AVC), May 2003.

[45] E. Setton, J. Noh, B. Girod, Low-latency video streaming over peer-to-peer networks, IEEE International Conference on Multimediaand Expo, July 2006, pp. 569–572.

[46] Nokia, Nokia n96, /http://en.wikipedia.org/wiki/Nokia_N96S, June15, 2011.

[47] L. Kleinrock, Queueing Systems: Volume 1 Theory, John Wiley &Sons Inc., 1975.

[48] Y.J. Liang, J.G. Apostolopoulos, B. Girod, Analysis of packet loss forcompressed video: effect of burst losses and correlation betweenerror frames, IEEE Transactions on Circuits and Systems for VideoTechnology 18 (7) (2008) 861–874.

[49] VideoLAN, x264, /http://www.videolan.org/developers/x264.htmlS, June 10, 2010.

[50] A. Konrad, B.Y. Zhao, A.D. Joseph, R. Ludwig, A Markov-basedchannel model algorithm for wireless networks, Wireless Networks9 (3) (2003) 189–199.