Designing QoE experiments to evaluate Peer-to-Peer streaming applications

Designing QoE experiments to evaluate Peer-to-Peer stre

aming applications

Tom Z.J. Fu, CUHK

Dah Ming Chiu, CUHK

Zhibin Lei, ASTRI

VCIP 2010, Huang Shan, China

Outline

Introduction & motivationChunk-level impairment modelExperiment settingResult analysis and insightsFuture work & conclusion

Internet streaming service becomes popular S/C mechanism, P2P mechanism, which is mostly implemented.

CDN, single/multiple tree-based application layer multicast, peer-to-peer streaming (live streaming / VoD).

There is a need to evaluate different mechanisms by some proper methodology. E.g. different strategies used in P2P system.

Two types of evaluation method Objective: measurement on objective metrics (plr, trans. delay) Subjective: inviting subjects to give scores

Introduction and motivation

Introduction and motivation1. Existing methods are not suitable

Only packet-level impairment model for single link network transmission (packet loss rate, packet end-to-end delay, etc) is considered.

A chunk (much larger than one packet) becomes basic unit of almost all the building blocks and designing issues for most large-scale P2P streaming systems.

2. Various objective metrics are defined in different systems and analytical models

Buffer count (UUSee measurement); Playback continuity (several different definitions, Coolstrea

ming, PPlive Measurement, etc.); Subjective testing validation is necessary.

Sourcevi deo(SRC)

Vi deoencoder

Networktransmissi on

Vi deodecoder

Processedvi deo(PVS)

Peer

PeerPeerPeerChunkmaker

Chunkbuff ermanager

Peer-to-Peer mechani sm

Chunk-l evel di storti on Pl aybackControl l er

Di storti on generator

Peer Peer

Chunk-l evel impai rment module

Chunk si ze

Fi g.1Traditional HRC includes: source video (SRC), video encoder, network transmission, video decoder, processed video (PVS).

Chunk-level impairment model

Packet-level impairment for single link (e.g. plr, end-to-end delay)

Chunk-level impairment: for dynamic topology; and various strategies

Video encoder– Different media codec, transmission rate could be chosen at the

video encoder component Network transmission – chunk level impairment module

Chunk maker– responsible for organizing video stream packets into chunks.

Chunk-level distortion generator – three different ways are designed to implement chunk-level

distortion generator Chunk buffer manager and playback controller:

– manages and keeps the received chunks in a local chunk-level buffer;

– make playback decision for each chunk. Video decoder

– After being decoded by the video decoder component, the processed videos (PVS) are then displayed in the monitors to the users.


Notations:– Ti

e: the expected playback time of the ith chunk;– Ti

s: the start download time of chunk i;– Ti

c: the complete download time of chunk i;1. Chunk-level delay.

Chunk i is delayed if Di = {Tic - Ti

e}+ > 0, where {x}+ = x when x > 0, otherwise 0.

2. Chunk delay distribution (CDD). Chunk delay distribution is aggregate statistics for all delayed chunks. I

n the simplest case, it can be represented by a discrete random variable. 3. Chunk receiving pattern (CRP).

It describes how a chunk, i, is filled over the whole downloading process. If we denote fi(t), t∈[Ts

i, Tci], to be the download completion percenta

ge of chunk i at time t, then mathematically, CRP could be represented by any increasing curve of fi(t) over t∈[Ts

i, Tci] with constraints fi(Ts

i) = 0 and fi(Tc

i) = 1.


Illustration of different CRPs Curves A, B, C, D have the same start

downloading time Tsi (1 second before

Tie) and finish time Tc

i (4 seconds after

the Tie).

Chunk generated by curve A will always receive more contents than that of B, C and D.

At t = Tie, the expected playback time,

A generates chunk with 80% of the completeness while B only generates 20%, C close to 0% and D 0%.

Note: in this work, we only apply the simplest pattern,(Curve D, i.e. all contents arrive at same time, Tc

i), the complicated curves will be studied in future work

Live experiments Most detailed CRP for each chunk can be collected

and recorded during a real-life experiment Simulation results

It is possible to simulate a large network with a large number of users, and have the simulation repeatable. The same kinds of detailed CRP traces can be collected.

Artificial generating Manually create different possible chunk delays (by

following certain distribution) or chunk-level receiving patterns (by implementing fi(t) with different increasing curves and parameters), for subjective testing purposes.

Chunk-level distortion generator

For the P2P streaming system, the playback controller acts as an essential role.

Chunks can be considered as two cases: non-delayed chunk, complete downloading on or before Ti

e. delayed chunk (Di > 0). Not complete when meets Ti

e. PC deals with the two cases:

non-delayed chunk: move it out of the local buffer and send it to the decoder to be played back.

delayed chunk: three possible actions might be taken, but not limited.a) Wait until the chunk is complete and then send to decoder;b) Directly send the incomplete chunk to the decoder with no waitin

g;c) Wait for at most longest waiting time (LWT), either the timer expi

res or the chunk is complete, the PC stops waiting and sends it to the decoder immediately.

Simple playback controller

LWT = ∞, case (a)

LWT = 0, case (b)

LWT in between, case (c)

Simple playback controller

Note: implementation of PC can be more complicated, and this will be studied in future.

Experiment goal: To validate the effectiveness of some well-studied performance

metrics, e.g. the average playback (dis)continuity.

If the correlation does exist, try to find out a simple mapping function between the objective and subjective metrics.

To explore the relationship between chunk delay distribution (CDD) and subjective QoE.

Learn useful insight to help on design of the streaming peer software.

Experiment settings: 50 source video clips with average length of 30 seconds; 30 subjects (16 males and 14 females), age range (18 - 28); Assessment scheme: Absolute Category Rating (ACR) with

hidden reference

Experiment setting

Source videos (SRC)

Simply deployed decoder If the video chunk sent from the PC is incomplete, discard

it; Otherwise decode and playback it (just for Curve D).

If there is no chunk received from PC at the expected playback time, the decoder simply freezes at the last playable image until new content arrives.

Experiment setting

Due to the implementation of the decoder, there are three possible viewing effects caused by chunk-level distortions: Di = 0, no distortion.

If chunk i is completed before its expected playback time, it will be normally decoded and played back;

0 < Di < LWT, freeze-and-play viewing effect.

If chunk i is delayed but still completed before LWT, the resulting effect in PVS is firstly freezing at an image for duration of Di and then normally playing back chunk i.

Di >= LWT, freeze-and-discard viewing effect.

If chunk i is delayed and remains incomplete until LWT expires, the effect in PVS is freezing at an image for LWT and then directly jumping to chunk i + 1.

Resulting viewing effects

A reduced set of 20 combinations of chunk-level distortions composed of two factors: Average discontinuity (d = 1 − c):

– 0, 5%, 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, where

Tow types of chunk delay distribution (CDD): 1. Short delay distribution:

delays uniformly distributed in [0, 2] seconds; or 2. Long delay distribution:

all delays equal to 3 seconds, ( = LWT, LWT is set 3 seconds by default).

Testing set

Subjective assessment results for each processed video sequence MOS value (left), DMOS value (right):

The meaning for Mean Opinion Score (MOS) and DMOS:

Result analysis and insights

Insights from the subjective assessment results:

1. The DMOS (right) is consistent with MOS (left) analysis which means the experiment results are reasonable, where:

– DMOS is derived by subtracting the MOS of the PVS from the MOS of the reference video (of same category and with no distortion).

– DMOS metric removes the bias in the subjective scoring process caused by individual’s preference of video contents.

2. The correlation between objective metric and subjective QoE exists

3. The line derived by linear regression of the discontinuity (d) and subjective (MOS) can be made use of later (when we need to predict QoE by measured discontinuity metric w/o conducting subjective testing, so saving cost).

Result analysis and insights

Result analysis and insights Comparison between short and long chunk delay

distribution, MOS value (left), DMOS value (right):

Insights from the comparison:1. PVSes with long delay distribution obtain higher MOSes than

those with short delay distribution when average d is same.2. Subjects care more about the number of screen freezing

events than the duration of each freezing event.

Future work & conclusion Future work

Conduct more experiments with different parameter settings.

Change the implementation of decoder, to support incomplete chunk and concealment algorithm

Based on such framework, study more complicate design of playback controller (how long to wait for delayed chunk)

Study different chunk-receiving patterns. Conclusion

Chunk-level impairment model is proposed for P2P mechanism.

By applying this new model, we carry out subjective experiments

The results are preliminary but still get some interesting insights.

The end

Thanks!

Q & A

Documents

Designing QoE experiments to evaluate Peer-to-Peer streaming applications