9
Perceivable artifacts in compressed video and their relation to video quality Jun Xia a, , Yue Shi a , Kees Teunissen b , Ingrid Heynderickx c,d a Dong Fei Display R&D Centre, School of Electronic Science and Engineering, Southeast University, Si Pai Lou 2, Nanjing, 210096 China b Philips Consumer Lifestyle, Advanced Technology, Eindhoven, The Netherlands c Philips Research Laboratories, Eindhoven, The Netherlands d Technical University Delft, Delft, The Netherlands article info Article history: Received 14 April 2008 Received in revised form 28 October 2008 Accepted 17 April 2009 Keywords: Image quality Video compression Objective metric Compression artifact abstract Compressed video is degraded in quality due to the introduction of coding artifacts. A two-step subjective experiment was performed to evaluate the most visible artifacts and their relation to video quality for AVS and H.264 compressed video. In the first step, non-expert viewers were requested to score the image quality degradation as a function of compression ratio for various video sequences and to indicate which artifact was perceived during scoring. During the second step, eight trained viewers were asked to score the strength of three artifacts, i.e., blurring, blocking, and color distortion, which were reported as the most perceivable artifacts in the first step of the experiment. The quality performance between AVS and H.264 was also compared. The analysis of covariance indicated that the quality performance between AVS and H.264 was very close. A linear regression analysis showed that for the CIF videos 96% of the variance in quality degradation could be predicted by linearly combining the normalized strengths of the three most visible artifacts. & 2009 Elsevier B.V. All rights reserved. 1. Introduction Quality assessment is essential for reliably optimizing the total video chain of television systems. Important components that may contribute to image quality degra- dation in a video chain are acquisition of the original scene, compression of the video to reduce bandwidth, broadcasting of the captured video, format conversion and scaling, needed to prepare the broadcasted signal for the display, and rendering of the signal on the display. Until now, the most reliable method for quality assess- ment is still based on subjective evaluation experiments. To this purpose, several methods are standardized by the ITU [1], and are widely used in the display and video processing society. However, subjective experiments need a controlled environment, a well-designed experi- mental setup and analysis, and consequently, are time consuming and expensive. Objective quality metrics that automatically predict quality based on characteristics of the video signal only, can be a helpful alternative, at least in case their predictive power is good enough for practical applications, such as the optimization of the video chain. In general, objective metrics can be classified into three categories depending on the use of the original unprocessed video, i.e., full-reference metrics, reduced- reference metrics, and no-reference metrics. Full-reference metrics are based on the difference between the com- pressed video and its original version. They can be further classified depending on their technological approach. Metrics, such as MSE and PSNR, are purely based on signal characteristics. These metrics are widely used due to their simplicity for calculation, but their correlation Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/image Signal Processing: Image Communication ARTICLE IN PRESS 0923-5965/$ - see front matter & 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.image.2009.04.002 Corresponding author. Tel.: +86 025 83792449; fax: +86 025 83363222. E-mail address: [email protected] (J. Xia). Signal Processing: Image Communication 24 (2009) 548–556

Perceivable artifacts in compressed video and their relation to video quality

  • Upload
    jun-xia

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

ARTICLE IN PRESS

Contents lists available at ScienceDirect

Signal Processing: Image Communication

Signal Processing: Image Communication 24 (2009) 548–556

0923-59

doi:10.1

� Cor

fax: +86

E-m

journal homepage: www.elsevier.com/locate/image

Perceivable artifacts in compressed video and their relation tovideo quality

Jun Xia a,�, Yue Shi a, Kees Teunissen b, Ingrid Heynderickx c,d

a Dong Fei Display R&D Centre, School of Electronic Science and Engineering, Southeast University, Si Pai Lou 2, Nanjing, 210096 Chinab Philips Consumer Lifestyle, Advanced Technology, Eindhoven, The Netherlandsc Philips Research Laboratories, Eindhoven, The Netherlandsd Technical University Delft, Delft, The Netherlands

a r t i c l e i n f o

Article history:

Received 14 April 2008

Received in revised form

28 October 2008

Accepted 17 April 2009

Keywords:

Image quality

Video compression

Objective metric

Compression artifact

65/$ - see front matter & 2009 Elsevier B.V. A

016/j.image.2009.04.002

responding author. Tel.: +86 025 83792449;

025 83363222.

ail address: [email protected] (J. Xia).

a b s t r a c t

Compressed video is degraded in quality due to the introduction of coding artifacts.

A two-step subjective experiment was performed to evaluate the most visible artifacts

and their relation to video quality for AVS and H.264 compressed video. In the first step,

non-expert viewers were requested to score the image quality degradation as a function

of compression ratio for various video sequences and to indicate which artifact was

perceived during scoring. During the second step, eight trained viewers were asked to

score the strength of three artifacts, i.e., blurring, blocking, and color distortion, which

were reported as the most perceivable artifacts in the first step of the experiment. The

quality performance between AVS and H.264 was also compared. The analysis of

covariance indicated that the quality performance between AVS and H.264 was very

close. A linear regression analysis showed that for the CIF videos 96% of the variance in

quality degradation could be predicted by linearly combining the normalized strengths

of the three most visible artifacts.

& 2009 Elsevier B.V. All rights reserved.

1. Introduction

Quality assessment is essential for reliably optimizingthe total video chain of television systems. Importantcomponents that may contribute to image quality degra-dation in a video chain are acquisition of the originalscene, compression of the video to reduce bandwidth,broadcasting of the captured video, format conversionand scaling, needed to prepare the broadcasted signal forthe display, and rendering of the signal on the display.Until now, the most reliable method for quality assess-ment is still based on subjective evaluation experiments.To this purpose, several methods are standardized by theITU [1], and are widely used in the display and video

ll rights reserved.

processing society. However, subjective experimentsneed a controlled environment, a well-designed experi-mental setup and analysis, and consequently, are timeconsuming and expensive. Objective quality metricsthat automatically predict quality based on characteristicsof the video signal only, can be a helpful alternative, atleast in case their predictive power is good enough forpractical applications, such as the optimization of thevideo chain.

In general, objective metrics can be classified intothree categories depending on the use of the originalunprocessed video, i.e., full-reference metrics, reduced-reference metrics, and no-reference metrics. Full-referencemetrics are based on the difference between the com-pressed video and its original version. They can be furtherclassified depending on their technological approach.Metrics, such as MSE and PSNR, are purely based onsignal characteristics. These metrics are widely used dueto their simplicity for calculation, but their correlation

ARTICLE IN PRESS

J. Xia et al. / Signal Processing: Image Communication 24 (2009) 548–556 549

with subjective data is relatively low (the correlationbetween PSNR and subjective data is normally around0.8). Approaches based on a model of the human visualsystem (HVS) have been studied by many groups andsignificantly improve the predictive power of the visibilityof the artifacts. Examples are Sarnoff’s just noticeabledifference (JND) metric described by Lubin and Fibush [2],the perceptual distortion metric (PDM) introduced byWinkler [3], and the digital video quality (DVQ) metricdefined by Watson et al. [4]. Recently, another approachbased on natural scene statistics has been introduced bySheikh et al., i.e., the information fidelity criterion (IFC)metric [5]. It shows a correlation higher than 0.94 withsubjective data. Other approaches that also achieve a highcorrelation with subjective data for a certain range ofapplications, are e.g. the structural similarity approach(SSIM) proposed by Wang et al. [6], and the perceptualvideo quality measure (PVQM) defined by Hekstra et al.[7]. The video quality experts group (VQEG) has evaluatedsix full-reference objective quality metrics in the so-calledPhase II test [8]. This study shows a correlation as high as0.94 for the best full-reference objective quality metricwith the subjectively measured quality scores of a largeset of MPEG compressed videos. The results, however, alsoindicate that the objective metrics are still not fullyreliable. A review in [9] indicates that some of theobjective quality metrics correlate well with the sub-jective quality scores for one kind of artifact, but fail topredict another kind of artifact.

In television broadcast applications, the reference orthe original video is normally not available. As aconsequence, only reduced-reference or no-referencemetrics can be used. Without reference, it is much moredifficult to obtain a high accuracy and reliability of theobjective quality metric [10–13]. The complexity of theobjective metric and its required processing power areadditional concerns in the case of real-time applications.More particularly, for its application in a video chain asimple, yet relatively accurate metric is needed. Oneapproach to develop such a metric is to evaluate theperceived artifacts in compressed video, to quantify them,and to model their relative weight to the overall imagequality.

In the past two decades, research on objective metricsmainly focused on the development of quality metrics forcompressed video [14–16]. The artifacts perceived in lossycompressed video depend on compression algorithm,compression ratio, and video content. The commonlyperceived artifacts in hybrid motion compensated(MC) differential pulse code modulation (DPCM)/discretecosine transform (DCT) video compression are extensivelydescribed in [17]. Although the basic structure ofhybrid MC/DPCM/DCT compression is kept the same inrecent standards, the visible artifacts may be changedas a consequence of the introduction of new algorithms.For videos compressed with MPEG2 for example, themost visible artifacts are blocking, blurring, and ringing[17,18]. The introduction of a deblocking filter in theH.264 standard, on the other hand, has dramaticallyreduced the occurrence of blocking artifacts as a conse-quence of compression. Hence, for the H.264 baseline

profile the most perceivable artifacts are blurring andflickering, apart from the remaining blocking [19].Recently, a new video compression standard AVS [20]with a similar structure as H.264 has been published inChina. The most visible artifacts when applying AVScompression are not fully studied yet.

The purpose of this paper is to investigate the mostperceivable artifacts when compressing video accordingto the AVS standard, and to evaluate the relation betweenthe video quality and the perceptual strength of theseartifacts. In addition, the difference in artifacts andresulting quality between AVS and H.264 compressionare also discussed. To achieve this goal, a two-stepexperiment was designed. It’s approach was similar asdescribed in [21]. The first step aimed at assessing thevideo quality degradation as a function of compressionratio, and at deducing the most visible artifacts. Thedouble-stimulus continuous quality-scale (DSCQS) [1] wasused to judge the quality of sequences generated with twovideo codecs, for three different video formats and basedon twelve different original videos. A questionnaire wasadded to determine the most visible artifacts. During thesecond step, eight subjects were first trained to defineand assess the most visible artifacts. They then scoredthe strength of each artifact with the DSCQS method. Theresults of both steps were combined to determine therelative contribution of each artifact to the overall imagequality. The remainder of this paper is organized asfollows. In Section 2, the setup of the two subjectiveexperiments is introduced. The results and their statisticalanalysis are presented in Section 3. Finally, Section 4discusses the obtained results, after which the conclusionsare presented in Section 5.

2. Method

2.1. Experimental conditions

The settings used for compressing the original videosequences were in line with the common test conditionsas set by the AVS workshop [22]. In particular, two videocodecs, i.e., H.264 and AVS, were used each at elevendifferent bitrates. For the original video sequences, threedifferent formats were selected, i.e., high-resolution (HD)video (1280�720, progressive), standard resolution (SD)video (720�576, interlaced), and low-resolution (CIF)video (352�288, progressive). The original video se-quences were chosen from [23], and a snapshot of eachof them is given in Fig. 1.

Per original video sequence four compressed versionswere generated by changing the quantization parameter(QP) in the algorithm, such that different bitrates wereobtained. It should be noted that these bitrates (as givenin Table 1) are approximate averaged bitrates over thesequences. The actual bitrates might be slightly differentper sequence (due to limited quantization in the QP factorof the algorithm).

A 42 in PDP panel was selected to display the HD videosequences. The physical resolution of the panel was1024�768, which was smaller than the resolution of the

ARTICLE IN PRESS

Fig. 1. A snapshot of all original video sequences used in the experiments. (a) HD-1: ‘city’ (b) HD-2: ‘crew’ (c) HD-3: ‘harbour’ (d) HD-4: ‘spincalendar’

(e) SD-1: ‘basketball’ (f) SD-2: ‘flowergarden’ (g) SD-3: ‘horseriding’ (h) SD-4: ‘src5’ (i) CIF-1: ‘football’ (j) CIF-2: ‘mobile’ (k) CIF-3: ‘paris’ (l) CIF-4:

‘tempete’.

J. Xia et al. / Signal Processing: Image Communication 24 (2009) 548–556550

video sequences. To properly display the video sequenceswithout introducing additional artifacts, scaling wasavoided, and the video sequences were centered andcropped to fit the resolution of the screen. For the SDand the CIF video sequences, a 19 in CRT was used as thedisplay panel. It’s resolution was set to 800� 600.A simple temporal interpolation algorithm was used todeinterlace the SD sequences. Since in the experiment,subjects were requested to score both the compressedand original (uncompressed) video sequences, bothdeinterlaced in the same way, we assumed that the

influence of deinterlacing on the quality scores was notsignificant. The white point of the displays was set toD65. The contrast setting of the display panel was setto 100 and the brightness setting to 50, such that themaximum luminance of the display was obtained,while avoiding clipping at both maximal and minimalluminance. The experimental conditions actually usedduring both experiments are summarized in Table 1. Theambient illumination was adjusted to around 15 lux on thebackground and 20 lux on the center of the screen, both inthe direction of the viewer.

ARTICLE IN PRESS

Table 1Summary of the experimental conditions: the used video sequences, the specifications for compressing the sequences, the display settings and the

viewing distance.

Session 1 (HD) Session 2 (SD) Session 3 (CIF)

Sequence ‘spincalendar’, ‘city’, ‘crew’,

‘harbour’

‘flowergarden’, ‘src5’, ‘basketball’, ‘horse

riding’

‘football’, ‘paris’, ‘mobile’,

‘tempete’

Codec H.264 high profile, AVS

zengqiang profile

H.264 high profile, AVS zengqiang profile H.264 baseline profile, AVSP2

Bitrates (bits/s) 4M, 8M, 10M, 15M 1M, 1.5M, 2.5M, 4M 256k, 333k, 512k

Display 42 in PDP with 16:9 aspect ratio 19 in CRT with 4:3 aspect ratio 19 in CRT with 4:3 aspect ratio

Display luminance (cd/m2)a Max. ¼ 240, Min. ¼ 0.2 Max. ¼ 130, Min. ¼ 0.2 Max. ¼ 130, Min. ¼ 0.2

Viewing distance (m) 1.5 (three times height of

image)

1.2 (six times height of image) 0.8 (six times height of image)

a Measured on a 10�10 cm2 uniform block (R ¼ G ¼ B) in the center of the screen.

J. Xia et al. / Signal Processing: Image Communication 24 (2009) 548–556 551

2.2. Protocol of the first experiment

The first experiment was split into three sessions, i.e., aseparate session for each video format (i.e., HD, SD andCIF). The three sessions were performed in the order ofHD, SD, and CIF by all participants. Within each session,subjects were requested to score the overall quality forboth video sequences of each displayed pair (i.e., theoriginal and the compressed video). The order betweenand within each pair was randomized to each subject. Forthe HD and the SD video formats, the two sequences of apair were shown time-sequentially on the display. Sincefor the CIF format, the required resolution was smaller,both sequences of the pair could be, and so, weredisplayed simultaneously one next to the other. Thesubjects were provided for each sequence with a con-tinuous graphical scale as recommended in [1] with fiveadjectives added to the scale: excellent, good, fair, poor,and bad. Additionally, after the scoring experiment, therewas an interview with each subject during which allcompressed versions of a given sequence and codec wereshown. For each sequence and codec subjects were askedto answer the following two questions:

1.

What kind of video artifacts were perceived whenassessing the overall image quality?

2.

In which area in the video content these artifacts wereperceived?

The answers given by the subjects were recorded by theexperimenter.

2.3. Protocol of the second experiment

From the interview results of the first experiment, thethree most perceived artifacts were selected and thestrength of these three artifacts as a function of compres-sion ratio was measured in the second experiment. Here,only the artifacts in the CIF sequences were evaluated,mainly based on the consideration that the zengqiangprofile of the AVS standard is still under development forSD and HD video.

The three most perceived artifacts in CIF video wereblurring, blocking, and color distortion. Therefore, the

second experiment was split into three sessions, i.e., onefor each artifact. The three sessions were performed in theorder of blurring, blocking, and color distortion by allsubjects. For each session, these subjects were trained tobecome an expert in detecting and scoring the artifacts. Tomake them first familiar with each of the three artifactsthey were trained together as a group. During thistraining, all eight participants were seated around onedisplay. For each pair of video sequences (again, thecompressed video against the original), the subjects freelydiscussed with each other in what area they perceived theartifacts and how strong the perceived artifacts were.They continued discussing until they got a commonunderstanding of how to score the strength of eachartifact. After the group training, subjects were asked toscore individually the specified artifact (i.e., blurring,blocking, and color distortion) for each video sequenceof each pair. They did that again on a continuous graphicalscale, but now with the adjectives: imperceptible, percep-tible but not annoying, slightly annoying, annoying, veryannoying. Again, all pairs were shown to each subject in arandom order. The experimental settings were the sameas in the first experiment for the CIF videos (see Table 1).

2.4. Participants

During the first experiment, 20 subjects evaluated theHD and the SD video sequences, and 15 subjects evaluatedthe CIF video sequences. All subjects had a (corrected tonormal) visual acuity X1.0 (as tested with TumblingE chart), and were not color blind (as tested with Ishiharatest). Nine female and eleven male subjects assessed theHD and the SD video sequences, with their age rangingfrom 22 to 29 and an average age of 25.7. Six female andnine male subjects assessed the CIF video sequences, withtheir age ranging from 22 to 29 and an average age of 25.For this experiment, we selected subjects that were notfamiliar with video compression or with typical compres-sion artifacts. We used this selection criterion to assurethat the quality as assessed by these subjects wasrepresentative for the averaged consumer.

For the second experiment, eight subjects, includingthree female and five male, were selected as experts.Their age ranged from 23 to 26 with an average age of 25.

ARTICLE IN PRESS

J. Xia et al. / Signal Processing: Image Communication 24 (2009) 548–556552

They were all master degree students, studying imageprocessing. They were taught in video compression, butnot considered as specialists.

3. Results and analysis

3.1. Results of the first experiment

The raw data obtained on the graphical scale were firstlinearly transformed into numerical data, ranging from 0to 5. Video quality degradation scores were then deducedfrom the transformed data by subtracting the qualityscore of the compressed video from that of the originalvideo. The resulting data were then checked on theirreliability. Based on that, the results of one subject wereexcluded for the HD video format (since this subjectscored the quality of the compressed video higher thanthat of the original video in more than five cases). Theremaining 608 data points (i.e., 19 subjects�2 codecs�4sequences�4 bitrates) were analyzed with the statisticalpackage SPSS (version 13). For the SD video format, theresults of three subjects were excluded for the samereason as given above. The remaining 544 data points (i.e.,17 subjects�2 codecs�4 sequences�4 bitrates) wereanalyzed. For the CIF video format, all 360 data points (i.e.,15 subjects�2 codecs�4 sequences�3 bitrates) could beincluded in the analysis.

Video quality degradation scores were used in ananalysis of covariance with bitrates (to represent com-pression ratio) as a covariate, and codec (to representcompression algorithm) and video sequence (to representthe video content) as fixed factors. The analysis wasperformed separately for each session (i.e., for each videoformat), of which the results are shown in Table 2. Thelevel of significance for each factor (i.e., video codecand video sequence) on quality degradation was differentfor the different video formats. The covariate bitrateswas significant for all three video formats. Content

Table 2Analysis of covariance for video quality degradation as a function of

codec and sequence, taking the bitrates as a covariate.

Source df F P

(a) HD format

Bitrates 1 104.183 o0.001

Codec 1 2.209 0.138

Video Sequence 3 1.718 0.162

Codec* 3 0.747 0.524

Video sequence

(b) SD format

Bitrates 1 144.519 o0.001

Codec 1 15.729 o0.001

Video sequence 3 15.451 o0.001

Codec* 3 3.389 0.018

Video sequence

(c) CIF format

Bitrates 1 50.569 o0.001

Codec 1 0.032 0.859

Video sequence 3 15.613 o0.001

Codec* 3 1.865 0.135

Video sequence

dependency was found for the SD and CIF videos, whichmeans that the quality degradation was different depend-ing on the sequence. The difference between the H.264and AVS codec was significant only for the SD videos.Table 2(b) also indicates that for the SD videos there wasan interaction between codec and sequence. This interac-tion is illustrated in Fig. 2(a), where the quality degrada-tion is averaged across all bitrates for each videosequence, and the error bars represent the standard errorof the means. It shows that for the sequences ‘flower-garden’ and ‘src5’, the quality degradation for theAVS codec is comparable to that of the H.264 codec,whereas for the other sequences, the quality degradationof the AVS codec is higher than that of the H.264 codec. Inthe latter case this tendency is independent of bitrates.For the sequences ‘flowergarden and ‘scr5’, on the otherhand, the AVS codec performs slightly better than theH.264 codec at higher bitrates, but worse at lower bitrates.This is illustrated in Figs. 2(b) and (c) for the sequence‘flowergarden’ and ‘scr5’, respectively.

The answers to the questionnaire were carefullyhandled by the experimenter to classify the artifactsmentioned by the non-expert viewers into categories ofartifacts known to occur in compression. This was done byfirst translating the answers given to the first question toknown artifacts, and by checking the relevance of thetranslation by the answer given to the second question.The latter is done by evaluating whether the indicatedartifact actually occurred in the indicated area of the videocontent. In other words, if a subject mentioned to see blur,we checked whether indeed there was some blur in thearea indicated to be the distorted area by that subject. Theresulting artifact categories are shown in Table 3.

For each video format, only the three artifacts men-tioned most often are given. They are ranked according tothe number of times that the artifact has been mentioned.For the SD video sequences, only two artifacts are listed,since all other artifacts were reported less than five times.Table 3 shows that blurring was the most perceivedartifact reported for all three video formats. The secondmost perceived artifact was different per video format.The difference in artifact visibility between the differentformats indicates that video format has an influence onthe perceived artifacts.

Where the artifacts were most visible in each of thevideo sequences became clear from the answers to thesecond question of the questionnaire. For the HD formatblurring was most reported in the video sequence ‘city’around the tower building, in the video sequence ‘crew’along the clothes of the astronauts, and in the videosequences ‘harbour’ around the ship. Flickering and colordistortion were most reported in the video sequence‘spincalendar’, more particularly related to the numericalnumbers inside the calendar. For the SD format, blurringwas reported for all four video sequences, i.e., for thesports men in ‘basketball’, the flowers and people in‘flowergarden’, the horse in ‘horseriding’, and the peoplein ‘src5’. Blocking was most reported in the background ofthe video sequences ‘basketball’ and ‘horse riding’. For theCIF format, blurring was reported for all four videosequences, i.e., in the background for the sequence

ARTICLE IN PRESS

Table 3The most obvious artifacts deduced from the questionnaire for each

video format.

Rank

order

HD video

sequence

SD video

sequence

CIF video

sequence

1 Blurring (144) Blurring (132) Blurring (98)

2 Flickering (32) Blocking (34) Color distortion

(15)

3 Color distortion

(28)

Others (o5) Blocking (8)

4 Others (o5) Others (o5)

Video sequences

basketball

Mar

gina

l mea

ns o

f qu

ality

deg

rada

tion

0

1

2

3

4AVSH.264

Bitrates (Mbits/s)

1

Mar

gina

l mea

ns o

f qu

ality

deg

rada

tion

0

1

2

3

4AVSH.264

Bitrates (Mbits/s)

1

Mar

gina

l mea

ns o

f qu

ality

deg

rada

tion

0

1

2

3

4AVSH.264

flowergarden horseriding src5

2 3 4

2 3 4

Fig. 2. (a) Marginal means of video quality degradation as a function of

SD video sequence for different codecs, where a higher score indicates a

lower quality. The marginal means of video quality degradation as a

function of bitrates for the SD video sequences ‘flowergarden’ and ‘src5’

are given in (b) and (c), respectively.

J. Xia et al. / Signal Processing: Image Communication 24 (2009) 548–556 553

‘football’, around the rolling ball for the sequence ‘mobile’,in the face for the sequence ‘paris’, and at the woods forthe sequence ‘tempete’. Color distortion was most re-

ported in the grass for the sequence ‘football’ andaround the numbers for the sequence ‘mobile’. Blockingwas most reported in the background of the sequence‘football’.

3.2. Results of the second experiment (CIF format)

Again, the raw data obtained on the graphical scalewere linearly transformed into numerical values, within arange from 0 to 5. From the resulting numerical scores,artifact strength scores were obtained by subtracting theartifact score of the compressed CIF video from the artifactscore of the original video. Fig. 3 shows for each of theartifacts separately (i.e., for blurring, blocking and colordistortion) the mean artifact strength score versus themean quality degradation score, obtained in the firstexperiment. The error bars in Fig. 3 depict the standarderror of the means in both directions.

To further explore the relation between the meanartifact strength scores and the mean quality degradationscores, a linear correlation analysis was conducted and theresults are given in Table 4. It shows that the artifactsblurring and color distortion had the highest correlationwith quality degradation. The correlation between theartifact blocking and quality degradation was consider-ably lower. To predict video quality degradation as acombination of the strengths of blurring, color distortion,and blocking, a linear regression analysis was performed.However, since the strengths of the artifacts weremutually highly correlated, a principal component analy-sis (PCA) was conducted to find independent components.In SPSS software, the mean artifact strength scores werefirst normalized (mean ¼ 0, standard deviation ¼ 1) foreach artifact. Two components were extracted from thePCA analysis (shown in Table 5), which accounted for97.6% of the normalized variance. Component one wasmainly determined by the averaged strength of the threeartifacts, whereas component two was dominated by thedifference in strength between the blocking and blurringartifact.

Based on the component values (C1 and C2), a linearregression analysis was performed to determine the bestfit for the relation between the mean video qualitydegradation scores (IQ) and the two sets of componentvalues. This resulted in

IQ ¼ 2:261þ 0:684� C1 � 0:056� C2 (1)

ARTICLE IN PRESS

Quality degradation scores

1.0

Blu

rrin

g st

reng

th s

core

s

0

1

2

3

4

5

Quality degradation scores

1.0

Blo

ckin

g st

reng

th s

core

s

0

1

2

3

4

5

Quality degradation scores

1.0

Col

or d

isto

rtio

n st

reng

th s

core

s

0

1

2

3

4

5

1.5 2.0 2.5 3.0 3.5 4.0 4.5

1.5 2.0 2.5 3.0 3.5 4.0 4.5

1.5 2.0 2.5 3.0 3.5 4.0 4.5

Fig. 3. Scatter plots of artifact strength score, i.e., blurring, blocking, and

color distortion in (a), (b), and (c), respectively, against quality

degradation score for CIF video.

Table 4The Pearson correlation coefficient between artifact strength score and

quality degradation score for the three most perceived artifacts, obtained

with the CIF resolution videos.

Pearson correlation Blurring Blocking Color distortion

Video quality degradation 0.684 0.503 0.805

Table 5The transformation coefficients between the strengths of the three most

perceived artifacts and the two components deduced from the principal

component analysis.

Artifacts Components

C1 C2

Blurring 0.365 �0.877

Blocking 0.339 1.255

Color distortion 0.387 �0.272

20

22

24

26

28

30

32

34

36

38

1

1.5

2

2.5

3

3.5

4

1

PSN

R

Pred

icte

d va

lues

Video quality degradation scores

1.5 2 2.5 3 3.5 4

Fig. 4. Scatter plot of the predicted video quality degradation (+symbols)

and the PSNR values (D symbols) against the mean of the subjective

quality degradation scores for all compressed CIF videos used in the

experiment.

J. Xia et al. / Signal Processing: Image Communication 24 (2009) 548–556554

Finally, Eq. (1) was transformed back to a relationbetween quality degradation and the strength of the threeartifacts by using the transformation matrix in Table 5.

The resulting function was

IQ ¼ 2:261þ 0:299� Zblurþ 0:280� Zcolor

þ 0:162� Zblock (2)

where Zblur, Zcolor, and Zblock are the normalized meanartifact strength scores of blurring, color distortion, andblocking, respectively. Fig. 4 shows the scatter plot of thequality degradation predicted by Eq. (2) versus the meansubjective quality degradation score. The Pearson correla-tion coefficient between predicted values and the meansubjective quality degradation score was 0.978, indicatingthat 96% of the variance in the data is explained. Thiscorrelation coefficient was substantially higher than thoseof the video quality degradation with each of the artifactdifference scores separately (as given in Table 4). Thisillustrates that the prediction of video quality degradationas a consequence of compression was considerablyimproved by taking a linear combination of the strengthof three artifacts rather than considering only one artifact.For comparison, also PSNR against the video quality

ARTICLE IN PRESS

1

1.5

2

2.5

3

3.5

4

1

Pred

icte

d va

lues

Video quality degradation scores

Correlation coefficient0.934Correlation coefficient0.992

1.5 2 2.5 3 3.5 4

Fig. 5. Scatter plot of the predicted video quality degradation against the

mean of the subjective quality degradation scores for two of the models

obtained in the repetition analysis: i.e., for the best (D symbols) and

worst (+symbols) performing model.

J. Xia et al. / Signal Processing: Image Communication 24 (2009) 548–556 555

degradation scores is depicted in Fig. 4. The Pearsoncorrelation coefficient between PSNR and video qualitydegradation scores was �0.444.

To further evaluate the reliability and stability of ourregression model, we repeated the analysis on a limitedset of data points (i.e., 18 out of the 24 compressed CIFsequences) and checked the prediction ability of theresulting model on the remaining 6 compressed se-quences. This process was repeated 8 times, each timerandomly selecting the 18 sequences for establishing themodel. The Pearson correlation between the subjectivequality scores and the predicted quality for the sixremaining sequences varied from 0.934 to 0.992. Fig. 5shows the scatter plot for the best and worst case.

4. Discussion

The results of the first experiment indicate that there isno difference in perceived quality between H.264 and AVScompression for the HD and the CIF videos. For the SD videosH.264 compression yields a statistically significant betterquality than AVS compression, but also here the difference issmall. Overall, subjects can hardly distinguish the differencein quality degradation between the two codecs for the samecompression ratio. A similar statement was already made in[20] based on the objective metric PSNR.

The questionnaire reveals that the most perceivableartifacts on CIF videos both for H.264 and AVS compres-sion are blurring, blocking, and color distortion. Blurring isthe artifact most often reported for all three videoformats. It also has a high, though not the highest,correlation with the quality degradation scores for theCIF videos. Color distortion is reported mostly for lowbitrates videos, and has the highest correlation coefficientwith the quality degradation scores measured for the CIFvideos. Apparently, whenever detected, color distortionstrongly affects the overall video quality. The blockingartifact is most obvious for fast moving objects in videosat low bitrates. In [18] also ringing is mentioned as one ofthe most perceivable artifacts. However, our experiments

do not confirm this. One possible reason may be thatringing, which mostly occurs at edges, is confused withblur by our participants of the first experiment.

The results of the second experiment reveal that for CIFvideo sequences the degradation of video quality as aconsequence of currently popular video compression stan-dards as H.264 and AVS can be predicted by a linearcombination of the (normalized) strength of three artifacts,i.e., blurring, blocking, and color distortion. The resultingmodel predicts 96% of the variance in the subjectivelymeasured quality degradation scores. Even when performingthe model fit on a limited set of data points and evaluatingthe model prediction ability on the remaining data points,still at least a correlation of 0.93 between the subjectivelymeasured quality degradation and the quality predicted bythe model is obtained. It cannot be excluded that theprediction ability of the proposed model can be furtherimproved by also including interaction terms between theartifacts. This, however, makes the model more complicated,while the necessity for the more complex model is limitedbecause of the already high correlation obtained with thesimpler model (only including linear terms).

Although our results are based on a full-referenceexperimental setup, if reliable no-reference objectivequality metrics are available for each artifact, the predic-tion of quality degradation can be done with a no-reference approach using Eq. (2). For blurring andblocking artifacts, there are already many studies report-ing no-reference objective metrics [24–27]. For colordistortion, further research is needed to explore andquantify the perceivable strength in compressed videos.

Finally, the results of the first experiment also showthat video content has a significant effect on the qualitydegradation caused by compression, at least for the SDand CIF videos. The content dependency can be under-stood from differences in visibility of each artifactdepending on characteristics such as the amount of edges,details, contrast, and motion in the sequences. Developingcontent independent objective quality metrics is still achallenging task for the video signal processing society.Most metrics monotonically and consistently predict thequality degradation as a function of compression ratio forone specific video, but fail to predict quality differencesover various sequences. PSNR, for example, is a metric thatis sufficiently reliable to predict quality degradation forone video, but its consistency is reduced if several videosare combined (i.e., in Fig. 4 the Pearson correlationcoefficient between PSNR and subjective quality degrada-tion score is �0.444). To develop reliable objective qualitymetric for blurring, blocking, and color distortion, thecontent dependency should be explicitly considered.

5. Conclusion

Two subjective experiments were conducted to quan-tify perceived quality of compressed video. The results ofthe first experiment, with non-expert viewers, indicatethat the three most perceivable artifacts, both for the AVSand H.264 codec, are blurring, blocking, and colordistortion. The results also illustrate that the performance,

ARTICLE IN PRESS

J. Xia et al. / Signal Processing: Image Communication 24 (2009) 548–556556

defined as the video quality degradation against compres-sion ratio, for the AVS and the H.264 codec is comparablefor the HD and the CIF formats, but that the H.264 codecperforms better for the SD format. The results of secondexperiment, with expert viewers and CIF resolution video,show that the video quality degradation as a consequenceof compression can be predicted by linearly combining thenormalized attribute strengths of the three most percei-vable artifacts. The Pearson correlation coefficient be-tween the predicted quality and the subjectivelymeasured quality is as high as 0.978.

This suggests that a simple, yet reliable no-referenceobjective quality metric can be developed based on theapproach of assessing the strength of each artifact sepa-rately, and then linearly combining them. Since there arealready many objective metrics for blurring and blockingavailable in literature, the development of an objectivequality metric can be mainly focused on the development ofa metric for color distortion. In this approach, contentindependency of the metric should be of explicit concern.

References

[1] Recommendation ITU-R BT.500, Methodology for the subjectiveassessment of the quality of television pictures.

[2] J. Lubin, D. Fibush, Sarnoff JND vision model, T1A1.5 Working GroupDocument #97-612, ANSI T1 Standards Committee, (1997).

[3] S. Winkler, A perceptual distortion metric for digital color video,In: Proceedings of SPIE, Human Vision and Electronic Imaging IV,vol. 3644, 1999. pp. 175–184.

[4] A.B. Watson, J. Hu, J.F. McGowan III, DVQ: a digital video qualitymetric based on human vision, Journal of Electronic Imaging 10 (1)(2001) 20–29.

[5] H.R. Sheikh, A.C. Bovik, G. de Veciana, An information fidelitycriterion for image quality assessment using natural scenestatistics, IEEE Transactions on Image Processing 14 (12) (2005)2117–2128.

[6] Z. Wang, A.C. Bovik, H.R. Sheikh, et al., Image quality assessment:from error visibility to structural similarity, IEEE Transactions onImage Processing 13 (4) (2004) 600–612.

[7] A.P. Hekstra, J.G. Beerends, D. Ledermann, et al., PVQM—aperceptual video quality measure, Signal Processing: Image Com-munication 17 (10) (2002) 781–798.

[8] VQEG, Final report from the Video Quality Experts Group on thevalidation of objective models of video quality assessment—PhaseII, /http://www.vqeg.org/S, (2003).

[9] H.R. Sheikh, M.F. Sabir, A.C. Bovik, A statistical evaluation of recentfull reference image quality assessment algorithms, IEEE Transac-tions on Image Processing 15 (11) (2006) 3441–3452.

[10] N. Montard, P. Bretillon, Objective quality monitoring issues indigital broadcasting networks, IEEE Transactions on Broadcasting51 (3) (2005) 269–275.

[11] T. Brandao, M.P. Queluz, No-reference image quality assessmentbased on DCT domain statistics, Signal Processing 88 (4) (2008)822–833.

[12] Y.W. Yu, Z.D. Lu, H.F. Ling, F.D. Zou, No-reference perceptual qualityassessment of JPEG images using general regression neural net-work, In: Proceedings of ISNN’06, Lecture notes in ComputerScience, vol. 3972, 2006, pp. 638–645.

[13] A. Eden, No-reference estimation of the coding PSNR for H.264-coded sequences, IEEE Transactions on Consumer Electronics 53 (2)(2007) 667–674.

[14] S. Olsson, M. Stroppiana, J. Baina, Objective methods for assessmentof video quality: state of the art, IEEE Transactions on Broadcasting43 (4) (1997) 487–495.

[15] S. Winkler, Issues in vision modeling for perceptual video qualityassessment, Signal Processing 78 (2) (1999) 231–252.

[16] U. Engelke, H.J. Zepernick, Perceptual-based quality metrics forimage and video services: a survey, In: Proceedings of the 3rdEuroNGI, 2007, pp. 190–197.

[17] M. Yuen, H.R. Wu, A survey of hybrid MC/DPCM/DCT video codingdistortions, Signal Processing 70 (1998) 247–278.

[18] C.C. Koh, S.K. Mitra, J.M. Foley, I.E.J. Heynderickx, Annoyance ofindividual artifacts in MPEG-2 compressed video and their relationto overall annoyance, In: Proceedings of SPIE, Human Vision andElectronic Imaging X, vol. 5666, 2005, pp. 595–606.

[19] T. Wolff, H. Ho, J.M. Foley, S.K. Mitra, H.264 coding artifacts andtheir relation to perceived annoyance, In: Proceedings of EUSIPCO,(Florence, Italy, 2006).

[20] L. Yu, F. Yi, J. Dong, C. Zhang, Overview of AVS-Video: tools,performance and complexity, In: Proceedings of SPIE, VisualCommunications and Image Processing’05, vol. 5960, 2005,pp. 679–690.

[21] E. Vicario, I. Heynderickx, G. Ferretti, P. Carrai, Design of a tool tobenchmark scaling algorithms on LCD monitors, In: Proceedings ofSID’02, 2002, pp. 704–707.

[22] ftp://159.226.42.57/public/avs_doc/0612_Zhuhai/avs/N1356.doc.[23] ftp://159.226.42.57/public/seqs/video.[24] F. Pan, X. Lin, S. Rahardja, et al., A locally adaptive algorithm for

measuring blocking artifacts in images and videos, Signal Proces-sing: Image Communication 19 (2004) 499–506.

[25] H. Liu, I. Heynderickx, A simplified human vision model applied to ablocking artifact metric, In: Proceedings of CAIP’07, vol. 4673, 2007,pp. 334–341.

[26] F. Rony, J.K. Lina, A no-reference objective image sharpness metricbased on just-noticeable blur and probability summation, In: IEEEICIP’07, vol. 3, 2007, pp. 445–448.

[27] H.R. Sheikh, A.C. Bovik, L. Cormack, No-reference quality assess-ment using natural scene statistics: JPEG2000, IEEE Transactions onImage Processing 14 (11) (2005) 1918–1927.