Adaptation and optimization of coding algorithms for mobile 3DTVsp.cs.tut.fi/mobile3dtv/results/tech/D2.2_Mobile3DTV_v1... · 2009-02-11 · MOBILE3DTV Project No. 216503 Adaptation

MOBILE3DTV project has received funding from the European Community’s ICT programme

in the context of the Seventh Framework Programme (FP7/2007-2011) under grant

agreement n° 216503. This document reflects only the authors’ views and the Community

or other project partners are not liable for any use that may be made of the information

contained therein.

Adaptation and optimization of coding algorithms for mobile 3DTV Philipp Merkle Heribert Brust Kristina Dix Yongzhe Wang Aljoscha Smolic

MOBILE3DTV

Project No. 216503

Adaptation and optimization of coding algorithms for mobile 3DTV

Philipp Merkle, Heribert Brust, Kristina Dix, Yongzhe Wang, Aljoscha Smolic

Abstract: As a starting point and reference for the further research, this deliverable specifies and evaluates available coding standards for 3D video for the specific conditions in the Mobile3DTV project. The following stereo video formats and codecs are specified and evaluated: • H.264/AVC simulcast, • H.264 Stereo SEI message, • H.264/MVC, • MPEG-C Part 3 using H.264 for both video and depth, • H.264 auxiliary picture syntax for video plus depth. Significant coding gains can be achieved with hierarchical B pictures for temporal prediction, but the gain differs largely for individual sequences. On the other hand hierarchical B pictures also mean increased complexity and memory requirements. Inter-view prediction leads to a significant reduction of bitrate for some sequences. However, in other cases the gain is negligible. We achieved up to 35% bitrate savings from inter-view prediction compared to stereo simulcast. Inter-view prediction whether as H.264 SEI or MVC does not add substantial complexity. A representation as video plus depth is an interesting alternative for 3D video. It allows to adjust the stereo rendering at the decoder and to optimally adapt the 3D impression for any given display. However, this extended functionality comes at the cost of an increased complexity. MPEG-C Part 3 is suitable for encoding of video plus depth data. Good depth quality is essential for good overall quality. Keywords: 3D video coding, MVC, H.264/AVC, MPEG-C Part 3

MOBILE3DTV D2.2

2

Executive Summary

The ultimate goal of research in WP2 is to develop the best possible 3D video representation and coding for the specific application of transmission over DVB-H and mobile terminals. For that different alternative approaches will be developed, implemented, optimized and compared. This will include approaches beyond the current state-of-the-art as defined in available standards like mixed resolution stereo video coding.

As a starting point and reference for the further research, this deliverable specifies and evaluates available coding standards for 3D video for the specific conditions in the Mobile3DTV project. Specifically the optimizations are targeted for the demonstrator device as to be used in the project, i.e. how to use the different 3D video codecs in this specific context.

The following 3D video formats and codecs are specified and evaluated:

H.264/AVC simulcast,

H.264 Stereo SEI message,

H.264/MVC,

MPEG-C Part 3 using H.264 for both video and depth,

H.264 auxiliary picture syntax for video plus depth.

Evaluation is done by simulations. Same test data are used in all experiments. Professional stereo sequences in 16:9 format (as the display in the demonstrator) kindly provided by KUK Filmproduktion are used. In total the 4 sequences Horse, Car, Hands, and Snail spanning a range of different types of content and complexity are used formatted to 480x270. The material is coded at different bitrates using optimum settings for each of those codecs. Quality is evaluated by means of PSNR over bitrate and informal subjective expert viewing.

As for any type of video coding, the same amount of raw input data leads to very different RD-performance. The required bitrate for achieving acceptable quality strongly depends on the properties of the sequence content, especially temporal variation and complexity of the scene. The coding gain from inter-view prediction (Stereo SEI & MVC) varies largely.

Significant coding gains can be achieved with hierarchical B pictures for temporal prediction. The gain from using hierarchical B pictures differs largely for individual sequences, depending on factors like scene content complexity and temporal variation. On the other hand hierarchical B pictures also mean increased complexity and memory requirements. It remains to be studied how far this can be implemented on a mobile terminal.

Inter-view prediction leads to a significant reduction of bitrate for some sequences. However, in some cases the gain is negligible. In our experiments we achieved up to 35% bitrate savings from inter-view prediction compared to stereo simulcast. Inter-view prediction whether performed as H.264 SEI or MVC does not add substantial complexity. However, a standard conform implementation of MVC requires that the decoder supports the H.264 High Profile since MVC extends that. A standard conform implementation of the Stereo SEI Message requires that the decoder supports the H.264 interlaced tools.

A representation as video plus depth is an interesting alternative for 3D video. It allows to adjust the stereo rendering at the decoder and to optimally adapt the 3D impression for any given display. However, this extended functionality comes at the cost of an increased complexity since rendering of one output view has to be done at the terminal device. Depth estimation is

MOBILE3DTV D2.2

3

necessary on sender side, which is an inherently error prone task. Nevertheless, it has been shown that good general quality is achievable by the video plus depth approach. Please refer to D2.3 “Report on generation of video plus depth data base”.

MPEG-C Part 3 is suitable for encoding of video plus depth data. Different bitrate ratios (and

thereby qualities) for video and depth can be adjusted. It was found that at a certain bitrate the

QP for depth should be chosen lower than the QP for color to achieve best overall results, e.g.

C30, D24. It can be concluded that good depth quality is essential for good overall quality. The

numerical bitrate ratio between color and depth may vary largely depending on the sequence.

We have found ratios between 1:1 and 6:1. In most cases for best overall quality a substantial

portion of the bitrate has to be spent for depth.

A detailed comparison of “video plus video” approaches (simulcast, H.264 SEI, MVC) and video

plus depth (MPEG-C Part 3) is still to be done. This will be done in close collaboration with

WP4, which will perform formal subjective tests about these issues. These experiments will also

include mixed resolution stereo video coding, as an extension of available 3D video formats.

MOBILE3DTV D2.2

4

Table of Contents

1. Introduction ................................................................................................................................................................ 5 H.264 Simulcast ................................................................................................................................................................. 6

1.1. Specification .................................................................................................................................................... 6 1.2. Simulation ....................................................................................................................................................... 7

2. H.264 Stereo SEI Message ....................................................................................................................................... 12 2.1. Specification .................................................................................................................................................. 13 2.2. Simulation ..................................................................................................................................................... 16

3. Multiview Video Coding (MVC) ............................................................................................................................. 21 3.1. Specification .................................................................................................................................................. 22 3.2. Simulation ..................................................................................................................................................... 22

4. MPEG-C Part 3 ......................................................................................................................................................... 28 4.1. Specification .................................................................................................................................................. 29 Syntax ........................................................................................................................................................................... 30

Supplemental information message syntax ....................................................................................................... 31 Supplemental information payload syntax ......................................................................................................... 31

4.2. Simulation ..................................................................................................................................................... 33 5. H.264 Auxiliary Picture Syntax for video plus depth ............................................................................................... 41

5.1. Specification .................................................................................................................................................. 42 5.2. Simulation ..................................................................................................................................................... 43

6. Comparative Analysis ............................................................................................................................................... 47 7. Conclusions .............................................................................................................................................................. 52

MOBILE3DTV D2.2

5

1. Introduction

The ultimate goal of research in WP2 is to develop the best possible 3D video representation and coding for the specific application of transmission over DVB-H and mobile terminals. For that different alternative approaches will be developed, implemented, optimized and compared. This will include approaches beyond the current state-of-the-art as defined in available standards like mixed resolution stereo video coding.

As a starting point and reference for the further research, this deliverable specifies and evaluates available coding standards for stereo video for the specific conditions in the Mobile3DTV project. Specifically the optimizations are targeted for the demonstrator device as to be used in the project, i.e. how to use the different stereo video codecs in this specific context.

The following stereo video formats and codecs are specified and evaluated:

H.264/AVC simulcast,

H.264 Stereo SEI message,

H.264/MVC,

MPEG-C Part 3 using H.264 for both video and depth,

H.264 auxiliary picture syntax for video plus depth.

Evaluation is done by simulations. Same test data are used in all experiments. Professional stereo sequences in 16:9 format (as the display in the demonstrator) kindly provided by KUK Filmproduktion are used. In total the 4 sequences Horse, Car, Hands, and Snail spanning a range of different types of content and complexity are used formatted to 480x270. The material is coded at different bitrates using optimum settings for each of those codecs. Quality is evaluated by means of PSNR over bitrate and informal subjective expert viewing.

Finally, an initial comparison is done from all simulation results of different formats and codecs.

MOBILE3DTV D2.2

6

H.264 Simulcast

Figure 1: Schematic block diagram for H.264 Simulcast coding with stereo video format data

As depicted in the overview diagram, H.264 Simulcast uses the stereo video format, consisting of the two input video sequences Video 1 and Video 2 for the left and right view of the stereo pair. The codec used for H.264 Simulcast is H.264/AVC, which is applied to each of the two input sequences independently, resulting in two encoded bit- or transport-streams BS/TS. After transmission over the Channel the two streams are decoded independently, resulting in the distorted sequences of the stereo pair Video 1 and Video 2.

1.1. Specification According to the H.264/MPEG-4-AVC standard1, “H.264 Simulcast” is specified as the individual application of an H.264/AVC conforming coder to several video sequences in a generic way.

1 ITU-T Recommendation H.264, “Advanced video coding for generic audiovisual services”,

November 2007.

MOBILE3DTV D2.2

7

H.264/MPEG4-AVC is the latest video coding standard of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG). H.264/MPEG4-AVC has recently become the most widely accepted video coding standard and covers all common video applications ranging from mobile services and videoconferencing to IPTV, HDTV, and HD video storage.

1.2. Simulation

Test data

The simulations for H.264 Simulcast have been carried out with the following test data sets:

Producer KUK

Sequences Car Hands Horse Snail

Length [frames] 235 251 140 189

Framerate [frames/second] 30

Resolution [pixel] 480 x 270

Data Format VL + VR

Setup

The simulations for H.264 Simulcast have been carried out with the following coding settings:

Coder Implementation JM 14.2

Standard H.264/AVC

Quantization Parameter

24

30

36

42

GOP Size 16

1

(hierarchical B pictures)

(no B pictures)

Intra Period 16

Search Range 32

Symbol Mode CABAC

Besides these settings typical configurations for H.264/AVC have been used.

Results

MOBILE3DTV D2.2

8

For H.264 Simulcast simulations the left and right view are encoded, transmitted and decoded independently. We achieved the following results:

Car GOP 1 GOP 16

VL VR VL VR

QP Rate [kbps] PSNR [dB] Rate [kbps] PSNR [dB] Rate [kbps] PSNR [dB] Rate [kbps] PSNR [dB]

24 951,66 41,03 1079,74 40,75 400,83 38,48 439,78 38,08

30 335,50 36,99 372,00 36,65 164,19 35,12 178,66 34,75

36 124,16 33,74 134,21 33,41 68,06 32,37 73,20 32,00

42 52,32 30,98 55,84 30,67 27,76 29,93 29,54 29,59

Hands GOP 1 GOP 16

VL VR VL VR


24 2873,82 41,78 2498,30 42,19 1565,86 37,40 1350,52 37,94

30 1586,37 36,94 1368,25 37,49 664,89 32,05 580,85 32,65

36 733,07 32,33 641,97 32,97 235,62 28,14 215,00 28,71

42 255,48 28,12 236,52 28,73 78,28 25,34 74,31 25,82

Horse GOP 1 GOP 16

VL VR VL VR


24 1677,38 37,87 1683,29 37,75 736,45 36,61 749,10 36,46

30 564,73 32,75 566,33 32,61 367,76 32,65 374,37 32,55

36 196,96 28,81 192,89 28,69 151,08 28,73 151,67 28,61

42 70,58 26,13 68,11 26,10 52,28 25,92 50,55 25,88

MOBILE3DTV D2.2

9

Snail GOP 1 GOP 16

VL VR VL VR


24 230,04 45,06 221,54 45,11 142,33 44,87 140,22 44,82

30 103,67 40,93 101,08 41,05 79,58 40,97 77,51 41,02

36 51,94 37,16 50,75 37,33 42,24 37,11 41,23 37,22

42 28,74 33,13 27,77 33,34 23,09 33,07 22,18 33,24

Table 1: RD results for H.264 Simulcast coding simulations with stereo video format data

MOBILE3DTV D2.2

10

Figure 2: RD-comparison for H.264 Simulcast coding simulations: total bitrate for both views vs. average PSNR relative to the original sequences

These simulation results clearly indicate, that the overall RD-performance for temporal prediction with using hierarchical B pictures (GOP 16) is better than for simulcast coding without hierarchical B pictures (GOP 1). The gains that can be achieved for the individual sequences differ largely, depending on factors like scene content complexity and temporal variation. For the same quality between almost zero and up to 50% of the bitrate can be saved with hierarchical B pictures (GOP 16). In addition to that the RD-performance of the left and right view is the same, when compressed under equal conditions.

Informal subjective expert viewing has been carried out for the H.264 Simulcast simulation results on a stereoscopic display. This lead to the conclusion that the objective RD results are confirmed, as for the same bitrate a higher quality is achieved by using hierarchical B pictures (GOP 16) for temporal prediction or in return a lower bitrate is necessary to achieve the same

MOBILE3DTV D2.2

11

subjective quality. Without hierarchical B pictures (GOP 1) the sequences are temporally more unsteady. For medium bitrates a tolerable to mostly acceptable subjective quality was observed.

MOBILE3DTV D2.2

12

2. H.264 Stereo SEI Message

Figure 3: Schematic block diagram for H.264 Stereo SEI Message coding with stereo video format data

As depicted in the overview diagram, H.264 Stereo SEI Message uses the stereo video format, consisting of the two input video sequences Video 1 and Video 2 for the left and right view of the stereo pair. For H.264 Stereo SEI Message compression these two sequences are interlaced line-by-line into one sequence, where the top field contains Video 1 and the bottom field Video 2. The codec used for H.264 Stereo SEI Message is H.264/AVC, which is applied to the interlaced sequence, resulting in one encoded bit- or transport-stream BS/TS. After

MOBILE3DTV D2.2

13

transmission over the Channel this stream is decoded, resulting in the distorted interlaced sequence. For output this sequence is deinterlaced to the stereo pair Video 1 and Video 2.

2.1. Specification According to the H.264/AVC standard2, the “H.264 Stereo SEI Message” is specified as follows:

Supplemental Enhancement Information

SEI (supplemental enhancement information) messages assist in processes related to decoding, display or other purposes. However, SEI messages are not required for constructing the luma or chroma samples by the decoding process. Conforming decoders are not required to process this information for output order conformance to H.264/SVC.

SEI payload syntax

sei_payload( payloadType, payloadSize ) { C Descriptor

if( payloadType == 0 )

buffering_period( payloadSize ) 5

else if( payloadType == 1 )

pic_timing( payloadSize ) 5

…

else if( payloadType == 21 )

stereo_video_info( payloadSize ) 5

…

}

Stereo video information SEI message syntax

stereo_video_info( payloadSize ) { C Descriptor

field_views_flag 5 u(1)

if( field_views_flag )

top_field_is_left_view_flag 5 u(1)

else {

current_frame_is_left_view_flag 5 u(1)

2 ITU-T Recommendation H.264, “Advanced video coding for generic audiovisual services”,

November 2007.

MOBILE3DTV D2.2

14

next_frame_is_second_view_flag 5 u(1)

}

left_view_self_contained_flag u(1)

right_view_self_contained_flag u(1)

}

Stereo video information SEI message semantics

This SEI message provides the decoder with an indication that the entire coded video sequence consists of pairs ofpictures forming stereo-view content.

The stereo video information SEI message shall not be present in any access unit of a coded video sequence unless a stereo video information SEI message is present in the first access unit of the coded video sequence.

field_views_flag equal to 1 indicates that all pictures in the current coded video sequence are fields and all fields of a particular parity are considered a left view and all fields of the opposite parity are considered a right view for stereoview content. field_views_flag equal to 0 indicates that all pictures in the current coded video sequence are frames and alternating frames in output order represent a view of a stereo view. The value of field_views_flag shall be the same in all stereo video information SEI messages within a coded video sequence.

When the stereo video information SEI message is present and field_views_flag is equal to 1, the left view and right view of a stereo video pair shall be coded as a complementary field pair, the display time of the first field of the field pair in output order should be delayed to coincide with the display time of the second field of the field pair in output order, and the spatial locations of the samples in each individual field should be interpreted for display purposes as representing complete pictures as shown in Figure 4 (top) rather than as spatially-distinct fields within a frame as shown in Figure 4 (bottom).

MOBILE3DTV D2.2

15

Figure 4: Nominal vertical and horizontal sampling locations of 4:2:0 samples in a frame (top) and in top and bottom fields (bottom)

top_field_is_left_view_flag equal to 1 indicates that the top fields in the coded video sequence represent a left view and the bottom fields in the coded video sequence represent a right view. top_field_is_left_view_flag equal to 0 indicates that the bottom fields in the coded video sequence represent a left view and the top fields in the coded video sequence represent a right view. When present, the value of top_field_is_left_view_flag shall be the same in all stereo video information SEI messages within a coded video sequence.

current_frame_is_left_view_flag equal to 1 indicates that the current picture is the left view of a stereo-view pair. current_frame_is_left_view_flag equal to 0 indicates that the current picture is the right view of a stereo-view pair.

next_frame_is_second_view_flag equal to 1 indicates that the current picture and the next picture in output order form a stereo-view pair, and the display time of the current picture should be delayed to coincide with the display time of the next picture in output order. next_frame_is_second_view_flag equal to 0 indicates that the current picture and the previous picture in output order form a stereo-view pair, and the display time of the current picture should not be delayed for purposes of stereo-view pairing.

left_view_self_contained_flag equal to 1 indicates that no inter prediction operations within the decoding process for the left-view pictures of the coded video sequence refer to reference pictures that are right-view pictures. left_view_self_contained_flag equal to 0 indicates that some inter prediction operations within the decoding process for the left-view pictures of the coded video sequence may or may not refer to reference pictures that are right-view pictures. Within a coded video sequence, the value of left_view_self_contained_flag in all stereo video information SEI messages shall be the same.

right_view_self_contained_flag equal to 1 indicates that no inter prediction operations within the decoding process for the right-view pictures of the coded video sequence refer to reference pictures that are left-view pictures. right_view_self_contained_flag equal to 0 indicates that some inter prediction operations within the decoding process for the right-view pictures of the coded video sequence may or may not refer to reference pictures that are left-view pictures. Within a coded video sequence, the value of right_view_self_contained_flag in all stereo video information SEI messages shall be the same.

MOBILE3DTV D2.2

16

2.2. Simulation

Test data

The simulations for H.264 Stereo SEI Message have been carried out with the following test data sets:

Producer KUK


Length [frames] 235 251 140 189



Data Format VLVR Interlaced

Setup

The simulations for for H.264 Stereo SEI Message have been carried out with the following coding settings:


Standard H.264/AVC

Field Coding Enabled


24

30

36

42

GOP Size 16

1


(no B pictures)

Intra Period 16

Search Range 32

Symbol Mode CABAC

Besides these settings typical configurations for H.264/AVC have been used.

Results

For H.264 Stereo SEI Message simulations the left and right view sequences are interlaced into one sequence of double height and encoded, transmitted and decoded as one stream consisting of top and bottom field. We achieved the following results:

MOBILE3DTV D2.2

17

Car GOP 1 GOP 16

VLVR (all) VLVR (intra bottom) VLVR (all) VLVR (intra bottom)


24 1675,31 40,87 1801,99 40,92 648,85 38,21 775,48 38,28

30 565,04 36,82 631,96 36,90 252,82 34,85 321,03 34,93

36 213,04 33,54 243,39 33,69 106,76 32,16 135,87 32,19

42 91,56 30,83 105,18 30,94 45,13 29,79 56,41 29,75

Hands GOP 1 GOP 16



24 5222,53 41,93 5297,24 41,98 2812,86 37,64 2864,36 37,66

30 2819,99 37,15 2872,02 37,22 1161,12 32,29 1197,42 32,31

36 1271,41 32,57 1305,07 32,65 401,86 28,35 425,28 28,38

42 440,65 28,32 461,94 28,41 132,58 25,56 147,53 25,59

Horse GOP 1 GOP 16



24 2931,68 37,76 3374,14 37,85 1046,62 36,23 1473,03 36,55

30 865,59 32,49 1109,81 32,68 469,69 32,30 729,65 32,57

36 276,03 28,54 372,31 28,74 193,24 28,56 296,06 28,61

42 103,82 26,03 135,12 26,15 69,66 25,92 101,20 25,90

Snail GOP 1 GOP 16



MOBILE3DTV D2.2

18

24 375,32 44,90 460,23 45,08 199,35 44,58 283,55 44,85

30 156,38 40,63 205,65 40,98 107,41 40,82 157,13 40,98

36 77,21 36,90 103,33 37,28 57,57 37,10 83,71 37,14

42 43,96 33,03 57,60 33,26 32,48 33,18 45,56 33,10

Table 2: RD results for H.264 Stereo SEI Message coding simulations with stereo video format data

MOBILE3DTV D2.2

19

Figure 5: RD-comparison for H.264 Stereo SEI Message coding simulations: total bitrate for both views vs. average PSNR relative to the original sequences

These simulation results clearly indicate, that the overall RD-performance for temporal prediction with using hierarchical B pictures (GOP 16) is better than for H.264 Stereo SEI Message coding without hierarchical B pictures (GOP 1). The gains that can be achieved for the individual sequences differ largely, depending on factors like scene content complexity and temporal variation. For the same quality between almost zero and up to 60% of the bitrate can be saved with hierarchical B pictures (GOP 16). In addition to that the RD-performance of the left and right view is the same, since they are compressed jointly. Moreover the effect of intra coded pictures in the bottom field has been tested, where inter prediction for the bottom field is disabled if the top field is intra coded, resulting in a relevant decline of the RD-performance.

Informal subjective expert viewing has been carried out for the H.264 Stereo SEI Message simulation results on a stereoscopic display. This lead to the conclusion that the objective RD

MOBILE3DTV D2.2

20

results are confirmed, as for the same bitrate a higher quality is achieved by using hierarchical B pictures (GOP 16) for temporal prediction or in return a lower bitrate is necessary to achieve the same subjective quality. Without hierarchical B pictures (GOP 1) the sequences are temporally more unsteady. For medium bitrates a mostly acceptable subjective quality was observed.

MOBILE3DTV D2.2

21

3. Multiview Video Coding (MVC)

Figure 6: Schematic block diagram for H.264 Multiview Video Coding with stereo video format data

As depicted in the overview diagram, H.264 MVC uses the stereo video format, consisting of the two input video sequences Video 1 and Video 2 for the left and right view of the stereo pair. The codec used for H.264 Multiview Video Coding is H.264/MVC, which is applied to both sequences simultaneously for inter-view predictive coding, resulting in two dependent encoded bit-streams BS that may contain the camera parameters as auxiliary information. For transmission these two bit-streams are interleaved frame-by-frame in the multiplexer MUX, resulting in one MVC transport-stream TS. After transmission over the Channel this stream is decoded (and thereby demultiplexed), resulting in the distorted sequences of the stereo pair Video 1 and Video 2.

MOBILE3DTV D2.2

22

3.1. Specification According to the H.264/MVC standard3, “Multiview Video Coding” is specified as an extension to the family of H.264 standards. For MVC, the single-view concepts of H.264/AVC are extended, so that a current picture in the coding process can have temporal as well as inter-view reference pictures for motion-compensated prediction, but also includes a number of new techniques for improved coding efficiency, reduced decoding complexity, and new functionalities for multiview operations. MVC takes advantage of some of the interfaces and transport mechanisms introduced for the scalable video coding (SVC) extension of H.264/AVC. New requirements for 3D video related to interface, transport of the MVC bitstreams, and MVC decoder resource management lead to new features, that have been adopted for MVC, including marking of reference pictures, supporting for efficient view switching, structuring of the bitstream, signaling of view scalability supplemental enhancement information (SEI) and parallel decoding SEI.

Figure 7: MVC coding scheme with stereo video format data: inter-view prediction (red arrows) combined with hierarchical B pictures for temporal prediction (black arrows)

The figure above shows how H.264/MVC is applied to stereo video format data. Hierarchical B pictures for temporal prediction are used in combination with additional inter-view reference pictures for the second of the two stereo views.

3.2. Simulation

Test data

The simulations for H.264 MVC have been carried out with the following test data sets:

Producer KUK


Length [frames] 235 251 140 189

3 ISO/IEC JTC1/SC29/WG11, “Text of ISO/IEC 14496-10:200X/FDAM 1 Multiview Video

Coding”, Doc. N9978, Hannover, Germany, July 2008.

MOBILE3DTV D2.2

23



Data Format VL + VR

Setup

The simulations for H.264 MVC have been carried out with the following coding settings:

Coder Implementation JMVM 7.0

Standard H.264/MVC

Inter-view Prediction enabled (IP prediction structure)


24

30

36

42

GOP Size 16

2


(only one consecutive B picture)

Intra Period 16

Search Range 96

Symbol Mode CABAC

Besides these settings typical configurations for H.264/MVC have been used. Results

For H.264 MVC simulations the left and right view are encoded, transmitted and decoded dependently. We achieved the following results:

Car GOP 1 GOP 16

VL VR VL VR


24 805,69 40,64 624,56 40,32 390,02 38,37 263,80 37,84

30 309,44 36,88 223,35 36,51 171,92 35,46 103,39 34,85

36 113,41 33,59 83,42 33,16 72,84 32,65 44,02 32,07

42 45,15 30,83 35,99 30,40 29,17 30,06 18,41 29,48

MOBILE3DTV D2.2

24

Hands GOP 1 GOP 16

VL VR VL VR


24 2320,38 39,75 1860,98 40,26 1446,13 36,56 1118,55 37,21

30 1235,47 35,28 953,30 35,88 678,19 32,32 512,49 33,02

36 564,13 31,19 416,30 31,77 263,10 28,59 198,86 29,22

42 210,22 27,59 149,28 28,04 87,43 25,57 64,83 25,98

Horse GOP 1 GOP 16

VL VR VL VR


24 1283,52 37,13 1014,70 37,14 661,67 36,16 394,98 35,95

30 482,43 32,47 309,21 32,38 329,65 32,38 167,73 32,18

36 176,41 28,69 89,98 28,33 133,39 28,55 54,54 28,05

42 62,43 25,98 32,18 25,71 48,04 25,78 19,51 25,36

Snail GOP 1 GOP 16

VL VR VL VR


24 206,88 44,92 151,19 45,01 136,44 44,49 81,83 44,46

30 94,22 40,75 59,22 40,79 77,01 40,70 41,25 40,62

36 46,66 36,97 26,98 36,77 41,02 36,77 19,90 36,47

42 27,05 33,22 15,49 33,11 22,85 32,93 10,32 32,74

Table 3: RD results for H.264 MVC coding simulations with stereo video format data

MOBILE3DTV D2.2

25

MOBILE3DTV D2.2

26

Figure 8: RD-comparison for H.264 MVC simulations: total bitrate for both views vs. average PSNR relative to the original sequences

These simulation results clearly indicate, that the overall RD-performance for temporal

prediction with using hierarchical B pictures (GOP 16) is better than for H.264 MVC coding with

an IBPBP… prediction structure (GOP 2). The gains that can be achieved for the individual

sequences differ largely, depending on factors like scene content complexity and temporal

variation. For the same quality between almost zero and up to 50% of the bitrate can be saved

with hierarchical B pictures (GOP 16). Additional simulations with equal coding conditions,

except that inter-view prediction is disabled, identify the gain than is achieved by MVC

compared to simulcast coding. The results show that inter-view prediction leads to a bitrate

reduction for the right view. The gains that can be achieved for the individual sequences differ

MOBILE3DTV D2.2

27

largely, so that for the same quality between almost zero and up to 25% of the total bitrate can

be saved with MVC.

Informal subjective expert viewing has been carried out for the MVC simulation results on a stereoscopic display. This lead to the conclusion that the objective RD results are confirmed, as for the same bitrate a higher quality is achieved by using hierarchical B pictures (GOP 16) for temporal prediction or in return a lower bitrate is necessary to achieve the same subjective quality. With an IBPBP… prediction structure (GOP 2) the sequences are temporally more unsteady. For medium bitrates a tolerable to mostly acceptable subjective quality was observed. By an additional comparison between MVC and Simulcast results (using JMVM with and without inter-view prediction) the objective RD results are confirmed as well, leading to the conclusion that MVC requires a lower bitrate to achieve the same objective and subjective quality than simulcast coding.

MOBILE3DTV D2.2

28

4. MPEG-C Part 3

Figure 9: Schematic block diagram for MPEG-C Part 3 coding with video plus depth format data

As depicted in the overview diagram, MPEG-C Part 3 uses the video plus depth format, consisting of the input video sequences Video and the associated depth information Depth for one of the two views of a stereo pair. The codec used for MPEG-C Part 3 is H.264/AVC, which is applied to each of the two input sequences independently, resulting in two encoded bit- -

MOBILE3DTV D2.2

29

streams BS. For transmission these two bit-streams are interleaved frame-by-frame in the multiplexer MUX, resulting in one MVC transport-stream TS, that may contain additional depth maps properties as auxiliary information. After transmission over the Channel the demultiplexer DEMUX separates this stream into the two individually coded streams. These two streams are decoded independently, resulting in the distorted Video sequence and the distorted Depth sequence for one of the two views of a stereo pair.

4.1. Specification

ISO/IEC 23002-3:20074 defines auxiliary video streams as data coded as video sequences and supplementing a primary video sequence. Depth maps and parallax maps are the first specified types of auxiliary video streams, relating to stereoscopic-view video content. In this context, ISO/IEC 23002-3:2007 specifies syntax and semantics for conveying information describing the interpretation of auxiliary video streams.

Syntax for such information is specified in ISO/IEC 23002-3:2007 as a stream of data referred to as a supplemental information (SI) message stream. Provisions for extensibility have been included, so that additional types of data can be defined in future extensions of the current SI message stream syntax by ISO/IEC.

An SI message stream can contain several concatenated SI messages, hence conveying various types of information. The auxiliary video SI (AVSI) is the only currently-defined type of SI (other than reserved SI message types that are reserved for future specification by ISO/IEC and are to be ignored by decoders if present). An AVSI message characterizes the interpretation of an auxiliary video sequence that accompanies a primary video sequence. For instance, an AVSI can indicate that the auxiliary video represents depth map information, and can provide parameters for the proper interpretation of the auxiliary video as such depth information. The means for identifying the primary video stream and the auxiliary video stream to which these messages pertain is a system-level issue that is outside the scope of ISO/IEC 23002-3:2007.

Although the auxiliary video SI is the only type of SI that is currently specified in ISO/IEC 23002-3:2007, the SI message format has been defined in a generic fashion so that it can potentially be used for purposes other than aiding in the interpretation of auxiliary video sequences. Any kind of data could potentially be carried in the SI message format.

According to the standard specification, “MPEG-C Part 3” is specified as follows:

Auxiliary Video Stream

An auxiliary video stream is a coded representation of an auxiliary video and should be accompanied by a Supplemental Information (SI) RBSP containing at least one Auxiliary Video Supplemental Information (AVSI) message. If more than one AVSI message is present in the SI RBSP, then the first one shall be taken into account and the other ones shall be discarded. The

4 ISO/IEC JTC1/SC29/WG11, “ISO/IEC CD 23002-3: Representation of auxiliary video and

supplemental information”, Doc. N8259, Klagenfurt, Austria, July 2007.

MOBILE3DTV D2.2

30

sample values m of an auxiliary video picture shall be interpreted according to the payload type payloadType of the AVSI message. The following table lists the valid AVSI payload types, the corresponding type of auxiliary video and the number of channels.

payloadType Type of auxiliary video Number of channels

0x10 Depth map 1

0x11 Parallax map 1

The Supplemental Information (SI) RBSP is not part of the Auxiliary Video stream, and shall be conveyed by means that are beyond the scope of this International Standard.

The primary video and the auxiliary video might be spatially and/or temporally misaligned due to:

- interlaced/progressive mismatch,

- different spatial resolutions,

- different temporal resolutions.

Although the re-sampling process is voluntarily left open, the minimal constraints specified now should be met to ensure a correct matching of the primary and auxiliary samples.

Field/frame alignment is provided through the syntax elements aux_is_one_field, aux_is_bottom_field and aux_is_interlaced, which are part of AVSI messages.

Spatial alignment is provided through the two syntax elements position_offset_h and position_offset_v, which are part of AVSI messages.

The temporal synchronization between the primary and the auxiliary videos shall be conveyed by means beyond the scope of this Specification.

Supplemental Information (SI)

Syntax

si_rbsp( NumBytesInSI ) { Descriptor

NumBytesInRBSP = 0

while( NumBytesInRBSP < NumBytesInSI )

si_message( )

}

MOBILE3DTV D2.2

31

Supplemental information message syntax

si_message( ) { Descriptor

payloadType = 0

while( next_bits( 8 ) = = 0xFF ) {

ff_byte /* equal to 0xFF */ f(8)

NumBytesInRBSP ++

payloadType += 255

}

last_payload_type_byte u(8)

NumBytesInRBSP ++

payloadType += last_payload_type_byte

payloadSize = 0

while( next_bits( 8 ) = = 0xFF ) {

ff_byte /* equal to 0xFF */ f(8)

NumBytesInRBSP ++

payloadSize += 255

}

last_payload_size_byte u(8)

NumBytesInRBSP ++

payloadSize += last_payload_size_byte

si_payload( payloadType, payloadSize )

NumBytesInRBSP += payloadSize

}

Supplemental information payload syntax

si_payload( payloadType, payloadSize ) { Descriptor

is_avsi = FALSE

if( payloadType == 0x10 || payloadType == 0x11 ) {

is_avsi = TRUE

generic_params()

}

if( payloadType == 0x10 )

depth_params()

else if( payloadType == 0x11 )

parallax_params()

else

reserved_si_message( payloadSize )

MOBILE3DTV D2.2

32

}

Depth map parameters syntax

depth_params( ) { Descriptor

nkfar u(8)

nknear u(8)

}

Parallax map parameters syntax

parallax_params( ) { Descriptor

parallax_zero u(16)

parallax_scale u(16)

dref u(16)

wref u(16)

}

Generic parameters syntax

generic_params( ) { Descriptor

aux_is_one_field f(1)

if (aux_is_one_field) {

aux_is_bottom_field f(1)

}

else {

aux_is_interlaced f(1)

}

reserved_generic_bits f(6)

position_offset_h u(8)

position_offset_v u(8)

}

Reserved SI message syntax

reserved_si_message( payloadSize ) { Descriptor

for( i = 0; i < payloadSize; i++ )

MOBILE3DTV D2.2

33

reserved_si_byte b(8)

}

4.2. Simulation

Test data

The simulations for MPEG-C Part 3 have been carried out with the following test data sets:

Producer KUK


Length [frames] 235 251 140 189



Data Format VL + DL

Setup

The simulations for MPEG-C Part 3 have been carried out with the following coding settings:


Standard H.264/AVC


24

30

36

42

GOP Size 16

1


(no B pictures)

Intra Period 16

Search Range 32

Symbol Mode CABAC

Besides these settings typical configurations for H.264/AVC have been used. Results

MOBILE3DTV D2.2

34

For MPEG-C Part 3 simulations the left view video and depth are encoded, transmitted and decoded independently. We achieved the following results:

Car GOP 1 GOP 16

VL DL VL DL


24 951,66 41,03 353,40 45,17 400,83 38,48 130,29 42,47

30 335,50 36,99 118,83 42,48 164,19 35,12 41,85 40,00

36 124,16 33,74 39,35 39,64 68,06 32,37 15,92 37,96

42 52,32 30,98 14,38 36,56 27,76 29,93 7,99 36,28

Hands GOP 1 GOP 16

VL DL VL DL


24 2873,82 41,78 929,08 43,85 1565,86 37,40 374,22 39,89

30 1586,37 36,94 370,00 40,23 664,89 32,05 129,64 36,45

36 733,07 32,33 131,29 36,89 235,62 28,14 48,31 33,70

42 255,48 28,12 44,98 33,80 78,28 25,34 17,93 31,18

Horse GOP 1 GOP 16

VL DL VL DL


24 1677,38 37,87 94,21 48,39 736,45 36,61 41,48 46,79

30 564,73 32,75 35,64 45,49 367,76 32,65 18,24 44,64

36 196,96 28,81 15,98 42,07 151,08 28,73 10,09 42,15

MOBILE3DTV D2.2

35

42 70,58 26,13 8,81 38,19 52,28 25,92 7,22 39,82

Snail GOP 1 GOP 16

VL DL VL DL


24 230,04 45,06 72,00 48,65 142,33 44,87 34,77 47,36

30 103,67 40,93 27,81 45,76 79,58 40,97 17,45 45,08

36 51,94 37,16 13,91 42,08 42,24 37,11 11,13 41,98

42 28,74 33,13 9,09 38,66 23,09 33,07 7,63 38,09

Table 4: RD results for MPEG-C Part 3 coding simulations with video plus depth format data

MOBILE3DTV D2.2

36

MOBILE3DTV D2.2

37

Figure 10: RD-comparison for MPEG-C Part 3 coding simulations using the same QP for VL and DL : total bitrate for video plus depth vs. average PSNR relative to the original VL sequence and the VR sequence

rendered from original VL+DL, respectively

These simulation results clearly indicate, that the overall RD-performance for temporal

prediction with using hierarchical B pictures (GOP 16) is better than for video plus depth coding

without hierarchical B pictures (GOP 1). Note, that for the evaluation of MPEG-C Part 3

simulations the right view has been rendered from video plus depth of the left view. According to

this the PSNR of the right view was calculated between the right view rendering results from

compressed and original video plus depth data of the left view. The gains that can be achieved

for the individual sequences differ largely, depending on factors like scene content and depth

complexity as well as temporal variation. For the same quality between almost zero and up to

50% of the bitrate can be saved with hierarchical B pictures (GOP 16).

Since the video and the depth sequences are coded individually with MPEG-C Part 3, different

bitrate ratios (and thereby qualities) for left view video and left view depth can be combined. The

influence of such combinations on the RD-performance of the rendered right view has been

evaluated. The following figure shows the results for all 16 possible combinations of color and

depth quality in our experiments using 4 qualities (i.e. QP settings) for each. The curves

combine points of constant color quality (C24, C30, …) and points of constant depth quality

(D24, D30, …). Apparently curves of constant color quality are steeper. In most cases

increasing depth bitrate has a stronger influence on overall quality than increasing color bitrate.

At a certain bitrate the QP for depth should be chosen lower (i.e. better quality) than the QP for

color to achieve best overall results, e.g. C30, D24. It can be concluded that good depth quality

is essential for good overall quality.

MOBILE3DTV D2.2

38

The bitrate ratio between color and depth in such an optimum point may vary largely depending

on the sequence. If we select D24 and C30 we get the following ratios for GOP1 from the tables

above:

Car: 335,50 : 353,40 ≈ 1 : 1

Hands: 1586,37 : 929,08 ≈ 1.5 : 1

Horse: 564,73 : 94,21 ≈ 6 : 1

Snail: 103,67 : 72,00 ≈ 1.5 : 1

For best overall quality a substantial portion of the bitrate has to be spent for depth in most

cases.

Car, GOP1

31,00

33,00

35,00

37,00

39,00

41,00

43,00

0,00 200,00 400,00 600,00 800,00 1000,00 1200,00 1400,00

Total bitrate (V+D) [kbps]

Y-P

SN

R [

dB

]

C24

C30

C36

C42

D24

D30

D36

D42

MOBILE3DTV D2.2

39

Hands, GOP1

27,00

28,00

29,00

30,00

31,00

32,00

33,00

34,00

35,00

36,00

37,00

0,00 500,00 1000,00 1500,00 2000,00 2500,00 3000,00 3500,00 4000,00


Y-P

SN

R [

dB

]C24

C30

C36

C42

D24

D30

D36

D42

Horse, GOP1

27,00

28,00

29,00

30,00

31,00

32,00

33,00

34,00

35,00

36,00

37,00

0,00 500,00 1000,00 1500,00 2000,00


Y-P

SN

R [

dB

]

C24

C30

C36

C42

D24

D30

D36

D42

MOBILE3DTV D2.2

40

Figure 11: RD details for the rendered right view VR for MPEG-C Part 3 coding simulations: total bitrate for video plus depth vs. average PSNR relative to the VR sequence rendered from original VL+DL; all

combinations of color and depth quality; curves of constant color quality and curves of constant depth quality

Informal subjective expert viewing has been carried out for the MPEG-C Part 3 simulation results on a stereoscopic display using the results with same QP for video and depth. This lead to the conclusion that the objective RD results are confirmed, as for the same bitrate a higher quality is achieved by using hierarchical B pictures (GOP 16) for temporal prediction or in return a lower bitrate is necessary to achieve the same subjective quality. Without hierarchical B pictures (GOP 1) the sequences are temporally more unsteady.

Snail, GOP1

33,00

35,00

37,00

39,00

41,00

43,00

45,00

0,00 50,00 100,00 150,00 200,00 250,00 300,00 350,00


Y-P

SN

R [

kb

ps]

C24

C30

C36

C42

D24

D30

D36

D42

MOBILE3DTV D2.2

41

5. H.264 Auxiliary Picture Syntax for video plus depth

Figure 12: Schematic block diagram for H.264 Auxiliary Picture Syntax coding with video plus depth format data

As depicted in the overview diagram, H.264 Auxiliary Picture Syntax uses the video plus depth format, consisting of the input video sequences Video and the associated depth information Depth for one of the two views of a stereo pair. The codec used for H.264 Auxiliary Picture Syntax is H.264/AVC, which is applied to both sequences simultaneously but independently (with Video being the primary coded picture and Depth the auxiliary coded picture), resulting in one encoded bit- or transport-stream BS/TS. After transmission over the Channel this stream is

MOBILE3DTV D2.2

42

decoded, again simultaneously but independently for primary and auxiliary coded pictures, resulting in the distorted Video sequence and the distorted Depth sequence for one of the two views of a stereo pair.

5.1. Specification In addition to basic coding tools, the H.264/AVC standard enables sending extra supplemental information along with the compressed video data. This often takes a form called "supplemental enhancement information" (SEI) or "video usability information" (VUI) in the standard. SEI data is specified in a backward-compatible way, so that as new types of supplemental information are specified, they can even be used with profiles of the standard that had been previously specified before that definition. The first version of the standard includes the definition of a variety of such SEI data, which we will not specifically review herein. Instead we focus only on what new types of backward-compatible supplemental and auxiliary data are defined in the new FRExt amendment. One of these new types of data are auxiliary pictures, which are extra monochrome pictures sent along with the main video stream, that can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI).

Definitions

Primary coded picture: The coded representation of a picture to be used by the decoding process for a bitstream conforming to H.264/AVC. The primary coded picture contains all macroblocks of the picture. The only pictures that have a normative effect on the decoding process are primary coded pictures.

Auxiliary coded picture: A picture that supplements the primary coded picture that may be used in combination with other data not specified by H.264/AVC in the display process. An auxiliary coded picture has the same syntactic and semantic restrictions as a monochrome redundant coded picture. An auxiliary coded picture must contain the same number of macroblocks as the primary coded picture. Auxiliary coded pictures have no normative effect on the decoding process.

Decoding

The decoding of auxiliary coded pictures is not required for conformance with H.264/AVC.

The (optional) decoding process for the decoding of auxiliary coded pictures is the same as if the auxiliary coded pictures were primary coded pictures in a separate coded video stream (with some minor constraints).

The syntax of each coded slice of an auxiliary coded picture shall obey the same constraints as a coded slice of a redundant picture, with the following differences of constraints.

– If the primary coded picture is an IDR picture, the auxiliary coded slice syntax shall correspond to that of a slice of an IDR picture;

– Otherwise (the primary coded picture is not an IDR picture), the auxiliary coded slice syntax shall correspond to that of a slice of a non-IDR picture.

– The slices of an auxiliary coded picture (when present) shall contain all macroblocks corresponding to those of the primary coded picture.

MOBILE3DTV D2.2

43

5.2. Simulation

Test data

The simulations for H.264 Auxiliary Picture Syntax have been carried out with the following test data sets:

Producer KUK


Length [frames] 235 251 140 189



Data Format VL + DL

Setup

The simulations for H.264 Auxiliary Picture Syntax have been carried out with the following coding settings:


Standard H.264/AVC


24

30

36

42

GOP Size 16

1


(no B pictures)

Intra Period 16

Search Range 32

Symbol Mode CABAC

Besides these settings typical configurations for H.264/AVC have been used. Results

For H.264 Auxiliary Picture Syntax simulations the left view video and depth are encoded, transmitted and decoded independently. Based on the same simulations as for MPEG-C Part 3, we achieved the following results:

MOBILE3DTV D2.2

44

MOBILE3DTV D2.2

45

Figure 13: RD-comparison for H.264 Auxiliary Picture Syntax coding simulations: total bitrate for video plus depth vs. average PSNR relative to the original VL sequence and the VR sequence rendered from

original VL+DL, respectively

As already described in section 5.2 these simulation results clearly indicate, that the overall RD-performance for temporal prediction with using hierarchical B pictures (GOP 16) is better than for simulcast coding without hierarchical B pictures (GOP 1). Note, that according to section 5.2, the PSNR of the right view was calculated between the right view rendering results from compressed and original video plus depth data of the left view. The gains that can be achieved for the individual sequences differ largely, depending on factors like scene content and depth complexity and temporal variation. For the same quality between almost zero and up to 50% of the bitrate can be saved with hierarchical B pictures (GOP 16). In contrast to MPEG-C Part 3 no variations of different video and depth qualities are possible with H.264 Auxiliary Picture Syntax. Therefore the different contribution of the right and the rendered left view to the average PSNR is analyzed here. For high bitrates the left views achieves a higher quality than the rendered right view, while for low bitrates the opposite can be observed in some cases. The differences

MOBILE3DTV D2.2

46

between the two views are mostly small, but especially for high bitrates differences of several dB are possible.

Informal subjective expert viewing has been carried out for the H.264 Auxiliary Picture Syntax simulation results on a stereoscopic display. Due to the equivalent simulations the conclusions for the subjective quality evaluation are the same as for MPEG-C Part 3 simulations (see section 4.2 for details).

MOBILE3DTV D2.2

47

6. Comparative Analysis

MOBILE3DTV D2.2

48

Figure 14: RD-comparison for the three different simulations on stereo video coding approaches: total bitrate for both views vs. average PSNR relative to the original sequences

Objective results in terms of RD-performance have been compared for the three different simulations on stereo video coding approaches (without depth), namely H.264 Simulcast, H.264 Stereo SEI Message and H.264/MVC. This comparison clearly indicates that the overall RD-performance of both Stereo SEI Message and MVC coding is better than for Simulcast coding. The results for Stereo SEI Message and MVC cannot be compared directly, because different coding conditions had to be used for simulations, but since the corresponding simulcast coding experiments with JM (in section 2.2) and with JMVM (in section 4.2) achieve very similar results, the conclusion seems reasonable, that Stereo SEI performs better than MVC in some cases. Comparison of the RD-performance gains between Simulcast and Stereo SEI (using equal coding conditions) shows that for the same quality up to 35% of the total bitrate can be saved with interview prediction. However, in some cases as for the Hands sequence the gain is negligible.

MOBILE3DTV D2.2

49

Informal subjective expert viewing has been carried out for the simulation results of the three different approaches for coding of stereo video format data on a stereoscopic display. This lead to the conclusion that the objective RD results are confirmed, as for the same bitrate a lower quality is achieved by Simulcast coding than by Stereo SEI or MVC coding. In return for these two approaches a lower bitrate is necessary to achieve the same subjective quality as simulcast.

From a complexity point of view all 3 approaches are comparable. They use the same basic operations. Using hierarchical B pictures with a GOP of 16 certainly means a tremendous increase of memory requirements and delay. A standard conform implementation of MVC requires that the decoder supports the H.264 High Profile since MVC extends that. A standard conform implementation of the Stereo SEI Message requires that the decoder supports the H.264 interlaced tools.

MOBILE3DTV D2.2

50

Figure 15: RD-comparison for simulcast coding with stereo video and video plus depth format data: total bitrate vs. average PSNR for both views

Objective results in terms of RD-performance have been compared for the simulcast simulations with stereo video and video plus depth format data. However, it has to be considered for the comparison between stereo video and video plus depth coding results, as well as the fact that for stereo video the PSNR of left and right view is calculated between original and decoded pictures, while for video plus depth the PSNR of the right view is calculated between the rendered right view from compressed and original video plus depth data of the left view. Taking these differences into account the results indicate that the overall RD-performance of video plus depth is better than for stereo video with simulcast coding. For both formats the RD-performance for temporal prediction with using hierarchical B pictures (GOP 16) is better than for simulcast coding without hierarchical B pictures (GOP 1).

Initial and very informal subjective expert viewing has been carried out to compare simulcast with video plus depth coding. However these comparisons were done using sequences with

MOBILE3DTV D2.2

51

equal PSNR and not equal bitrate, since no data were available yet for equal bitrate. Note that the PSNR values are calculated differently as stated above. In these tests simulcast stereo coding showed better subjective results than video plus depth, but the bitrate was higher as well. Therefore it is not possible to take conclusions at this point.

A detailed comparison of the different approaches is the main work in WP2 in the coming months. Detailed subjective testing is already planned in collaboration with WP4, which will also answer the questions remaining open here at this point.

MOBILE3DTV D2.2

52

7. Conclusions

This deliverable investigated available 3D video representation formats and coding standards for mobile applications. Simulations were carried out with realistic coding settings, e.g. intra period of 16 for random access and error robustness. A typical set of test sequences was used targeting the display to be used in the demonstrator and covering different types of content (high and low scene content complexity, high and low temporal variation). As for any type of video coding, the same amount of raw input data leads to very different RD-performance. The required bitrate for achieving acceptable quality strongly depends on the properties of the sequence content, especially temporal variation and complexity of the scene. The coding gain from inter-view prediction (Stereo SEI & MVC) varies largely.

Significant coding gains can be achieved with hierarchical B pictures for temporal prediction. Not using hierarchical B pictures not only results in considerably higher bitrates for the same objective quality, but even in a worse subjective quality. However, the gain from using hierarchical B pictures differs largely for individual sequences, depending on factors like scene content complexity and temporal variation. However, hierarchical B pictures also mean increased complexity and memory requirements. It remains to be studied how far this can be implemented on a mobile terminal.

Savings from GOP 16 vs. GOP 1 for 3D video coding:

H.264 Simulcast: up to 50% bitrate saving

H.264 Stereo SEI: up to 60% bitrate saving

MVC: up to 50% bitrate saving

MPEG-C Part 3: up to 50% bitrate saving

Inter-view prediction leads to a significant reduction of bitrate for some sequences. However, in some cases the gain is negligible. In our experiments we achieved up to 35% bitrate savings from inter-view prediction compared to stereo simulcast. Inter-view prediction whether performed as H.264 SEI or MVC does not add substantial complexity. It uses the same basic operations. However, a standard conform implementation of MVC requires that the decoder supports the H.264 High Profile since MVC extends that. A standard conform implementation of the Stereo SEI Message requires that the decoder supports the H.264 interlaced tools.

A representation as video plus depth is an interesting alternative for 3D video. It allows to adjust the stereo rendering at the decoder and to optimally adapt the 3D impression for any given display. However, this extended functionality comes at the cost of an increased complexity since rendering of one output view has to be done at the terminal device. Depth estimation is necessary on sender side, which is an inherently error prone task. Nevertheless, it has been shown that good general quality is achievable by the video plus depth approach. Please refer to D2.3 “Report on generation of video plus depth data base”.

MPEG-C Part 3 is suitable for encoding of video plus depth data. Different bitrate ratios (and

thereby qualities) for video and depth can be adjusted. It was found that at a certain bitrate the

QP for depth should be chosen lower than the QP for color to achieve best overall results, e.g.

C30, D24. It can be concluded that good depth quality is essential for good overall quality. The

numerical bitrate ratio between color and depth may vary largely depending on the sequence.

We have found ratios between 1:1 and 6:1. In most cases for best overall quality a substantial

portion of the bitrate has to be spent for depth.

MOBILE3DTV D2.2

53

A detailed comparison of “video plus video” approaches (simulcast, H.264 SEI, MVC) and video

plus depth (MPEG-C Part 3) is still to be done. This will be done in close collaboration with

WP4, which will perform formal subjective tests about these issues. These experiments will also

include mixed resolution stereo video coding, as an extension of available 3D video formats.

Mobile 3DTV Content Delivery Optimization over DVB-H System

MOBILE3DTV - Mobile 3DTV Content Delivery Optimization over DVB-H System - is a three-yearproject which started in January 2008. The project is partly funded by the European Union 7th

RTD Framework Programme in the context of the Information & Communication Technology (ICT)Cooperation Theme.

The main objective of MOBILE3DTV is to demonstrate the viability of the new technology ofmobile 3DTV. The project develops a technology demonstration system for the creation andcoding of 3D video content, its delivery over DVB-H and display on a mobile device, equippedwith an auto-stereoscopic display.

The MOBILE3DTV consortium is formed by three universities, a public research institute and twoSMEs from Finland, Germany, Turkey, and Bulgaria. Partners span diverse yet complementaryexpertise in the areas of 3D content creation and coding, error resilient transmission, userstudies, visual quality enhancement and project management.

For further information about the project, please visit www.mobile3dtv.eu.

Tuotekehitys Oy TamlinkProject coordinator

FINLAND

Tampereen Teknillinen Yliopisto

Visual quality enhancement,

Scientific coordinator

FINLAND

Fraunhofer Gesellschaft zur Förderung der angewandten Forschung e.V

Middle East Technical UniversityError resilient transmission

TURKEY

Stereo video content creation and coding

GERMANY

Technische Universität IlmenauDesign and execution of subjective tests

GERMANY

MM Solutions Ltd. Design of prototype terminal device

BULGARIA

MOBILE3DTV project has received funding from the European Community’s ICT programme in the context of theSeventh Framework Programme (FP7/2007-2011) under grant agreement n° 216503. This document reflects onlythe authors’ views and the Community or other project partners are not liable for any use that may be made of theinformation contained therein.

http://www.europa.eu/

http://cordis.europa.eu/fp7/

Documents

Adaptation and optimization of coding algorithms for mobile 3DTVsp.cs.tut.fi/mobile3dtv/results/tech/D2.2_Mobile3DTV_v1... · 2009-02-11 · MOBILE3DTV Project No. 216503 Adaptation