28
Overview of the Stereo and Multiview Video Coding Extensions of the H.264/MPEG-4 AVC Standard By Anthony Vetro, Fellow IEEE, Thomas Wiegand, Fellow IEEE, and Gary J. Sullivan, Fellow IEEE PROCEEDINGS OF THE IEEE 2011

Overview of the Stereo and Multiview Video Coding Extensions of the H.264/MPEG-4 AVC Standard By Anthony Vetro, Fellow IEEE, Thomas Wiegand, Fellow IEEE,

Embed Size (px)

Citation preview

Overview of the Stereo and Multiview Video Coding Extensions of the H.264/MPEG-4 AVC Standard

By Anthony Vetro, Fellow IEEE, Thomas Wiegand, Fellow IEEE, and

Gary J. Sullivan, Fellow IEEE

PROCEEDINGS OF THE IEEE 2011

The Emerging MVC Standard for 3D Video Services

Ying Chen, Ye-KuiWang, Kemal Ugur, MiskaM. Hannuksela,

Jani Lainema, andMoncef Gabbouj

EURASIP Journal on Advances in Signal Processing 2009

Outline

Introduction

Multiview Scenarios and Applications

Standardization Requirements

H.264/MPEG-4 AVC Basics

Extending H.264/MPEG-4 AVC for Multiview

Frame-Compatible Stereo Encoding Formats

Conclusion and Future Work

3

Introduction

Joint Video Team of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) standardize an extension of H.264/MPEG-4 that is referred to as Multiview Video Coding (MVC) MVC provides a compact representation for multiple views of a video scene, stereo-

paired video for 3-D viewing The stereo high profile of the MVC extension was selected by the Blu-Ray Disc

Association as the coding format for 3-D video with high-definition resolution The system level integration of MVC is more challenging as the decoder output may

contain more than one view and can consist of any combination of the views with any temporal level

Various “frame compatible” approaches for support of stereo-view video as an alternative to MVC are discussed

4

Multiview Scenarios and Applications

5

Multiview Scenarios and Applications

Free-viewpoint video, the viewpoint can be interactively changed There exist several candidate views for the viewer, one of them is selected as the

target view Decoder focus on decoding target view Efficient switching between different view

3-D TV, more than one view is decoded and display simultaneously Stereoscopic video

Classic stereo systems that require special-purpose glasses Auto-stereoscopic displays that do not require glasses

3-D video Multiple actual or rendered views of the scene are presented to the viewer, e.g. using

‘virtual reality’ glasses or an advanced auto-stereoscopic display, so that view changes with head movements and the viewer has the feeling of immersion in the 3-D

Parallel processing of different views and flexible stream adaption6

Multiview Scenarios and Applications

Teleconference applications Both interactivity and virtual reality

Rendering of 3-D TV content or view synthesis Depth information is needed

2-D TV or HDTV application are still dominating the market MVC content should provide a way for those 2-D decoders to generate a display from

an MVC bitstream

7

Standardization Requirements

High compression efficiency Huge amount of data in MVC Enable Inter-view prediction Efficient memory management of decoded pictures Hierarchical temporal scalability was found to be efficient for MVC Significant gain compared to independent compression of each view

Random access Ensure that any image can be accessed, decoded, and displayed by starting the

decoder at a random access point and decoding a relatively small quantity of data on which that image may depend

Insertion intra coded pictures View-switching random access

8

Standardization Requirements

Typical MVC prediction structure

9

back

IDR anchor

Standardization Requirements

Scalability The ability of a decoder to access only a portion of a bitstream still being able to

generate effective video output – reduce temporal or spatial resolution Temporal scalability and View-scalability Adaption of user preference, network bandwidth, decoder complexity

Decoder resource consumption A number of views are to be decoded and display Optimal decoder in terms of memory and complexity is very important to make real-

time decoding possible

Parallel processing In 3-D TV, multiple views need to be decoded simultaneously Reduce computation time to achieve real-time decoding

10

Standardization Requirements

Backward compatibility A subset of the MVC bitstream corresponding to “base view” needs to be decodable by

an ordinary H.264/MPEG-4 AVC decoder, and other data representing other views should be encoded in a way that will not affect base view decoding

Achieving a desired degree quality consistency among views or to select a preferential quality for encoding some views versus others

Convey camera parameters along with the bitstream in order to support intermediate view interpolation at the decoder

MVC share some design principles with SVC, such as backward compatibility with H.264/AVC, temporal scalability, network friendly adaption…

New mechanisms include view scalability, interview prediction structure, coexisting of decoded pictures from multiple dimensions in the decoded picture buffer, multiple representation in the display, parallel decoding at decoder

11

H.264/MPEG-4 AVC BASICS

H.264/AVC covers Video coding layer (VCL) : creates a coded representation of the source content Network abstraction layer (NAL) : formats these data and provides header information

Network Abstraction Layer (NAL) A coded H.264/MPEG-4 AVC video data stream is organized into NAL units, which are

packets that contain an integer number of bytes VCL NAL units

Picture content (coded slices or slice data)

Non-VCL NAL units Parameter sets

Sequence Parameter Set (SPS) : sequence-level header information Picture Parameter Set (PPS) : infrequently changing picture-level header information

SEI messages Do not affect the core decoding process Assist the decoding process or subsequent processing (bitstream manipulation or display)

12

Type Payload

H.264/MPEG-4 AVC BASICS

13

H.264/MPEG-4 AVC Basics

The set of consecutive NAL units associated with a single coded picture is referred to as an access unit

A set of consecutive access unit with certain properties is referred to as a coded video sequence

A coded video sequence represents an independently decodable part of a video stream and always starts with an instantaneous decoding refresh (IDR) access unit

Video coding Layer (VCL) Reference picture buffering and the associated buffering memory control

The behavior of the decoded picture buffer (DBP) can be adaptively controlled by memory management control operation (MMCO) commands

The reference picture lists that are used for coding of P or B slices can be arbitrarily constructed from the pictures available in the DPB via reference picture list modification (RPLM) commands

14

Extending H.264/MPEG-4 AVC for Multiview

Bitstream Structure Compress multiview stream to include a “base view” bitstream, which is coded

independently from all other views in a manner compatible with decoders for single-view profile of the standard

There are useful properties of the coded pictures in the H.264/AVC-compliant base view, such as temporal level, which are not indicated in the VCL NAL units of H.264/AVC. To indicate those properties, the prefix NAL unit has been introduced

Coded pictures from different views may use different SPS that contain the view dependency information for inter-view prediction

New syntax elements include view_id : identifier of each view temporal_id : temporal scalability hierarchy priority_id : used for the simple one-path bitstream adaption process anchor_pic_flag : indicate a picture is an anchor picture or non-anchor picture idr_flag : indicate a picture is IDR picture or not inter_view_flag : indicate a decoded picture is used for inter-view reference or not

15

Extending H.264/MPEG-4 AVC for Multiview

16

Extension NAL unit type

back

Extending H.264/MPEG-4 AVC for Multiview

Enabling Inter-View Prediction Exploit both temporal and spatial redundancy

The flexible reference picture management capabilities that had already been designed into H.264/MPEG-4 AVC Making the decoded pictures from other views available in the reference picture lists for

use by the inter-picture prediction processing MVC design does not allow the prediction of a picture in one view at a given time using a

picture from another view at different time

Inter-view prediction may be used for encoding the non-base view of an IDR picture

17

Extending H.264/MPEG-4 AVC for Multiview

Additional picture type: anchor picture Similar to IDR that they do not use temporal prediction , do allow inter-view prediction It is prohibited for any picture that follows anchor picture to use any picture that precedes

the anchor picture as a reference for inter-picture prediction Provides a clean random access point for access to a given view

High-Level Syntax Three important pieces of information are carried in the SPS extension

View identification Total number of views

A listing of view identifiers (ex: 0-2-1)

View dependency information The number of inter-view reference pictures for list0/list1

The views that may be used for predicting a particular view (ex: view1 use view0 and view2 as reference views)

Separate for anchor and non-anchor pictures

18

Extending H.264/MPEG-4 AVC for Multiview

Level index for operation point Indicator of the resource requirements for a decoder that conforms to a particular level

A specific temporal subset and a set of views including those intended for output and the views that they depend

Multiple level values could be signaled as part of the SPS extension, with each level being associated with a particular operating point

The syntax indicates the number of views that are targeted for output as well as the number of views that would be required for decoding particular operating points

Profiles and Levels Profiles

Determine the subset of coding tools that must be supported by conforming decoders Based on the high profile of H.264/MPEG-4 AVC Multiview high profile

Supports multiple views Does not support interlace coding tools

Stereo high profile Two views Support interlace coding tools

19

Extending H.264/MPEG-4 AVC for Multiview

20

Extending H.264/MPEG-4 AVC for Multiview

Levels Constrains on the bitstreams produced by MVC encoders, to establish bounds on the

necessary decoder resources and complexity Limit on the amount of frame memory required for the decoding of a bitstream The maximum throughput in terms of macroblocks per second Maximum picture size Overall bitrate

Coding Performance 20%-30% bitrate saving 2-3 dB gains

21

Extending H.264/MPEG-4 AVC for Multiview

SEI Message for Multiview Video Parallel decoding information SEI message MVC scalable nesting SEI message

Indicate the scope of views or temporal levels for which the message apply Reuse the syntax of H.264/AVC SEI messages for a specific set of views and temporal level

View scalability information SEI message The mapping between each operation point and the required NAL units Signal profile and level for each operation point which is identified by the view_id

Multiview scene information SEI message Multiview acquisition information SEI message Signal camera parameters, which are helpful in view interpolation by a renderer

Nonrequired view component SEI message Indicates that a particular view component is not needed for decoding

22

Operation point : a subset bitstream which identified by the combination of required view_id and temporal_id values.

Extending H.264/MPEG-4 AVC for Multiview

Parallel coding of multiple views 3-D broadcasting use cases, display need to output many views simultaneously to

support head-motion parallax

Use Parallel decoding information SEI message indicates the video is encoded in a way that macroblock in view 1 picture could only use reconstruction values of macroblocks that belong to certain rows in view 0 picture

23

Extending H.264/MPEG-4 AVC for Multiview

View dependency change SEI message Signal changes in the view dependency structure

Operation point not present SEI message Indicates operation points that are not present in the bitstream Useful in streaming and networking scenarios

Base view temporal HRD SEI message Associated with an IDR access unit Signal information relevant to the hypothetical reference decoder (HRD) parameters

associated with the base view

24

HRD: a virtual buffering algorithm that can be used to test the behavior of the coded bitstream and its effect on a real decoder

Frame-Compatible Stereo Encoding Formats

Frame-compatible formats refer to a class of stereo video formats in which the two stereo views are essentially multiplexed into a single coded frame or sequence of frames

Other common names include stereo interleaving or spatial/temporal multiplexing formats

25

Frame-Compatible Stereo Encoding Formats

Basic Principles With a frame-compatible format, the left and right views are packed together in

the samples of a single video frame Half of the coded samples represent the left view and other half represent the

right view Each coded view has half the resolution of the full coded frame Temporal multiplexing

The left and right views would be interleaved as alternating frames or field of a coded sequence

Frame-compatible formats have received considerable attention from the broadcast industry since the coded video can be processed by encoders and decoders that were not specially designed to handle stereo video

Only the final display stage requires some customization for recognizing and properly rendering the video to enable a 3-D viewing experience

26

Frame-Compatible Stereo Encoding Formats

The drawback of representing the stereo signal in this way is that spatial or temporal resolution would be only half of that used for 2-D video with the same encoded resolution

Key additional issue with frame-compatible formats is distinguishing the left and right views

Signaling The signaling for a complete set of frame-compatible formats has been

standardized within the H.264/MPEG-4 AVC standard as SEI messages Frame packing arrangement (FPA) SEI message was specified in an amendment of

the H.264/MPEG-4 AVC standard

27

Conclusion and Future Work

Three-dimensional video has drawn significant attention recently. The efficient representation and compression of stereo and multiview video is a central component of any 3-D or multiview

This paper reviewed the recent extensions H.264/MPEG-4 AVC standard that support 3-D stereo and multiview video

The MVC standard includes stereo and multiview video by enabling inter-view prediction as well as temporal interpicture prediction

Another important development has been the efficient representation, coding, and signaling of frame-compatible stereo video formats

As the market evolves and new types of displays and services are offered, additional new technologies and standards will need to be introduced. The generation of the large number of views required by autostereoscopic displays would be needed. Solutions that consider the inclusion of depth map information for this purpose are a significant area of

focus for future designs

28