2
Tile Based HEVC Video for Head Mounted Displays Robert Skupin, Yago Sanchez, Cornelius Hellge, Thomas Schierl Video Coding & Analytics Department Fraunhofer Heinrich-Hertz-Institute Berlin, Germany {forename.surname}@hhi.fraunhofer.de Abstract—360° video services with resolutions of UHD and beyond for Virtual Reality head mounted displays are a challenging task due to limits of video decoders in constrained end devices. Adaptivity to the current user viewport is a promising approach but incurs significant encoding overhead when encoding per user or set of viewports. A more efficient way to achieve viewport adaptive streaming is to facilitate motion-constrained HEVC tiles. Original content resolution within the user viewport is preserved while content currently not presented to the user is delivered in lower resolution. A lightweight aggregation of varying resolution tiles into a single HEVC bitstream can be carried out on-the-fly and allows usage of a single decoder instance on the end device. Keywords— 360° video; virtual reality; tiles; HEVC. I. INTRODUCTION The emergence of consumer grade virtual reality (VR) headsets is forecasted to create a multibillion-dollar segment within the entertainment market during the coming years [1]. While gaming is expected to be the dominant use case for VR, head mounted displays (HMDs) can also serve as novel immersive means for consumption of high-resolution omnidirectional or 360° video. It is evident that a major fraction of todays US and global Internet traffic is caused by video data [2]. While most of this data relates to traditional video services, major platforms such as Youtube and Facebook are already delivering 360° video to various devices. However, current 360° video services offer a somewhat limited user experience as resolution in the user viewport and hence visual quality are not up to par with traditional video services. Covering the full 360° surroundings in sufficient resolution could easily lead to multiple times UHD resolution, which poses a major challenge to the established video streaming chain as well as available end devices. VR relevant devices such as mobile phones contain hardware video decoders that are tailored to resolutions used in traditional video service such as FHD or UHD. Therefore, it is of importance to limit overall resolution to be transmitted and decoded. Taking into account the specifics of 360° video on HMDs, i.e. only a relatively small subset of the complete video is presented to the user at once, it is evident that adaptivity to the current user viewport is a promising approach to increase visual quality for the user at a given overall resolution. A simplistic approach for HMDs in this direction is to encode the exact or an over-provisioned viewport for each user. While adequately minimizing the number of outside-viewport samples to be decoded, such a service comes at the cost of a massive encoding overhead when considered for large-scale deployments. Another approach is to encode the 360° video multiple times for a number of viewport orientation, e.g. by projection to a pyramid surface [3], which still comes at the cost of a significant encoding and storage overhead for multiple dozen orientations. In the past, there has been a notable amount of work on tiling variants to address viewport adaptivity for high resolution panorama streaming [4]. However, the end devices addressed were for the most part flat-panel screens and tablets, all of which have a relatively high tolerance towards constraining user interaction such as viewport changes. HMDs as end device do not allow constraining viewport changes at all and correct content must be rendered and presented instantaneously on the turn of the users head. In this demonstration paper, we propose an alternative approach to 360° video facilitating HEVC tiles. This approach allows emphasis of the current user viewport through decreasing resolution of outside-viewport video samples on-the-fly. Hence, the full 360° surroundings are always available on the end device. At the same time the amount of samples that lie outside the user viewport is reduced. The resolution adaptivity per video area is achieved without transcoding by merging motion-constrained HEVC tiles of varying resolution into a single common bitstream through lightweight tile aggregation [5]. The demonstration shows a 360° video system for HMDs that is capable of adapting resolution of video areas depending on the current user viewport and creates an individually tailored HEVC bitstream. 25% High Resolu/on 33% High Resolu/on 42% High Resolu/on Tile boundary Slice boundary Figure 1: Three exemplary tiling configuration mixing high and low resolution motion-constrained HEVC tiles from a cubic 360° video sequence into common bitstreams with varying viewport provisioning.

Tile Based HEVC Video for Head Mounted Displaysiphome.hhi.de/skupin/assets/pdfs/ISM2016_TileBasedHEVCVideoForHMDs.pdfcommon HEVC slice wherever necessary to maintain a regular tile

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Tile Based HEVC Video for Head Mounted Displaysiphome.hhi.de/skupin/assets/pdfs/ISM2016_TileBasedHEVCVideoForHMDs.pdfcommon HEVC slice wherever necessary to maintain a regular tile

Tile Based HEVC Video for Head Mounted Displays

Robert Skupin, Yago Sanchez, Cornelius Hellge, Thomas Schierl Video Coding & Analytics Department

Fraunhofer Heinrich-Hertz-Institute Berlin, Germany

{forename.surname}@hhi.fraunhofer.de

Abstract—360° video services with resolutions of UHD and beyond for Virtual Reality head mounted displays are a challenging task due to limits of video decoders in constrained end devices. Adaptivity to the current user viewport is a promising approach but incurs significant encoding overhead when encoding per user or set of viewports. A more efficient way to achieve viewport adaptive streaming is to facilitate motion-constrained HEVC tiles. Original content resolution within the user viewport is preserved while content currently not presented to the user is delivered in lower resolution. A lightweight aggregation of varying resolution tiles into a single HEVC bitstream can be carried out on-the-fly and allows usage of a single decoder instance on the end device.

Keywords— 360° video; virtual reality; tiles; HEVC.

I. INTRODUCTION The emergence of consumer grade virtual reality (VR)

headsets is forecasted to create a multibillion-dollar segment within the entertainment market during the coming years [1]. While gaming is expected to be the dominant use case for VR, head mounted displays (HMDs) can also serve as novel immersive means for consumption of high-resolution omnidirectional or 360° video. It is evident that a major fraction of todays US and global Internet traffic is caused by video data [2]. While most of this data relates to traditional video services, major platforms such as Youtube and Facebook are already delivering 360° video to various devices.

However, current 360° video services offer a somewhat limited user experience as resolution in the user viewport and hence visual quality are not up to par with traditional video services. Covering the full 360° surroundings in sufficient resolution could easily lead to multiple times UHD resolution, which poses a major challenge to the established video streaming chain as well as available end devices. VR relevant devices such as mobile phones contain hardware video decoders that are tailored to resolutions used in traditional video service such as FHD or UHD. Therefore, it is of importance to limit overall resolution to be transmitted and decoded. Taking into account the specifics of 360° video on HMDs, i.e. only a relatively small subset of the complete video is presented to the user at once, it is evident that adaptivity to the current user viewport is a promising approach to increase visual quality for the user at a given overall resolution. A simplistic approach for HMDs in this direction is to encode the exact or an over-provisioned viewport for each user. While adequately minimizing the

number of outside-viewport samples to be decoded, such a service comes at the cost of a massive encoding overhead when considered for large-scale deployments.

Another approach is to encode the 360° video multiple times for a number of viewport orientation, e.g. by projection to a pyramid surface [3], which still comes at the cost of a significant encoding and storage overhead for multiple dozen orientations. In the past, there has been a notable amount of work on tiling variants to address viewport adaptivity for high resolution panorama streaming [4]. However, the end devices addressed were for the most part flat-panel screens and tablets, all of which have a relatively high tolerance towards constraining user interaction such as viewport changes. HMDs as end device do not allow constraining viewport changes at all and correct content must be rendered and presented instantaneously on the turn of the users head.

In this demonstration paper, we propose an alternative approach to 360° video facilitating HEVC tiles. This approach allows emphasis of the current user viewport through decreasing resolution of outside-viewport video samples on-the-fly. Hence, the full 360° surroundings are always available on the end device. At the same time the amount of samples that lie outside the user viewport is reduced. The resolution adaptivity per video area is achieved without transcoding by merging motion-constrained HEVC tiles of varying resolution into a single common bitstream through lightweight tile aggregation [5]. The demonstration shows a 360° video system for HMDs that is capable of adapting resolution of video areas depending on the current user viewport and creates an individually tailored HEVC bitstream.

25%HighResolu/on

33%HighResolu/on

42%HighResolu/on

Tileboundary

Sliceboundary

Figure 1: Three exemplary tiling configuration mixing high and low resolution motion-constrained HEVC tiles from a cubic 360° video sequence into common bitstreams with varying viewport provisioning.

Page 2: Tile Based HEVC Video for Head Mounted Displaysiphome.hhi.de/skupin/assets/pdfs/ISM2016_TileBasedHEVCVideoForHMDs.pdfcommon HEVC slice wherever necessary to maintain a regular tile

II. DYNAMIC TILING The key principle of the proposed technique is to

dynamically adapt the resolution of the omnidirectional video content based on the current user viewport. For this purpose, the original 360° video is initially encoded at two resolutions making use of motion-constrained HEVC tiles at the desired tiling granularity. For instance, a cubic video1 could be evenly divided into 24 tiles, i.e. at a granularity of 4 tiles per cube face. Likewise, a version of the cubic video is downsampled by a factor of two and encoded at the same tiling granularity. Figure 1 depicts three variants of user viewport dependent video bitstreams mixing the 24 tiles with varying ratio of the two resolutions. The user viewport provisioning, i.e. video area in which the original resolution is preserved, varies from 25% to 42% of the complete 360° video as illustrated.

Taking, for instance, the variant with 33% viewport provisioning, 8 tiles maintain the original high resolution (HR), while the remaining 16 tiles are of low resolution (LR). In this setup, overall resolution of the 360° video to be transmitted and decoded is half the resolution of the original content. The ratio of 8 HR to 16 LR tiles corresponds to a high quality viewport of approximately 180° x 90° which serves to compensate for user viewport changes before the tile selection can be dynamically adjusted to the latest user viewport. The presented tiling approach also allows more flexibility for services to adapt to the characteristics of the client HMDs. For instance, as reported in [6], there is considerable variation of field of view in HMDs available or in development. Varying viewport provisioning of a 360° video service using the presented technique does not induce extra encoding or storage cost, while for viewport dependent encoding schemes such as presented in [3], all offered variants would have to be preprocessed, encoded and stored.

The motion-constrained HEVC tiles are encoded with a set of encoder constraints that allows lightweight aggregation into a single common bitstream without transcoding. These constraints mostly concern the inter prediction on tile boundaries. For further details the reader is referred to [5]. As illustrated in Figure 1, encoded tiles are placed into a common HEVC slice wherever necessary to maintain a regular tile grid and correct scan order. In order to change the selection of tiles as the user viewport changes, the encoding has to provide a suitable rate of random access points within the bitstream.

III. SYSTEM OVERVIEW The individual components of the demonstrator system

are illustrated in Figure 2. The tile bitstreams are available in two resolutions as independently decodable short duration segments as typical for HTTP based streaming delivery. The Oculus Rift Consumer Version 1 serves as HMD for the demonstrator. Feedback about the current head orientation of the user from the OculusSDK is used to control the tile selection process, which selects HR tiles for content within the current user viewport and LR tiles for the remaining

1 Dataset generously provided by Deep Inc.

content that does not lie within the user viewport. The selection of tiles for the upcoming time interval is passed to the tile aggregation component that merges the individual tile bitstream segments into the viewport dependent bitstream. The prototype system aggregates tiles using a proprietary implementation but it is worth noting that the ISO/IEC 14496-15 file format can also be facilitated to carry out the tile aggregation [7]. A proprietary HEVC software decoder processes the user dependent HEVC bitstream and passes decoded pictures to an OpenGL based render to generate the user viewport to be output on the Oculus Rift HMD.

REFERENCES [1] Deloitte, “Technology, Media & Telecommunication Prediction

2016”, Retrived from: http://www2.deloitte.com/content/dam/Deloitte/global/Documents/Technology-Media-Telecommunications/gx-tmt-prediction-2016-full-report.pdf, 2016

[2] Sandvine Intelligent Broadband Networks, "Internet Phenomena report", June 2016, Retrieved from https://www.sandvine.com/trends/global-internet-phenomena/, 2016

[3] E. Kuzyakov, D. Pio, “Next-generation video encoding techniques for 360 video and VR”, Retrieved from: https://code.facebook.com/posts/1126354007399553/next-generation-video-encoding-techniques-for-360-video-and-vr/, 2016

[4] Gaddam, V. R., Riegler, M., Eg, R., Griwodz, C., & Halvorsen, P., “Tiling in Interactive Panoramic Video: Approaches and Evaluation”, IEEE Transactions on Multimedia, 18(9), 1819-1831, 2016

[5] Sanchez, Y., Skupin, R., & Schierl, T., “Compressed Domain Video Processing for Tile based Panoramic Streaming using HEVC”, Proceedings of IEEE International Conference on Image Processing (ICIP), Quebec, Canada, September 2015

[6] Oscillada, J. M., “Comparison Chart of FOV (Field of View) of VR Headsets”, Retrieved from: http://www.virtualrealitytimes.com/2015/05/24/chart-fov-field-of-view-vr-headsets/, 2015

[7] M. M. Hannuksela, V. K. Malamal Vadakital (Nokia), K. Grüneberg, Y. Sanchez (Fraunhofer HHI), m38147, “ISO/IEC 14496-15: on extractor design for HEVC files (merging m37864 and m37873)”, 2016.

TileAggrega*on

TileSelec*on

OpenGLRendering

Orienta*onFeedback

HEVCDecoder

OculusSDK

PerTileBitstreamSegments

Figure 2: Demonstrator system overview: user orientation dependent tile aggregation for creation of an individual HEVC bitstream.