22
Parallel Scalability and Efficiency of HEVC Parallelization Approaches Chi Ching Chi, Mauricio Alvarez- Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry, St´ephane Pateux and Thomas Schierl IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Parallel Scalability and Efficiency of HEVC Parallelization Approaches Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry,

Embed Size (px)

Citation preview

Page 1: Parallel Scalability and Efficiency of HEVC Parallelization Approaches Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry,

Parallel Scalability and Efficiency ofHEVC Parallelization Approaches

Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry, St´ephane

Pateux and Thomas SchierlIEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS

FOR VIDEO TECHNOLOGY

Page 2: Parallel Scalability and Efficiency of HEVC Parallelization Approaches Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry,

Outline

• Introduction• Video codec parallelization approaches• Coding efficiency analysis• Experimental evaluation• Conclusions

Page 3: Parallel Scalability and Efficiency of HEVC Parallelization Approaches Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry,

Introduction

• While the single-core processor can decode a 1080p H.264/AVC video in real-time, it is very unlikely that processor performance will decode a 2160p50 HEVC video in real-time.

• To obtain real-time HEVC decoding performance, parallelism is no longer an option but a necessity.

Page 4: Parallel Scalability and Efficiency of HEVC Parallelization Approaches Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry,

Introduction

• H.264/AVC supports slice parallelization.• It may not achieve real-time if it receives a

video with one or a few slices per frame.• The main parallelization approaches currently

included in the HEVC draft (Tiles and Wavefront Parallel Processing[WPP]).

• This paper presents a approach called Overlapped Wavefront(OWF).

Page 5: Parallel Scalability and Efficiency of HEVC Parallelization Approaches Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry,

Previous parallelization strategies

• Frame-level parallelism• Slice-level parallelism• Macroblock-level parallelism

Page 6: Parallel Scalability and Efficiency of HEVC Parallelization Approaches Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry,

Frame-level parallelism

• Frame-level parallelism consists of processing multiple frames at the same time.

• Frame-level parallelism is sufficient for multicore systems with just a few cores.

• If due to fast motion, motion vectors are long, there is little parallelism.

Page 7: Parallel Scalability and Efficiency of HEVC Parallelization Approaches Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry,

Slice-level Parallelism

• Each frame can be partitioned into one or more slices.

• Slices in a frame are completely independent from each other and therefore they can also be used for parallel processing.

• It is useful for a frame with a few slices but not one slice per frame.

Page 8: Parallel Scalability and Efficiency of HEVC Parallelization Approaches Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry,

Macroblock-level Parallelism

Page 9: Parallel Scalability and Efficiency of HEVC Parallelization Approaches Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry,

Parallelization Strategies in HEVC

• Tiles• Wavefront Parallel Processing (WPP)• Overlapped Wavefront (OWF)

Page 10: Parallel Scalability and Efficiency of HEVC Parallelization Approaches Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry,

Tiles

Page 11: Parallel Scalability and Efficiency of HEVC Parallelization Approaches Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry,

Tiles

• The number of tiles and the location of their boundaries can be defined for the entire sequence or changed from picture to picture.

• Compared to slices, Tiles have a better coding efficiency.

• The rate-distortion loss increases with the number of tiles.

Page 12: Parallel Scalability and Efficiency of HEVC Parallelization Approaches Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry,

Wavefront Parallel Processing (WPP)

Page 13: Parallel Scalability and Efficiency of HEVC Parallelization Approaches Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry,

Overlapped Wavefront (OWF)

• When a thread has finished a CTB row in the current picture and no more rows are available it can start processing the next picture instead of waiting for the current picture to finish.

• The support this approach, the motion vector is contrained to ¼ of picture height.

Page 14: Parallel Scalability and Efficiency of HEVC Parallelization Approaches Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry,

Overlapped Wavefront (OWF)

Page 15: Parallel Scalability and Efficiency of HEVC Parallelization Approaches Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry,

Coding efficiency analysis

Page 16: Parallel Scalability and Efficiency of HEVC Parallelization Approaches Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry,

Coding efficiency analysis

Page 17: Parallel Scalability and Efficiency of HEVC Parallelization Approaches Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry,

Experimental evaluation

• Environment

Page 18: Parallel Scalability and Efficiency of HEVC Parallelization Approaches Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry,

Experimental evaluation

Page 19: Parallel Scalability and Efficiency of HEVC Parallelization Approaches Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry,

Experimental evaluation

Page 20: Parallel Scalability and Efficiency of HEVC Parallelization Approaches Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry,

Experimental evaluation

Page 21: Parallel Scalability and Efficiency of HEVC Parallelization Approaches Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry,

Experimental evaluation

Page 22: Parallel Scalability and Efficiency of HEVC Parallelization Approaches Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry,

Conclusions

• We present a detailed performance comparison of the main approaches, namely WPP ,Tiles and OWF.

• Tiles performance 7% higher than WPP on average at 12 cores.

• The proposed OWF 28% higher on average than Tiles.

• Achieve real-time performance for 1080p50 videos, but “only” 25.4 fps for 2160p.