Non-essentiality of Correlation between Image and Depth Map in Free Viewpoint Image Coding: Accurate Depth Map Case

Non-essentiality of Correlation between Image and Depth Map in Free Viewpoint Image Coding: Accurate Depth Map Case

Tomohiko Inoue, Norishige Fukushima, Yutaka Ishibashi

Graduate School of Engineering,Nagoya Institute of Technology, Japan

Outline

Background Related Works Purpose Experimental Environment Experimental Results Conclusions and Future Works

Background (1/2)

Original Images Depth Maps

Depth Image Based Rendering

Free Viewpoint Images

Images and their depth maps are huge,

thus an effective coding is necessary.

Background (2/2)

Coding Depth Map Free viewpoint image

3Dwarping

Coding distortions deteriorate

the quality of the view synthesis.

Remove the distortions from the coded depth

map.

Related Works (1/7)

Encoding flow

Input

Output

Transformation

Quantization

Encode

Inverse quantizatio

n

Inverse transformatio

n

Decode

Pre-processing

Post-processing

Bit-stream

Related Works (2/7)

Post filter Non-joint filter

- Bilateral filter- Boundary reconstruction filter- Post filter set

Joint filter- Joint bilateral filter- Trilateral filter- Weighted mode filter

Related Works (3/7) Post filter set

1. Median filter- Removes spike noises- Intermediate values in the boundaries is

left.

Coded depth map - spike noises - intermediate values


1. Median filter- Removes spike noises- Intermediate values in the boundaries is

left.

Intermediate values

Median filter


2. Min-max blur remove filter- Replace blurred pixels with min or max

filtered value.

Coded depth map 　　　　　 after median filter - intermediate values is left



filtered value.

Coded depth map 　　　　　 after median filter - intermediate values is left



filtered value.

Min-max blurremove filter

Post filter set 3. Binary weighted range filter

- Simplified filter of the bilateral filter- The hard thresholding filter

Related Works (5/7)

1 1 1 1 11 1 1 1 11 1 1 1 01 1 1 0 01 1 0 0 0

Adaptive weight of kernel by threshold

filtering

Weighted mode filter (joint filter) The filter uses frequency of weighed

depth values, whose weight is defined by distance and nearness of depth values and color values, into local histogram.

Related Works (6/7)

Obtain the global mode value

Localized histogram

Depth value

Freq

uen

cy

Related Works (7/7)

RGB imageDepth map

Joint filter

Non-joint filter

Non-edge aligned case: conventional

Signal of depth and image

Input Coded Post filtered

Purpose

Signal of depth and image

Input Coded

Edge aligned case: this presentation

Which type of filtershould we use ?

RGB imageDepth map

Experimental Environment (1/3)

Input Datasets: Art, Bowling1, Cloth1, Books, Reindeer,

Wood1*1.We test 30 sequences and pick up representatives

Left-right images and depth maps. Max filter

Approximate version of the alpha-matting-based view synthesis*2.*1 D. Scharstein and C. Pal, in Proc. CVPR, June 2007.

Image

Depthmap

Max filter(3×3)

Post filtering

Encode&

Decode

Viewsynthesis

*2 X. Xu et al., SPIC, vol. 28, issue 9, pp. 1023-1045, Oct. 2013


Image

Depthmap

Max filter(3×3)

Post filtering

Encode&

Decode

Viewsynthesis

Encode, Decode JPEG, JPEG 2000, H.264/AVC Using the same codecs for image-and-depth

pairs Post filter

Post filter set Weighted mode filter Post filter set + Weighted mode filter Weighted mode filter (Reference : Depth map

itself)


Image

Depthmap

Max filter(3×3)

Post filtering

Encode&

Decode

Viewsynthesis

View synthesis Synthesized view at the center viewpoint

between two reference views For evaluation, we compare the synthesized

view by using Y channel of Peak Signal to Noise Ratio (PSNR).

Experimental results (1/5)

Post filter set Weighted mode filter

Art, H.264/AVC(RGB-image QP=32, Depth map QP=32)

0 1 2 3 425

26

27

28

29

30

31

32

33

34

35

bit per pixel [bpp]

PSN

R(sy

nthe

size

d vi

ew) [

dB]


bpp vs PSNR of synthesized view (Art, H.264/AVC)

RGB-image QP=32

RGB-image QP=26

RGB-image QP=20

RGB-image QP =41

Depth map QP=41,32,26,20

0 1 230

31

32

33

34

35

36

37

38

39

40

bit per pixel [bpp]

PSN

R(sy

nthe

size

d vi

ew) [

dB]


bpp vs PSNR of synthesized view (Bowling1, H.264/AVC)

RGB-image QP=32

RGB-image QP=26

RGB-image QP=20

RGB-image QP =41

0 1 2 3 4 5 6 726

28

30

32

34

36

38

40

42

bit per pixel [bpp]

PSN

R [d

B]


bpp vs PSNR of synthesized view (Cloth1, H.264/AVC)


0 1 2 3 42526272829303132333435

bit per pixel [bpp]

PSN

R(sy

nthe

size

d vi

ew) [

dB]

bpp vs PSNR of synthesized view (Art, JPEG)

RGB-image QP=10

RGB-image QP=35 RGB-image QP=60RGB-image QP=10

Conclusions

We show non-essentiality of using a correlation between an image and a depth map for the DIBR. Especially we show the case of using highly accurate depth map. Various image codecs

(JPEG,JPEG2000,H.264/AVC) Post filter set is the best.

Future works

We use estimated depth maps which have high accuracy to verify the result.

We make R-D optimizations for improving coding performance of actual codecs to reveal the optimal bit allocation between images and depth maps.