Upload
nell
View
31
Download
1
Embed Size (px)
DESCRIPTION
Scalable Media Coding Applications and New Directions. David Taubman School of Electrical Engineering & Telecommunications UNSW Australia. Overview of Talk. Backdrop and Perspective interactive remote browsing of media JPEG2000, JPIP and emerging applications Approaches to scalable video - PowerPoint PPT Presentation
Citation preview
UNSW – EE&T
Scalable Media CodingApplications and New Directions
David Taubman
School of Electrical Engineering & TelecommunicationsUNSW Australia
UNSW – EE&T
Taubman; PCS’13, San Jose 2
Overview of Talk• Backdrop and Perspective
– interactive remote browsing of media– JPEG2000, JPIP and emerging applications
• Approaches to scalable video– fully embedded approaches based on wavelet lifting– partially embedded approaches based on inter-layer prediction– important things to appreciate
• Key challenges for scalable video and other media– focus on motion, depth and natural models
• Early work on motion for scalable coding– focus: reduce artificial boundaries
• Current focus: everything is imagery– depth, motion, boundary geometry– scalable compression of breakpoints (sparse innovations)– estimation algorithms for joint discovery of depth, motion and geometry– temporal flows based on breakpoints
• Summary
UNSW – EE&T
Emerging Trends• Video formats
– QCIF (25 Kpel), CIF (100 Kpel),4CIF/SDTV (½ Mpel), HDTV (2 Mpel)UHDTV 4K (10 Mpel) and 8K (32 Mpel)
– Cinema: 24/48/60 fps; UHDTV: potentially up to 120 fps– HFR: helmet cams & professional cameras now do 240fps
• Displays– “retina” resolutions (200 to 400 pixels/inch)– what resolution video do I need for an iPad? (2048x1536?)
• Internet and mobile devices – vast majority of internet traffic is video– 100’s of separate non-scalable encodings of a video
• http://gigaom.com/2012/12/18/netflix-encoding
• New media: multi-view video, depth, …
Taubman; PCS’13, San Jose 3
UNSW – EE&T
Embedded media• One code-stream, many subsets of interest
4Taubman; PCS’13, San Jose
Low res
Medium res
High res
Compressed bit-streamLow quality
Medium quality
High quality
Low frame rate
Medium frame rate
High frame rate
• Heterogeneous clients (classic application)– match client bandwidth, display, computation, ….
• Graceful degradation (another classic)– more important subsets protected more heavily, …
• Robust offloading of sensitive media– backup most important elements first
UNSW – EE&T
Our focus: Interactive Browsing• Image quality progressively improves over time
• Video quality improves each time we go back
• Region (window) of interest accessibility– in space, in time, across views, …
– can yield enormous effective compression gains
– prioritized streaming based on “degree of relevance”• some elements contribute only partially to the window of interest
• Related application: Early retrieval of surveillance– can’t send everything over the air
– can’t wait until the plane lands
Taubman: PCS’13, San Jose 5
UNSW – EE&T
Scalable images – things that work well• Multi-resolution transforms
– 2D wavelet transforms work well
6
• Accessibility through partitioned coding of subbands– Region of interest access without any blocking artefacts
ECDZQ R-D curve(step size modulation)
)1(1I )1(
2I )1(3I)1(
1I
)2(0I )2(
1I
2
4
)2/1(2
)4/1(4
)1(0I
Bit-plane coding(truncation)
L
D2 coding passes
per bit-plane
• Embedded coding– Successive refinement through bit-plane coding– Multiple coding passes/bit-plane improve embedding
Taubman; PCS’13, San Jose
UNSW – EE&T
7
JPEG2000 – more than compression
Decoupling and embedding
embeddedembedded
code-block code-block
bit-streams bit-streams
embeddedembedded
code-block code-block
bit-streams bit-streams
LLLL22
LHLH22HHHH22
HLHL22
HLHL11
HHHH11LHLH11
Taubman; PCS’13, San Jose
UNSW – EE&T
8
JPEG2000 – more than compression
Spatial random access
Taubman; PCS’13, San Jose
UNSW – EE&T
9
JPEG2000 – more than compression
Quality and resolution scalability
LLLL22
LHLH22 HHHH22
HLHL22
HLHL11
HHHH11LHLH11
layer 1layer 1layer 2layer 2layer 3layer 3
quality layers
Taubman; PCS’13, San Jose
UNSW – EE&T
10
JPEG2000 – JPIP interactivity (IS15444-9)
• Client sends “window requests”– spatial region, resolution, components, …
• Server sends “JPIP stream” messages– self-describing, arbitrarily ordered– pre-emptable, server optimized data stream
• Server typically models client cache– avoids redundant transmission
• JPIP also does metadata– scalability & accessibility for text, regions, XML, …
Cache Model
imagerywindow request
JPIP Server JPIP Client
Target(file or code-stream) Decompress/render
ApplicationJPIP stream + response headers
Client Cache
window
window
status
Taubman; PCS’13, San Jose
UNSW – EE&T
11
What can you do with JPIP?
• Highly efficient interactive navigation within– large images (giga-pixel, even tera-pixel)– medical volumes– virtual microscopy– window of interest access, progressive to lossless– interactive metadata
• Interactive video– frame of interest– region of interest– frame rate and resolution of interest– quality improves each time we go back over content
Panoramic Video Demo
Aerial Demo
Album Demo
Catscan Demo
Campus Demo
Taubman; PCS’13, San Jose
UNSW – EE&T
Taubman; PCS’13, San Jose 12
Overview of Talk• Backdrop and Perspective
– interactive remote browsing of media– JPEG2000, JPIP and emerging applications
• Approaches to scalable video– fully embedded approaches based on wavelet lifting– partially embedded approaches based on inter-layer prediction– important things to appreciate
• Key challenges for scalable video and other media– focus on motion, depth and natural models
• Early work on motion for scalable coding– focus: reduce artificial boundaries
• Current focus: everything is imagery– depth, motion, boundary geometry– scalable compression of breakpoints (sparse innovations)– estimation algorithms for joint discovery of depth, motion and geometry– temporal flows based on breakpoints
• Summary
UNSW – EE&T
Wavelet-like approaches
• Lifting factorization exists for any FIR temporal wavelet transform
• Warp frames prior to lifting filters, j
– motion compensation aligns frame features– invertibility unaffected (any motion model)
* * 11
** 22
** L-1L-1
** LL
Even framesEven frames
Odd framesOdd frames
Low-pass framesLow-pass frames
High-pass framesHigh-pass frames
warpwarpframeframe
ss
warpwarpframesframes
warpwarpframeframe
ss
warpwarpframesframes
Motion Compensated Temporal Lifting
Taubman; PCS’13, San Jose 13
UNSW – EE&T
Motion Compensated Temporal Lifting
• Equiv. to filtering along motion trajectories– subject to good MC interpolation filters– subject to existence of 2D motion trajectories
• Handles expansive/contractive motion flows– motion trajectory filtering interpretation still valid for all but
highest spatial freqs– subject to disciplined warping to minimize aliasing effects
• Need good motion!– smooth, bijective mappings wherever possible– avoid unnecessary motion discontinuities (e.g., blocks)
Taubman; PCS’13, San Jose 14
UNSW – EE&T
t+2D Structure with MC Lifting
MCLifting
MCLifting
SpatialDWT
SpatialDWT
SpatialDWT
SpatialDWT
SpatialDWT
SpatialDWT
MCLifting
MCLifting
t+2D Analysis t+2D Synthesis
H1
H2
L2
H2
L2
H1
Spatio-Temporal Subbands
input frames
Spatial resolutionSpatial resolution
Qualit
yQ
ualit
yTemporal
Temporal
resolution
resolution
Dimensions of Scalability:Dimensions of Scalability:
NB: other interestingNB: other interestingstructures existstructures exist
Taubman; PCS’13, San Jose 15
UNSW – EE&T
Inter-layer prediction (SVC/SHEVC)
Taubman: PCS’13, San Jose 16
0011
2233
44
00
22
44
00
44
0011
2233
44
00
22
44
00
44High Resolution
Layer
Low ResolutionLayer
Multiple predictors• pick one?• blend?
Implications for embedding …
UNSW – EE&T
Important Principles• Scalability is a multi-dimensional phenomenon
– not just a sequential list of layers
• Quality scalability critical for embedded schemes– low res subset needs much higher quality at high res
• Physical motion scales naturally– not generally true for block-based approximations– but, 2D motion discontinuous at boundaries
• Prediction alone is sub-optimal– full spatio-temporal transforms preferred
• improve the quality of embedded resolutions and frame rates• noise orthogonalisation
– ideally orthogonal transforms• note body of work by Flierl et al. (MMSP’06 thru PCS’13)
Taubman: PCS’13, San Jose 17
UNSW – EE&T
18
00
1
2/)(ˆ0 g)(ˆ1 g
Temporal transforms: Why prediction alone is sub-optimal
2
1
2
1
12 kf
kf2 22 kfevenframes
oddframes
residual
forward transform
2
1
2
1
12 kf
kf2 22 kf
reverse transform
12 ky 12 ky
2qL
2qH
quantization
1
-½-½
1
0H
1H
0G
1G
2qL
2qH
2
2
2
2
kf
1
½½ 1
Redundant spanningof low-pass content byboth channels
High-pass quantizationnoise has unnecessarilyhigh energy gain.
Bi-directionalprediction
Taubman: PCS’13, San Jose
UNSW – EE&T
19
Reduced noise power through lifting
• Inject –ve fraction of high band into low band synthesis path– removes low freq. noise power from
synthesized high band
• Add compensating step in the forward transform– does not affect energy compacting
properties of prediction
2
1
2
1
12 kf
kf2 22 kfevenframes
oddframes
2
1
2
1
12 kf
kf2 22 kf
12 ky 12 ky
2qL
2qH
22 kfky2
00
1
2/)(ˆ0 g
)(ˆ1 g
12 ky12 ky
4
1
4
1
12 ky12 ky
ky2
4
1
4
1
Taubman: PCS’13, San Jose
UNSW – EE&T
Taubman; PCS’13, San Jose 20
Overview of Talk• Backdrop and Perspective
– interactive remote browsing of media– JPEG2000, JPIP and emerging applications
• Approaches to scalable video– fully embedded approaches based on wavelet lifting– partially embedded approaches based on inter-layer prediction– important things to appreciate
• Key challenges for scalable video and other media– focus on motion, depth and natural models
• Early work on motion for scalable coding– focus: reduce artificial boundaries
• Current focus: everything is imagery– depth, motion, boundary geometry– scalable compression of breakpoints (sparse innovations)– estimation algorithms for joint discovery of depth, motion and geometry– temporal flows based on breakpoints
• Summary
UNSW – EE&T
Motion for Scalable Video• Fully scalable video requires scalable motion
– reduce motion bit-rate as video quality reduces– reduce motion resolution as video resolution reduces
• First demonstration (Taubman & Secker, 2003)
– 16x16 triangular mesh motion model– Wavelet transform of mesh node vectors– EBCOT coding of mesh subbands– Model-based allocation of motion bits to quality layers– Pure t+2D motion-compensated temporal lifting
21Taubman; PCS’13, San Jose
UNSW – EE&T
Scalable motion – very early results
22
0 200 400 600 800 1000 120018
20
22
24
26
28
30
32
0 200 400 600 800 1000 120018
20
22
24
26
28
30
32
34
Lum
inan
ce P
SN
R (
dB)
Lum
inan
ce P
SN
R (
dB)
Lum
inan
ce P
SN
R (
dB)
Lum
inan
ce P
SN
R (
dB)
Bit-Rate (kbit/s)Bit-Rate (kbit/s)
Non-scalableBrute-force
Model-basedLossless motion
Non-scalableBrute-force
Model-basedLossless motion
CIF Bus CIF Bus at QCIF resolution
Bit-Rate (kbit/s)Bit-Rate (kbit/s)
2.01.81.61.41.21.00.80.60.40.2
36
34
32
30
28
26
24
22
20
38
CIF mobile
Lum
inan
ce P
SN
R (
dB)
Lum
inan
ce P
SN
R (
dB)
Bit-Rate (Mbit/s)Bit-Rate (Mbit/s)
5/3 temporal lifting
1/3 (hierarchical B-frames)
H.264 high complexity H.264 results
• CABAC• 5 prev, 3 future ref frames• multi-hypothesis testing• (courtesy of Marcus Flierl)
Taubman; PCS’13, San Jose
UNSW – EE&T
Motion challenges• Issues:
– smooth motion fields scale well• mesh is guaranteed to be smooth and invertible everywhere
– but, real motion fields have discontinuities
• Hierarchical block-based schemes– produce a massive number of artificial discontinuities– not invertible – i.e., there are no motion trajectories– non-physical – hence, not easy to scale– but, easy to optimize for energy compaction
• particularly effective at lower bit-rates
• Depth/disparity has all the same issues as motion– both tend to be piecewise smooth media– NB: bandlimited sampling considerations may not apply
• boundary discontinuity modeling more important for these media
23Taubman; PCS’13, San Jose
UNSW – EE&T
24
Aliasing challenges
Analysis filter responses of thepopular 9/7 wavelet transform
1)(ˆ)(ˆ)(ˆ)(ˆ1010 hhhh
Fundamental constraint:(for perfect reconstruction)
half-band filter
00
1
2/
)(0̂ h )(1̂ h
)(1̂ h
Extract LLsubband
Spatial aliasing
Taubman; PCS’13, San Jose
UNSW – EE&T
Spatial scalability – t+2D
• Temporal transform uses full spatial resolution
25
temporalpredict
temporalupdate
• At reduced spatial resolution– Temporal synthesis steps missing high-resolution info
– If motion trajectories wrong/non-physical ghosting
– If trajectories valid temporal synthesis reduces aliasing• less aliasing than regular spatial DWT at reduced resolution
Taubman; PCS’13, San Jose
UNSW – EE&T
Taubman; PCS’13, San Jose 26
Overview of Talk• Backdrop and Perspective
– interactive remote browsing of media– JPEG2000, JPIP and emerging applications
• Approaches to scalable video– fully embedded approaches based on wavelet lifting– partially embedded approaches based on inter-layer prediction– important things to appreciate
• Key challenges for scalable video and other media– focus on motion, depth and natural models
• Early work on motion for scalable coding– focus: reduce artificial boundaries
• Current focus: everything is imagery– depth, motion, boundary geometry– scalable compression of breakpoints (sparse innovations)– estimation algorithms for joint discovery of depth, motion and geometry– high level view of the coding framework we are pursuing
• Summary
UNSW – EE&T
Block-based schemes with merging
• Linear & affine models– encourages larger blocks
• Merging of quad-tree nodes– encourages larger regions and improves efficiency– merging approach later picked up by the HEVC standard
• Hierarchical coding– works very well with merging; provides resolution scalability
27
Affine motion model
Linear motion model
Pure translation
leafmerging
(Mathew and Taubman, 2006)
Taubman; PCS’13, San Jose
UNSW – EE&T
Boundary geometry and merging• Model motion & boundary• No merging
– Hung et al. (2006)– Escoda et al. (2007)
• With merging– Mathew & Taubman (2007)– separate quad-trees (2008)
28
M2M1
Motion Comp (M2)
Motion Comp (M1)
Combined result
Taubman; PCS’13, San Jose
Motion only Motion + boundary Separate quad-trees
UNSW – EE&T
Indicative Performance
• Things that reduce artificial discontinuities:– modeling geometry as well as motion– separately pruned trees for geometry and motion– merging nodes from the pruned quad-trees
• These schemes are practical and resolution scalable– readily optimized across the hierarchy
29
Two quad-trees:motion, geometry,
merging
Single quad-tree:motion, geometry,
merging
Single quad-tree:motion only
Single quad-tree:motion + merging
Taubman; PCS’13, San Jose
UNSW – EE&T
Taubman; PCS’13, San Jose 30
Overview of Talk• Backdrop and Perspective
– interactive remote browsing of media– JPEG2000, JPIP and emerging applications
• Approaches to scalable video– fully embedded approaches based on wavelet lifting– partially embedded approaches based on inter-layer prediction– important things to appreciate
• Key challenges for scalable video and other media– focus on motion, depth and natural models
• Early work on motion for scalable coding– focus: reduce artificial boundaries
• Current focus: everything is imagery– depth, motion, boundary geometry– scalable compression of breakpoints (sparse innovations)– estimation algorithms for joint discovery of depth, motion and geometry– temporal flows induced by breakpoints
• Summary
UNSW – EE&T
Geometry from arc breakpoints
• Breaks can represent segmentation contours– but don’t need a segmentation– breaks are all we need for adaptive transforms– direct RD optimisation is possible
• Natural resolution scalability– finer resolution arcs embedded in coarser arcs
Taubman: PCS’13, San Jose 31
Coarse grid Finer grid
ArcsGridpoints
UNSW – EE&T
Finer grid
Breakpoint induction and scalability
• Explicitly identify subset of breakpoints – “vertices”– remaining breakpoints get induced– position of break on its arc impacts induction
• Representation is fully scalable– resolution improves as we add vertices at finer scales– quality improves as we add precision to breakpoint positions
• Breakpoint representation is image-like– vertex density closely related to image resolution– vertex accuracy like sample accuracy in an image hierarchy
Taubman: PCS’13, San Jose 32
Coarse grid
direct inductiononto sub-arcs
spatial inductiononto root-arcs
spatial inductiononto root-arcs
UNSW – EE&T
33
Breakpoints drive an adaptive DWT
Basis functions do not cross discontinuity along an arc
Max of one breakpoint per arc Adaptive transform well defined
Arc BreakpointPyramid
Field SamplePyramid
Original field samples
Breakpoint Adaptive Transforms (BPA-DWT) – sequence of non-separable 2D lifting steps
P1-step
U1-step
P2-step
U2-step
Taubman: PCS’13, San Jose
UNSW – EE&T
root arcs
Arc-bands and sub-bands
34
Sub-bands (BPA-transformed data)Arc-bands (vertices)
Level 2
Level 1
Level 0
codeblocksnon-root
arcs
Taubman: PCS’13, San Jose
UNSW – EE&T
Embedded Block Coding– for scalability and ROI accessibility
• Sub-band stream (field samples)– Sub-bands divided into code blocks– Coded using EBCOT (JPEG2000)– Bitplanes assigned to quality layers
• Vertex Stream– Arc-bands divided into code blocks– Coding scheme similar to EBCOT– Bitplanes refine vertex locations– Bitplanes assigned to quality layers
1
0
1
1
0
0
X
1
0
1
0
0
0
X
0
0
1
0
0
0
X
1
0
0
0
0
0
X
LSB
λ1
λ2
λ3
Vertex stream
Sub-band stream
Taubman: PCS’13, San Jose 35
UNSW – EE&T
Fully Scalable Depth Coding– field samples are depth; breaks are depth discontinuities
36
JPEG 2000, 50 k bits• Resolution scalable• Quality scalable• No blocks
Proposed, 50 k bits• Resolution scalable• Quality scalable• No blocks
Poorly suited todiscontinuities in
depth/motion fields
Well suited todiscontinuous
depth/motion fields
Taubman: PCS’13, San Jose
UNSW – EE&T
R-D Results for Depth Coding
Taubman: PCS’13, San Jose 37
Breakpoint-adaptive: vertex stream is truncated at a quality level sub-band stream decoded
progressively
UNSW – EE&T
Model-based quality layering
38
• Scaled by discarding sub-band and arc-band quality layers– fully automatic model-based quality layer formation– model-based interleaving of all quality layers for optimal embedding
Taubman: PCS’13, San Jose
UNSW – EE&T
Model-based quality layering
39
• Compared with segmentation based approach(Zanuttigh & Cortelazzo, 2009)– not scalable; sensitive to initial choice of segmentation complexity
Taubman: PCS’13, San Jose
UNSW – EE&T
Fully scalable motion coding – preliminary
• Field samples are motion vectors– motion coded using EBCOT after BPA-DWT
• Breakpoints and motion jointly estimated– compression-regularised optical flow:
– Coded length L provides the prior (regularisation)• reflects cost of scalably coding arc breakpoints• plus cost of coding motion wavelet coeffs after BPA-DWT
– Distortion D provides the observation model
Taubman: PCS’13, San Jose 40
UNSW – EE&T
Optimisation framework• Approach based on loopy belief propagation• Breakpoints/vertices connected via inducing rules
– turns out to be very efficient• complexity grows only linearly with image size
• Breakpoints/motion connected via BPA-DWT– initial simplification: motion at each pixel location drawn
from a small finite set (cardinality 5)– initial motion candidates generated by block search
with varying block sizes
• System always converges rapidly• Use modes of the marginal beliefs at each node
Taubman: PCS’13, San Jose 41
UNSW – EE&T
Initial scalable motion results
• 2 frames, inter-predict only; occlusions removed– newer work eliminates restriction to finite motion set
Taubman: PCS’13, San Jose 42
JPEG2000
H.264
trunc motion bits
trunc vertexand motion bits
UNSW – EE&T
Visualising the breaks & motion
Taubman: PCS’13, San Jose 43
UNSW – EE&T
Temporal induction of motion – preliminary
• Given:– motion from f1 to f2
– plus breakpoint fields in f1 and f2
• Induce:– motion from f2 to f1
– resolve ambiguities, find occlusions
Taubman: PCS’13, San Jose 44
f1 f2
UNSW – EE&T
Synthetic example:
Taubman: PCS’13, San Jose 45
Inferred horizontal motion
Inferred vertical motion
Inferred occlusion
Resolvedambiguities
UNSW – EE&T
Temporal induction of breaks – preliminary
• Given:– motion from frame f1 to frame f3
– plus breakpoint fields in f1 and f3
– plus motion from f1 to f2 (real or estimated)
• Induce:– breakpoint field in frame f2
– hence estimate all other motion fields– note: induced occlusions are temporal breaks
Taubman: PCS’13, San Jose 46
f1 f2 f3
UNSW – EE&T
Synthetic example:
Taubman: PCS’13, San Jose 47
Horizontal breaksFrame 1Horizontal breaksFrame 2 inducedHorizontal breaks
Frame 3
UNSW – EE&T
Things we are working on• Message approximation strategies for joint inference
of breakpoint fields with motion– compression regularized optical flow– continuous motion, notionally at infinite precision
• Advanced breakpoint coding techniques– including differential coding schemes
• i.e., geometry transforms
– including inter-block conditional coding schemes that do not break the EBCOT paradigm
• Breakpoint adaptive spatio-temporal transforms– breakpoints potentially address most of the open issues
surrounding motion compensated lifting schemes
• Integration within our JPEG2000 SDK (Kakadu)– interactive access, JPIP for remote browsing, etc.
Taubman: PCS’13, San Jose 48
UNSW – EE&T
Demo placeholder• Interactive browsing of breakpoint media over JPIP
Taubman: PCS’13, San Jose 49
UNSW – EE&T
Arc breakpoints vs graph edges• arcs = possible edges on spatial graph
– breaks = missing edges– main point of divergence
• arc breakpoints have locations (key to induction & scalability)• graph edges may be weighted (equivalent to “soft” breaks)
• Related approaches: “Graph based transforms for depth video coding”
Kim, Narang and Ortega, ICASSP’12 “Video coder based on lifting transforms on graphs”
Martinez-Enriquez, Diaz-de-Maria & Ortega, ICIP’11– Edges (binary breaks) from segmentation (JBIG encoded)– Motion (block based) produces temporal edges– Spatio-temporal video transform adapts to graph– hierarchical, but not scalable in the usual sense
Taubman: PCS’13, San Jose 50
UNSW – EE&T
Summary• Interactive image browsing benefits from
– embedded transforms and embedded block coding– intelligent servers send only a fraction of total content– getting only what you want ~ huge effective coding gain
• Challenges for interactive video– fully embedded motion fields– bijective motion fields wherever this is physical
• Key: fully scalable model of motion discontinuities– allows motion to be invertible almost everywhere– ideal for motion compensated 3D wavelet xforms– allows motion to scale naturally in space and time
• Numerous research challenges …
Taubman: PCS’13, San Jose 51
UNSW – EE&T
Important Acknowledgements• Reji Mathew
– scalable motion, scalable depth & breakpoint estimation
• Sean Young– scalable motion coding results
• Dominic Rüfenacht– temporal induction of motion and breakpoints
• Pietro Zanuttigh– breakpoint estimation for depth coding
• Australian Research Council– much of the presented research was partially funded
by the ARC Discovery Projects scheme
Taubman: PCS’13, San Jose 52