Upload
jane-parks
View
215
Download
0
Embed Size (px)
Citation preview
Unwrap Mosaics: A new representation for video editing
Alex Rav-Acha et al.In SIGGRAPH 2008
발표 이성호2009 년 1 월 22 일
2
Abstract
• A new representation for video• Modeling the image-formation process
– From an object’s texture map to the image• Modulated by an object-space occlusion mask
– Recover “unwrap mosaic”
• Editing• Re-composition
– Into the original space– Resizing objects– Repainting textures– Copying/cutting/pasting objects– Attaching effects layers to deforming objects
3
Introduction
• Unwrap mosaics– A wide range of Editing– Deforming surface– Self-occlusion– Provides references for feature-point
tracking
4
Reconstructiong 3D model
• Not easy• Software packages– [2d3 Ltd. 2008; Thorm¨ahlen and
Broszio 2008]
• Extensions to nonrigid scenes– [Bregler et al. 2000; Brand 2001;
Torresani et al. 2008]
• Less reliable under occlusion
5
Dense reconstruction from video
• Restricted to rigid scenes – [Seitz et al. 2006]
• Using interactive tools– [Debevec et al. 1996; van den Hengel et al. 2007]– Also for rigid scenes
• Triangulation of the sparse points from nonrigid structure– Surprisingly troublesome
6
Unwrap mosaics
• Ro recover the object’s texture map– Rather than its 3D shape– Directly from video
• recovered texture map will be a– 2D-to-2D mapping– Sequence of binary masks modeling oc-
clusion– Unwrap mosaic
7
• A video will typically be represented– by an assembly of several unwrap mosaics
• one per object,• and one for the background.
• Edits– performed on mosaic itself– Without converting to 3D
• Main contribution– Recover the unwrap mosaic from images
• Energy minimization procedure
8
• Figure 2: Reconstruction overview. Steps in constructing an unwrap mosaic rep-resentation of a video sequence. Steps 1 to 3a form an initial estimate of the model parameters, and step 3 is an energy minimization procedure which re-fines the model to subpixel accuracy.
9
Segmentation
• Segment the sequence into – independently moving objects
• “video cut and paste” [Li et al. 2005]
• Allow for user interaction
10
Tracking
• Recover the texturemap of – a deforming 3D object – from a sequence of 2D images
• texture map may be assumed to be con-stant– although the model is changing its shape
• Interestpoint detection and tracking – [Sand and Teller 2006, for example]
11
Embedding
• view the sparse point tracks – as a high dimensional projection of the
2D surface parameters
12
Mosaic stitching
• a map from the tracked points in each image– to the (u; v) parameter space.
• A variation of [Agarwala et al. 2004]– emerges naturally from the energy formulation.
13
Track refinement
• the mosaic is good enough – to create a reference template to match• against the original frames
• reduces any drift that may have been present after the
• original tracking phase
14
Using the model for video edit-ing
• Edit the texture map– For example by drawing on it– Warp it via the recovered mapping– Combine with the other layers of the original sequence
• Re-rendered mosaic will not exactly match the original sequence– warped by the 2D–2D mapping– masked by the occlusion masks– alpha-blended with the original image
• Remove layers– The removed area is filled in
• because the mapping is defined even in occluded areas.
15
Limitations
• Textured surfaces are required for point tracking– Low texture– One dimensional textures– motion blur
• The assumption of a smoothly varying smooth 3D sur-face– objects with significant protrusions
• the dinosaur in figure 11
• The assumption of smoothly varying lighting– strong shadows will disrupt tracking
• limited to disc-topology objects– rotating cylinder will be reconstructed as a long tapestry
• see figure 11
16
The unwrap mosaic model
• Image generation model– how an image sequence is constructed
• from a collection of unwrap mosaics
– a fitting problem
• Energy minimization– via nonlinear optimization
• Initial estimate for the– 2D-2D mapping– Texture map– Can be obtained from sparse 2D tracking data.
17
18
19
Point-spread functions
20
21
22
23
24
25
Discrete energy formula-tion
26
27
Data cost
28
29
Constraints
30
Mapping smoothness
31
Visibility smoothness
32
Minimizing the energy
33
Minimizing over C: stitch-ing
34
35
36
Reparametrization and embed-ding
37
38
39
Minimizing over w: dense mapping
• Simply use MATLAB’s griddata to in-terpolate– No guarantee of minimizing the original
energy
40
Minimizing over w and b: dense mapping with occlusion
41
42
Lighting
43
User interaction and hinting
• Segmentatinon– the user selects objects in a small num-
ber of frames, – and the segmentation is propagated us-
ing optical flow or sparse tracks.
• mosaic coverage
44
45
Tuning parameters
• The robust kernel width τ – is set to match an estimate of image noise.– Set to 5/255 gray-levels
• Scale parameter τ3– 40 pixels
• Except the face, which had many outlier tracks
• spatial smoothness λwl– controls the amount of deformation of the map-
ping– Constant for all
• but the “boy” sequence
46
Results
• Synthetic sequence
47
Synthetic sequence
• There is no concept of a “ground truth”– visually evaluate the recovered mosaic
(figure 7)
• About 30% of mosaic pixels visible – in any one frame
• Self occlusion near the nose
48
Face sequence
• Few high-contrast points coupled – with strong lighting
49
Giraffe sequence: fore-ground
• A logo – is placed on the fore-
ground giraffe’s back and head
• With optical flow,– the annotation drifts
by about 10 pixels in 30 frames, while the
• Unwrap mosaic – shows no visible drift.
50
Boy sequence
– both sides of his head and torso are shown
– variable focus– motion blur– considerable fore-
ground occlusion
• Ear is doubled– could be fixed
• by editing as in section 4
51
Dinosaur sequence
• Not a smooth reparametrization – of the object’s true “texture map”
• iterative automatic segmentation – Might separate mosaics
• for each model component: arms, torso, tail
• Unwrap mosaic technique is – deformation agnostic
52
Related work
• Wang and Adelson’s paper [1994]– about how layers might apply to self-occludng ob-
jects
• Irani et al. [1995]– more elaborate parametric transformations
• Layered Depth Images [Shade et al. 1998]• Still photos to mosaics [Bhat et al. 2007]
– for rigid scenes
• Optic flow– energy-based formulation [Bruhn et al. 2005]
53
Related work
• Point tracking– Sand and Teller [2006]
• Fitting motion models – corresponding to multiple 3D motions
• Bhat et al. [2007]
• Discover texture coordinates – for predefined 3D surface models
• Zigelman et al. [2002]
• Use of energy formulations– to regularize models – and to make modelling assumptions coherent and ex-
plicit• [Fleet et al. 2002, Brox et al. 2004, Bruhn et al. 2005]
54
Discussion
• Without 3D shape recovery– manipulations of the video
• which respect the three-dimensionaliy of the scene
• Viewing reconstruction – as an embedding of tracks into 2D
• Embedding would be unstable– For more constrained models– Large amounts of missing data
• that self-occlusion causes
• We use approximate algorithms– No guarantees of globally minimizing the overall en-
ergy
55
Generalizations
• Automatically segment the layers– Many existing methods could be used– Sparse point tracks are clustered into independent
motions
• Optimizing each layer simultaneously• To allow non-boolean masks b• Apply matrix factorization to the input tracks
– to recover a deformable 3D shape– can place constraints
• on the mapping which would allow outlier removal
– allow exact visibility to be computed