Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Efficient & Robust LK for Mobile Vision

Instructor - Simon Lucey

16-623 - Designing Computer Vision Apps

Direct Method (ours)

Indirect Method (ORB+RANSAC)

H. Alismail, B. Browning, S. Lucey “Bit-Planes: Dense Subpixel Alignment of Binary Descriptors.” ACCV 2016.







Today

• Review - LK Algorithm

• Inverse Composition

• Robust Extensions

Review - LK Algorithm

• Lucas & Kanade (1981) realized this and proposed a method for estimating warp displacement using the principles of gradients and spatial coherence.

• Technique applies Taylor series approximation to any spatially coherent area governed by the warp .

4

W(x;p)

I(p+�p) ⇡ I(p) + @I(p)@pT

�p




5

W(x;p)

I(p+�p) ⇡ I(p) + @I(p)@pT

�p




5

“We consider this image to always be static....”

W(x;p)

I(p+�p) ⇡ I(p) + @I(p)@pT

�p




6

W(x;p)

I(p+�p) ⇡ I(p) + @I(p)@pT

�p




7

W(x;p)

I(p+�p) ⇡ I(p) + @I(p)@pT

�p

@I(p)@pT

=

2

664

@I(x01)

@x0T1

. . . 0T

.... . .

...

0T . . . @I(x0N )

@x0TN

3

775

2

664

@W(x1;p)@pT

...@W(xN ;p)

@pT

3

775

x

0 = W(x;p)

Reminder - Template Coordinates

8

W(x1;p)

“Template”“Source Image”

W(x1;0)

I T

Reminder - Template Coordinates

8

x1

“Template”“Source Image”I T

x

01


9

rx

I ryI

@I(p)@pT

=

2

664

@I(x01)

@x0T1

. . . 0T

.... . .

...

0T . . . @I(x0N )

@x0TN

3

775

2

664

@W(x1;p)@pT

...@W(xN ;p)

@pT

3

775

Which Way?

10

Step 1: Warp Image

Step 2: Estimate Gradients

�

�

“Horizontal”

“Vertical”

I I(p)

I(p)

rx

I(p)

ryI(p)

Which Way?

10

Step 1: Warp Image


�

�

“Horizontal”

“Vertical”

I I(p)

I(p)

@I(x0)

@x

Which Way?

10

Step 1: Warp Image


�

�

“Horizontal”

“Vertical”

I I(p)

I(p)

@I(x0)

@x

Which Way?

11


Step 2: Warp Gradients

I

rx

I(p)

ryI(p)

�

�

“Horizontal”

“Vertical”

ryI

rx

I

Which Way?

11


Step 2: Warp Gradients

I�

�

“Horizontal”

“Vertical”

ryI

rx

I

@I(x0)

@x0


12

@I(p)@pT

=

2

664

@I(x01)

@x0T1

. . . 0T

.... . .

...

0T . . . @I(x0N )

@x0TN

3

775

2

664

@W(x1;p)@pT

...@W(xN ;p)

@pT

3

775

Deriving the

• For an affine warp,

13

@W(x;p)

@pT

W(x;p) =

1� p1 p2 p3

p4 1� p5 p6

�2

4x

y

1

3

5

@W(x;p)

@p

T=

�x y 1 0 0 00 0 0 x �y 1

�

Deriving the

• For an homography warp,

14

@W(x;p)

@pT

@W(x;p)

@p

T=

�x y 1 0 0 00 0 0 x �y 1

�

W(x;p) =1

p7x+ p8y + p9

1� p1 p2 p3

p4 1� p5 p6

�2

4x

y

1

3

5

There are other ways to parameterize homographies, however this way is perhaps the most com-mon. The Jacobian of the homography in Equation (98) is:

(99)

The parameters of are:

(100)

Finally, the parameters of are:

(101)

where is the determinant:

(102)

A.2 Newton Inverse Compositional Algorithm

To be able to apply the Newton inverse compositional algorithm (see Section 4.2) to the homogra-phy we also need the Hessian of the homography. The Hessian has two components one for each

46

N � 1

N = number of pixels




15

N � 1

I(p+�p) ⇡ I(p) + @I(p)T

@p�p

W(x;p)

P = number of warp parameters

N ⇥ P

• Often just refer to,

as the “Jacobian” matrix. • Also refer to,

as the “pseudo-Hessian”. • Finally, we can refer to,

as the “template”.


16

T (0) = I(p+�p)

JI =@I(p)@pT

HI = JTIJI

• Actual algorithm is just the application of the following steps,

keep applying steps until converges.

LK Algorithm

17

Step 1:

Step 2:

�p

p� p + �p

�p = H�1I JT

I [T (0)� I(p)]

Examples of LK Alignment

18

Examples of LK Alignment

18

Gauss-Newton Algorithm

• LK is essentially an application of the Gauss-Newton algorithm,

19

s.t. F : RN ! RM

argmin�x

||y � F(x)� @F(x)

@xT�x||22

x x+�x

Step 1:

Step 2:

keep applying steps until converges. �x

argminx

||y � F(x)||22

“Carl Friedrich Gauss”

“Isaac Newton”

Optimization Interpretation

• The optimization employed with the LK algorithm can be interpreted as Gauss-Newton optimization.

• Other non-linear least-squares optimization strategies have been investigated (Baker et al. 2003) • Levenberg-Marquadt • Newton • Steepest-Descent

• Gauss-Newton in empirical evaluations has appeared to be the most robust (Baker et al.).

20

LK Event Horizon• Initial warp estimate has to be suitably close to the ground-

truth for gradient-search methods to work. • Kind of like a black hole’s event horizon. • You got to be inside it to be sucked in!

Expanding the Event Horizon

• Best strategy is to expand the neighborhood across which gradients are estimated.

• Simplest to do this in practice with a blur. • Often apply what is known as “coarse-to-fine” alignment.

I(x

+�

x)�

I(x)

@I(x)@x

N

x 2 N

Questions

• Why is LK sensitive to the initial guess?

• Why can’t you perform the optimization in a single shot?

Today




?

Efficient search is essential on mobile and desktop!!

Computation Concerns

• Unfortunately, the LK algorithm can be computationally expensive. • Requires the re-computation of the Jacobian matrix at each iteration.

• With the additional inversion of the Hessian matrix at each iteration.

26

“Is there any way we can pre-compute any of this?”

HI = JTIJI

JI =

2

664

@I(x01)

@x0T1

. . . 0T

.... . .

...

0T . . . @I(x0N )

@x0TN

3

775

2

664

@W(x1;p)@pT

...@W(xN ;p)

@pT

3

775

Linearizing the Template?

27

I(p+�p) ⇡ I(p) + @I(p)T

@p�p


27

T (0) ⇡ I(p) + @I(p)@pT

�p“template”


27

T (0) ⇡ I(p) + @I(p)@pT

�p“template”

T (0) +@T (0)

@pT�p ⇡ I(p)

“Why is this useful if the template must be static?”

Simple Example

argmin�p

NX

n=1

||W(xn;p) +@W(xn;p)

@pT�p�W(xn;0)||22

Simple Example

argmin�p

NX

n=1

||W(xn;p) +@W(xn;p)

@pT�p�W(xn;0)||22

x2x3

x

03

x1

x

01

x

02

where:x

0 = W(x;p)x = W(x;0)

p+�p

Simple Example

argmin�p⇤

NX

n=1

||W(xn;p)�W(xn;0)�@W(xn;0)

@pT�p

⇤||22

Simple Example

argmin�p⇤

NX

n=1

||W(xn;p)�W(xn;0)�@W(xn;0)

@pT�p

⇤||22

x2x3

x

03

x1

x

01

x

02

where:x

0 = W(x;p)x = W(x;0)

0+�p⇤

Simple Example

argmin�p⇤

NX

n=1

||W(xn;p)�W(xn;0)�@W(xn;0)

@pT�p

⇤||22“Static”

x2x3

x

03

x1

x

01

x

02

where:x

0 = W(x;p)x = W(x;0)

0+�p⇤

• In general one can show,

Relating and

30

�p �p⇤

W(x;p) = W(x;�p

⇤)

x2x3

x

03

x1

x

01

x

02

where:x

0 = W(x;p)x = W(x;0)

0+�p⇤

• In general one can show,

Relating and

30

�p �p⇤

W(x;p) ⇡ W(x;�p

⇤)“for non-linear warp”

x2x3

x

03

x1

x

01

x

02

where:x

0 = W(x;p)x = W(x;0)

0+�p⇤

• Specifically for an affine warp one can show,

Relating and

31

�p �p⇤

W(x;p) = W(x;�p

⇤)


Relating and

31

�p �p⇤

M̃(p)x̃ = M̃(�p

⇤)x̃

M̃(p) =

2

41� p1 p2 p3p4 1� p5 p60 0 1

3

5x̃ =

x

1

�where,


Relating and �p �p⇤

M̃(�p

⇤)�1M̃(p)x̃ = x̃

x2x3

x

03

x1

x

01

x

02

where:x

0 = W(x;p)x = W(x;0)

0+�p⇤



M̃(�p

⇤)�1M̃(p)x̃ = x̃

x2x3

x

03

x1

x

01

x

02

where:x

0 = W(x;p)x = W(x;0)

p ��1 �p⇤



M̃(�p

⇤)�1M̃(p)x̃ = x̃

also, M̃(p+�p)x = x

x2x3

x

03

x1

x

01

x

02

where:x

0 = W(x;p)x = W(x;0)

p ��1 �p⇤

x2x3

x

03

x1

x

01

x

02

where:x

0 = W(x;p)x = W(x;0)

p+�p



M̃(�p

⇤)�1M̃(p)x̃ = x̃

also, M̃(p+�p)x = x

therefore,

M̃(p+�p) = M̃(�p⇤)�1M̃(p) = I

x2x3

x

03

x1

x

01

x

02

where:x

0 = W(x;p)x = W(x;0)

p ��1 �p⇤

x2x3

x

03

x1

x

01

x

02

where:x

0 = W(x;p)x = W(x;0)

p+�p

• Extending the affine warp to the general form,

Relating and

33

�p �p⇤

M̃(�p

⇤)�1M̃(p)x̃ = W(x;p ��1 �p

⇤)

x2x3

x

03

x1

x

01

x

02

where:x

0 = W(x;p)x = W(x;0)

p ��1 �p⇤

Inverse Composition Algorithm

34



Step 1:

Step 2:

argmin�p

||T (0) +@T (0)

@p�p� I(p)||22

p ! p ��1 �p

�p

T


35



Step 1:

Step 2:

p ! p ��1 �p

�p

�p = H�1T JT

T [I(p)� T (0)]

JT =@T (0)

@pTHT = JT

T JT


35



Step 1:

Step 2:

p ! p ��1 �p

�p

�p = H�1T JT

T [I(p)� T (0)]“Static”

“Inverse Composition”

JT =@T (0)

@pTHT = JT

T JT

Linearizing the Template

36

@T (0)

@pT=

2

664

@T (x1)@xT

1. . . 0T

.... . .

...

0T . . . @T (xN )@xT

N

3

775

2

664

@W(x1;0)@pT

...@W(xN ;0)

@pT

3

775

@I(p)@pT

=

2

664

@I{W(x1;p)}@W(x1;p)T

. . . 0T

.... . .

...

0T . . . @I{W(xN ;p)}@W(xN ;p)T

3

775

2

664

@W(x1;p)@pT

...@W(xN ;p)

@pT

3

775

“Constantly changing”

“Static”

Linearizing the Template

36

@T (0)

@pT=

2

664

@T (x1)@xT

1. . . 0T

.... . .

...

0T . . . @T (xN )@xT

N

3

775

2

664

@W(x1;0)@pT

...@W(xN ;0)

@pT

3

775

@I(p)@pT

=

2

664

@I{W(x1;p)}@W(x1;p)T

. . . 0T

.... . .

...

0T . . . @I{W(xN ;p)}@W(xN ;p)T

3

775

2

664

@W(x1;p)@pT

...@W(xN ;p)

@pT

3

775

“Constantly changing”

“Static”

Rules for Compositional Warps

• Not just applicable to affine warps, can be applied to any set

of warps that:

• have an identity warp (i.e. )

• set of warps must be closed under composition,

37

W(x;0) = x

W(W(x;p1);p2) ! W(x;p2 � p1)

for example a homography,

�3 = �2�1

where,

det(�) = +1

Inverse Compositional Performance

38

540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593

594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647

CVPR#****

CVPR#****

CVPR 2016 Submission #****. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

((

*

(p-1-step) >=

*

p) & 0x01) |

((

*

(p -step) >=

*

p) & 0x02) |

((

*

(p+1-step) >=

*

p) & 0x04) |

((

*

(p-1 ) >=

*

p) & 0x08) |

((

*

(p+1 ) >=

*

p) & 0x10) |

((

*

(p-1+step) >=

*

p) & 0x20) |

((

*

(p +step) >=

*

p) & 0x40) |

((

*

(p+1+step) >=

*

p) & 0x80) ;

Figure 9. LBP/Census transform computation ’C’ pseudo-codeat 3 ⇥ 3 neighborhood, where p is a pointer to the desired pixellocation and step is number of elements per image row.

Algorithm # parameters # channels

BP (ours) 8 8

ECC [17] 8 1

DIC-1 [5] 10 1

DIC-2 [5] 20 3

DF [13] 8 5

GC [10] 8 3

GM 8 2

LK 8 1

Table 1. Algorithms compared in this work. Number of parametersindicate the DOF of the state vector, which is 8 for a plane inadditon to any photometric parameters. We use the authors’ codefor ECC and DIC.

latency. Pseudo-code to compute the LBP is show in Fig. 9which can be easily translated to SIMD instructions on theCPU or other parallel computing platforms such as GPU’sand FPGAs [6, 24, 7].

We compare the performance of our algorithm againsta variety of template tracking algorithms, which are: Theenhance correlation coefficient ECC by Evangelidis andPsarakis [17]. The ECC algorithm is an example of an intrin-sically robust cost function, normalized correlation, which isinvariant up to an affine illumination change. An example ofestimating illumination parameters alongside the motion pa-rameters is the Dual Inverse Compositional algorithm (DIC)by Bartoli [5]. We use two variations of DIC, the first is usingthe gain+bias model on grayscale images denoted by DIC-1.The second is using a full affine lighting model using theRGB image denoted by DIC-2. We also compare the perfor-mance against a recently published descriptor-based methodby Crivellaro and Lepetit [13] called Descriptor Fields DF.Finally, we include baseline results from raw intensity BF,improved LK with the Gradient Constraint GC [10], and theGradient Magnitude GM. We report two numbers in the eval-uation. Firstly, is the percent of successful tracking measuredby the overlap between the warped image given each algo-rithms estimate denoted by A and the warped image giventhe ground truth denoted by B. The overlap is computed aso = (A \B)/(A [B), where a value of o >

1/2 indicates a

successful tracking. Secondly, since we are interested in sub-pixel accuracy we also show the mean percentage of overlap

Template area Intensity BitPlanes DF [13]

75⇥ 57 650 460 370

150⇥ 115 360 170 160

300⇥ 230 140 90 60

640⇥ 460 45 35 20

Table 2. Planar template tracking runtime in frames per second onsingle core Intel Core i7 2.8 Ghz. DF is implemented with first-order derivatives [13] (five channels). BitPlanes timing are shownwith eight binary channels. All algorithms have been optimizedequally by us. Using the authors’s code of DF, we are unable to runfaster than 2 Hz with a densely sampled grid of points at resolution300⇥ 230.

across all frames given by m =

1/n

Pni=1 oi, where n is the

number of frames in each sequence.Results are compared for three types of geometric and

photometric variations. First is an out of plane rotation,which induces perspective change show in Fig. 10b. Second,is a dynamic lighting change where the image is station-ary but a illuminated with nonlinearly varying illumination.Finally a static lighting change, which is the same dataas in the dynamically changing light, but with the lightingtransition phase is removed.

A summary of the compared algorithms is provided inTable 1 and evaluation results are shown in Table 3 andFig. 11. Based on our experimentation, the top performingmethods are the ones that employ a Descriptor Constancyassumption, namely: BitPlanes and DF. However, BitPlanesis more efficient than DF and it performed significantly betterfor the ‘out of plane rotation’ which induces perspectivechange in the image. An example out of plane rotation datais shown in Fig. 10b. In fact, all tested algorithms, withthe exception of BitPlanes, performed poorly with this data.Algorithms that use a robust function (ECC) and the onesthat attempt to estimate illumination as well (DIC) performedwell, but fell behind in comparison to Descriptor Constancyand the simple gradient constraint.

Finally, the wood texture under dynamic lighting changepresented a challenge for all algorithms with the exception ofBitPlanes, which successes on all frames with a high overlapaccuracy of ⇡ 99% given the ground truth.

4. Related Work

The simplest form of multi-channel LK is RGB dataregistration [4], which usually provides better accuracy incomparison to using grayscale [26]. Lucey et al. [33] useda multi-channel representation with Gabor features, but Ga-bor features are not invariant to multiplicative illuminationchange. Distribution Fields [41] (where each channel is a binfrom a local intensity histogram) and Descriptor Fields (DF)[13] (where each channel is nonlinear function applied to1-st and 2-nd order image gradients), are other examples of

6

(fps)

Planar template tracking runtime in frames per second on single core Intel Core i7 2.8 Ghz.


Today




Robust Error Functions

• Least squares criterion is not robust to outliers (e.g. occlusion, illumination)

• For example, the two outliers here cause the fitted line to be quite wrong.

• Robust error functions can be helpful here.


• Least squares criterion is not robust to outliers (e.g. occlusion, illumination)

• For example, the two outliers here cause the fitted line to be quite wrong.

• Robust error functions can be helpful here.


argmin�p

⌘{T (0) +@T (0)

@pT�p� I(p)}

L2

L1

Robust Error

⌘{x}

x

for example,

⌘{x} = ||x||1 ! L1 error


• Computational inefficiencies creep into inverse composition when one departs from the classical L2 distance.

• Much better results for fast tracking have been achieved recently using robust representations (e.g. Bristow et al.).

Robust Representations

1. Compute image gradients 2. Pool into local histograms3. Concatenate histograms4. Normalize histograms

! RN : RN⇥K

Robust IC Algorithm

44



Step 1:

Step 2:

p ! p ��1 �p

�p

argmin�p

|| {T (0)}+ @ {T (0)}@pT

�p� {I(p)}||22

Descriptor Performance648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701

702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755

CVPR#****

CVPR#****


(a) Lighting change.

(b) Out-of-plane rotation.Figure 10. Tracking results using the Bricks dataset [19]. The top row of each figure shows the performance of BitPlanes, while the bottomrow shows classical intensity-based LK. Additional results and videos are provided in the supplementary materials.

0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

Overlap fraction

Succ

ess

frac

tion

BP ECC DIC-1

DIC-2 DF GC

LK

Figure 11. Fraction of successfully tracked frames as functionof the overlap area given the ground truth. BitPlanes and DFperform better than other methods. However, in Table 3 we see thatBitPlanes performing better with difficult textures and perspectivedistortion.

multi-channel representations sharing our goal of improvedrobustness to illumination change. DF’s performance is re-ported to be superior Distribution Fields [13]. Based on ourexperiments in Section 3, BitPlanes’ performance improveson DF especially under out of plane rotations that cause

prospective distortion of the image.Gradient-based alignment in a multi-channel representa-

tion has also been applied to matching across object cate-gories using SIFT [9]. The authors estimate the gradient ofthe objective using linear regression, which when applied toeach channel separately reduces to a standard convolution.Xiong and de la Torre [48] tackle the problem of descriptoralignment differently. Using SDM, the authors learn thedescent maps offline and demonstrate tracking results withHOG and SIFT, which are expensive to compute densely. Incontrast, our approach does not require an offline learningstep and is shown to work well with simple binary descrip-tors. We note that Ren et al. [38] has recently applied theSDM with LBP for the problem of face alignment, which re-sults in a fast algorithm during testing. However, the offlinetraining step remain present.

Antonakos et al. [1] work is closest to ours. The au-thors use a multi-channel representation with various featuredescriptors to improve robustness of face alignment usingAAMs. The authors experiment the histogrammed versionof LBP [37], which is not binary. In addition to the use of bi-

7

“Raw Pixels”

“BitPlanes”

540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593

594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647

CVPR#****

CVPR#****


((

*

(p-1-step) >=

*

p) & 0x01) |

((

*

(p -step) >=

*

p) & 0x02) |

((

*

(p+1-step) >=

*

p) & 0x04) |

((

*

(p-1 ) >=

*

p) & 0x08) |

((

*

(p+1 ) >=

*

p) & 0x10) |

((

*

(p-1+step) >=

*

p) & 0x20) |

((

*

(p +step) >=

*

p) & 0x40) |

((

*

(p+1+step) >=

*

p) & 0x80) ;

Figure 9. LBP/Census transform computation ’C’ pseudo-codeat 3 ⇥ 3 neighborhood, where p is a pointer to the desired pixellocation and step is number of elements per image row.

Algorithm # parameters # channels

BP (ours) 8 8

ECC [17] 8 1

DIC-1 [5] 10 1

DIC-2 [5] 20 3

DF [13] 8 5

GC [10] 8 3

GM 8 2

LK 8 1

Table 1. Algorithms compared in this work. Number of parametersindicate the DOF of the state vector, which is 8 for a plane inadditon to any photometric parameters. We use the authors’ codefor ECC and DIC.

latency. Pseudo-code to compute the LBP is show in Fig. 9which can be easily translated to SIMD instructions on theCPU or other parallel computing platforms such as GPU’sand FPGAs [6, 24, 7].

We compare the performance of our algorithm againsta variety of template tracking algorithms, which are: Theenhance correlation coefficient ECC by Evangelidis andPsarakis [17]. The ECC algorithm is an example of an intrin-sically robust cost function, normalized correlation, which isinvariant up to an affine illumination change. An example ofestimating illumination parameters alongside the motion pa-rameters is the Dual Inverse Compositional algorithm (DIC)by Bartoli [5]. We use two variations of DIC, the first is usingthe gain+bias model on grayscale images denoted by DIC-1.The second is using a full affine lighting model using theRGB image denoted by DIC-2. We also compare the perfor-mance against a recently published descriptor-based methodby Crivellaro and Lepetit [13] called Descriptor Fields DF.Finally, we include baseline results from raw intensity BF,improved LK with the Gradient Constraint GC [10], and theGradient Magnitude GM. We report two numbers in the eval-uation. Firstly, is the percent of successful tracking measuredby the overlap between the warped image given each algo-rithms estimate denoted by A and the warped image giventhe ground truth denoted by B. The overlap is computed aso = (A \B)/(A [B), where a value of o >

1/2 indicates a

successful tracking. Secondly, since we are interested in sub-pixel accuracy we also show the mean percentage of overlap

Template area Intensity BitPlanes DF [13]

75⇥ 57 650 460 370

150⇥ 115 360 170 160

300⇥ 230 140 90 60

640⇥ 460 45 35 20

Table 2. Planar template tracking runtime in frames per second onsingle core Intel Core i7 2.8 Ghz. DF is implemented with first-order derivatives [13] (five channels). BitPlanes timing are shownwith eight binary channels. All algorithms have been optimizedequally by us. Using the authors’s code of DF, we are unable to runfaster than 2 Hz with a densely sampled grid of points at resolution300⇥ 230.

across all frames given by m =

1/n

Pni=1 oi, where n is the

number of frames in each sequence.Results are compared for three types of geometric and

photometric variations. First is an out of plane rotation,which induces perspective change show in Fig. 10b. Second,is a dynamic lighting change where the image is station-ary but a illuminated with nonlinearly varying illumination.Finally a static lighting change, which is the same dataas in the dynamically changing light, but with the lightingtransition phase is removed.

A summary of the compared algorithms is provided inTable 1 and evaluation results are shown in Table 3 andFig. 11. Based on our experimentation, the top performingmethods are the ones that employ a Descriptor Constancyassumption, namely: BitPlanes and DF. However, BitPlanesis more efficient than DF and it performed significantly betterfor the ‘out of plane rotation’ which induces perspectivechange in the image. An example out of plane rotation datais shown in Fig. 10b. In fact, all tested algorithms, withthe exception of BitPlanes, performed poorly with this data.Algorithms that use a robust function (ECC) and the onesthat attempt to estimate illumination as well (DIC) performedwell, but fell behind in comparison to Descriptor Constancyand the simple gradient constraint.

Finally, the wood texture under dynamic lighting changepresented a challenge for all algorithms with the exception ofBitPlanes, which successes on all frames with a high overlapaccuracy of ⇡ 99% given the ground truth.

4. Related Work

The simplest form of multi-channel LK is RGB dataregistration [4], which usually provides better accuracy incomparison to using grayscale [26]. Lucey et al. [33] useda multi-channel representation with Gabor features, but Ga-bor features are not invariant to multiplicative illuminationchange. Distribution Fields [41] (where each channel is a binfrom a local intensity histogram) and Descriptor Fields (DF)[13] (where each channel is nonlinear function applied to1-st and 2-nd order image gradients), are other examples of

6

Planar template tracking runtime in frames per second on single core Intel Core i7 2.8 Ghz.

More to read…

• Baker & Matthews, “Lucas-Kanade 20 Years On: A Unifying Framework”, IJCV 2004.

• Bristow & Lucey, “In Defense of Gradient-Based Alignment on Densely Sampled Sparse Features”, Springer Book on Dense Registration Methods 2015.

• Baker, Datta & Kanade “Parameterizing Homographies”, CMU Tech Report 06-11.

In Defense of Gradient-Based Alignment

on Densely Sampled Sparse Features

Hilton Bristow and Simon Lucey

Abstract In this chapter, we explore the surprising result that gradient-based continuous optimization methods perform well for the alignment ofimage/object models when using densely sampled sparse features (HOG,dense SIFT, etc.). Gradient-based approaches for image/object alignmenthave many desirable properties – inference is typically fast and exact, anddiverse constraints can be imposed on the motion of points. However, the pre-sumption that gradients predicted on sparse features would be poor estima-tors of the true descent direction has meant that gradient-based optimizationis often overlooked in favour of graph-based optimization. We show that thisintuition is only partly true: sparse features are indeed poor predictors of theerror surface, but this has no impact on the actual alignment performance.In fact, for general object categories that exhibit large geometric and ap-pearance variation, sparse features are integral to achieving any convergencewhatsoever. How the descent directions are predicted becomes an importantconsideration for these descriptors. We explore a number of strategies forestimating gradients, and show that estimating gradients via regression in amanner that explicitly handles outliers improves alignment performance sub-stantially. To illustrate the general applicability of gradient-based methodsto the alignment of challenging object categories, we perform unsupervisedensemble alignment on a series of non-rigid animal classes from ImageNet.

Hilton BristowQueensland University of Technology, Australia. e-mail: [email protected]

Simon LuceyThe Robotics Institute, Carnegie Mellon University, USA. e-mail: [email protected]

1

Parameterizing Homographies

CMU-RI-TR-06-11

Simon Baker, Ankur Datta, and Takeo Kanade

The Robotics InstituteCarnegie Mellon University

5000 Forbes AvenuePittsburgh, PA 15213

The motion of a plane can be described by a homography. We study how to parameterizehomographies to maximize plane estimation performance. We compare the usual 3 ⇥ 3 matrixparameterization with a parameterization that combines 4 fixed points in one of the images with4 variable points in the other image. We empirically show that this 4pt parameterization is farsuperior. We also compare both parameterizations with a variety of direct parameterizations. Inthe case of unknown relative orientation, we compare with a direct parameterization of the planeequation, and the rotation and translation of the camera(s). We show that the direct parameteri-zation is both less accurate and far less robust than the 4-point parameterization. We explain thepoor performance using a measure of independence of the Jacobian images. In the fully calibratedsetting, the direct parameterization just consists of 3 parameters of the plane equation. We showthat this parameterization is far more robust than the 4-point parameterization, but only approxi-mately as accurate. In the case of a moving stereo rig we find that the direct parameterization ofplane equation, camera rotation and translation performs very well, both in terms of accuracy androbustness. This is in contrast to the corresponding direct parameterization in the case of unknownrelative orientation. Finally, we illustrate the use of plane estimation in 2 automotive applications.

http://www.ri.cmu.edu/pub_files/pub3/baker_simon_2004_1/baker_simon_2004_1.pdf

http://hilton.bristow.io/media/papers/2014_Springer_Bristow.pdf

http://www.cs.cmu.edu/~ankurd/Site/Publications_files/baker_simon_2006_1.pdf

Documents

Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5