67
Efficient & Robust LK for Mobile Vision Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps

Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Efficient & Robust LK for Mobile Vision

Instructor - Simon Lucey

16-623 - Designing Computer Vision Apps

Page 2: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Direct Method (ours)

Indirect Method (ORB+RANSAC)

H. Alismail, B. Browning, S. Lucey “Bit-Planes: Dense Subpixel Alignment of Binary Descriptors.” ACCV 2016.

Page 3: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Direct Method (ours)

Indirect Method (ORB+RANSAC)

H. Alismail, B. Browning, S. Lucey “Bit-Planes: Dense Subpixel Alignment of Binary Descriptors.” ACCV 2016.

Page 4: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Direct Method (ours)

Indirect Method (ORB+RANSAC)

H. Alismail, B. Browning, S. Lucey “Bit-Planes: Dense Subpixel Alignment of Binary Descriptors.” ACCV 2016.

Page 5: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Today

• Review - LK Algorithm

• Inverse Composition

• Robust Extensions

Page 6: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Review - LK Algorithm

• Lucas & Kanade (1981) realized this and proposed a method for estimating warp displacement using the principles of gradients and spatial coherence.

• Technique applies Taylor series approximation to any spatially coherent area governed by the warp .

4

W(x;p)

I(p+�p) ⇡ I(p) + @I(p)@pT

�p

Page 7: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Review - LK Algorithm

• Lucas & Kanade (1981) realized this and proposed a method for estimating warp displacement using the principles of gradients and spatial coherence.

• Technique applies Taylor series approximation to any spatially coherent area governed by the warp .

5

W(x;p)

I(p+�p) ⇡ I(p) + @I(p)@pT

�p

Page 8: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Review - LK Algorithm

• Lucas & Kanade (1981) realized this and proposed a method for estimating warp displacement using the principles of gradients and spatial coherence.

• Technique applies Taylor series approximation to any spatially coherent area governed by the warp .

5

“We consider this image to always be static....”

W(x;p)

I(p+�p) ⇡ I(p) + @I(p)@pT

�p

Page 9: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Review - LK Algorithm

• Lucas & Kanade (1981) realized this and proposed a method for estimating warp displacement using the principles of gradients and spatial coherence.

• Technique applies Taylor series approximation to any spatially coherent area governed by the warp .

6

W(x;p)

I(p+�p) ⇡ I(p) + @I(p)@pT

�p

Page 10: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Review - LK Algorithm

• Lucas & Kanade (1981) realized this and proposed a method for estimating warp displacement using the principles of gradients and spatial coherence.

• Technique applies Taylor series approximation to any spatially coherent area governed by the warp .

7

W(x;p)

I(p+�p) ⇡ I(p) + @I(p)@pT

�p

@I(p)@pT

=

2

664

@I(x01)

@x0T1

. . . 0T

.... . .

...

0T . . . @I(x0N )

@x0TN

3

775

2

664

@W(x1;p)@pT

...@W(xN ;p)

@pT

3

775

x

0 = W(x;p)

Page 11: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Reminder - Template Coordinates

8

W(x1;p)

“Template”“Source Image”

W(x1;0)

I T

Page 12: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Reminder - Template Coordinates

8

x1

“Template”“Source Image”I T

x

01

Page 13: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Review - LK Algorithm

9

rx

I ryI

@I(p)@pT

=

2

664

@I(x01)

@x0T1

. . . 0T

.... . .

...

0T . . . @I(x0N )

@x0TN

3

775

2

664

@W(x1;p)@pT

...@W(xN ;p)

@pT

3

775

Page 14: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Which Way?

10

Step 1: Warp Image

Step 2: Estimate Gradients

“Horizontal”

“Vertical”

I I(p)

I(p)

rx

I(p)

ryI(p)

Page 15: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Which Way?

10

Step 1: Warp Image

Step 2: Estimate Gradients

“Horizontal”

“Vertical”

I I(p)

I(p)

@I(x0)

@x

Page 16: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Which Way?

10

Step 1: Warp Image

Step 2: Estimate Gradients

“Horizontal”

“Vertical”

I I(p)

I(p)

@I(x0)

@x

Page 17: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Which Way?

11

Step 1: Estimate Gradients

Step 2: Warp Gradients

I

rx

I(p)

ryI(p)

“Horizontal”

“Vertical”

ryI

rx

I

Page 18: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Which Way?

11

Step 1: Estimate Gradients

Step 2: Warp Gradients

I�

“Horizontal”

“Vertical”

ryI

rx

I

@I(x0)

@x0

Page 19: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Review - LK Algorithm

12

@I(p)@pT

=

2

664

@I(x01)

@x0T1

. . . 0T

.... . .

...

0T . . . @I(x0N )

@x0TN

3

775

2

664

@W(x1;p)@pT

...@W(xN ;p)

@pT

3

775

Page 20: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Deriving the

• For an affine warp,

13

@W(x;p)

@pT

W(x;p) =

1� p1 p2 p3

p4 1� p5 p6

�2

4x

y

1

3

5

@W(x;p)

@p

T=

�x y 1 0 0 00 0 0 x �y 1

Page 21: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Deriving the

• For an homography warp,

14

@W(x;p)

@pT

@W(x;p)

@p

T=

�x y 1 0 0 00 0 0 x �y 1

W(x;p) =1

p7x+ p8y + p9

1� p1 p2 p3

p4 1� p5 p6

�2

4x

y

1

3

5

There are other ways to parameterize homographies, however this way is perhaps the most com-mon. The Jacobian of the homography in Equation (98) is:

(99)

The parameters of are:

(100)

Finally, the parameters of are:

(101)

where is the determinant:

(102)

A.2 Newton Inverse Compositional Algorithm

To be able to apply the Newton inverse compositional algorithm (see Section 4.2) to the homogra-phy we also need the Hessian of the homography. The Hessian has two components one for each

46

Page 22: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

N � 1

N = number of pixels

Review - LK Algorithm

• Lucas & Kanade (1981) realized this and proposed a method for estimating warp displacement using the principles of gradients and spatial coherence.

• Technique applies Taylor series approximation to any spatially coherent area governed by the warp .

15

N � 1

I(p+�p) ⇡ I(p) + @I(p)T

@p�p

W(x;p)

P = number of warp parameters

N ⇥ P

Page 23: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

• Often just refer to,

as the “Jacobian” matrix. • Also refer to,

as the “pseudo-Hessian”. • Finally, we can refer to,

as the “template”.

Review - LK Algorithm

16

T (0) = I(p+�p)

JI =@I(p)@pT

HI = JTIJI

Page 24: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

• Actual algorithm is just the application of the following steps,

keep applying steps until converges.

LK Algorithm

17

Step 1:

Step 2:

�p

p� p + �p

�p = H�1I JT

I [T (0)� I(p)]

Page 25: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Examples of LK Alignment

18

Page 26: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Examples of LK Alignment

18

Page 27: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Gauss-Newton Algorithm

• LK is essentially an application of the Gauss-Newton algorithm,

19

s.t. F : RN ! RM

argmin�x

||y � F(x)� @F(x)

@xT�x||22

x x+�x

Step 1:

Step 2:

keep applying steps until converges. �x

argminx

||y � F(x)||22

“Carl Friedrich Gauss”

“Isaac Newton”

Page 28: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Optimization Interpretation

• The optimization employed with the LK algorithm can be interpreted as Gauss-Newton optimization.

• Other non-linear least-squares optimization strategies have been investigated (Baker et al. 2003) • Levenberg-Marquadt • Newton • Steepest-Descent

• Gauss-Newton in empirical evaluations has appeared to be the most robust (Baker et al.).

20

Page 29: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

LK Event Horizon• Initial warp estimate has to be suitably close to the ground-

truth for gradient-search methods to work. • Kind of like a black hole’s event horizon. • You got to be inside it to be sucked in!

Page 30: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Expanding the Event Horizon

• Best strategy is to expand the neighborhood across which gradients are estimated.

• Simplest to do this in practice with a blur. • Often apply what is known as “coarse-to-fine” alignment.

I(x

+�

x)�

I(x)

@I(x)@x

N

x 2 N

Page 31: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Questions

• Why is LK sensitive to the initial guess?

• Why can’t you perform the optimization in a single shot?

Page 32: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Today

• Review - LK Algorithm

• Inverse Composition

• Robust Extensions

Page 33: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

?

Efficient search is essential on mobile and desktop!!

Page 34: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Computation Concerns

• Unfortunately, the LK algorithm can be computationally expensive. • Requires the re-computation of the Jacobian matrix at each iteration.

• With the additional inversion of the Hessian matrix at each iteration.

26

“Is there any way we can pre-compute any of this?”

HI = JTIJI

JI =

2

664

@I(x01)

@x0T1

. . . 0T

.... . .

...

0T . . . @I(x0N )

@x0TN

3

775

2

664

@W(x1;p)@pT

...@W(xN ;p)

@pT

3

775

Page 35: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Linearizing the Template?

27

I(p+�p) ⇡ I(p) + @I(p)T

@p�p

Page 36: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Linearizing the Template?

27

T (0) ⇡ I(p) + @I(p)@pT

�p“template”

Page 37: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Linearizing the Template?

27

T (0) ⇡ I(p) + @I(p)@pT

�p“template”

T (0) +@T (0)

@pT�p ⇡ I(p)

“Why is this useful if the template must be static?”

Page 38: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Simple Example

argmin�p

NX

n=1

||W(xn;p) +@W(xn;p)

@pT�p�W(xn;0)||22

Page 39: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Simple Example

argmin�p

NX

n=1

||W(xn;p) +@W(xn;p)

@pT�p�W(xn;0)||22

x2x3

x

03

x1

x

01

x

02

where:x

0 = W(x;p)x = W(x;0)

p+�p

Page 40: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Simple Example

argmin�p⇤

NX

n=1

||W(xn;p)�W(xn;0)�@W(xn;0)

@pT�p

⇤||22

Page 41: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Simple Example

argmin�p⇤

NX

n=1

||W(xn;p)�W(xn;0)�@W(xn;0)

@pT�p

⇤||22

x2x3

x

03

x1

x

01

x

02

where:x

0 = W(x;p)x = W(x;0)

0+�p⇤

Page 42: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Simple Example

argmin�p⇤

NX

n=1

||W(xn;p)�W(xn;0)�@W(xn;0)

@pT�p

⇤||22“Static”

x2x3

x

03

x1

x

01

x

02

where:x

0 = W(x;p)x = W(x;0)

0+�p⇤

Page 43: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

• In general one can show,

Relating and

30

�p �p⇤

W(x;p) = W(x;�p

⇤)

x2x3

x

03

x1

x

01

x

02

where:x

0 = W(x;p)x = W(x;0)

0+�p⇤

Page 44: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

• In general one can show,

Relating and

30

�p �p⇤

W(x;p) ⇡ W(x;�p

⇤)“for non-linear warp”

x2x3

x

03

x1

x

01

x

02

where:x

0 = W(x;p)x = W(x;0)

0+�p⇤

Page 45: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

• Specifically for an affine warp one can show,

Relating and

31

�p �p⇤

W(x;p) = W(x;�p

⇤)

Page 46: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

• Specifically for an affine warp one can show,

Relating and

31

�p �p⇤

M̃(p)x̃ = M̃(�p

⇤)x̃

M̃(p) =

2

41� p1 p2 p3p4 1� p5 p60 0 1

3

5x̃ =

x

1

�where,

Page 47: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

• Specifically for an affine warp one can show,

Relating and �p �p⇤

M̃(�p

⇤)�1M̃(p)x̃ = x̃

x2x3

x

03

x1

x

01

x

02

where:x

0 = W(x;p)x = W(x;0)

0+�p⇤

Page 48: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

• Specifically for an affine warp one can show,

Relating and �p �p⇤

M̃(�p

⇤)�1M̃(p)x̃ = x̃

x2x3

x

03

x1

x

01

x

02

where:x

0 = W(x;p)x = W(x;0)

p ��1 �p⇤

Page 49: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

• Specifically for an affine warp one can show,

Relating and �p �p⇤

M̃(�p

⇤)�1M̃(p)x̃ = x̃

also, M̃(p+�p)x = x

x2x3

x

03

x1

x

01

x

02

where:x

0 = W(x;p)x = W(x;0)

p ��1 �p⇤

x2x3

x

03

x1

x

01

x

02

where:x

0 = W(x;p)x = W(x;0)

p+�p

Page 50: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

• Specifically for an affine warp one can show,

Relating and �p �p⇤

M̃(�p

⇤)�1M̃(p)x̃ = x̃

also, M̃(p+�p)x = x

therefore,

M̃(p+�p) = M̃(�p⇤)�1M̃(p) = I

x2x3

x

03

x1

x

01

x

02

where:x

0 = W(x;p)x = W(x;0)

p ��1 �p⇤

x2x3

x

03

x1

x

01

x

02

where:x

0 = W(x;p)x = W(x;0)

p+�p

Page 51: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

• Extending the affine warp to the general form,

Relating and

33

�p �p⇤

M̃(�p

⇤)�1M̃(p)x̃ = W(x;p ��1 �p

⇤)

x2x3

x

03

x1

x

01

x

02

where:x

0 = W(x;p)x = W(x;0)

p ��1 �p⇤

Page 52: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Inverse Composition Algorithm

34

• Actual algorithm is just the application of the following steps,

keep applying steps until converges.

Step 1:

Step 2:

argmin�p

||T (0) +@T (0)

@p�p� I(p)||22

p ! p ��1 �p

�p

T

Page 53: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Inverse Composition Algorithm

35

• Actual algorithm is just the application of the following steps,

keep applying steps until converges.

Step 1:

Step 2:

p ! p ��1 �p

�p

�p = H�1T JT

T [I(p)� T (0)]

JT =@T (0)

@pTHT = JT

T JT

Page 54: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Inverse Composition Algorithm

35

• Actual algorithm is just the application of the following steps,

keep applying steps until converges.

Step 1:

Step 2:

p ! p ��1 �p

�p

�p = H�1T JT

T [I(p)� T (0)]“Static”

“Inverse Composition”

JT =@T (0)

@pTHT = JT

T JT

Page 55: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Linearizing the Template

36

@T (0)

@pT=

2

664

@T (x1)@xT

1. . . 0T

.... . .

...

0T . . . @T (xN )@xT

N

3

775

2

664

@W(x1;0)@pT

...@W(xN ;0)

@pT

3

775

@I(p)@pT

=

2

664

@I{W(x1;p)}@W(x1;p)T

. . . 0T

.... . .

...

0T . . . @I{W(xN ;p)}@W(xN ;p)T

3

775

2

664

@W(x1;p)@pT

...@W(xN ;p)

@pT

3

775

“Constantly changing”

“Static”

Page 56: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Linearizing the Template

36

@T (0)

@pT=

2

664

@T (x1)@xT

1. . . 0T

.... . .

...

0T . . . @T (xN )@xT

N

3

775

2

664

@W(x1;0)@pT

...@W(xN ;0)

@pT

3

775

@I(p)@pT

=

2

664

@I{W(x1;p)}@W(x1;p)T

. . . 0T

.... . .

...

0T . . . @I{W(xN ;p)}@W(xN ;p)T

3

775

2

664

@W(x1;p)@pT

...@W(xN ;p)

@pT

3

775

“Constantly changing”

“Static”

Page 57: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Rules for Compositional Warps

• Not just applicable to affine warps, can be applied to any set

of warps that:

• have an identity warp (i.e. )

• set of warps must be closed under composition,

37

W(x;0) = x

W(W(x;p1);p2) ! W(x;p2 � p1)

for example a homography,

�3 = �2�1

where,

det(�) = +1

Page 58: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Inverse Compositional Performance

38

540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593

594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647

CVPR#****

CVPR#****

CVPR 2016 Submission #****. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

((

*

(p-1-step) >=

*

p) & 0x01) |

((

*

(p -step) >=

*

p) & 0x02) |

((

*

(p+1-step) >=

*

p) & 0x04) |

((

*

(p-1 ) >=

*

p) & 0x08) |

((

*

(p+1 ) >=

*

p) & 0x10) |

((

*

(p-1+step) >=

*

p) & 0x20) |

((

*

(p +step) >=

*

p) & 0x40) |

((

*

(p+1+step) >=

*

p) & 0x80) ;

Figure 9. LBP/Census transform computation ’C’ pseudo-codeat 3 ⇥ 3 neighborhood, where p is a pointer to the desired pixellocation and step is number of elements per image row.

Algorithm # parameters # channels

BP (ours) 8 8

ECC [17] 8 1

DIC-1 [5] 10 1

DIC-2 [5] 20 3

DF [13] 8 5

GC [10] 8 3

GM 8 2

LK 8 1

Table 1. Algorithms compared in this work. Number of parametersindicate the DOF of the state vector, which is 8 for a plane inadditon to any photometric parameters. We use the authors’ codefor ECC and DIC.

latency. Pseudo-code to compute the LBP is show in Fig. 9which can be easily translated to SIMD instructions on theCPU or other parallel computing platforms such as GPU’sand FPGAs [6, 24, 7].

We compare the performance of our algorithm againsta variety of template tracking algorithms, which are: Theenhance correlation coefficient ECC by Evangelidis andPsarakis [17]. The ECC algorithm is an example of an intrin-sically robust cost function, normalized correlation, which isinvariant up to an affine illumination change. An example ofestimating illumination parameters alongside the motion pa-rameters is the Dual Inverse Compositional algorithm (DIC)by Bartoli [5]. We use two variations of DIC, the first is usingthe gain+bias model on grayscale images denoted by DIC-1.The second is using a full affine lighting model using theRGB image denoted by DIC-2. We also compare the perfor-mance against a recently published descriptor-based methodby Crivellaro and Lepetit [13] called Descriptor Fields DF.Finally, we include baseline results from raw intensity BF,improved LK with the Gradient Constraint GC [10], and theGradient Magnitude GM. We report two numbers in the eval-uation. Firstly, is the percent of successful tracking measuredby the overlap between the warped image given each algo-rithms estimate denoted by A and the warped image giventhe ground truth denoted by B. The overlap is computed aso = (A \B)/(A [B), where a value of o >

1/2 indicates a

successful tracking. Secondly, since we are interested in sub-pixel accuracy we also show the mean percentage of overlap

Template area Intensity BitPlanes DF [13]

75⇥ 57 650 460 370

150⇥ 115 360 170 160

300⇥ 230 140 90 60

640⇥ 460 45 35 20

Table 2. Planar template tracking runtime in frames per second onsingle core Intel Core i7 2.8 Ghz. DF is implemented with first-order derivatives [13] (five channels). BitPlanes timing are shownwith eight binary channels. All algorithms have been optimizedequally by us. Using the authors’s code of DF, we are unable to runfaster than 2 Hz with a densely sampled grid of points at resolution300⇥ 230.

across all frames given by m =

1/n

Pni=1 oi, where n is the

number of frames in each sequence.Results are compared for three types of geometric and

photometric variations. First is an out of plane rotation,which induces perspective change show in Fig. 10b. Second,is a dynamic lighting change where the image is station-ary but a illuminated with nonlinearly varying illumination.Finally a static lighting change, which is the same dataas in the dynamically changing light, but with the lightingtransition phase is removed.

A summary of the compared algorithms is provided inTable 1 and evaluation results are shown in Table 3 andFig. 11. Based on our experimentation, the top performingmethods are the ones that employ a Descriptor Constancyassumption, namely: BitPlanes and DF. However, BitPlanesis more efficient than DF and it performed significantly betterfor the ‘out of plane rotation’ which induces perspectivechange in the image. An example out of plane rotation datais shown in Fig. 10b. In fact, all tested algorithms, withthe exception of BitPlanes, performed poorly with this data.Algorithms that use a robust function (ECC) and the onesthat attempt to estimate illumination as well (DIC) performedwell, but fell behind in comparison to Descriptor Constancyand the simple gradient constraint.

Finally, the wood texture under dynamic lighting changepresented a challenge for all algorithms with the exception ofBitPlanes, which successes on all frames with a high overlapaccuracy of ⇡ 99% given the ground truth.

4. Related Work

The simplest form of multi-channel LK is RGB dataregistration [4], which usually provides better accuracy incomparison to using grayscale [26]. Lucey et al. [33] useda multi-channel representation with Gabor features, but Ga-bor features are not invariant to multiplicative illuminationchange. Distribution Fields [41] (where each channel is a binfrom a local intensity histogram) and Descriptor Fields (DF)[13] (where each channel is nonlinear function applied to1-st and 2-nd order image gradients), are other examples of

6

(fps)

Planar template tracking runtime in frames per second on single core Intel Core i7 2.8 Ghz.

H. Alismail, B. Browning, S. Lucey “Bit-Planes: Dense Subpixel Alignment of Binary Descriptors.” ACCV 2016.

Page 59: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Today

• Review - LK Algorithm

• Inverse Composition

• Robust Extensions

Page 60: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Robust Error Functions

• Least squares criterion is not robust to outliers (e.g. occlusion, illumination)

• For example, the two outliers here cause the fitted line to be quite wrong.

• Robust error functions can be helpful here.

Page 61: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Robust Error Functions

• Least squares criterion is not robust to outliers (e.g. occlusion, illumination)

• For example, the two outliers here cause the fitted line to be quite wrong.

• Robust error functions can be helpful here.

Page 62: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Robust Error Functions

argmin�p

⌘{T (0) +@T (0)

@pT�p� I(p)}

L2

L1

Robust Error

⌘{x}

x

for example,

⌘{x} = ||x||1 ! L1 error

Page 63: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Robust Error Functions

• Computational inefficiencies creep into inverse composition when one departs from the classical L2 distance.

• Much better results for fast tracking have been achieved recently using robust representations (e.g. Bristow et al.).

Page 64: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Robust Representations

1. Compute image gradients 2. Pool into local histograms3. Concatenate histograms4. Normalize histograms

! RN : RN⇥K

Page 65: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Robust IC Algorithm

44

• Actual algorithm is just the application of the following steps,

keep applying steps until converges.

Step 1:

Step 2:

p ! p ��1 �p

�p

argmin�p

|| {T (0)}+ @ {T (0)}@pT

�p� {I(p)}||22

Page 66: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

Descriptor Performance648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701

702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755

CVPR#****

CVPR#****

CVPR 2016 Submission #****. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

(a) Lighting change.

(b) Out-of-plane rotation.Figure 10. Tracking results using the Bricks dataset [19]. The top row of each figure shows the performance of BitPlanes, while the bottomrow shows classical intensity-based LK. Additional results and videos are provided in the supplementary materials.

0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

Overlap fraction

Succ

ess

frac

tion

BP ECC DIC-1

DIC-2 DF GC

LK

Figure 11. Fraction of successfully tracked frames as functionof the overlap area given the ground truth. BitPlanes and DFperform better than other methods. However, in Table 3 we see thatBitPlanes performing better with difficult textures and perspectivedistortion.

multi-channel representations sharing our goal of improvedrobustness to illumination change. DF’s performance is re-ported to be superior Distribution Fields [13]. Based on ourexperiments in Section 3, BitPlanes’ performance improveson DF especially under out of plane rotations that cause

prospective distortion of the image.Gradient-based alignment in a multi-channel representa-

tion has also been applied to matching across object cate-gories using SIFT [9]. The authors estimate the gradient ofthe objective using linear regression, which when applied toeach channel separately reduces to a standard convolution.Xiong and de la Torre [48] tackle the problem of descriptoralignment differently. Using SDM, the authors learn thedescent maps offline and demonstrate tracking results withHOG and SIFT, which are expensive to compute densely. Incontrast, our approach does not require an offline learningstep and is shown to work well with simple binary descrip-tors. We note that Ren et al. [38] has recently applied theSDM with LBP for the problem of face alignment, which re-sults in a fast algorithm during testing. However, the offlinetraining step remain present.

Antonakos et al. [1] work is closest to ours. The au-thors use a multi-channel representation with various featuredescriptors to improve robustness of face alignment usingAAMs. The authors experiment the histogrammed versionof LBP [37], which is not binary. In addition to the use of bi-

7

“Raw Pixels”

“BitPlanes”

540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593

594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647

CVPR#****

CVPR#****

CVPR 2016 Submission #****. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

((

*

(p-1-step) >=

*

p) & 0x01) |

((

*

(p -step) >=

*

p) & 0x02) |

((

*

(p+1-step) >=

*

p) & 0x04) |

((

*

(p-1 ) >=

*

p) & 0x08) |

((

*

(p+1 ) >=

*

p) & 0x10) |

((

*

(p-1+step) >=

*

p) & 0x20) |

((

*

(p +step) >=

*

p) & 0x40) |

((

*

(p+1+step) >=

*

p) & 0x80) ;

Figure 9. LBP/Census transform computation ’C’ pseudo-codeat 3 ⇥ 3 neighborhood, where p is a pointer to the desired pixellocation and step is number of elements per image row.

Algorithm # parameters # channels

BP (ours) 8 8

ECC [17] 8 1

DIC-1 [5] 10 1

DIC-2 [5] 20 3

DF [13] 8 5

GC [10] 8 3

GM 8 2

LK 8 1

Table 1. Algorithms compared in this work. Number of parametersindicate the DOF of the state vector, which is 8 for a plane inadditon to any photometric parameters. We use the authors’ codefor ECC and DIC.

latency. Pseudo-code to compute the LBP is show in Fig. 9which can be easily translated to SIMD instructions on theCPU or other parallel computing platforms such as GPU’sand FPGAs [6, 24, 7].

We compare the performance of our algorithm againsta variety of template tracking algorithms, which are: Theenhance correlation coefficient ECC by Evangelidis andPsarakis [17]. The ECC algorithm is an example of an intrin-sically robust cost function, normalized correlation, which isinvariant up to an affine illumination change. An example ofestimating illumination parameters alongside the motion pa-rameters is the Dual Inverse Compositional algorithm (DIC)by Bartoli [5]. We use two variations of DIC, the first is usingthe gain+bias model on grayscale images denoted by DIC-1.The second is using a full affine lighting model using theRGB image denoted by DIC-2. We also compare the perfor-mance against a recently published descriptor-based methodby Crivellaro and Lepetit [13] called Descriptor Fields DF.Finally, we include baseline results from raw intensity BF,improved LK with the Gradient Constraint GC [10], and theGradient Magnitude GM. We report two numbers in the eval-uation. Firstly, is the percent of successful tracking measuredby the overlap between the warped image given each algo-rithms estimate denoted by A and the warped image giventhe ground truth denoted by B. The overlap is computed aso = (A \B)/(A [B), where a value of o >

1/2 indicates a

successful tracking. Secondly, since we are interested in sub-pixel accuracy we also show the mean percentage of overlap

Template area Intensity BitPlanes DF [13]

75⇥ 57 650 460 370

150⇥ 115 360 170 160

300⇥ 230 140 90 60

640⇥ 460 45 35 20

Table 2. Planar template tracking runtime in frames per second onsingle core Intel Core i7 2.8 Ghz. DF is implemented with first-order derivatives [13] (five channels). BitPlanes timing are shownwith eight binary channels. All algorithms have been optimizedequally by us. Using the authors’s code of DF, we are unable to runfaster than 2 Hz with a densely sampled grid of points at resolution300⇥ 230.

across all frames given by m =

1/n

Pni=1 oi, where n is the

number of frames in each sequence.Results are compared for three types of geometric and

photometric variations. First is an out of plane rotation,which induces perspective change show in Fig. 10b. Second,is a dynamic lighting change where the image is station-ary but a illuminated with nonlinearly varying illumination.Finally a static lighting change, which is the same dataas in the dynamically changing light, but with the lightingtransition phase is removed.

A summary of the compared algorithms is provided inTable 1 and evaluation results are shown in Table 3 andFig. 11. Based on our experimentation, the top performingmethods are the ones that employ a Descriptor Constancyassumption, namely: BitPlanes and DF. However, BitPlanesis more efficient than DF and it performed significantly betterfor the ‘out of plane rotation’ which induces perspectivechange in the image. An example out of plane rotation datais shown in Fig. 10b. In fact, all tested algorithms, withthe exception of BitPlanes, performed poorly with this data.Algorithms that use a robust function (ECC) and the onesthat attempt to estimate illumination as well (DIC) performedwell, but fell behind in comparison to Descriptor Constancyand the simple gradient constraint.

Finally, the wood texture under dynamic lighting changepresented a challenge for all algorithms with the exception ofBitPlanes, which successes on all frames with a high overlapaccuracy of ⇡ 99% given the ground truth.

4. Related Work

The simplest form of multi-channel LK is RGB dataregistration [4], which usually provides better accuracy incomparison to using grayscale [26]. Lucey et al. [33] useda multi-channel representation with Gabor features, but Ga-bor features are not invariant to multiplicative illuminationchange. Distribution Fields [41] (where each channel is a binfrom a local intensity histogram) and Descriptor Fields (DF)[13] (where each channel is nonlinear function applied to1-st and 2-nd order image gradients), are other examples of

6

Planar template tracking runtime in frames per second on single core Intel Core i7 2.8 Ghz.

Page 67: Lecture 12 - Efficient & Robust LK for Mobile Vision16623.courses.cs.cmu.edu/slides/Lecture_12.pdf000x y 1 W (x; p)= 1 p 7 x + p 8 y + p 9 1 p 1 p 2 p 3 p 4 1 p 5 p 6 2 4 x y 1 3 5

More to read…

• Baker & Matthews, “Lucas-Kanade 20 Years On: A Unifying Framework”, IJCV 2004.

• Bristow & Lucey, “In Defense of Gradient-Based Alignment on Densely Sampled Sparse Features”, Springer Book on Dense Registration Methods 2015.

• Baker, Datta & Kanade “Parameterizing Homographies”, CMU Tech Report 06-11.

In Defense of Gradient-Based Alignment

on Densely Sampled Sparse Features

Hilton Bristow and Simon Lucey

Abstract In this chapter, we explore the surprising result that gradient-based continuous optimization methods perform well for the alignment ofimage/object models when using densely sampled sparse features (HOG,dense SIFT, etc.). Gradient-based approaches for image/object alignmenthave many desirable properties – inference is typically fast and exact, anddiverse constraints can be imposed on the motion of points. However, the pre-sumption that gradients predicted on sparse features would be poor estima-tors of the true descent direction has meant that gradient-based optimizationis often overlooked in favour of graph-based optimization. We show that thisintuition is only partly true: sparse features are indeed poor predictors of theerror surface, but this has no impact on the actual alignment performance.In fact, for general object categories that exhibit large geometric and ap-pearance variation, sparse features are integral to achieving any convergencewhatsoever. How the descent directions are predicted becomes an importantconsideration for these descriptors. We explore a number of strategies forestimating gradients, and show that estimating gradients via regression in amanner that explicitly handles outliers improves alignment performance sub-stantially. To illustrate the general applicability of gradient-based methodsto the alignment of challenging object categories, we perform unsupervisedensemble alignment on a series of non-rigid animal classes from ImageNet.

Hilton BristowQueensland University of Technology, Australia. e-mail: [email protected]

Simon LuceyThe Robotics Institute, Carnegie Mellon University, USA. e-mail: [email protected]

1

Parameterizing Homographies

CMU-RI-TR-06-11

Simon Baker, Ankur Datta, and Takeo Kanade

The Robotics InstituteCarnegie Mellon University

5000 Forbes AvenuePittsburgh, PA 15213

The motion of a plane can be described by a homography. We study how to parameterizehomographies to maximize plane estimation performance. We compare the usual 3 ⇥ 3 matrixparameterization with a parameterization that combines 4 fixed points in one of the images with4 variable points in the other image. We empirically show that this 4pt parameterization is farsuperior. We also compare both parameterizations with a variety of direct parameterizations. Inthe case of unknown relative orientation, we compare with a direct parameterization of the planeequation, and the rotation and translation of the camera(s). We show that the direct parameteri-zation is both less accurate and far less robust than the 4-point parameterization. We explain thepoor performance using a measure of independence of the Jacobian images. In the fully calibratedsetting, the direct parameterization just consists of 3 parameters of the plane equation. We showthat this parameterization is far more robust than the 4-point parameterization, but only approxi-mately as accurate. In the case of a moving stereo rig we find that the direct parameterization ofplane equation, camera rotation and translation performs very well, both in terms of accuracy androbustness. This is in contrast to the corresponding direct parameterization in the case of unknownrelative orientation. Finally, we illustrate the use of plane estimation in 2 automotive applications.