Supplementary material for the paper: Discriminative ...openaccess.thecvf.com/content_cvpr_2017/supplemental/Lukezic...c p DFMh)+^lH(^h c p DFMh) + kh^ c p ... Rank 0.0 0.1 0.2 0.3

Supplementary material for the paper:Discriminative Correlation Filter with Channel and Spatial Reliability

DCF-CSR

Alan Lukezic1, Tomas Vojır2, Luka Cehovin Zajc1, Jirı Matas2 and Matej Kristan1

1Faculty of Computer and Information Science, University of Ljubljana, Slovenia2Faculty of Electrical Engineering, Czech Technical University in Prague, Czech Republic

{alan.lukezic, luka.cehovin, matej.kristan}@fri.uni-lj.si{vojirtom, matas}@cmp.felk.cvut.cz

Abstract

This is the supplementary material for the paper ”Dis-criminative Correlation Filter with Channel and Spatial Re-liability” submitted to the CVPR 2017. Due to spatial con-straints, parts not crucial for understanding the DCF-CSRtracker formulation, but helpful for gaining insights, weremoved here.

1. Derivation of the augmented Lagrangianminimizer

This section provides the complete derivation of theclosed-form solutions for the relations (9,10) in the submit-ted paper [3]. The augmented Lagrangian from Equation(5) in [3] is

L(hc,h, l) = ‖hHc diag(f)− g‖2 + λ

2‖hm‖2 + (1)

[lH(hc − hm) + lH(hc − hm)] + µ‖hc − hm‖2,

with hm = (m�h). For the purposes of derivation we willrewrite (1) into a fully vectorized form

L(hc,h, l) = ‖hHc diag(f)− g‖2 + λ

2‖hm‖2+ (2)[

lH(hc −√DFMh) + lH(hc −

√DFMh)

]+

µ‖hc −√DFMh‖2,

where F denotes D × D orthonormal matrix of Fouriercoefficients, such that the Fourier transform is defined asx = F(x) =

√DFx and M = diag(m). For clearer rep-

resentation we denote the four terms in the summation (2)as

L(hc,h, l) = L1 + L2 + L3 + L4, (3)

where

L1 =(hHc diag(f)− g

)(hHc diag(f)− g

)T, (4)

L2 =λ

2‖hm‖2, (5)

L3 = lH(hc −√DFMh) + lH(hc −

√DFMh), (6)

L4 = µ‖hc −√DFMh‖2. (7)

Minimization of (Equation 5 in [3]) is an iterative processat which the following minimizations are required:

hoptc = argmin

hc

L(hc,h, l), (8)

hopt = argminh

L(hoptc ,h, l). (9)

Minimization w.r.t. to hc is derived by finding hc at whichthe complex gradient of the augmented Lagrangian van-ishes, i.e.,

∇hcL ≡ 0, (10)

∇hcL1 +∇hc

L2 +∇hcL3 +∇hc

L4 ≡ 0. (11)

The partial complex gradients are:

∇hcL1 = (12)

=∂

∂hc

[(hHc diag(f)− g

)(hHc diag(f)− g

)T]=

=∂

∂hc

[hHc diag(f)diag(f)H hc − hHc diag(f)gH−

gdiag(f)H hc + ggH]=

= diag(f)diag(f)H hc − diag(f)gH ,

1

∇hcL2 = 0, (13)

∇hcL3 = (14)

=∂

∂hc

[lH

(hc −

√DFMh

)+ lH

(hc −

√DFMh

)]=

=∂

∂hc

[lH hc − lH

√DFMh+ lT hc − l

H√DFMh

]=

= l,

∇hcL4 = (15)

=∂

∂hc

[µ(hc −

√DFMh

)(hc −

√DFMh

)T]=

=∂

∂hc

[µ(hch

Hc − hc

√DhHMFH−

√DFMhhHc +DFMhhHMFH

)]=

= µhc − µ√DFMh.

Note that√DFMh = hm according to our original defini-

tion of hm. Plugging (12-15) into (11) yields

diag(f)diag(f)H hc − diag(f)gH + lµhc − µhm = 0,(16)

hc =diag(f)gH + µhm − l

diag(f)diag(f)H + µ,

which can be rewritten into

hc =f � g + µhm − l

f � f + µ. (17)

Next we derive the closed-form solution of (9). The optimalh is obtained when the complex gradient w.r.t. h vanishes,i.e.,

∇hL ≡ 0 (18)∇hL1 +∇hL2 +∇hL3 +∇hL4 ≡ 0. (19)

The partial gradients are

∇hL1 = 0, (20)

∇hL2 = (21)

=∂

∂h

[λ

2(Mh)

T(Mh)

]=

∂

∂h

[λ

2hHMMh

].

Since we defined mask m as real-valued binary mask, theproduct MM can be simplified into M and the result for∇hL2 is

∇hL2 =λ

2Mh. (22)

The remaining gradients are as follows:

∇hL3 = (23)

=∂

∂h

[lH

(hc −

√DFMh

)+ lH

(hc −

√DFMh

)]=

=∂

∂h

[lH hc − lH

√DFMh+ lT hc − l

H√DFMh

]=

= −√DMFH l,

∇hL4 = (24)

=∂

∂h

[µ(hc −

√DFMh

)T(hc −

√DFMh

)]=

=∂

∂h

[µ(hHc hc − hHc

√DFMh−

√DhHMFH hc +DhHMh

)]=

= −µ√DMFH hc + µDMh.

Plugging (20-24) into (19) yields

λ

2Mh−

√DMFH l− µ

√DMFH hc + µDMh = 0,

(25)

Mh = M

√DFH (l+ µhc)

λ2 + µD

.

Using the definition of the inverse Fourier transform, i.e.,F−1(x) = 1√

DFH x, (25) can be rewritten into

m� h = m� F−1(l+ µhc)λ2D + µ

. (26)

The values in m are either zero or one. Elements in hthat correspond to the zeros in m can in principle not berecovered from (26) since this would result in division byzero. But our initial definition of the problem was to seeksolutions for the filter that satisfies the following relationh ≡ h �m. This means the values corresponding to ze-ros in m should be zero in h. Thus the proximal solutionto (26) is

h = m� F−1(l+ µhc)λ2D + µ

. (27)

2. Convergence of the derived optimizationThe optimization of augmented Lagrangian (Algorithm

1 in the paper) is central to the proposed tracking algorithm.At each tracking iteration, we constrain the optimization torun for four iterations since the major drop in filter erroris achieved within the first few steps. For convenience, wevisualize here the convergence profile.

Figure 1 shows the average squared difference betweenresult of the correlation of the the filter constrained by themask function and the ideal output. This graph was ob-tained by averaging 60 examples of initializing a filter on atarget (one per VOT2015 sequence) and scaling each to aninterval between zero and one. It is clear that the error dropsby 80% within the first few iterations. The reduction fromthe remaining iterations is due to reduction of the errors inthe sidelobes of the filter.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40Number of iterations

Cost

Figure 1: Convergence of our optimization method. Thegraph is averaged over the initializations in 60 sequences,while for each sequence it is normalized to have maximumcost at 1 and minimum at 0. Due to the different absolutecost values for each optimization, normalization is used todemonstrate only how cost changes during the optimization.

3. Time analysisThe average speed of our tracker measured on the

VOT2016 dataset is approximately 13 frames-per-second or77 milliseconds per-frame. Figure 2 shows the processingtime required by each step of the SCR-DCF. A tracking it-eration is divided into two steps: (i) target localization and(ii) the visual model update. Target localization takes inaverage 35 milliseconds at each frame and is composed oftwo sub-steps: estimation of object translation (23ms) andscale change estimation (12ms). The visual model updatestep takes on average 42 milliseconds. It consists of threesub-steps: spatial reliability map construction (16ms), filterupdate (12ms) and scale model update (14ms). Filter op-timization, which is part of the filter update step, takes onaverage 7 milliseconds.

Localization

Update

Translation23ms

Scale estimation12ms

Reliability map 16ms

Filter update 12ms

Scale update14ms

Figure 2: A single iteration processing time decomposedacross the main steps of the SCR-DCF.

4. VOT benchmarksIn the paper [3] Figure 7 shows the results of ex-

pected average overlap measure for VOT2015 [2] andVOT2016 [1] challenge. For better clarity we showed onlytop-performing trackers. Full results are presented here.The VOT2015 [2] challenge results with all 61 trackers andthe CSR-DCF are shown in Figure 3. Figure 4 shows the re-sults for 70 trackers and CSR-DCF on VOT 2016 [1] chal-lenge.

References[1] M. Kristan, A. Leonardis, J. Matas, M. Felsberg,

R. Pflugfelder, L. Cehovin, T. Vojir, G. Hager, A. Lukezic,and G. et al. Fernandez. The visual object tracking vot2016challenge results. In Proc. European Conf. Computer Vision,2016. 3, 4

[2] M. Kristan, J. Matas, A. Leonardis, M. Felsberg, L. Cehovin,G. Fernandez, T. Vojir, G. Hager, G. Nebehay, and R. et al.Pflugfelder. The visual object tracking vot2015 challenge re-sults. In Int. Conf. Computer Vision, 2015. 3, 4

[3] A. Lukezic, T. Vojır, L. Cehovin Zajc, J. Matas, and M. Kris-tan. Discriminative correlation filter with channel and spatialreliability. In Comp. Vis. Patt. Recognition, 2017. 1, 3

1102030405060Rank

0.0

0.1

0.2

0.3

0.4

Expe

cted

Ave

rage

Ove

rlapqwe

r

t

y

uio

a

s

d

f

ghj

k

l;

2)2!

2@

2#

2$

2%

2^

2&2*

2(3)

3!3@

3#

3$

3%

3^3&3*

3(4)

4!

4@4#

4$

4%4^4&

4*

4(

5)5!

5@

5#5$5%

5^5&5*

5(

6)

6!6@

ACT3%amt6@AOGTracker2#ASMS2)baseline3@bdf4@cmil3$CMT5^CSR-DCFq

CT5&DATgDeepSRDCFwDFT4(DSST3*dtracker2&EBTefct4^FoT5)

FragTrack5*ggt2(HMMTxD;HT4%IVT5$KCF24)kcfmtsa3^kcfdp4#kcfv23#

L1APG5@LDPtLGT3&loftlite6)LT_FLO5!matflow4&MCTlMEEMhMIL3(

mkcfplus2!muster3)mvcft2%ncc6!nsamfiOAB5#OACFkPKLTF4$rajssca

RobStruckjs3trackerssamf2*SCBT4*scebtusKCF4!sme2$SODLTfsPSTy

srat2^srdcfrSTC5%struckosumshiftdTGPR3!tric2@zhang5(

Figure 3: Expected average overlap plot for full VOT2015 [2] (left) benchmark with the proposed CSR-DCF tracker. Legendis shown on the right side.

110203040506070Rank

0.0

0.1

0.2

0.3

0.4

Expe

cted

Ave

rage

Ove

rlapqw

e

rt

yu

io

a

s

d

fg

h

jk

l

;2)

2!2@

2#

2$

2%

2^

2&2*

2(

3)3!

3@

3#

3$3%

3^3&

3*3(

4)

4!

4@

4#

4$

4%4^

4&

4*

4(5)

5!

5@

5#5$

5%

5^5&5*

5(6)

6!

6@

6#

6$

6%6^6&

6*

6(7)7!

ACT4&ANT3$ARTDSST4(ASMS3#BDF6@BST3^CCCT2&CCOTwCDTT5!CMT7)ColorKCF2^CSR-DCFq

CTF6*DAT3@DDCudeepMKCF2#DeepSRDCFgDFST5$DFT6!DNTsDPT2!DPTG4^DSST20144$EBTi

FCFkFCT6)FoT5*FRT6&GCF3)GGTv22)HCF2(HMMTxD2$HT5%IVT6$KCF20143*KCFSMXPC3!

LGT4*LoFTLite6(LT_FLO6#MAD3%MatFlow5@Matrioska6%MDNetNjMIL5)MLDFtMWCF4#ncc7!NSAMF2%

OEST3(PKLTF5^RFDCF2;SAMF20144!SCT44)SHCThSiamAN2@SiamRNfsKCF5#SMPR5&SODLT2*SRBTo

SRDCFlSSATrSSKCFdStapleySTAPLEpaSTC6^STRUCK20145(SWCF4@TCNNeTGPR4%TricTRACK3&

Figure 4: Expected average overlap plot for full VOT2016 [1] (left) benchmark with the proposed CSR-DCF tracker. Legendis shown on the right side.

Documents

Supplementary material for the paper: Discriminative ...openaccess.thecvf.com/content_cvpr_2017/supplemental/Lukezic...c p DFMh)+^lH(^h c p DFMh) + kh^ c p ... Rank 0.0 0.1 0.2 0.3