Upload
habao
View
219
Download
1
Embed Size (px)
Citation preview
Supplementary material for the paper:Discriminative Correlation Filter with Channel and Spatial Reliability
DCF-CSR
Alan Lukezic1, Tomas Vojır2, Luka Cehovin Zajc1, Jirı Matas2 and Matej Kristan1
1Faculty of Computer and Information Science, University of Ljubljana, Slovenia2Faculty of Electrical Engineering, Czech Technical University in Prague, Czech Republic
{alan.lukezic, luka.cehovin, matej.kristan}@fri.uni-lj.si{vojirtom, matas}@cmp.felk.cvut.cz
Abstract
This is the supplementary material for the paper ”Dis-criminative Correlation Filter with Channel and Spatial Re-liability” submitted to the CVPR 2017. Due to spatial con-straints, parts not crucial for understanding the DCF-CSRtracker formulation, but helpful for gaining insights, weremoved here.
1. Derivation of the augmented Lagrangianminimizer
This section provides the complete derivation of theclosed-form solutions for the relations (9,10) in the submit-ted paper [3]. The augmented Lagrangian from Equation(5) in [3] is
L(hc,h, l) = ‖hHc diag(f)− g‖2 + λ
2‖hm‖2 + (1)
[lH(hc − hm) + lH(hc − hm)] + µ‖hc − hm‖2,
with hm = (m�h). For the purposes of derivation we willrewrite (1) into a fully vectorized form
L(hc,h, l) = ‖hHc diag(f)− g‖2 + λ
2‖hm‖2+ (2)[
lH(hc −√DFMh) + lH(hc −
√DFMh)
]+
µ‖hc −√DFMh‖2,
where F denotes D × D orthonormal matrix of Fouriercoefficients, such that the Fourier transform is defined asx = F(x) =
√DFx and M = diag(m). For clearer rep-
resentation we denote the four terms in the summation (2)as
L(hc,h, l) = L1 + L2 + L3 + L4, (3)
where
L1 =(hHc diag(f)− g
)(hHc diag(f)− g
)T, (4)
L2 =λ
2‖hm‖2, (5)
L3 = lH(hc −√DFMh) + lH(hc −
√DFMh), (6)
L4 = µ‖hc −√DFMh‖2. (7)
Minimization of (Equation 5 in [3]) is an iterative processat which the following minimizations are required:
hoptc = argmin
hc
L(hc,h, l), (8)
hopt = argminh
L(hoptc ,h, l). (9)
Minimization w.r.t. to hc is derived by finding hc at whichthe complex gradient of the augmented Lagrangian van-ishes, i.e.,
∇hcL ≡ 0, (10)
∇hcL1 +∇hc
L2 +∇hcL3 +∇hc
L4 ≡ 0. (11)
The partial complex gradients are:
∇hcL1 = (12)
=∂
∂hc
[(hHc diag(f)− g
)(hHc diag(f)− g
)T]=
=∂
∂hc
[hHc diag(f)diag(f)H hc − hHc diag(f)gH−
gdiag(f)H hc + ggH]=
= diag(f)diag(f)H hc − diag(f)gH ,
1
∇hcL2 = 0, (13)
∇hcL3 = (14)
=∂
∂hc
[lH
(hc −
√DFMh
)+ lH
(hc −
√DFMh
)]=
=∂
∂hc
[lH hc − lH
√DFMh+ lT hc − l
H√DFMh
]=
= l,
∇hcL4 = (15)
=∂
∂hc
[µ(hc −
√DFMh
)(hc −
√DFMh
)T]=
=∂
∂hc
[µ(hch
Hc − hc
√DhHMFH−
√DFMhhHc +DFMhhHMFH
)]=
= µhc − µ√DFMh.
Note that√DFMh = hm according to our original defini-
tion of hm. Plugging (12-15) into (11) yields
diag(f)diag(f)H hc − diag(f)gH + lµhc − µhm = 0,(16)
hc =diag(f)gH + µhm − l
diag(f)diag(f)H + µ,
which can be rewritten into
hc =f � g + µhm − l
f � f + µ. (17)
Next we derive the closed-form solution of (9). The optimalh is obtained when the complex gradient w.r.t. h vanishes,i.e.,
∇hL ≡ 0 (18)∇hL1 +∇hL2 +∇hL3 +∇hL4 ≡ 0. (19)
The partial gradients are
∇hL1 = 0, (20)
∇hL2 = (21)
=∂
∂h
[λ
2(Mh)
T(Mh)
]=
∂
∂h
[λ
2hHMMh
].
Since we defined mask m as real-valued binary mask, theproduct MM can be simplified into M and the result for∇hL2 is
∇hL2 =λ
2Mh. (22)
The remaining gradients are as follows:
∇hL3 = (23)
=∂
∂h
[lH
(hc −
√DFMh
)+ lH
(hc −
√DFMh
)]=
=∂
∂h
[lH hc − lH
√DFMh+ lT hc − l
H√DFMh
]=
= −√DMFH l,
∇hL4 = (24)
=∂
∂h
[µ(hc −
√DFMh
)T(hc −
√DFMh
)]=
=∂
∂h
[µ(hHc hc − hHc
√DFMh−
√DhHMFH hc +DhHMh
)]=
= −µ√DMFH hc + µDMh.
Plugging (20-24) into (19) yields
λ
2Mh−
√DMFH l− µ
√DMFH hc + µDMh = 0,
(25)
Mh = M
√DFH (l+ µhc)
λ2 + µD
.
Using the definition of the inverse Fourier transform, i.e.,F−1(x) = 1√
DFH x, (25) can be rewritten into
m� h = m� F−1(l+ µhc)λ2D + µ
. (26)
The values in m are either zero or one. Elements in hthat correspond to the zeros in m can in principle not berecovered from (26) since this would result in division byzero. But our initial definition of the problem was to seeksolutions for the filter that satisfies the following relationh ≡ h �m. This means the values corresponding to ze-ros in m should be zero in h. Thus the proximal solutionto (26) is
h = m� F−1(l+ µhc)λ2D + µ
. (27)
2. Convergence of the derived optimizationThe optimization of augmented Lagrangian (Algorithm
1 in the paper) is central to the proposed tracking algorithm.At each tracking iteration, we constrain the optimization torun for four iterations since the major drop in filter erroris achieved within the first few steps. For convenience, wevisualize here the convergence profile.
Figure 1 shows the average squared difference betweenresult of the correlation of the the filter constrained by themask function and the ideal output. This graph was ob-tained by averaging 60 examples of initializing a filter on atarget (one per VOT2015 sequence) and scaling each to aninterval between zero and one. It is clear that the error dropsby 80% within the first few iterations. The reduction fromthe remaining iterations is due to reduction of the errors inthe sidelobes of the filter.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 10 20 30 40Number of iterations
Cost
Figure 1: Convergence of our optimization method. Thegraph is averaged over the initializations in 60 sequences,while for each sequence it is normalized to have maximumcost at 1 and minimum at 0. Due to the different absolutecost values for each optimization, normalization is used todemonstrate only how cost changes during the optimization.
3. Time analysisThe average speed of our tracker measured on the
VOT2016 dataset is approximately 13 frames-per-second or77 milliseconds per-frame. Figure 2 shows the processingtime required by each step of the SCR-DCF. A tracking it-eration is divided into two steps: (i) target localization and(ii) the visual model update. Target localization takes inaverage 35 milliseconds at each frame and is composed oftwo sub-steps: estimation of object translation (23ms) andscale change estimation (12ms). The visual model updatestep takes on average 42 milliseconds. It consists of threesub-steps: spatial reliability map construction (16ms), filterupdate (12ms) and scale model update (14ms). Filter op-timization, which is part of the filter update step, takes onaverage 7 milliseconds.
Localization
Update
Translation23ms
Scale estimation12ms
Reliability map 16ms
Filter update 12ms
Scale update14ms
Figure 2: A single iteration processing time decomposedacross the main steps of the SCR-DCF.
4. VOT benchmarksIn the paper [3] Figure 7 shows the results of ex-
pected average overlap measure for VOT2015 [2] andVOT2016 [1] challenge. For better clarity we showed onlytop-performing trackers. Full results are presented here.The VOT2015 [2] challenge results with all 61 trackers andthe CSR-DCF are shown in Figure 3. Figure 4 shows the re-sults for 70 trackers and CSR-DCF on VOT 2016 [1] chal-lenge.
References[1] M. Kristan, A. Leonardis, J. Matas, M. Felsberg,
R. Pflugfelder, L. Cehovin, T. Vojir, G. Hager, A. Lukezic,and G. et al. Fernandez. The visual object tracking vot2016challenge results. In Proc. European Conf. Computer Vision,2016. 3, 4
[2] M. Kristan, J. Matas, A. Leonardis, M. Felsberg, L. Cehovin,G. Fernandez, T. Vojir, G. Hager, G. Nebehay, and R. et al.Pflugfelder. The visual object tracking vot2015 challenge re-sults. In Int. Conf. Computer Vision, 2015. 3, 4
[3] A. Lukezic, T. Vojır, L. Cehovin Zajc, J. Matas, and M. Kris-tan. Discriminative correlation filter with channel and spatialreliability. In Comp. Vis. Patt. Recognition, 2017. 1, 3
1102030405060Rank
0.0
0.1
0.2
0.3
0.4
Expe
cted
Ave
rage
Ove
rlapqwe
r
t
y
uio
a
s
d
f
ghj
k
l;
2)2!
2@
2#
2$
2%
2^
2&2*
2(3)
3!3@
3#
3$
3%
3^3&3*
3(4)
4!
4@4#
4$
4%4^4&
4*
4(
5)5!
5@
5#5$5%
5^5&5*
5(
6)
6!6@
ACT3%amt6@AOGTracker2#ASMS2)baseline3@bdf4@cmil3$CMT5^CSR-DCFq
CT5&DATgDeepSRDCFwDFT4(DSST3*dtracker2&EBTefct4^FoT5)
FragTrack5*ggt2(HMMTxD;HT4%IVT5$KCF24)kcfmtsa3^kcfdp4#kcfv23#
L1APG5@LDPtLGT3&loftlite6)LT_FLO5!matflow4&MCTlMEEMhMIL3(
mkcfplus2!muster3)mvcft2%ncc6!nsamfiOAB5#OACFkPKLTF4$rajssca
RobStruckjs3trackerssamf2*SCBT4*scebtusKCF4!sme2$SODLTfsPSTy
srat2^srdcfrSTC5%struckosumshiftdTGPR3!tric2@zhang5(
Figure 3: Expected average overlap plot for full VOT2015 [2] (left) benchmark with the proposed CSR-DCF tracker. Legendis shown on the right side.
110203040506070Rank
0.0
0.1
0.2
0.3
0.4
Expe
cted
Ave
rage
Ove
rlapqw
e
rt
yu
io
a
s
d
fg
h
jk
l
;2)
2!2@
2#
2$
2%
2^
2&2*
2(
3)3!
3@
3#
3$3%
3^3&
3*3(
4)
4!
4@
4#
4$
4%4^
4&
4*
4(5)
5!
5@
5#5$
5%
5^5&5*
5(6)
6!
6@
6#
6$
6%6^6&
6*
6(7)7!
ACT4&ANT3$ARTDSST4(ASMS3#BDF6@BST3^CCCT2&CCOTwCDTT5!CMT7)ColorKCF2^CSR-DCFq
CTF6*DAT3@DDCudeepMKCF2#DeepSRDCFgDFST5$DFT6!DNTsDPT2!DPTG4^DSST20144$EBTi
FCFkFCT6)FoT5*FRT6&GCF3)GGTv22)HCF2(HMMTxD2$HT5%IVT6$KCF20143*KCFSMXPC3!
LGT4*LoFTLite6(LT_FLO6#MAD3%MatFlow5@Matrioska6%MDNetNjMIL5)MLDFtMWCF4#ncc7!NSAMF2%
OEST3(PKLTF5^RFDCF2;SAMF20144!SCT44)SHCThSiamAN2@SiamRNfsKCF5#SMPR5&SODLT2*SRBTo
SRDCFlSSATrSSKCFdStapleySTAPLEpaSTC6^STRUCK20145(SWCF4@TCNNeTGPR4%TricTRACK3&
Figure 4: Expected average overlap plot for full VOT2016 [1] (left) benchmark with the proposed CSR-DCF tracker. Legendis shown on the right side.