1
Combining complementary kernels in complex visual categorization {nicolas.thome, matthieu.cord}@lip6.fr 1 , [email protected] 2 1 LIP6, UPMC Paris 6, France 2 ETIS, ENSEA/CNRS/Université de Cergy Pontoise, France Contribution Two kernel-learning proposed algorithms: 1. Hybrid strategy published in [1]: new MKL algorithm non-sparse combina- tion between different image modalities bicycle car chair motorbike 25 30 35 40 45 50 0 0.2 0.4 0.6 0.8 1 30 35 40 45 50 0 0.2 0.4 0.6 0.8 1 28 30 32 34 36 38 40 42 44 0 0.2 0.4 0.6 0.8 1 30 35 40 45 50 0 0.2 0.4 0.6 0.8 1 68 70 72 74 76 78 80 82 0 0.2 0.4 0.6 0.8 1 22 23 24 25 26 27 28 29 30 31 0 0.2 0.4 0.6 0.8 1 15 20 25 30 35 0 0.2 0.4 0.6 0.8 1 30 35 40 45 50 0 0.2 0.4 0.6 0.8 1 person plant sofa tv-monitor 2. Unpublished work: learning a powered product of kernels, denoted Product Ker- nel Learning (PKL). Hybrid MKL-scheme Non-sparse combination between 6= image modalities, still using 1 optimisation scheme Idea: Each descriptor numerous kernels with varying parameters (e.g. σ for gaussian) Each channel c: set of M kernels K c,σ 1 MKL strategy to select the relevant σ parameter (SimpleMKL [2]) Adapted MKL problem formulation: f (x)= N e X i=1 α i y i N c X c=1 σ M X σ =σ 1 β c,σ k c,σ (x, x i ) - b joint optimization performed on α i (N e param- eters) and β c,σ (N c × M parameters). Kernel parameter tuning & learning at the same time: option to cross-validation (6= [3]). Product Kernel Learning: PKL Geometric combination of kernels K (x 1 , x 2 )= Y c k c (x 1 , x 2 ) β c Adapted PKL problem formulation: f (x)= X i α i y i Y c k c (x i , x) β c - b As in MKL: jointly learning α i and β c Algorithm for exponential kernels: k c (x 1 , x 2 )= e -β c d 2 c (x 1 ,x 2 ) Alternate optimization scheme: 1. Classic SVM solver on α 2. Approximate second order gradient descent on β Step 1 convex, Step 2 not overall problem not convex. Results UCI Toys like datasets for algorithm validation. Combination of Gaussian kernels on each axis. D ATA SET 1 -MKL (%) PKL (%) I NONOSPHERE 89.0 ± 2.1 94.2 ± 1.4 S ONAR 83.8 ± 3.8 86.2 ± 4.5 PKL is competitive to existing MKL algorithms: more accurate, sparser, faster VOC 2009 Categorization with multiple visual features (15 kernels, 150 for hybrid strategy). aeroplane bicycle bird boat bottle bus car cat chair cow table dog horse bike person plant sheep sofa train tv category SIFT Prod Avg 1 -MKL [2] 2 -MKL [3] Hybrid-MKL [1] PKL aeroplane 79.5 78.3 77.9 79.7 79.3 79.7 79.7 bicycle 46.9 45.9 46.0 47.8 47.9 48.3 47.0 bird 55.9 53.0 54.4 56.5 57.5 57.4 57.0 boat 61.4 56.9 56.4 62.3 60.1 62.8 62.2 bottle 17.6 18.7 19.1 19.5 19.8 20.1 19.2 bus 71.4 69.2 69.8 72.3 72.0 72.3 71.5 car 49.7 49.5 49.1 50.4 50.2 51.2 51.7 cat 54.8 54.4 54.1 56.8 57.2 57.0 56.8 chair 43.3 41.2 41.5 42.3 42.8 43.6 43.4 cow 21.1 24.3 24.7 21.7 25.1 24.9 26.5 dining-table 35.9 30.1 31.2 35.5 34.4 35.6 36.0 dog 39.1 35.8 35.2 37.4 37.4 38.2 39.4 horse 47.5 40.1 40.8 46.0 43.8 45.1 47.3 motorbike 46.3 54.9 55.3 53.2 56.0 55.8 55.0 person 82.0 81.8 81.7 82.5 82.8 82.9 82.8 potted-plant 23.0 29.9 30.9 30.7 31.8 31.3 29.4 sheep 33.0 24.8 26.7 30.1 31.7 30.7 32.9 sofa 32.6 25.9 25.3 32.5 29.9 32.0 33.2 train 68.2 67.1 67.5 69.9 69.5 69.8 69.4 tv-monitor 51.6 51.0 50.4 54.0 53.6 53.5 52.5 mean 48.0 46.7 46.9 49.0 49.1 49.6 49.6 Learned kernel combinations outperform best performing kernel (SIFT) False for uniform weighting (averaging-product) 6= [4] Uniform weighting sub-optimal as soon as large performance variation between kernels Sparse v.s. dense combination: task-dependent (Learning p norm c.f. [5]) Experimentally, Hybrid 1 -MKL: good compromise between 1 and 2 Globally, hybrid 1 -MKL and PKL offer best MAP, but slight improvement Discussion Unsucessful experiment: PKL for dis- criminative dictionnary learning, see [6] Unsucessful experiment: PKL for detec- tor/descriptor combination, see [7] Sum or Product Kernel Learning ? (a) Complementary Kernels Sum K 1 K 2 K s (Sum) K p (Prod) (b)Redundant Kernels Product K 1 K 2 K s (Sum) K p (Prod) Complementarity/Redundancy: metric ? Kernel correlation, Q-Stat, ρ-Stat ? Not effective in real image databases (VOC) References [1] David Picard, Nicolas Thome, and Matthieu Cord, “An efficient system for combining complementary kernels in complex visual categorization tasks,” in ICIP, 2010, pp. 3877– 3880. [2] Alain Rakotomamonjy, Francis Bach, Stephane Canu, and Yves Grandvalet, “Sim- pleMKL,” JMLR, vol. 9, pp. 2491–2521, 2008. [3] Marius Kloft, Ulf Brefeld, Soeren Sonnenburg, Pavel Laskov, Klaus-Robert Müller, and Alexander Zien, “Efficient and accurate lp-norm multiple kernel learning,” in NIPS, pp. 997–1005. 2009. [4] Peter V. Gehler and Sebastian Nowozin, “On feature combination for multiclass object classification,” in IEEE ICCV, 2009. [5] Fei Yan, Krystian Mikolajczyk, Mark Barnard, Hongping Cai, and Josef Kittler, “Lp norm multiple kernel fisher discriminant analysis for object and image categorisation,” in IEEE CVPR, 2010. [6] H. Cai, F. Yan, and K. Mikolajczyk, “Learning weights for codebook in image classification and retrieval,” in IEEE Conference on Computer Vision and Pattern Recognition, 2010. [7] Marcin Marszalek, Cordelia Schmid, Hedi Harzallah, and Joost van de Weijer, “Learning object representations for visual object class recognition,” oct 2007, Visual Recognition Challange workshop, in conjunction with ICCV.

visual categorization Combining complementary kernels in

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: visual categorization Combining complementary kernels in

Combining complementary kernels in complexvisual categorization{nicolas.thome, matthieu.cord}@lip6.fr1, [email protected]

1LIP6, UPMC Paris 6, France2ETIS, ENSEA/CNRS/Université de Cergy Pontoise, France

ContributionTwo kernel-learning proposed algorithms:

1. Hybrid strategy published in [1]: newMKL algorithm ⇒ non-sparse combina-tion between different image modalities

bicycle car chair motorbike

25

30

35

40

45

50

0 0.2 0.4 0.6 0.8 1

30

35

40

45

50

0 0.2 0.4 0.6 0.8 1

28

30

32

34

36

38

40

42

44

0 0.2 0.4 0.6 0.8 1

30

35

40

45

50

0 0.2 0.4 0.6 0.8 1

68

70

72

74

76

78

80

82

0 0.2 0.4 0.6 0.8 1

22

23

24

25

26

27

28

29

30

31

0 0.2 0.4 0.6 0.8 1

15

20

25

30

35

0 0.2 0.4 0.6 0.8 1

30

35

40

45

50

0 0.2 0.4 0.6 0.8 1

person plant sofa tv-monitor

2. Unpublished work: learning a poweredproduct of kernels, denoted Product Ker-nel Learning (PKL).

Hybrid MKL-schemeNon-sparse combination between 6= imagemodalities, still using `1 optimisation scheme

Idea: Each descriptor ⇒ numerous kernelswith varying parameters (e.g. σ for gaussian)• Each channel c: set of M kernels Kc,σ

• `1 MKL strategy to select the relevant σparameter (SimpleMKL [2])

Adapted MKL problem formulation:

f(x) =

Ne∑i=1

αiyi

Nc∑c=1

σM∑σ=σ1

βc,σkc,σ(x,xi)− b

joint optimization performed on αi (Ne param-eters) and βc,σ (Nc ×M parameters).

• Kernel parameter tuning & learning atthe same time: option to cross-validation(6= [3]).

Product Kernel Learning: PKLGeometric combination of kernels

K(x1,x2) =∏c

kc(x1,x2)βc

Adapted PKL problem formulation:

f(x) =∑i

αiyi∏c

kc(xi,x)βc − b

As in MKL: jointly learning αi and βc• Algorithm for exponential kernels:kc(x1,x2) = e−βcd

2c(x1,x2)

• Alternate optimization scheme:1. Classic SVM solver on α2. Approximate second order gradient

descent on β

• Step 1 convex, Step 2 not⇒ overall problem not convex.

ResultsUCI Toys like datasets for algorithm validation. Combination of Gaussian kernels on each axis.

DATA SET `1-MKL (%) PKL (%)INONOSPHERE 89.0 ± 2.1 94.2 ± 1.4SONAR 83.8 ± 3.8 86.2 ± 4.5

⇒ PKL is competitive to existing MKL algorithms: more accurate, sparser, faster

VOC 2009 Categorization with multiple visual features (15 kernels, 150 for hybrid strategy).aeroplane bicycle bird boat bottle bus car cat chair cow

table dog horse bike person plant sheep sofa train tv

category SIFT Prod Avg `1-MKL [2] `2-MKL [3] Hybrid-MKL [1] PKLaeroplane 79.5 78.3 77.9 79.7 79.3 79.7 79.7bicycle 46.9 45.9 46.0 47.8 47.9 48.3 47.0bird 55.9 53.0 54.4 56.5 57.5 57.4 57.0boat 61.4 56.9 56.4 62.3 60.1 62.8 62.2bottle 17.6 18.7 19.1 19.5 19.8 20.1 19.2bus 71.4 69.2 69.8 72.3 72.0 72.3 71.5car 49.7 49.5 49.1 50.4 50.2 51.2 51.7cat 54.8 54.4 54.1 56.8 57.2 57.0 56.8chair 43.3 41.2 41.5 42.3 42.8 43.6 43.4cow 21.1 24.3 24.7 21.7 25.1 24.9 26.5dining-table 35.9 30.1 31.2 35.5 34.4 35.6 36.0dog 39.1 35.8 35.2 37.4 37.4 38.2 39.4horse 47.5 40.1 40.8 46.0 43.8 45.1 47.3motorbike 46.3 54.9 55.3 53.2 56.0 55.8 55.0person 82.0 81.8 81.7 82.5 82.8 82.9 82.8potted-plant 23.0 29.9 30.9 30.7 31.8 31.3 29.4sheep 33.0 24.8 26.7 30.1 31.7 30.7 32.9sofa 32.6 25.9 25.3 32.5 29.9 32.0 33.2train 68.2 67.1 67.5 69.9 69.5 69.8 69.4tv-monitor 51.6 51.0 50.4 54.0 53.6 53.5 52.5mean 48.0 46.7 46.9 49.0 49.1 49.6 49.6

• Learned kernel combinations outperform best performing kernel (SIFT)– False for uniform weighting (averaging-product) 6= [4]– Uniform weighting sub-optimal as soon as large performance variation between kernels

• Sparse v.s. dense combination: task-dependent (Learning `p norm c.f. [5])– Experimentally, Hybrid `1-MKL: good compromise between `1 and `2

• Globally, hybrid `1-MKL and PKL offer best MAP, but slight improvement

Discussion• Unsucessful experiment: PKL for dis-

criminative dictionnary learning, see [6]

• Unsucessful experiment: PKL for detec-tor/descriptor combination, see [7]

• Sum or Product Kernel Learning ?

(a) Complementary Kernels⇒ Sum

K1 K2 Ks (Sum) Kp (Prod)(b)Redundant Kernels⇒ Product

K1 K2 Ks (Sum) Kp (Prod)

• Complementarity/Redundancy: metric ?Kernel correlation, Q-Stat, ρ-Stat ?

• Not effective in real image databases(VOC)

References[1] David Picard, Nicolas Thome, and Matthieu Cord, “An efficient system for combining

complementary kernels in complex visual categorization tasks,” in ICIP, 2010, pp. 3877–3880.

[2] Alain Rakotomamonjy, Francis Bach, Stephane Canu, and Yves Grandvalet, “Sim-pleMKL,” JMLR, vol. 9, pp. 2491–2521, 2008.

[3] Marius Kloft, Ulf Brefeld, Soeren Sonnenburg, Pavel Laskov, Klaus-Robert Müller, andAlexander Zien, “Efficient and accurate lp-norm multiple kernel learning,” in NIPS, pp.997–1005. 2009.

[4] Peter V. Gehler and Sebastian Nowozin, “On feature combination for multiclass objectclassification,” in IEEE ICCV, 2009.

[5] Fei Yan, Krystian Mikolajczyk, Mark Barnard, Hongping Cai, and Josef Kittler, “Lp normmultiple kernel fisher discriminant analysis for object and image categorisation,” in IEEECVPR, 2010.

[6] H. Cai, F. Yan, and K. Mikolajczyk, “Learning weights for codebook in image classificationand retrieval,” in IEEE Conference on Computer Vision and Pattern Recognition, 2010.

[7] Marcin Marszałek, Cordelia Schmid, Hedi Harzallah, and Joost van de Weijer, “Learningobject representations for visual object class recognition,” oct 2007, Visual RecognitionChallange workshop, in conjunction with ICCV.