Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
1
Learning for Bridging the Semantic Gap in Image Retrieval
Steven Chu-Hong HOI
RM 1021
11:15a.m. – 12:3028-April, 2005
Ph.D. Qualify Presentation, CSE Dept, CUHK
Supervisor: Prof. Michael R. LYU
Committee: Prof. Tien-Tsin WONGProf. Leo Jiaya JIA
2 of 60
Outline• Introduction
Image Retrieval: TBIR, CBIRThe Semantic Gap: Relevance FeedbackMotivationScope of Our Work
• A Unified Learning FrameworkOverviewSemi-Supervised Active LearningLog-based Relevance FeedbackRegularized Distance Metric LearningDiscussions
• Conclusions
3 of 60
Introduction• Image Retrieval
Importance• Imperative due to explosive growth of image and video data• Far from mature compared with text information retrieval
Applications:• Visual information management in various disciplines, such as,
digital media library, medical image retrieval, photo album management, Web image searching, etc.
Text-based vs. Content-based • TBIR: attractive, needs automatic Annotations, but not yet ready
Low Accuracy, e.g. ~ [20% - 30%] on 263 words, 5k images• CBIR: more popular and feasible, query using low-level features
4 of 60
Introduction• Content-based Image Retrieval (CBIR)
Receiving considerable amount of research effortsUsing query-by-example (QBE), low-level visual features, such as color, edge, and textureChallenge: The semantic gap between extracted low-level features and high-level human concepts
QBE
5 of 60
Introduction• What is “the semantic gap”?
“The semantic gap is the lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given situation.” –– Smeulders et al. 2000
The semantic gap is the difference between thehuman's perception and machine’s understanding of the visual data.
6 of 60
Introduction• Possible Solutions:
“One way to resolve the semantic gap comes from sources outside the image by integrating other sources of information about the image in the query. Information about an image can come from a number of different sources: the image content, labels attachedto the image, images embedded in a text, and so on.”
–– Smeulders et al. 2000A popular and state-of-the-art Solution
• Relevance Feedback
2
7 of 60
Introduction• Relevance Feedback (RF)
A powerful tool for attacking the semantic gap issueUsing an interactive mechanism to solicit users’ feedbackCan greatly boost the retrieval performance in CBIRMany existing techniques already…
• Limitations of Traditional Relevance FeedbackThe initial round of feedback is typically unsatisfactoryNeeds many rounds of feedback for satisfactory resultsMaking users bored for too many interactions
8 of 60
Introduction• Motivation
What resources are available for bridging the semantic gap?• Labeled Data (from the user’s online feedback)• Unlabeled Data (image contents in database)• Users’ Log Data (available in a long-term perspective)
How to make use of these resources via different ways?• Short-term Learning vs. Long-Term Learning• Offline Learning vs. Online Learning • Supervised Learning vs. Semi-Supervised Learning
So What techniques can be used for solving effectively?• Statistical Learning Techniques: such as, Support Vector Machines,
Semi-Supervised Learning, Regularization Techniques
Note: online and offline here means WITH or WITHOUT user’s interaction
9 of 60
Introduction• Our proposed Solution: Unified Learning Framework
Semi-Supervised Active Learning (SSAL)• Utilizing the unlabeled data online
Log-based Relevance Feedback (LRF)• Utilizing the users’ log data online
Regularized Distance Metric Learning (RDML)• Utilizing the users’ log data offline
10 of 60
• Scope of Our Work
Introduction
Short-term Perspective(non-log based)
Long-term Perspective(log based)
Offline Mode
Online Mode
TRF: Traditional Relevance FeedbackAIA: Automatic Image AnnotationSSAL: Semi-Supervised Active LearningLRF: Log-based Relevance FeedbackRML: Regularized Metric Learning
TRF LRF
RML
SSAL
Supervised Semi-Supervised
AIA
Our Work
11 of 60
IntroductionCompleted or Ongoing Work• SSAL: Semi-Supervised Active Learning
Techniques: SVM, and Harmonic Functions using Gaussian FieldsPublication
• “A Semi-Supervised Active Learning Framework for Image Retrieval”, to appear in Proc. IEEE CVPR, 2005
• RDML: Regularized Distance Metric LearningA Regularization Scheme via Min/Max Principle using Log DataPublication
• “Collaborative Image Retrieval via Regularized Metric Learning”, Under Review
12 of 60
Introduction• LRF: Log-based Relevance Feedback
Technique I: Soft Label SVM approachPublications:
• “A Novel Log-based Relevance Feedback in Content-based Image Retrieval”, Proc. ACM Multimedia, pp 24-31, Oct, 2004
• “ A Unified Log-based Relevance Feedback Scheme for Image Retrieval”, Under Review
Technique II: Co-training Approach via Coupled SVMPublications:
• “Integrating User Feedback Log into Relevance Feedback for Content-based Image Retrieval”, Invited to IEEE EMMA Workshop in conjunction with IEEE ICDE, April, 2005
• “Coupled Support Vector Machine: A Co-training Approach Toward Log-based Relevance Feedback”, Under review
3
13 of 60
A Unified Learning Framework• Overview
A framework integrates offline and online learning from both short-term and long-term perspectives meanwhile using online labeled and unlabeled data as well as users’ log data.Three main components:
• Semi-Supervised Active Learning (SSAL)• Log-based Relevance Feedback (LRF)• Regularized Distance Metric Learning (RDML)
The architecture of our proposed framework as follows:
14 of 60
A Unified Learning Framework
Log DB
Final Results
Image DB
User Query
DM
Start
End of Task?YES
NO
Log Session
Collect User’s Feedback
SSAL
Feature Extraction
RDML LRF
Retrieval Results
Auto-Image Annotation
Feature Extraction
Distance Measure Relevance Feedback
init
15 of 60
A Unified Learning Framework
Log DB
Final Results
Image DB
User Query
DM
Start
End of Task?YES
NO
Log Session
Collect User’s Feedback
SSAL
Feature Extraction
RDML LRF
Retrieval Results
Auto-Image Annotation
Feature Extraction
Distance Measure Relevance Feedback
init
16 of 60
Semi-Supervised Active Learning• Motivation and Overview
Background and Challenges• Active Learning vs. Passive Learning• Supervised Learning vs. Semi-Supervised Learning• Are unlabeled data informative?• Huge computation cost, prohibited for image retrieval?
A two-stage solution:• Building a SVM classifier
– Selecting informative unlabeled data coarsely • Semi-supervised learning enhanced by SVM results
– Tuning on selected unlabeled data finely
SSAL LRF RDML
17 of 60
• Architecture of SSAL Scheme
SSAL LRF RDML
18 of 60
• Related WorkSupport Vector Machines for Image Retrieval
• A dozen of work, famous like “Support vector machine active learning for image retrieval”, S. Tong and E. Y. Chang, ACM Multimedia 2001 (SAL)
Learning unlabeled data for Image Retrieval• “Bootstrapping SVM active learning by incorporating
unlabelled images for image retrieval”, Lei Wang et al., IEEE CVPR 2003 (TSVM-SAL)
Semi-Supervised Learning Techniques• “Semi-Supervised Learning Using Gaussian Fields and
Harmonic Functions”, Xiaojin Zhu et al., ICML 2003
SSAL LRF RDML
4
19 of 60
Formulation and Risk Analysis• Formulation
Training a SVM Classifier f
Fitting the probability via a Sigmoid Function
SSAL LRF RDML
20 of 60
• FormulationSemi-Supervised Learning via Harmonic Functions
F: Fused Relevance Function with the SVM classifier f
Formulation and Risk Analysis
SSAL LRF RDML
SVM results
21 of 60
• Risk Analysis of Active Learning
Formulation and Risk Analysis
SSAL LRF RDML
22 of 60
Two Risk Components• SVM risk term:
– No efficient way to avoid retraining problem.– A simple approximation can be very effective. – The strategy: choose instances closest to the decision boundary
• HF risk term:– Efficient retraining way is available
Formulation and Risk Analysis
SSAL LRF RDML
23 of 60
A Practical Algorithm
SSAL LRF RDML
24 of 60
• DatasetsImages selected from COREL image CDsTwo ground-truth datasets
• 20-Category: each category contains 100 images, totally 2,000• 50-Category: each category contains 100 images, totally 5,000
• Image RepresentationColor Moment
• 9-dimensionEdge Direction Histogram
• 18-dimension• Canny detector, 18 bins of 20 degrees each
Wavelet-based texture • 9-dimension• Daubechies-4 wavelet, 3-level DWT• Entropies of 9 subimages are generated for the texture feature
Experimental Results
SSAL LRF RDML
5
25 of 60
Experimental Results
SSAL LRF RDML
26 of 60
Experimental Results
SSAL LRF RDML
27 of 60
Experimental Results
SSAL LRF RDML
28 of 60
Experimental Results
SSAL LRF RDML
29 of 60
• Main contributions of this workPropose an efficient scheme using SVM to select most informative unlabeled data for learning in image retrievalSuggest a novel Semi-Supervised Active Learning by fusing harmonic function learning and SVM active learning techniquesImplementing a practical active learning algorithm for image retrieval in which impressive empirical results are shown
Summary
SSAL LRF RDML
30 of 60
Log-based Relevance Feedback• Motivation and Overview
Users’ log data contain semantic information.How to utilize users’ log data to boost retrieval performance?Hypothesize that two images tend to be similar in their content when they have been judged similarly by a large number of usersLog-based Relevance Feedback: combine users’ log data and low-level image contents in relevance feedback learning tasks.
SSAL LRF RDML
6
31 of 60
Related Work• Only a few of work in literature
X. He et al “Learning a semantic space from user’s relevance feedback for image retrieval.” IEEE Tran. CSVT, 13(1):39–48, January 2003.“Only consider positive feedback in the log data”X. He et al “Learning an image manifold for retrieval.”In ACM Multimedia, pp. 17–23, New York, US, 2004.“Using simulation log data”Hoi and Lyu, “A Novel Log-based Relevance Feedback Technique for Content-based Image Retrieval”, In ACM Multimedia, pp. 24-31, New York, US, 2004.
SSAL LRF RDML
32 of 60
Representation of Log Data• Relevance Matrix (R)
RF round / Log session: Nl images are markedElements: relevant (1), irrelevant (-1), unknown (0)
Log Sessions
Image samples
1 -1 1 -1 -1 0 1 -1 -1 11
-1 1 -1 -1 -1 -1 -1 1 -1-10
SSAL LRF RDML
33 of 60
Setting of the LRF Learning• Two data representations in the learning task
Low-level image content:
Users’ feedback log data:
Consider as Multi-Modal Learning Problem• How to solve?
},,,{ 21 NxxxX L=
},,,{ 21 NrrrR L=
SSAL LRF RDML
34 of 60
Coupled SVM for LRF• Motivation
How to attack the learning problem on the two modalities?• Low-level Image content: X• User relevance feedback log: R
Support Vector Machines: superior classification performance
• A Straightforward Solution:Learn an SVM classifier on each modality respectively
• For image content X, we learn an optimal weighting vector w;• For log content R, we learn an optimal weighting vector u;
Combine their results together linearly
SSAL LRF RDML
35 of 60
• A Straightforward SolutionFor the image content modality: wTx
For the user feedback log modality: uTr
Coupled SVM for LRF
SSAL LRF RDML
36 of 60
• Disadvantages of the straightforward solutionLinear combinationModality Consistence
• Our better solution: Coupled SVMLearn the two modalities in a unified formulationEnforce the prediction on the two types of information to be consistent.
Coupled SVM for LRF
SSAL LRF RDML
7
37 of 60
• Formulation: Coupled SVM
Coupled Support Vector Machine
SSAL LRF RDML
38 of 60
• Optimization of Coupled SVMHard to be solved directlyAlternating Optimization (AO)
• AO: two-step optimizationFix Y’, try to find (u, b_u), and (w, b_w)Fix (u, b_u) and (w, b_w), try to find Y’
Coupled Support Vector Machine
SSAL LRF RDML
39 of 60
• Alternating OptimizationFix Y’, the primal optimization is equivalent to solving the two optimization subproblems:
Coupled Support Vector Machine
SSAL LRF RDML
40 of 60
• Alternating Optimization (AO)By introducing non-negative Lagrange multipliers, the above two subproblems can be solved
Coupled Support Vector Machine
SSAL LRF RDML
41 of 60
• Alternating Optimization (AO)After solving (u, b_u) and (w, b_w), fixing them, the optimal Y’ can be found to fit the data as follows:
Coupled Support Vector Machine
SSAL LRF RDML
42 of 60
• Summary of AO procedure1) Beginning with a small value of 2) Performing the two-step AO procedure3) Repeating 2) by increasing until it achieves the setting
threshold
• Comments on the Coupled SVMCan be a general approach for multi-modal learning problemsNeed to investigate the convergence issue of Alternating Optimization Need to study better methods for solving the optimization problemRequire to take some practical considerations when fitting for specific problems.
ρ
ρ
Coupled Support Vector Machine
SSAL LRF RDML
8
43 of 60
• A Practical AlgorithmPractical considerations
• Cannot engage all unlabeled samples due to response requirement for relevance feedback
• Strategy for choosing unlabeled samples– Closest to the decision boundary of SVM: most informative
according to active learning– Closest to the labeled samples: to avoid too much effort in
learning the label information• Introducing a parameter to control the error for label
correction to avoid overlarge change in the labeled set∆
Coupled Support Vector Machine
SSAL LRF RDML
44 of 60
• A Practical Algorithm (cont’d)
Coupled Support Vector Machine
SSAL LRF RDML
45 of 60
• A Practical Algorithm (cont’d)Coupled Support Vector Machine
SSAL LRF RDML
46 of 60
Experimental Results• Dataset
Images selected from COREL image CDsTwo ground-truth datasets
• 20-Category: each category contains 100 images, totally 2,000• 50-Category: each category contains 100 images, totally 5,000
SSAL LRF RDML
47 of 60
Experimental Results (cont’d)• Low-level Image Representation
Color Moment • 9-dimension
Edge Direction Histogram • 18-dimension• Canny detector, 18 bins of 20 degrees each
Wavelet-based texture • 9-dimension• Daubechies-4 wavelet, 3-level DWT• Entropies of 9 subimages are generated for the texture feature
SSAL LRF RDML
48 of 60
Experimental Results (cont’d)• Collection of Users’ Log Data
Log format• A log session (LS) corresponds a relevance feedback round• Each log session contains 20 images labeled by users
Log data• Collected from real-world users• On 20-Category: 150 log sessions• On 50-Category: 150 log sessions• Not simulated log data, contains subjective noise
SSAL LRF RDML
9
49 of 60
Experimental Results (cont’d)• CBIR GUI for collecting feedback data
SSAL LRF RDML
50 of 60
Experimental Results (cont’d)• Performance Evaluation
Measurement Metric• Average Precision = # relevant images / # returned images
Experimental Setting• 200 queries• 20 initially labeled images• SVM: RBF kernel, parameters set via training data
Comparison Schemes• RF-SVM
– traditional relevance feedback by SVM• LRF-2SVM
– log-based relevance feedback by learning two SVMs respectively• LRF-CSVM
– log-based relevance feedback by Coupled SVM
SSAL LRF RDML
51 of 60
Experimental Results (cont’d)• Performance Evaluation: on 20-Category Dataset
20 30 40 50 60 70 80 90 1000.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
Number of Images Returned
Ave
rage
Pre
cisi
on
EuclideanRF−SVMLRF−2SVMsLRF−CSVM
SSAL LRF RDML
52 of 60
Experimental Results (cont’d)• Performance Evaluation: on 50-Category Dataset
20 30 40 50 60 70 80 90 1000.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
Number of Images Returned
Ave
rage
Pre
cisi
on
EuclideanRF−SVMLRF−2SVMsLRF−CSVM
SSAL LRF RDML
53 of 60
Experimental Results (cont’d)
SSAL LRF RDML
54 of 60
Experimental Results (cont’d)
SSAL LRF RDML
10
55 of 60
Experimental Results (cont’d)• Performance over different amount of log data
SSAL LRF RDML
56 of 60
Experimental Results (cont’d)• Performance over different amount of log data
SSAL LRF RDML
57 of 60
Experimental Results (cont’d)• Evaluation of Time Efficiency
SSAL LRF RDML
58 of 60
Summary• A log-based relevance feedback scheme was studied by
integrating user feedback log into the content learning of low-level visual features in content-based image retrieval.
• A co-training approach for multimodal learning, i.e. Coupled Support Vector Machine, was proposed for studying the data with multiple representations.
• A practical algorithm using Coupled SVM was presented to attack the log-based relevance feedback problem in CBIR.
• Experimental results show our proposed scheme is effective for the log-based relevance feedback problem.
SSAL LRF RDML
59 of 60
Discussions• Log-based relevance feedback vs. Traditional
relevance feedbackIt may sometimes perform poorly when a user’s specific concept does not match general semantic concept
• Log-based relevance feedback Using Coupled SVMEfficiency for large applications
• Coupled Support Vector Machine for Multi-modal Learning
Other applications: Web image retrieval, Video retrieval
SSAL LRF RDML
60 of 60
Conclusions• A general learning framework was proposed toward
bridging the semantic gap in content-based image retrieval.• Three key components were studied in our framework
including Semi-Supervised Active Learning, Log-based Relevance Feedback, and Regularized Distance Metric Learning.
• We proposed techniques for solving the SSAL and LRF problems in which extensive empirical results were presented.
• Future work will consider to employ users’ log data to annotate the image automatically in the long-term learning purpose.
SSAL LRF RDML
11
61 of 60
Q&A
62 of 60
*Regularized Distance Metric Learning
• Motivation Our goal is to search for an appropriate distance metric for the low-level features such that the distance in low-level features is consistent with the user relevance judgments in log data.
We propose the “Min/Max” principle, which tries to minimize the distance between similar images and meanwhile maximize the distance between dissimilar images.
SSAL LRF RDML APPENDIX
* This is a joint work with Luo SI from CMU and Rong JIN from MSU.
63 of 60
Related Work• Log-based Relevance Feedback
Refer to LRF section
• Distance Metric Learning
SSAL LRF RDML
64 of 60
Regularized Distance Metric Learning
• OverviewThe basic idea of this work is to learn a desired distance metric in the space of low-level image features that effectively bridges the semantic gap.It is learned from the log data of user relevance feedback based on the Min/Max principle, i.e., minimize/maximize the distance between similar/dissimilar images.
SSAL LRF RDML
65 of 60
Formulation• We first exploit the metric learning algorithm in (3) for log data
This formulism tells us:When two images are judged as relevant in the same log session, they could be similar to each other;When one image is judged as relevant and another is judged as irrelevant in the same log session, thy must be dissimilar to each other.
SSAL LRF RDML
Where Q stands for number of log sessions in the log data.
66 of 60
Formulation• The formulism in (4) may not be robust for noise, we form a
new objective function for distance metric learning that takes into account both the discriminative issue and the robustness issue, formally as:
SSAL LRF RDML
12
67 of 60
Formulation• Using the distance expression in (2), both the second and
the third items of objective function in (5) can be expanded into the following forms:
SSAL LRF RDML
68 of 60
Formulation• Putting Eqn. (5), (7), (8) together, we have the
final formulism for the regularized metric learning:
SSAL LRF RDML
69 of 60
Formulation• To convert the above problem into the standard
form, we introduce a slack variable t that upper bounds the Frobenius norm of matrix A, which leads to an equivalent form of (9), i.e.,
The first constraint is called a second order cone constraintThe second constraint is a positive semi-definite constraint.A special form of Convex optimization problems!exist very efficient solutions to solve it in a polynomial time
70 of 60
Experimental Results• Datasets
20-Category50-Category
• Image Representation9-dimensional Color Histogram18-dimensional Edge Histogram9-dimension texture
SSAL LRF RDML
71 of 60
Experimental Results• Collection of Users’ Log Data
SSAL LRF RDML
72 of 60
Experimental Results• Compared Schemes:
1) A baseline CBIR system that uses the Euclidean distance metric and does not utilize users’ log data. We refer to this algorithm as “Euclidean”.2) A CBIR system that uses the semantic representation learned from the manifold learning algorithm in [8]. We refer to this algorithm as “IML”.3) A CBIR system that uses the distance metric learned by the algorithm in [34]. We refer to this algorithm as “DML”.4) A CBIR system that uses the distance metric learned by the proposed regularized metric learning algorithm. We refer to thisalgorithm as “RDML”.
SSAL LRF RDML
13
73 of 60
Experimental Results
SSAL LRF RDML
74 of 60
Experimental Results
SSAL LRF RDML
75 of 60
SSAL LRF RDML
76 of 60
SSAL LRF RDML
77 of 60
• Time Efficiency
SSAL LRF RDML
78 of 60
Summary• This paper proposes a novel algorithm for distance metric
learning, which boosts the retrieval accuracy of CBIR by taking advantage of the log data of users’ relevance judgments.
• A regularization mechanism is used in the proposed algorithm to improve the robustness of solutions, when the log data is small and noisy.
• It is formulated as a positive semi-definite programming problem, which can be solved efficiently.
• Experiment results have shown that the proposed algorithm for regularized distance metric learning substantially improves the retrieval accuracy of the baseline CBIR system.
SSAL LRF RDML
14
79 of 60
References• Chu-Hong Hoi, Michael R. Lyu, and Rong Jin, “Integrating User Feedback Log
into Relevance Feedback via Coupled SVM for Content-based Image Retrieval”, IEEE EMMA Workshop, April, 2005
• Chu-Hong Hoi and Michael R. Lyu, A Novel Log-based Relevance Feedback Technique in Content-based Image Retrieval, in Proc. ACM Multimedia, New York, USA, 10-16 October, pp. 24-31, 2004
• Chu-Hong Hoi and Michael R. Lyu, Web Image Learning for Searching Semantic Concepts in Image Databases, in Poster Proceedings of the 13th International World Wide Web Conference (WWW2004), New York, USA, 17-22 May, 2004
• Chu-Hong Hoi and Michael R. Lyu, Group-based Relevance Feedback with Support Vector Machine Ensembles, Proceedings of the 17th International Conference on Pattern Recognition (ICPR 2004), Cambridge, UK, 23-26 August, vol. 3, pp. 874-877, 2004
• Chu-Hong Hoi, et al. Biased Support Vector Machine for Relevance Feedback in Image Retrieval, Proceedings of International Joint Conference on Neural Networks (IJCNN2004), Budapest, Hungary , 25-29 July, pp. 3189-3194,2004
80 of 60
References (Cont.)
• Steven Chu-Hong Hoi, Michael R. Lyu, Rong Jin, A Unified Log-based Relevance Feedback Scheme for Image Retrieval, (Extended from ACM Multimedia 2004) Technical Report, CUHK, March, 2005
• S. Tong and E. Chang. Support vector machine active learning for image retrieval. In Proc. ACM Multimedia, pages 107--118, 2001.
• A.W.M. Smeulders, etc al. “Content-Based Image Retrieval at the End of the Early Years”, IEEE PAMI, 22(12), pp. 1349-1380, 2000
• “Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions”, Xiaojin Zhu et al., ICML 2003
• “Bootstrapping SVM active learning by incorporating unlabelled images for image retrieval”, Lei Wang et al., IEEE CVPR 2003 (TSVM-SAL)