Ph.D. Qualify Presentation, CSE Dept, CUHK Outlinelyu/student/phd/steven/hoi_ppt_2005.pdf · 2007-01-11 · 1 Learning for Bridging the Semantic Gap in Image Retrieval Steven Chu-Hong

1

Learning for Bridging the Semantic Gap in Image Retrieval

Steven Chu-Hong HOI

RM 1021

11:15a.m. – 12:3028-April, 2005

Ph.D. Qualify Presentation, CSE Dept, CUHK

Supervisor: Prof. Michael R. LYU

Committee: Prof. Tien-Tsin WONGProf. Leo Jiaya JIA

2 of 60

Outline• Introduction

Image Retrieval: TBIR, CBIRThe Semantic Gap: Relevance FeedbackMotivationScope of Our Work

• A Unified Learning FrameworkOverviewSemi-Supervised Active LearningLog-based Relevance FeedbackRegularized Distance Metric LearningDiscussions

• Conclusions

3 of 60

Introduction• Image Retrieval

Importance• Imperative due to explosive growth of image and video data• Far from mature compared with text information retrieval

Applications:• Visual information management in various disciplines, such as,

digital media library, medical image retrieval, photo album management, Web image searching, etc.

Text-based vs. Content-based • TBIR: attractive, needs automatic Annotations, but not yet ready

Low Accuracy, e.g. ~ [20% - 30%] on 263 words, 5k images• CBIR: more popular and feasible, query using low-level features

4 of 60

Introduction• Content-based Image Retrieval (CBIR)

Receiving considerable amount of research effortsUsing query-by-example (QBE), low-level visual features, such as color, edge, and textureChallenge: The semantic gap between extracted low-level features and high-level human concepts

QBE

5 of 60

Introduction• What is “the semantic gap”?

“The semantic gap is the lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given situation.” –– Smeulders et al. 2000

The semantic gap is the difference between thehuman's perception and machine’s understanding of the visual data.

6 of 60

Introduction• Possible Solutions:

“One way to resolve the semantic gap comes from sources outside the image by integrating other sources of information about the image in the query. Information about an image can come from a number of different sources: the image content, labels attachedto the image, images embedded in a text, and so on.”

–– Smeulders et al. 2000A popular and state-of-the-art Solution

• Relevance Feedback

2

7 of 60

Introduction• Relevance Feedback (RF)

A powerful tool for attacking the semantic gap issueUsing an interactive mechanism to solicit users’ feedbackCan greatly boost the retrieval performance in CBIRMany existing techniques already…

• Limitations of Traditional Relevance FeedbackThe initial round of feedback is typically unsatisfactoryNeeds many rounds of feedback for satisfactory resultsMaking users bored for too many interactions

8 of 60

Introduction• Motivation

What resources are available for bridging the semantic gap?• Labeled Data (from the user’s online feedback)• Unlabeled Data (image contents in database)• Users’ Log Data (available in a long-term perspective)

How to make use of these resources via different ways?• Short-term Learning vs. Long-Term Learning• Offline Learning vs. Online Learning • Supervised Learning vs. Semi-Supervised Learning

So What techniques can be used for solving effectively?• Statistical Learning Techniques: such as, Support Vector Machines,

Semi-Supervised Learning, Regularization Techniques

Note: online and offline here means WITH or WITHOUT user’s interaction

9 of 60

Introduction• Our proposed Solution: Unified Learning Framework

Semi-Supervised Active Learning (SSAL)• Utilizing the unlabeled data online

Log-based Relevance Feedback (LRF)• Utilizing the users’ log data online

Regularized Distance Metric Learning (RDML)• Utilizing the users’ log data offline

10 of 60

• Scope of Our Work

Introduction

Short-term Perspective(non-log based)

Long-term Perspective(log based)

Offline Mode

Online Mode

TRF: Traditional Relevance FeedbackAIA: Automatic Image AnnotationSSAL: Semi-Supervised Active LearningLRF: Log-based Relevance FeedbackRML: Regularized Metric Learning

TRF LRF

RML

SSAL

Supervised Semi-Supervised

AIA

Our Work

11 of 60

IntroductionCompleted or Ongoing Work• SSAL: Semi-Supervised Active Learning

Techniques: SVM, and Harmonic Functions using Gaussian FieldsPublication

• “A Semi-Supervised Active Learning Framework for Image Retrieval”, to appear in Proc. IEEE CVPR, 2005

• RDML: Regularized Distance Metric LearningA Regularization Scheme via Min/Max Principle using Log DataPublication

• “Collaborative Image Retrieval via Regularized Metric Learning”, Under Review

12 of 60

Introduction• LRF: Log-based Relevance Feedback

Technique I: Soft Label SVM approachPublications:

• “A Novel Log-based Relevance Feedback in Content-based Image Retrieval”, Proc. ACM Multimedia, pp 24-31, Oct, 2004

• “ A Unified Log-based Relevance Feedback Scheme for Image Retrieval”, Under Review

Technique II: Co-training Approach via Coupled SVMPublications:

• “Integrating User Feedback Log into Relevance Feedback for Content-based Image Retrieval”, Invited to IEEE EMMA Workshop in conjunction with IEEE ICDE, April, 2005

• “Coupled Support Vector Machine: A Co-training Approach Toward Log-based Relevance Feedback”, Under review

3

13 of 60

A Unified Learning Framework• Overview

A framework integrates offline and online learning from both short-term and long-term perspectives meanwhile using online labeled and unlabeled data as well as users’ log data.Three main components:

• Semi-Supervised Active Learning (SSAL)• Log-based Relevance Feedback (LRF)• Regularized Distance Metric Learning (RDML)

The architecture of our proposed framework as follows:

14 of 60

A Unified Learning Framework

Log DB

Final Results

Image DB

User Query

DM

Start

End of Task?YES

NO

Log Session

Collect User’s Feedback

SSAL

Feature Extraction

RDML LRF

Retrieval Results

Auto-Image Annotation

Feature Extraction

Distance Measure Relevance Feedback

init

15 of 60

A Unified Learning Framework

Log DB

Final Results

Image DB

User Query

DM

Start

End of Task?YES

NO

Log Session

Collect User’s Feedback

SSAL

Feature Extraction

RDML LRF

Retrieval Results

Auto-Image Annotation

Feature Extraction

Distance Measure Relevance Feedback

init

16 of 60

Semi-Supervised Active Learning• Motivation and Overview

Background and Challenges• Active Learning vs. Passive Learning• Supervised Learning vs. Semi-Supervised Learning• Are unlabeled data informative?• Huge computation cost, prohibited for image retrieval?

A two-stage solution:• Building a SVM classifier

– Selecting informative unlabeled data coarsely • Semi-supervised learning enhanced by SVM results

– Tuning on selected unlabeled data finely

SSAL LRF RDML

17 of 60

• Architecture of SSAL Scheme

SSAL LRF RDML

18 of 60

• Related WorkSupport Vector Machines for Image Retrieval

• A dozen of work, famous like “Support vector machine active learning for image retrieval”, S. Tong and E. Y. Chang, ACM Multimedia 2001 (SAL)

Learning unlabeled data for Image Retrieval• “Bootstrapping SVM active learning by incorporating

unlabelled images for image retrieval”, Lei Wang et al., IEEE CVPR 2003 (TSVM-SAL)

Semi-Supervised Learning Techniques• “Semi-Supervised Learning Using Gaussian Fields and

Harmonic Functions”, Xiaojin Zhu et al., ICML 2003

SSAL LRF RDML

4

19 of 60

Formulation and Risk Analysis• Formulation

Training a SVM Classifier f

Fitting the probability via a Sigmoid Function

SSAL LRF RDML

20 of 60

• FormulationSemi-Supervised Learning via Harmonic Functions

F: Fused Relevance Function with the SVM classifier f

Formulation and Risk Analysis

SSAL LRF RDML

SVM results

21 of 60

• Risk Analysis of Active Learning


SSAL LRF RDML

22 of 60

Two Risk Components• SVM risk term:

– No efficient way to avoid retraining problem.– A simple approximation can be very effective. – The strategy: choose instances closest to the decision boundary

• HF risk term:– Efficient retraining way is available


SSAL LRF RDML

23 of 60

A Practical Algorithm

SSAL LRF RDML

24 of 60

• DatasetsImages selected from COREL image CDsTwo ground-truth datasets

• 20-Category: each category contains 100 images, totally 2,000• 50-Category: each category contains 100 images, totally 5,000

• Image RepresentationColor Moment

• 9-dimensionEdge Direction Histogram

• 18-dimension• Canny detector, 18 bins of 20 degrees each

Wavelet-based texture • 9-dimension• Daubechies-4 wavelet, 3-level DWT• Entropies of 9 subimages are generated for the texture feature

Experimental Results

SSAL LRF RDML

5

25 of 60


SSAL LRF RDML

26 of 60


SSAL LRF RDML

27 of 60


SSAL LRF RDML

28 of 60


SSAL LRF RDML

29 of 60

• Main contributions of this workPropose an efficient scheme using SVM to select most informative unlabeled data for learning in image retrievalSuggest a novel Semi-Supervised Active Learning by fusing harmonic function learning and SVM active learning techniquesImplementing a practical active learning algorithm for image retrieval in which impressive empirical results are shown

Summary

SSAL LRF RDML

30 of 60

Log-based Relevance Feedback• Motivation and Overview

Users’ log data contain semantic information.How to utilize users’ log data to boost retrieval performance?Hypothesize that two images tend to be similar in their content when they have been judged similarly by a large number of usersLog-based Relevance Feedback: combine users’ log data and low-level image contents in relevance feedback learning tasks.

SSAL LRF RDML

6

31 of 60

Related Work• Only a few of work in literature

X. He et al “Learning a semantic space from user’s relevance feedback for image retrieval.” IEEE Tran. CSVT, 13(1):39–48, January 2003.“Only consider positive feedback in the log data”X. He et al “Learning an image manifold for retrieval.”In ACM Multimedia, pp. 17–23, New York, US, 2004.“Using simulation log data”Hoi and Lyu, “A Novel Log-based Relevance Feedback Technique for Content-based Image Retrieval”, In ACM Multimedia, pp. 24-31, New York, US, 2004.

SSAL LRF RDML

32 of 60

Representation of Log Data• Relevance Matrix (R)

RF round / Log session: Nl images are markedElements: relevant (1), irrelevant (-1), unknown (0)

Log Sessions

Image samples

1 -1 1 -1 -1 0 1 -1 -1 11

-1 1 -1 -1 -1 -1 -1 1 -1-10

SSAL LRF RDML

33 of 60

Setting of the LRF Learning• Two data representations in the learning task

Low-level image content:

Users’ feedback log data:

Consider as Multi-Modal Learning Problem• How to solve?

},,,{ 21 NxxxX L=

},,,{ 21 NrrrR L=

SSAL LRF RDML

34 of 60

Coupled SVM for LRF• Motivation

How to attack the learning problem on the two modalities?• Low-level Image content: X• User relevance feedback log: R

Support Vector Machines: superior classification performance

• A Straightforward Solution:Learn an SVM classifier on each modality respectively

• For image content X, we learn an optimal weighting vector w;• For log content R, we learn an optimal weighting vector u;

Combine their results together linearly

SSAL LRF RDML

35 of 60

• A Straightforward SolutionFor the image content modality: wTx

For the user feedback log modality: uTr

Coupled SVM for LRF

SSAL LRF RDML

36 of 60

• Disadvantages of the straightforward solutionLinear combinationModality Consistence

• Our better solution: Coupled SVMLearn the two modalities in a unified formulationEnforce the prediction on the two types of information to be consistent.

Coupled SVM for LRF

SSAL LRF RDML

7

37 of 60

• Formulation: Coupled SVM

Coupled Support Vector Machine

SSAL LRF RDML

38 of 60

• Optimization of Coupled SVMHard to be solved directlyAlternating Optimization (AO)

• AO: two-step optimizationFix Y’, try to find (u, b_u), and (w, b_w)Fix (u, b_u) and (w, b_w), try to find Y’


SSAL LRF RDML

39 of 60

• Alternating OptimizationFix Y’, the primal optimization is equivalent to solving the two optimization subproblems:


SSAL LRF RDML

40 of 60

• Alternating Optimization (AO)By introducing non-negative Lagrange multipliers, the above two subproblems can be solved


SSAL LRF RDML

41 of 60

• Alternating Optimization (AO)After solving (u, b_u) and (w, b_w), fixing them, the optimal Y’ can be found to fit the data as follows:


SSAL LRF RDML

42 of 60

• Summary of AO procedure1) Beginning with a small value of 2) Performing the two-step AO procedure3) Repeating 2) by increasing until it achieves the setting

threshold

• Comments on the Coupled SVMCan be a general approach for multi-modal learning problemsNeed to investigate the convergence issue of Alternating Optimization Need to study better methods for solving the optimization problemRequire to take some practical considerations when fitting for specific problems.

ρ

ρ


SSAL LRF RDML

8

43 of 60

• A Practical AlgorithmPractical considerations

• Cannot engage all unlabeled samples due to response requirement for relevance feedback

• Strategy for choosing unlabeled samples– Closest to the decision boundary of SVM: most informative

according to active learning– Closest to the labeled samples: to avoid too much effort in

learning the label information• Introducing a parameter to control the error for label

correction to avoid overlarge change in the labeled set∆


SSAL LRF RDML

44 of 60

• A Practical Algorithm (cont’d)


SSAL LRF RDML

45 of 60

• A Practical Algorithm (cont’d)Coupled Support Vector Machine

SSAL LRF RDML

46 of 60

Experimental Results• Dataset

Images selected from COREL image CDsTwo ground-truth datasets

• 20-Category: each category contains 100 images, totally 2,000• 50-Category: each category contains 100 images, totally 5,000

SSAL LRF RDML

47 of 60

Experimental Results (cont’d)• Low-level Image Representation

Color Moment • 9-dimension

Edge Direction Histogram • 18-dimension• Canny detector, 18 bins of 20 degrees each

Wavelet-based texture • 9-dimension• Daubechies-4 wavelet, 3-level DWT• Entropies of 9 subimages are generated for the texture feature

SSAL LRF RDML

48 of 60

Experimental Results (cont’d)• Collection of Users’ Log Data

Log format• A log session (LS) corresponds a relevance feedback round• Each log session contains 20 images labeled by users

Log data• Collected from real-world users• On 20-Category: 150 log sessions• On 50-Category: 150 log sessions• Not simulated log data, contains subjective noise

SSAL LRF RDML

9

49 of 60

Experimental Results (cont’d)• CBIR GUI for collecting feedback data

SSAL LRF RDML

50 of 60

Experimental Results (cont’d)• Performance Evaluation

Measurement Metric• Average Precision = # relevant images / # returned images

Experimental Setting• 200 queries• 20 initially labeled images• SVM: RBF kernel, parameters set via training data

Comparison Schemes• RF-SVM

– traditional relevance feedback by SVM• LRF-2SVM

– log-based relevance feedback by learning two SVMs respectively• LRF-CSVM

– log-based relevance feedback by Coupled SVM

SSAL LRF RDML

51 of 60

Experimental Results (cont’d)• Performance Evaluation: on 20-Category Dataset

20 30 40 50 60 70 80 90 1000.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

Number of Images Returned

Ave

rage

Pre

cisi

on

EuclideanRF−SVMLRF−2SVMsLRF−CSVM

SSAL LRF RDML

52 of 60

Experimental Results (cont’d)• Performance Evaluation: on 50-Category Dataset

20 30 40 50 60 70 80 90 1000.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

Number of Images Returned

Ave

rage

Pre

cisi

on

EuclideanRF−SVMLRF−2SVMsLRF−CSVM

SSAL LRF RDML

53 of 60

Experimental Results (cont’d)

SSAL LRF RDML

54 of 60

Experimental Results (cont’d)

SSAL LRF RDML

10

55 of 60

Experimental Results (cont’d)• Performance over different amount of log data

SSAL LRF RDML

56 of 60

Experimental Results (cont’d)• Performance over different amount of log data

SSAL LRF RDML

57 of 60

Experimental Results (cont’d)• Evaluation of Time Efficiency

SSAL LRF RDML

58 of 60

Summary• A log-based relevance feedback scheme was studied by

integrating user feedback log into the content learning of low-level visual features in content-based image retrieval.

• A co-training approach for multimodal learning, i.e. Coupled Support Vector Machine, was proposed for studying the data with multiple representations.

• A practical algorithm using Coupled SVM was presented to attack the log-based relevance feedback problem in CBIR.

• Experimental results show our proposed scheme is effective for the log-based relevance feedback problem.

SSAL LRF RDML

59 of 60

Discussions• Log-based relevance feedback vs. Traditional

relevance feedbackIt may sometimes perform poorly when a user’s specific concept does not match general semantic concept

• Log-based relevance feedback Using Coupled SVMEfficiency for large applications

• Coupled Support Vector Machine for Multi-modal Learning

Other applications: Web image retrieval, Video retrieval

SSAL LRF RDML

60 of 60

Conclusions• A general learning framework was proposed toward

bridging the semantic gap in content-based image retrieval.• Three key components were studied in our framework

including Semi-Supervised Active Learning, Log-based Relevance Feedback, and Regularized Distance Metric Learning.

• We proposed techniques for solving the SSAL and LRF problems in which extensive empirical results were presented.

• Future work will consider to employ users’ log data to annotate the image automatically in the long-term learning purpose.

SSAL LRF RDML

11

61 of 60

Q&A

62 of 60

*Regularized Distance Metric Learning

• Motivation Our goal is to search for an appropriate distance metric for the low-level features such that the distance in low-level features is consistent with the user relevance judgments in log data.

We propose the “Min/Max” principle, which tries to minimize the distance between similar images and meanwhile maximize the distance between dissimilar images.

SSAL LRF RDML APPENDIX

* This is a joint work with Luo SI from CMU and Rong JIN from MSU.

63 of 60

Related Work• Log-based Relevance Feedback

Refer to LRF section

• Distance Metric Learning

SSAL LRF RDML

64 of 60

Regularized Distance Metric Learning

• OverviewThe basic idea of this work is to learn a desired distance metric in the space of low-level image features that effectively bridges the semantic gap.It is learned from the log data of user relevance feedback based on the Min/Max principle, i.e., minimize/maximize the distance between similar/dissimilar images.

SSAL LRF RDML

65 of 60

Formulation• We first exploit the metric learning algorithm in (3) for log data

This formulism tells us:When two images are judged as relevant in the same log session, they could be similar to each other;When one image is judged as relevant and another is judged as irrelevant in the same log session, thy must be dissimilar to each other.

SSAL LRF RDML

Where Q stands for number of log sessions in the log data.

66 of 60

Formulation• The formulism in (4) may not be robust for noise, we form a

new objective function for distance metric learning that takes into account both the discriminative issue and the robustness issue, formally as:

SSAL LRF RDML

12

67 of 60

Formulation• Using the distance expression in (2), both the second and

the third items of objective function in (5) can be expanded into the following forms:

SSAL LRF RDML

68 of 60

Formulation• Putting Eqn. (5), (7), (8) together, we have the

final formulism for the regularized metric learning:

SSAL LRF RDML

69 of 60

Formulation• To convert the above problem into the standard

form, we introduce a slack variable t that upper bounds the Frobenius norm of matrix A, which leads to an equivalent form of (9), i.e.,

The first constraint is called a second order cone constraintThe second constraint is a positive semi-definite constraint.A special form of Convex optimization problems!exist very efficient solutions to solve it in a polynomial time

70 of 60

Experimental Results• Datasets

20-Category50-Category

• Image Representation9-dimensional Color Histogram18-dimensional Edge Histogram9-dimension texture

SSAL LRF RDML

71 of 60

Experimental Results• Collection of Users’ Log Data

SSAL LRF RDML

72 of 60

Experimental Results• Compared Schemes:

1) A baseline CBIR system that uses the Euclidean distance metric and does not utilize users’ log data. We refer to this algorithm as “Euclidean”.2) A CBIR system that uses the semantic representation learned from the manifold learning algorithm in [8]. We refer to this algorithm as “IML”.3) A CBIR system that uses the distance metric learned by the algorithm in [34]. We refer to this algorithm as “DML”.4) A CBIR system that uses the distance metric learned by the proposed regularized metric learning algorithm. We refer to thisalgorithm as “RDML”.

SSAL LRF RDML

13

73 of 60


SSAL LRF RDML

74 of 60


SSAL LRF RDML

75 of 60

SSAL LRF RDML

76 of 60

SSAL LRF RDML

77 of 60

• Time Efficiency

SSAL LRF RDML

78 of 60

Summary• This paper proposes a novel algorithm for distance metric

learning, which boosts the retrieval accuracy of CBIR by taking advantage of the log data of users’ relevance judgments.

• A regularization mechanism is used in the proposed algorithm to improve the robustness of solutions, when the log data is small and noisy.

• It is formulated as a positive semi-definite programming problem, which can be solved efficiently.

• Experiment results have shown that the proposed algorithm for regularized distance metric learning substantially improves the retrieval accuracy of the baseline CBIR system.

SSAL LRF RDML

14

79 of 60

References• Chu-Hong Hoi, Michael R. Lyu, and Rong Jin, “Integrating User Feedback Log

into Relevance Feedback via Coupled SVM for Content-based Image Retrieval”, IEEE EMMA Workshop, April, 2005

• Chu-Hong Hoi and Michael R. Lyu, A Novel Log-based Relevance Feedback Technique in Content-based Image Retrieval, in Proc. ACM Multimedia, New York, USA, 10-16 October, pp. 24-31, 2004

• Chu-Hong Hoi and Michael R. Lyu, Web Image Learning for Searching Semantic Concepts in Image Databases, in Poster Proceedings of the 13th International World Wide Web Conference (WWW2004), New York, USA, 17-22 May, 2004

• Chu-Hong Hoi and Michael R. Lyu, Group-based Relevance Feedback with Support Vector Machine Ensembles, Proceedings of the 17th International Conference on Pattern Recognition (ICPR 2004), Cambridge, UK, 23-26 August, vol. 3, pp. 874-877, 2004

• Chu-Hong Hoi, et al. Biased Support Vector Machine for Relevance Feedback in Image Retrieval, Proceedings of International Joint Conference on Neural Networks (IJCNN2004), Budapest, Hungary , 25-29 July, pp. 3189-3194,2004

80 of 60

References (Cont.)

• Steven Chu-Hong Hoi, Michael R. Lyu, Rong Jin, A Unified Log-based Relevance Feedback Scheme for Image Retrieval, (Extended from ACM Multimedia 2004) Technical Report, CUHK, March, 2005

• S. Tong and E. Chang. Support vector machine active learning for image retrieval. In Proc. ACM Multimedia, pages 107--118, 2001.

• A.W.M. Smeulders, etc al. “Content-Based Image Retrieval at the End of the Early Years”, IEEE PAMI, 22(12), pp. 1349-1380, 2000

• “Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions”, Xiaojin Zhu et al., ICML 2003

• “Bootstrapping SVM active learning by incorporating unlabelled images for image retrieval”, Lei Wang et al., IEEE CVPR 2003 (TSVM-SAL)

Documents

Ph.D. Qualify Presentation, CSE Dept, CUHK Outlinelyu/student/phd/steven/hoi_ppt_2005.pdf · 2007-01-11 · 1 Learning for Bridging the Semantic Gap in Image Retrieval Steven Chu-Hong