Upload
kaleb
View
40
Download
0
Embed Size (px)
DESCRIPTION
A System for Large-scale, Content-based Web Image Retrieval - and the Semantics within. Till Quack. Task. Create a content-based image retrieval system for the WWW Large-scale, one order of magnitude larger than existing systems. Means O(10 6 ) items Relevance Feedback - PowerPoint PPT Presentation
Citation preview
Thursday, May 27, 2004
A System for Large-scale, Content-based
Web Image Retrieval- and the Semantics within
Till Quack
Thursday, May 27, 2004
Create a content-based image retrieval system for the WWW Large-scale, one order of magnitude larger than existing
systems. Means O(106) items Relevance Feedback
Explore and exploit the semantics within
Take large-scale, content-based image retrieval one step closer to commercial applications
Task
Thursday, May 27, 2004
Outline Content-based Image Retrieval on the WWW PART I: A System for Image Retrieval on the WWW
Features Retrieval Relevance Feedback Software Design
PART II: The Semantics within Identifying a Method to find Semantics Data Mining for Semantic Clues Frequent Itemset Mining and Association Rules The Visual Link
Discussion & Demonstration Conclusions & Outlook
Thursday, May 27, 2004
Content-based Image Retrieval on the WWW Characteristics of the data repository
Size: 4.2 billion documents in Google’s index Diversity: Documents in any context, language Control: Anybody can publish anything Dynamics: Ever changing
System Requirements FAST SCALABLE Make use of all the information
available Motivation for a new system
Existing systems • Either pure text (Google)• Or pure content-based
Large-Scale
Thursday, May 27, 2004
PART I: A System for Large-scale, Content-based Image Retrieval on the WWW
Ullrich MoenichTill QuackLars Thiele
Thursday, May 27, 2004
System Overview
Keywords
VisualFeatures
Image SpiderWorld Wide Web
DMOZData
KeywordExtraction
FeatureExtraction
Image DescriptionImages
(Binaries)
KeywordIndexing
Clustering
Keyword Request
Nearest Neighbor Search
Matching Images
User picksrelevantimages
Matching Images
Inverted Indexkeyid | imageid
RetrievalOffline Cluster nCluster 2Cluster 1
Cluster nCluster 2Cluster 1
Cluster nCluster 2Cluster 1
Cluster nCluster 2Cluster 1
mySQL
Thursday, May 27, 2004
Visual Features describe the Images Global Features from MPEG-7 Standard
Currently no Segmentation • Reasons: Scalability and the diversity of the data
Texture Features Edge Histogram Descriptor (EHD)
• Histogram of quantified edge directions. 80 dimensions Homogeneous Texture Descriptor (HTD)
• Output of Gabor filter-bank. 62 dimensions. Color Features
Scalable Color Descriptor (SCD)• Color Histogram. 256, 128, 64 or 32 dimensions
Dominant Color Descriptor (DCD)• Up to 8 dominant colors (3d color-space) and their percentages
– 32 “dimensions”
• “Bins” defined for each image
Thursday, May 27, 2004
Collateral Text as an additional Feature
ALT Tag and Collateral Text around images VERY uncontrolled annotation
Stemming: Porter Stemmer Example: training -> train More matching terms for boolean queries But also some new ambiguities
• train: to train [verb] / the train [noun]
Thursday, May 27, 2004
Retrieval in 2 Steps
Keywords
VisualFeatures
Image SpiderWorld Wide Web
DMOZData
KeywordExtraction
FeatureExtraction
Image DescriptionImages
(Binaries)
KeywordIndexing
Clustering
Keyword Request
Nearest Neighbor Search
Matching Images
User picksrelevantimages
Matching Images
Inverted Indexkeyid | imageid
RetrievalOffline Cluster nCluster 2Cluster 1
Cluster nCluster 2Cluster 1
Cluster nCluster 2Cluster 1
Cluster nCluster 2Cluster 1
mySQL
1. Text Retrieval
2. Visual Nearest Neighbor Search
Thursday, May 27, 2004
Retrieval: Text Options
Boolean query on inverted index
Vector Space Model LSI etc.
Choice Ranked boolean queries on
inverted index Ranking: tf*idf
Reasons Speed Sparsity of data:
• 600 000 Keywords in total
• 1 document: 10-50 words
Keyword ImageId tf
shoe 1233 1
sport 1233 1
red 1233 1
banana 1234 1
fruit 1234 2
Order 1234 1
Keyid ImageId tf
124 1233 1
341 1233 1
345 1233 1
445 1234 1
75 1234 2
875 1234 1
Thursday, May 27, 2004
Retrieval – Visual Features (MPEG-7) K-Nearest Neighbor search (K-NN)
Find K closest candidates ci to query image q in a vector space
Distance: Minkowsky Metrics for distance d(ci,q) namely L1 and L2 norms
Most MPEG-7 descriptors are high-dimensional vectors The “dimensionality curse” applies High dimensional spaces behave
“weirdly” In particular the distances are not too
meaningful
Thursday, May 27, 2004
Retrieval – Challenges for Visual Features
We have several (visual) feature types How can we combine them?
Our database is very large.How can we search it fast enough? i.e. how can we avoid comparing the query vector with each
database entry?
Thursday, May 27, 2004
A Combined Distance for the MPEG-7 Features
We use a combined distance of all the visual feature types The individual distances occupy different ranges in different
distributions The distributions were transformed to a normal distribution in
the range [0,1] The distances are then combined linearly
Thursday, May 27, 2004
Clustering speeds up the search Problem
Millions of items in DB Linear search over the whole dataset
too slow Looking only for the K nearest
neighbors anyway (One) Solution
Partition the data into Clusters, identified by representative, the centroid
Only search the cluster whose centroid is closest to query q
K-Means clustering algorithm Not the best, in particular in HD spaces But fast!
Problem with Clustering: Query at the border of a cell does not
find all the nearest neighbors Simple Solution:
Overlapping Clusters Problem: Scalability
• Original data 7GB• Overlapping data: 50 GB
Imageid Primary Descriptor
Secondary Descriptor 1
Secondary Descriptor 2
Secondary Descriptor 3
122 ehd htd scd dcd
45233 ehd htd scd dcd
6688 ehd htd scd dcd
Thursday, May 27, 2004
Relevance Feedback Improves the Results Relevance feedback: User input to improve search results -
iteration by iteration i.e. the user selects „good matches“ We obtain the following information:
1. A new query vector which is a combination of the relevant images = Query Vector Movement
2. The ratios for the combination of the feature types
Thursday, May 27, 2004
Relevance Feedback: Query Vector Movement Construct the query vector qn of images selected in
iteration n
Vector component kFeature type f (EHD,SCD,HTD)i=1...M relevant images
The final, new query vector is
q = 0.75 *qn + 0.25 *qn-1
i.e. move from the old query vector towards the new vector
Thursday, May 27, 2004
Relevance Feedback: Weight Adapation Which feature is most important for the given query?
The one for which all the relevant images are closest
Determine the ratios for the combination based on the average distance, e.g. for the EHD
and set
Thursday, May 27, 2004
Implementation – Software and Hardware Languages: C++ and Perl
Inline::CPP to connect Layers
WWW: Apache and CGI Relational DB: mySQL Operating System: OS X Hardware
Dual 2 GHZ Apple G5, 2GB RAM
Teran Terrabyte Disk Array
Thursday, May 27, 2004
Part II: The Semantics Within
Thursday, May 27, 2004
Semantics: Combining Text and Visual Features Our dataset is multi-modal
Keywords and several visual features
Not only valid for WWW data• Video: image+speech, • Bio-imagery:
image+microscope setting, cell coloring fluid
Goal: Try to jointly use the different modes Do semantic relations between
the modes exist? Learn something about these
semantic relations Improve the retrieval precision
based on them Challenges in our project:
Large-scale Noisy and uncontrolled data Only global visual features
Thursday, May 27, 2004
Identifying a Method to find the Semantics Related work
Latent Semantic Indexing (LSI) [Westerveld 2000]• – problem O(N2m3), N=Documents+Terms, m=concept space
Statistical models [Barnard, Forsyth 2001-2004]• Problem O: “several hours for several thousand images”• Problem: It is a (rather strict, hierarchical) model
Others• Neural networks (SOM etc.)• Hidden Markov Models
Often: Classification We don’t know our classes, or: there are just too many We can’t train them either (data too diverse and noisy)
Most of the methods above only tested on relatively small, supervised datasets
There is one more option …
Thursday, May 27, 2004
Method: Data Mining for Semantic Clues Mine the data for patterns
Find them only where they exist Deduce Rules from them Scalable methods available
Frequent Itemset Mining and Association Rules Classic Application: Market baskets, Census data … Some works on Multimedia data
• [Zaïane 98]: Datacubes with appended keywords• [Tešić et al. 03]: Perceptual associations (texture) within
images
Thursday, May 27, 2004
Frequent Itemsets and Association Rules Itemset I
Transaction T
Database D Support of Itemset A
A is called frequent if
Rule
Support of a Rule Statistical significance
Confidence of a Rule Strength of implication Maximum likelihood estimate that B
is true given that A is true
Thursday, May 27, 2004
Example & Advantages Example: Market Baskets
Rule {Diaper,Milk}{Beer}
Advantages Human readable Can be edited Fast Algorithms available
Note: Associations are not correlations The same concept, just simpler Associations and correlations:
[Brin, Motwani, Silverstein 98]
TID Items
1 Bread, Milk
2 Beer, Diaper, Bread, Eggs
3 Beer, Coke, Diaper, Milk
4 Beer, Bread, Diaper, Milk
5 Coke, Bread, Diaper, Milk
Thursday, May 27, 2004
Using FIMI to find the itemsets Frequent Itemset Mining (FIMI)
Find frequent itemsets with support > minsupp Minimal support minsupp given by “an expert”
First Algorithm: APriori [Agrawal et al. 93] Basic Idea: If an itemset is frequent, all its subsets must be
frequent (Monotonicity) k-passes over dataset for itemsets of length k ~O(knp) n transactions, p items, itemsets of length k
Today’s algorithms Rely on the same basic principle But much faster (Main Reason: Data structures)
• Usually only 2 database passes• ~linear runtime
State-of-the-art algorithm overview: FIMI’03 We used: fpmax* [Grahne, Zhu: Nov 03]
Thursday, May 27, 2004
Diapers and Beer !!? Application to the domain of
Multimedia data:1. Formulate images as transactions2. Low-level clusters serve as a
dimensionality reduction for the visual features
3. We find associations of visual features (clusters) and keywords
4. From theses associations we deduce semantic rules
Advantages Comparably low computational
complexity Other data sources can be
integrated in the same manner (e.g. long-term relevance feedback)
Challenges Noisy, uncontrolled data Associations within keywords much
stronger than associations between keywords and visual features
Uneven distribution of cluster sizes (K-Means problem)
Thursday, May 27, 2004
Characteristics of the Itemsets and Rules There are associations
Within text {shoe} {walk} Within visual clusters {EHD 14}
{SCD 12} Between text and visual
clusters {shoe} {EHD 14} Measure for interestingness or
choice of rules from FI Confidence? Statistical Criteria? Background Knowledge?
(Example: pregnant -> Woman: 100% confidence)
Our „Background Knowledge“: Rules that connect keywords and low-level features are more interesting Since this is known, the mining
can be adapted and made even faster
Thursday, May 27, 2004
Exploiting the Itemsets and Rules
Thursday, May 27, 2004
Selecting Interesting Low-Level Clusters based on Rules Clusters were introduced to partition the
visual feature vector data and search only on certain clusters
Problem: We miss certain nearest neighbors if images for a concept are spread over several clusters
Unsatisfactory Solution: Overlapping Clusters
But association rules might find and solve this situation
Clusters are re-united If number of images for concept in both
clusters is >minsupp Example:
{shirt} -> {ehd249,ehd310} reunites these clusters for the initial keyword-query “shirt”!
This is scalable - unlike overlapping clusters
Another benefit is that more images labeled with the original keyword are “injected” into the results of K-NN search
Currently: One Keyword as high level semantic concept
Future: Find high level semantic concepts by mining associations within text first
Thursday, May 27, 2004
The Visual Link Another contribution, NOT related to Frequent Itemset
Mining and Association Rules… Since search-concept suggests visual nearest neighbor
search with relevance feedback after intitial keyword search: It would be nice to have a diverse selection of images for a
given keyword on the first page of results Images sorted not only by keyword ranking, but also based
on visual feature information Basic idea: For a given keyword query, build groups of
images that are visually close. Larger groups are more important Show only one representative per group
Thursday, May 27, 2004
The Visual Link: A Graph-Based Approach Let I(Q) be a set of images matching a keyword query Q Define a graph G(V,E)
i.e. images are visually linked if the distance between them is lower than a given threshold
Do a connected component analysis to find connected components C
For each component C find the „best“ representative rC
Re-rank results based on representatives rC
Thursday, May 27, 2004
The Visual Link: An Example
Thursday, May 27, 2004
The Visual Link: An Approximation Problem: Distance calculations
for graph take too long Clusters cannot be used Loading individual vectors
takes a lot of time Solution:
Approximate distance Idea: If images in the same
cluster and same distance range to the centroid Probability that they are „close“ is high
New definition for visually linked If in same cluster and same
range of relative distance to its centroid
Can be encoded in relational DB! And comes at nearly no extra cost in creation
Imageid Clusterid 2ndClusterid Reldist
1 221 122 0.6
2 342 345 0.8
3 223 42 0.2
4 12 126 0.4
Thursday, May 27, 2004
Discussion & Demo
Thursday, May 27, 2004
Discussion: Precision Measuring the quality of
such a large-scale system is difficult Precision/Recall measure
not possible: ground truth not known
• C: correct results• D: Desired results• A: Actual results
We measure the precision based on user questioning
Thursday, May 27, 2004
Before we continue … some numbers
Number of Images: 3 006 660 Size of Image data: 111 GB Feature Extraction: 15 days (dual 2Ghz CPU, 2GB RAM)
Number of distinct keywords: 680 256 Size of inverted keyword index table: 50 260 345 lines MySQL database size: 23 GB
Thursday, May 27, 2004
And now … the moment you’ve all been waiting for …
The Demo of Cortina
Thursday, May 27, 2004
Conclusions A system with over 3 Million items was implemented
Probably the largest CBIR System to date? A retrieval concept was introduced
a keyword query followed by relevance feedback and visual nearest neighbour search
Superior to existing retrieval concepts (query by keyword or query by example)
Data mining to explore and exploit semantics in large-scale systems was introduced
Thursday, May 27, 2004
Questions
Thursday, May 27, 2004
Outlook Many extensions and improvements possible
Segmentation• Or maybe rather some simple tiling
Indexing• K-Means should be replaced• Suggestion: VA-File based approach [Manjunath,Tesic 03]
Association Rule Mining• Multilevel Approach• First keywords for high level semantic concepts• Then visual features
Thursday, May 27, 2004
Thanks Ullrich Moenich and Lars Thiele
Thursday, May 27, 2004
Which Rules are of Interest? There are associations
Within text {shoe} {walk} Within visual clusters {EHD 14} {SCD 12} Between text and visual clusters {shoe} {EHD 14, SCD
12} There are long and short rules
Short rules have higher support by the nature of the problem
Long rules contain more (precise) information about the semantics
Measure for interestingness or choice of rules from FI Confidence? Statistical Criteria? Background Knowledge? (Example pregnant Woman )
Thursday, May 27, 2004
Characteristics and Challenges Chosen criteria
Mainly interested in rules {keywords} {visual feature clusters}. (Our “Background Knowledge”)
Support, confidence Mine long and short rules
Restriction of the problem: Mine for frequent itemsets per keyword i.e. all images=transactions for a given keyword This means
• We avoid being distracted by associations within keywords• The method is made even more scalable
The keyword as a placeholder for a semantic concept A keyword does not always stand for a single semantic concept Proposal for future versions: Multi-Level approach:
• First {keywords} {keywords} rules to identify “real” semantic concepts
• Then itemset mining per identified concept
Thursday, May 27, 2004
Characteristics of the Itemsets and Rules - Overall
Thursday, May 27, 2004
Why keyword filtering of the results does not work
Thursday, May 27, 2004
Proposal: Semantic Clusters Ultimate goal: Search some
kind of „Semantic Clusters“ instead of visual feature clusters
Proposal based on approach from Ester et al. 2002, 2003
Clustering based on frequent itemsets, originally for text
Clustering criterion: minimize overlap