View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Reading Assignment
Guojun Lu, "Multimedia Database Management Systems", Artech House, Publishers, 1999. Chapter 9: Techniques and Data Structures for
Efficient Multimedia Similarity Search
3
Introduction
Feature space is multidimensional. Objective: divide the multidimensional space
into subspaces, so that only a few subspaces need to be searched for each query.
Operations on Data Structures
Search Most important as it is done “on-line” Must be efficient
Insertion and Deletion Less important as they can be performed “off-line” Need not be efficient
Efficient Search Techniques
Filtering B and B+-trees Clustering MB+-trees k-d trees Grid files R trees TV trees
Filtering
Reduce the search space by selecting specific items satisfying certain criteria.
Search on the multidimensional feature vector is carried out on the selected set of items only.
Filtering with Structured Attributes/Classification
Structured attributes associated with multimedia data. e.g. date
Subject classification e.g. image-content type=“Natural Scene”
Filtering using the triangle inequality
d is a distance metric and i,q, and k are feature
vectors for three objects. Most feature distance measures are “metrics”
d(i,k)=0 iff i=k d(i,k)= d(k,i) Triangle inequality above
),(),(),( qidkqdkid
Application of Triangle Inequality to Multimedia Retrieval MDB Representation
Select m feature vectors as a comparison base F, where m << |MDB|, the size of the database
iMDB fjF, calculate d(i,fj) and store in MDB.
MDB retrieval for query q Calculate d(q,fj) fjF.
Compute l(i) = max 1jm |d(i,fj)- d(q,fj)| Find d(i,q) iMDB l(i) T, where T is a
specified threshold.
Example Find database items whose distance to query q < 3 where the distances between q
and the 2 vectors f1 and f2 forming the base are 3 and 4, respectively
Database
Items id(i,f1) d(i,f2) |d(i,f1)- d(q,f1)| |d(i,f2)- d(q,f2)| l(i)
i1 2 5 1 1 1
i2 7 0
i3 10 4
i4 8 8
i5 9 7
i6 1 6
i7 5 4
i8 4 4
Filtering in the Color Histogram-Based Retrieval Space reduction can take place by
Reducing the number of bins Selecting only a subset of the database images for
calculating the distances from the query image. Space reduction is achieved through the following
process: Select potential image candidates. Use full histogram comparison to calculate the distance
between the query and the candidates.
Selecting Potential Image Candidates Use very few bins in comparing color
histograms. Use the average color of images.
3
1
2),(j
jjavg yxyxd
))(
,)(
,)(
(x , image ofColor Average 111
P
iB
P
iG
P
iRx
P
i
P
i
P
i
B-Trees (1)
1. B-Trees are always balanced.2. B-Trees keep similar-valued records
together on a disk page, which takes advantage of locality of reference.
3. B-Trees guarantee that every node in the tree will be full at least to a certain minimum percentage. This improves space efficiency while reducing the typical number of disk fetches necessary during a search or update operation.
B-Tree Definition
A B-Tree of order m has these properties: The root is either a leaf or has at least two children. Each node, except for the root and the leaves, has
between m/2 and m children. All leaves are at the same level in the tree, so the tree is
always height balanced.
A B-Tree node is usually selected to match the size of a disk block.
A B-Tree node could have hundreds of children.
B-Tree SearchSearch in a B-Tree is a generalization of search
in a 2-3 Tree.1. Do binary search on keys in current node. If
search key is found, then return record. If current node is a leaf node and key is not found, then report an unsuccessful search.
2. Otherwise, follow the proper branch and repeat the process.
50
20 30 70
10 12 13 21 23 35 37 40 55 59 60 73 80
B+-Trees
The most commonly implemented form of the B-Tree is the B+-Tree.
Internal nodes of the B+-Tree neither store records nor pointers to records. They only store key values to guide the search.
Leaf nodes store records or pointers to records.
A leaf node may store more or less records than an internal node stores keys.
Q: What is the advantage of using a B+-tree over a B-tree implementation?
B+-Tree InsertionInsert the following keys into an initially
empty B+ tree of order 4 where each leaf node can hold a maximum of 5 records:
12 , 23 , 10 , 48 , 33 , 50 , 15 , 18 , 20 , 45 , 47 , 31 , 52 , 21 , 30
What is the cost of direct access to a B+ tree of order n with N database items?
Clustering
Similar information items are grouped together to form a cluster based on a certain similarity measurement.
Each cluster is represented by the centroid of the feature vectors of that cluster.
How is search carried out?
2D Example
x x
xx
xx
xx x
xx
xx
xx x
xx
xx
x
x x
xx
xx
x
x x
xx
xx
x
x x
xx
xx
x
x x
xx
xx
xx x
xx
xx
x
x x
xx
xx
x
x x
xx
xx
xx x
xx
xx
x
x x
xx
xx
x
x x
xx
xx
xx x
xx
xx
x
x x
xx
xx
x
x x
xx
xx
xx x
xx
xx
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
xx
xx
x
xx
xx
x
xx
x
x
x
x
x
xx
x
x
x x
xx
xx
x
x x
xx
xx
x
x x
x
x
x
x
x
x x
xx
xx
xx x
xx
xx
x
x
x
xx
x
x
x x
x
xx
x
x
x x
xx
xx
x
x
x x
xx
xx
x
x x
xx
xx
x
x x
xx
xx
x
x x
xx
xx
xx x
xx
xx
x x
xx
xx
x
x xx x
xx
xx x
xx
xx
x
x