1
E.V. Myasnikov
2007
Digital image collection navigation based onautomatic classification methods
Samara State Aerospace University
RCDL 2007 Интернет-математика 2007
Навигация по коллекциям цифровых изображений на основе методов автоматической классификации
Самарский государственный аэрокосмический университет
Е.В. Мясников
2
Navigation in collection of digital images
• alternative to image retrieval system• complement to image retrieval system• convenient browsing system
Approaches to navigation system construction
• to construct projection of the whole image collection into 2-D navigation space• to cluster image collection into the set of clusters (hierarchy) and then construct 2-D projection of each cluster• to construct tree-like structure using an optimization rule
3
Clustering methods
Hierarchical clustering(agglomerative)
Singlelink
Completelink
Averagelink
Nonhierarchical clustering
K-means Kohonen neuralnetworks (SOM)
Fuzzy clustering
4
Linear
Principalcomponent
analysis(PCA)
Nonlinear
Classical Kruskal MDS
(multidimensional scaling)
Sammonprojection
Force-DirectedReplacement
Projection methods
Discrete latticesolution
Continuous solution
5
Demands to the navigation system
• Representation of the collection has a form of 2D vectors (as icons, points on the monitor)
• The set of images having higher level of similarity is displayed when bringing near the region
• The set of images having lower level of similarity is displayed when moving away from the current region
• Property of reversibility
Operations with the navigation “map”
• Scrolling (up, down, left, right)• Scaling (up, down)
6
Main phases of proposed approach
Feature extraction
Cluster hierarchy construction
Mapping into 2-D navigation space using restrictions
imposed by cluster hierarchy
Digital images
Navigation space
7
Clustering Phase: Analyzed Methods
Hierarchical clustering scheme
1. Adjacency matrix calculation2. Rank each object among
clusters3. Merge elements with minimal
distance between them4. Elimination of the raw and
column of absorbed cluster and matrix recalculation
5. Stopping criterion test and transition to the step 3
Inter-cluster distance
single linkminimal distance between objects
involved in clusters
complete linkmaximal distance between objects
involved in clusters
Kohonen neural network
WTA correction rule:w (t+1) = w (t) + (t)[x(t) - w (t)]
d(x(t), w (t)) = min1 i K d(x(t), wi (t))
Following equation holds true for the winning neuron
To construct the hierarchy of clusters Kohonen neural network functions in a recursive order
.
.
. Com
peti
tion
m
echa
nism
y1 x1
x2
xn
w11 w12 w1n
wk1 wk2
wkn yk
y2 y3 . . .
8
U
jjjq U
1
2)(
1 wx
Clustering: Experimental results
*Experiment was conducted on samples of size equal to 1000
Number of
clusters
Average quantization error*
Single link
Complete link
WTA
25 0.387 0.193 0.169
50 0.359 0.150 0.139
100 0.293 0.116 0.112
Quantization error:
Examples of clusters
9
Mapping Phase: Sammon projection
N
ji ij
ijij
jiij d
dd
d
2*1
Error
dij - distance between objects i and j in multidimensional spaced*
ij - distance between objects i and j in two dimensional spaceyjk - coordinates in 2D space
N
ijj
jkikij
ij
N
jiij
ikik tytydd
dd
d
tyty
ij
ij
1*
*
)()(2
)()1(
Iterative formula
Notation
Operational time ~ O[N3](under the assumption that the number of iterations is of the same order as the number of objects)
10
Construction of initial configuration for Sammon mapping
Average error value*
Number of iterations
100 200 300
Sammon mapping with
random initalization
0.126 0.078 0.056
Best Sammon mapping over 10 runs with random
initialization
0.093 0.051 0.035
PCA 0.139
Two-phase method
(PCA as initial configuration for
Sammon mapping)
0.044 0.041 0.039
*samples of 100 images from dataset of 10 000 images were used to conduct the experiment
Two-phase method example
11
Methods of speeding-up Sammon projection
1. Triangulation
2. Neural Network
3. Approximation using random sets
Chalmers’96 adaptation for Sammon projection (CS)
Two sets are constructed for each object on each iteration:
• set of k1 close objects• set of k2 random objects
Operational time ~ O[N2](under the assumption that the number of iterations is of the same order as the number of objects and k1+k2 << N)
12
Proposed Methods: Combined Method (CM)
1. Build Sammon projection for the top level of the cluster tree2. Build Sammon projection for the each subcluster at level 2
using 2D coordinates of the superclasters as fixed points3. Repeat the process for each subclaster (or object) of the
level 3 and so on
Idea: Use hierarchical clustering to build the projection
Method description
Operational time
LNO
21
– for balanced tree with depth L
Modification of method (MCM): Use 2D coordinates of top level clusters for each subcluster (or object) at any level
Special case: O[N2] – for balanced tree with depth 2
13
Proposed Methods: Restrictive Combined Method (CMR)1. Map centers xu
1 of top level clusters Сu1C0 to the 2D vectors yu
1 using
dimensionality reduction method (Sammon or two-phase method). Set boundaries of the whole displayed area 0=
2. For each cluster СukCv
k-1 of the current level k carry out points 3-6
3. Construct boundaries uk of the cluster Cu
k in 2D space using centers
coordinates ymk in 2D space of the clusters Cm
k, m=1..|Cvk-1| of the current
level k4. Complete cluster boundaries u
k using boundaries vk-1 of the parent
cluster Cvk-1 at the previous level: u
k =uk v
k-1
5. Map centers xik+1 of all subclusters Сi
k+1Сuk (or immediately images Oi) at
level k+1 to 2D vectors yik+1, using boundaries u
k of the cluster applying
the following recurrence relation
ku
ki
N
ijj
jki
ij
ij
N
ijij
ki
ki ttt
dd
dd
d
tt
ij
ij
),()()(2
)()1( 1
1
1*
*11 yyyyy
6. Apply described in points 3-5 procedure to map child clusters Cik+1
in the recursive order
14
Proposed Methods: Modifications for CMR
Function ,y
1. Full correction rule (CMR-1) – if yi exceeds the bounds of the cluster
then the correction value ensure yi to
be on the boundary at the next step
2. Piece-wise linear rule (CMR-2) – correction value ensure the “attraction” to the center of the cluster or to the boundary when yi comes near
or exceeds the boundary
can be selected based on minimization of
functional consisted of Sammon error and boundary function
Two models were considered
Example of CMR-1
15
Experimental Research
METHOD
1000 images 5000 images
Average error value
Mean square
deviation of error
Average operation
time
Average error value
Mean square
deviation of error
Average operation
time
PCA 0.1171 0.01449 2 0.1148 0.009690 11
CS 0.02880 0.005668 62 0.02592 0.001657 1880
CM 0.03407 0.002638 17 0.06220 0.028756 31
MCM 0.02767 0.002047 19 0.02840 0.001962 67
CMR-1 0.02972 0.002218 13 0.03143 0.001890 31
CMR-2 0.03494 0.002590 60 0.03723 0.002076 276
16
Example of MCM
sample size: 10 000 images
17
Example of CMR
sample size: 10 000 images
18
Example of navigation (CMR)
region “а” region “б”
19
region “в” region “г”
Example of navigation (CMR)
20
Selection of features
Note:“Measures that are more effective for retrieval tend to be more complex, and thus lose their advantage over the simpler measures when forced into two dimensions” (K.Rodden, W.Basalaj, D.Sinclair, K.Wood A comparison of measures for visualising image similarity. In The Challenge of Image Retrieval. British Computer Society Electronic Workshops in Computing, 2000)
Features: Color histograms in CIE L*a*b color space
Metrics: Euclidian
Main requirement to the feature system:• Configuration of images in navigation space must be understandable to user
21
• The requirements to the navigation method are considered • Novel navigation method is proposed• Novel combined method and its modifications for dimensionality reduction are proposed• Proposed methods are compared to known method• The results of experimental analysis of methods being used are present
Conclusions
Future plans• Exploring new feature systems• Method improvement• Estimation of effectiveness of navigation method including expert estimation
22
This work was financially supported by Yandex (www.yandex.ru)
The dataset “Image database” was provided by Yandex
Acknowledgements
THANK YOU FOR YOUR ATTENTION