Web based Image Similarity Search
Vinay Menon, Uday Babbar and Ujjwal Dasgupta
Delhi College of Engineering, Delhi
Email :{[email protected], [email protected], [email protected]}
Abstract—This paper proposes an implementation for a
web based image similarity search using a robust image
matching algorithm and a user definable matching
criterion. Matching is done using image signatures.
Image signatures are generated by decomposing images
using wavelet transforms. A database of image
signatures is generated by using it in conjunction with a
web crawler. Search results are found by calculating the
Euclidian distance between the signatures of the subject
and the database. Proposed architecture is time
efficient, robust and uses a rotation, and scale and
perspective invariant method.
Keywords: Image matching, Similarity, Reverse image
search, image signature
I. INTRODUCTION
As internet gets highly multimedia intensive there is
an increased need for tools to manipulate large
collections of media and to make search for media
more intuitive. Traditionally search engines use
manual captions, text in the related webpage, size and
dimension of images for image retrieval. This method
is highly subjective and may lead to unrelated search
results. The challenge is to bridge the gap between
the physical characteristics of digital images (e.g.
color, texture) that are used for comparison, and the
semantic meaning of the images that are used by
humans to query the database. A focus of significant
research is an algorithmic process for determining the
perceptual similarity of images. The comparison
algorithm has to judge differences in images in a
similar way as a human would. This is easier said as
done, because of some special properties of the
human eye and brain.
What makes humans perceive two images as being
similar? This is a difficult question for many reasons.
Two images can be perceived as similar because of
the presence of similar objects (e.g. a ball, a girl, a
Monalisa portrait) or because of a similar look (same
colors, shapes, textures). Recognition of objects and
hence content based image retrieval is extremely
difficult. This algorithm therefore analyzes the image
composition in terms of colors, shapes and textures.
Another problem in implementation of a search
engine is that comparing images in a large scale
collection is a processing power and bandwidth
intensive task. So an image signature /fingerprint
based solution has to be used. This algorithm should
extract features from an image and use them to
calculate a signature. The signatures generation
should be such that size, rotation, segmentation
insensitive matching is possible. These signature
/fingerprints generation in addition to allowing for
these comparisons should be fast enough for a viable
implementation.
II. IMPLEMENTATION
The implementation structure consists of three main blocks: the crawler, the signature generator and the similarity calculator.
A. WEB CRAWLER
The web crawler feeds the signature calculator with images from the internet which in turn populates the database. This is implemented in PHP. Code snippet is given below:
class Crawler { protected $markup = ''; public function __construct($uri) { $this->markup = $this->getMarkup($uri);
} public function getMarkup($uri) { return file_get_contents($uri); } public function get($type) { $method = "_get_{$type}"; if (method_exists($this, $method)){ return call_user_method($method, $this); } } protected function _get_images() { if (!empty($this->markup)){ preg_match_all('/<img([^>]+)\/>/i', $this->markup, $images); return !empty($images[1]) ? $images[1] : FALSE; } } protected function _get_links() { if (!empty($this->markup)){ preg_match_all('/<a([^>]+)\>(.*?)\<\/a\>/i', $this->markup, $links); return !empty($links[1]) ? $links[1] : FALSE; } } }
B. SIGNATURE (FEATURE VECTOR)
GENERATOR
This is the crux of the implementation where all of
the image processing is done. The fingerprint has to
be calculated only once for each image and then can
be stored in a database for fast searching and
comparing.
Initially before the fingerprint is calculated, the image is resized/scaled into a standard size. Since color comparisons are difficult to calculate in most color spaces especially in the popularly used RGB color space, the image is transformed into the CIE Luv color space. CIE Luv has components L*, u* and v*. L* component defines the luminance, and u*, v* define chrominance. CIE Luv is very used widely in color differences, especially with additive colors. The CIE Luv color space is defined from CIE XYZ.
CIE XYZ -> CIE Lab
{ L* = 116*((Y/Yn)^(1/3)) with Y/Yn>0.008856 { L* = 903.3*Y/Yn with Y/Yn<=0.008856 u* = 13*(L*)*(u'-u'n) v* = 13*(L*)*(v'-v'n)
Where u'=4*X/(X+15*Y*+3*Z) and v'=9*Y/(X+15*Y+3*Z) and u'n and v'n have the same definitions for u' and v' but applied to the white point reference.
The CIE XYZ is itself defined from the RGB using the following transformation:
|X| |0.430574 0.341550 0.178325| |Red |
|Y| = |0.222015 0.706655 0.071330| * |Green|
|Z| |0.020183 0.129553 0.939180| |Blue |
The finger print or the signature vector is composed of three parts: a grayscale DCT part, a color DCT part and an EDH part.
The discrete cosine transform (DCT) helps separate the image into parts (or spectral sub-bands) of differing importance (with respect to the image's visual quality). The DCT transforms a signal from the spatial domain into the frequency domain. A signal in the frequency domain contains the same information as that in the spatial domain. The order of values obtained by applying the DCT is coincidentally from lowest to highest frequency. This feature and the psychological observation that the human eye is less sensitive to recognizing the higher-order frequencies leads to the possibility of compressing a spatial signal by transforming it to the frequency domain and dropping high-order values and keeping low-order ones.
The general equation for a 2D (N by M image) DCT is defined by the following equation:
Where
The basic operation of the DCT is as follows:
• The input image is N by M;
• f(i,j) is the intensity of the pixel in row i and column j;
• F(u,v) is the DCT coefficient in row k1 and column k2 of the DCT matrix.
• For most images, much of the signal energy lies at low frequencies; these appear in the upper left corner of the DCT.
• The DCT input is an 8 by 8 array of integers. This array contains each pixel's gray scale level;
• 8 bit pixels have levels from 0 to 255.
The output array of DCT coefficients contains integers; these can range from -1024 to 1023.
The 64 (8 x 8) DCT basis functions are illustrated
A 2D-DCT is calculated over the L-channel (Luminosity) of the image. The first coefficient stands for the DC value, or average luminosity of the image. The next coefficients represent the higher order values with increasing frequency. A number of these coefficients (the first 10 coefficients) are taken and normalized for the grayscale fingerprint part. This part of the fingerprints represents the basic composition of the image.
Then a 2D-DCT of the two color components are calculated and used for the color part of the fingerprint. A two-dimensional DCT-II of an image or a matrix is simply the one-dimensional DCT-II, from above, performed along the rows and then along the columns (or vice versa). That is, the 2d DCT-II is given by the formula (omitting normalization and other scale factors):
Here only the first three of each are taken, since the human eye is much more sensitive to luminosity than to color. This part of the fingerprint represents the color composition of the image with reduced spatial resolution compared to the gray scale part.
On the luminosity channel of the image an EDH (Edge Direction Histogram) is calculated using a Sobel kernel. The Sobel kernel relies on central differences, but gives greater weight to the central pixels when averaging:
This is equivalent to first blurring the image using a 3 × 3 approximation to the Gaussian and then calculating first derivatives. This is because convolutions (and derivatives) are commutative and associative. These kernels are designed to respond maximally to edges running vertically and horizontally relative to the pixel grid, one kernel for each of the two perpendicular orientations. The kernels can be applied separately to the input image, to produce separate measurements of the gradient component in each orientation (call these Gx and Gy). These can then be combined together to find the absolute magnitude of the gradient at each point and
the orientation of that gradient. The gradient
magnitude is given by:
The histogram consists of eight equal bins N, NE, E, SE, S, SW, W and NW. This part of the fingerprint represents the "flow directions" of the image.
Image shows result after applying Sobel operator:
C. SIMILARITY CHECKER
The similarity checker calculates nearness of matching between two images by calculating a distance value.
Identical images yield the same fingerprint, similar images yield fingerprints similar to each other (according to a distance function) and unequal images yield totally different fingerprints. The distance function compares the fingerprints by weighting the coefficients and calculating the Euclidean Distance.
The user can input various comparison considerations for example, if the general brightness of the image should not be considered when comparing, the DC components weight can simply be set to zero. For less detail and a more tolerant search the higher coefficients can be made smaller or set to zero. For grayscale searching the color components weights are simply set to zero. For rotation or mirror invariant searching the components are shuffled accordingly and compared again. This weight vector can be used for a lot of tuning.
The Euclidean distance between points p and q is
the length of the line segment . In Cartesian coordinates, if p = (p1, p2,..., pn) and q = (q1, q2,..., qn) are two points in Euclidean n-space, then the distance from p to q is given by:
The smaller the distance value the more similar are the images.
III. EVALUATION
Image shows some results from the implementation.
There exists no clearly defined benchmark for
assessing a proposed similarity measure on images.
Indeed, human notions of similarity may vary, and
different definitions are appropriate for different
tasks.
The approach specified here has specific strengths
and weaknesses. Since this is a web based
implementation it entails large scale image matching
and speed is an
important criterion Scaling this implementation for a
full fledged search engine would require more
improvements in the efficiency of the fingerprint
calculation which is central to this application.
A more efficient retrieval system can be constructed
by implementing a Hierarchical Image classification
using certain tagged images and extrapolating the
groups by using this application.
IV. CONCLUSION
Here we have presented search engine capable of
finding similar images using an algorithm for
measuring perceptual similarity. In this system a web
based image crawler is initially used in conjunction
with the signature generator to create a database of
image fingerprints. Fingerprints are constructed using
processes that are fast and less CPU intensive. During
the search the fingerprint of the subject image is
calculated and compared with that of the database.
Results of similarity were found by calculating the
Euclidean distances. This comparison is flexible i.e.
if desired; images can be compared independent of
rotation or vertical and horizontal mirroring as well
as grayscale only, depending on which difference
function is taken.
V. REFERENCES
[1] Percentile Blobs for Image Similarity by Nicholas R. Howe,
Cornell, University Department of Computer Science, Ithaca, NY
14853
[2] Image Similarity: A Genetic Algorithm Based Approach by R.
C. Joshi, and Shashikala Tapaswi
[3]Color spaces FAQ
http://www.ilkeratalay.com/colorspacesfaq.php
[4]Image Similarity: Thomas Moser
[5]Discrete Cosine Transform:
http://www.cs.cf.ac.uk/Dave/Multimedia/node231.html
[6] Hierarchical Image Classification, Stefan Kuthan, Allan
Hanbury, TU Vienna
[7] Edge Detection, Bryan S. Morse, Brigham Young University
[8] J. Ashley, R. Barber, M. Flickner, J. Hafner, D. Lee,W.
Niblack, and D. Petkovic. Automatic and semi-automatic methods
for image annotation and retrieval in QBIC. In W. Niblack and R.
C. Jain, editors, Storage and Retrieval for Image and Video
Databases III. SPIE – The International Society for Optical
Engineering, 1995.
[9] J. Smith, “Integrated Spatial and Feature Image Systems:
Retrieval, Compression and Analysis,” PhD thesis, Columbia
University, USA, February, 1997.