Download pdf - Web Based Image Similarity Search

Web based Image Similarity Search

Vinay Menon, Uday Babbar and Ujjwal Dasgupta

Delhi College of Engineering, Delhi

Email :{[email protected], [email protected], [email protected]}

Abstract—This paper proposes an implementation for a

web based image similarity search using a robust image

matching algorithm and a user definable matching

criterion. Matching is done using image signatures.

Image signatures are generated by decomposing images

using wavelet transforms. A database of image

signatures is generated by using it in conjunction with a

web crawler. Search results are found by calculating the

Euclidian distance between the signatures of the subject

and the database. Proposed architecture is time

efficient, robust and uses a rotation, and scale and

perspective invariant method.

Keywords: Image matching, Similarity, Reverse image

search, image signature

I. INTRODUCTION

As internet gets highly multimedia intensive there is

an increased need for tools to manipulate large

collections of media and to make search for media

more intuitive. Traditionally search engines use

manual captions, text in the related webpage, size and

dimension of images for image retrieval. This method

is highly subjective and may lead to unrelated search

results. The challenge is to bridge the gap between

the physical characteristics of digital images (e.g.

color, texture) that are used for comparison, and the

semantic meaning of the images that are used by

humans to query the database. A focus of significant

research is an algorithmic process for determining the

perceptual similarity of images. The comparison

algorithm has to judge differences in images in a

similar way as a human would. This is easier said as

done, because of some special properties of the

human eye and brain.

What makes humans perceive two images as being

similar? This is a difficult question for many reasons.

Two images can be perceived as similar because of

the presence of similar objects (e.g. a ball, a girl, a

Monalisa portrait) or because of a similar look (same

colors, shapes, textures). Recognition of objects and

hence content based image retrieval is extremely

difficult. This algorithm therefore analyzes the image

composition in terms of colors, shapes and textures.

Another problem in implementation of a search

engine is that comparing images in a large scale

collection is a processing power and bandwidth

intensive task. So an image signature /fingerprint

based solution has to be used. This algorithm should

extract features from an image and use them to

calculate a signature. The signatures generation

should be such that size, rotation, segmentation

insensitive matching is possible. These signature

/fingerprints generation in addition to allowing for

these comparisons should be fast enough for a viable

implementation.

II. IMPLEMENTATION

The implementation structure consists of three main blocks: the crawler, the signature generator and the similarity calculator.

A. WEB CRAWLER

The web crawler feeds the signature calculator with images from the internet which in turn populates the database. This is implemented in PHP. Code snippet is given below:

class Crawler { protected $markup = ''; public function __construct($uri) { $this->markup = $this->getMarkup($uri);

} public function getMarkup($uri) { return file_get_contents($uri); } public function get($type) { $method = "_get_{$type}"; if (method_exists($this, $method)){ return call_user_method($method, $this); } } protected function _get_images() { if (!empty($this->markup)){ preg_match_all('/<img([^>]+)\/>/i', $this->markup, $images); return !empty($images[1]) ? $images[1] : FALSE; } } protected function _get_links() { if (!empty($this->markup)){ preg_match_all('/<a([^>]+)\>(.*?)\<\/a\>/i', $this->markup, $links); return !empty($links[1]) ? $links[1] : FALSE; } } }

B. SIGNATURE (FEATURE VECTOR)

GENERATOR

This is the crux of the implementation where all of

the image processing is done. The fingerprint has to

be calculated only once for each image and then can

be stored in a database for fast searching and

comparing.

Initially before the fingerprint is calculated, the image is resized/scaled into a standard size. Since color comparisons are difficult to calculate in most color spaces especially in the popularly used RGB color space, the image is transformed into the CIE Luv color space. CIE Luv has components L*, u* and v*. L* component defines the luminance, and u*, v* define chrominance. CIE Luv is very used widely in color differences, especially with additive colors. The CIE Luv color space is defined from CIE XYZ.

CIE XYZ -> CIE Lab

{ L* = 116*((Y/Yn)^(1/3)) with Y/Yn>0.008856 { L* = 903.3*Y/Yn with Y/Yn<=0.008856 u* = 13*(L*)*(u'-u'n) v* = 13*(L*)*(v'-v'n)

Where u'=4*X/(X+15*Y*+3*Z) and v'=9*Y/(X+15*Y+3*Z) and u'n and v'n have the same definitions for u' and v' but applied to the white point reference.

The CIE XYZ is itself defined from the RGB using the following transformation:

|X| |0.430574 0.341550 0.178325| |Red |

|Y| = |0.222015 0.706655 0.071330| * |Green|

|Z| |0.020183 0.129553 0.939180| |Blue |

The finger print or the signature vector is composed of three parts: a grayscale DCT part, a color DCT part and an EDH part.

The discrete cosine transform (DCT) helps separate the image into parts (or spectral sub-bands) of differing importance (with respect to the image's visual quality). The DCT transforms a signal from the spatial domain into the frequency domain. A signal in the frequency domain contains the same information as that in the spatial domain. The order of values obtained by applying the DCT is coincidentally from lowest to highest frequency. This feature and the psychological observation that the human eye is less sensitive to recognizing the higher-order frequencies leads to the possibility of compressing a spatial signal by transforming it to the frequency domain and dropping high-order values and keeping low-order ones.

The general equation for a 2D (N by M image) DCT is defined by the following equation:

Where

The basic operation of the DCT is as follows:

• The input image is N by M;

• f(i,j) is the intensity of the pixel in row i and column j;

• F(u,v) is the DCT coefficient in row k1 and column k2 of the DCT matrix.

• For most images, much of the signal energy lies at low frequencies; these appear in the upper left corner of the DCT.

• The DCT input is an 8 by 8 array of integers. This array contains each pixel's gray scale level;

• 8 bit pixels have levels from 0 to 255.

The output array of DCT coefficients contains integers; these can range from -1024 to 1023.

The 64 (8 x 8) DCT basis functions are illustrated

A 2D-DCT is calculated over the L-channel (Luminosity) of the image. The first coefficient stands for the DC value, or average luminosity of the image. The next coefficients represent the higher order values with increasing frequency. A number of these coefficients (the first 10 coefficients) are taken and normalized for the grayscale fingerprint part. This part of the fingerprints represents the basic composition of the image.

Then a 2D-DCT of the two color components are calculated and used for the color part of the fingerprint. A two-dimensional DCT-II of an image or a matrix is simply the one-dimensional DCT-II, from above, performed along the rows and then along the columns (or vice versa). That is, the 2d DCT-II is given by the formula (omitting normalization and other scale factors):

Here only the first three of each are taken, since the human eye is much more sensitive to luminosity than to color. This part of the fingerprint represents the color composition of the image with reduced spatial resolution compared to the gray scale part.

On the luminosity channel of the image an EDH (Edge Direction Histogram) is calculated using a Sobel kernel. The Sobel kernel relies on central differences, but gives greater weight to the central pixels when averaging:

This is equivalent to first blurring the image using a 3 × 3 approximation to the Gaussian and then calculating first derivatives. This is because convolutions (and derivatives) are commutative and associative. These kernels are designed to respond maximally to edges running vertically and horizontally relative to the pixel grid, one kernel for each of the two perpendicular orientations. The kernels can be applied separately to the input image, to produce separate measurements of the gradient component in each orientation (call these Gx and Gy). These can then be combined together to find the absolute magnitude of the gradient at each point and

the orientation of that gradient. The gradient

magnitude is given by:

The histogram consists of eight equal bins N, NE, E, SE, S, SW, W and NW. This part of the fingerprint represents the "flow directions" of the image.

Image shows result after applying Sobel operator:

C. SIMILARITY CHECKER

The similarity checker calculates nearness of matching between two images by calculating a distance value.

Identical images yield the same fingerprint, similar images yield fingerprints similar to each other (according to a distance function) and unequal images yield totally different fingerprints. The distance function compares the fingerprints by weighting the coefficients and calculating the Euclidean Distance.

The user can input various comparison considerations for example, if the general brightness of the image should not be considered when comparing, the DC components weight can simply be set to zero. For less detail and a more tolerant search the higher coefficients can be made smaller or set to zero. For grayscale searching the color components weights are simply set to zero. For rotation or mirror invariant searching the components are shuffled accordingly and compared again. This weight vector can be used for a lot of tuning.

The Euclidean distance between points p and q is

the length of the line segment . In Cartesian coordinates, if p = (p1, p2,..., pn) and q = (q1, q2,..., qn) are two points in Euclidean n-space, then the distance from p to q is given by:

The smaller the distance value the more similar are the images.

III. EVALUATION

Image shows some results from the implementation.

There exists no clearly defined benchmark for

assessing a proposed similarity measure on images.

Indeed, human notions of similarity may vary, and

different definitions are appropriate for different

tasks.

The approach specified here has specific strengths

and weaknesses. Since this is a web based

implementation it entails large scale image matching

and speed is an

important criterion Scaling this implementation for a

full fledged search engine would require more

improvements in the efficiency of the fingerprint

calculation which is central to this application.

A more efficient retrieval system can be constructed

by implementing a Hierarchical Image classification

using certain tagged images and extrapolating the

groups by using this application.

IV. CONCLUSION

Here we have presented search engine capable of

finding similar images using an algorithm for

measuring perceptual similarity. In this system a web

based image crawler is initially used in conjunction

with the signature generator to create a database of

image fingerprints. Fingerprints are constructed using

processes that are fast and less CPU intensive. During

the search the fingerprint of the subject image is

calculated and compared with that of the database.

Results of similarity were found by calculating the

Euclidean distances. This comparison is flexible i.e.

if desired; images can be compared independent of

rotation or vertical and horizontal mirroring as well

as grayscale only, depending on which difference

function is taken.

V. REFERENCES

[1] Percentile Blobs for Image Similarity by Nicholas R. Howe,

Cornell, University Department of Computer Science, Ithaca, NY

14853

[2] Image Similarity: A Genetic Algorithm Based Approach by R.

C. Joshi, and Shashikala Tapaswi

[3]Color spaces FAQ

http://www.ilkeratalay.com/colorspacesfaq.php

[4]Image Similarity: Thomas Moser

[5]Discrete Cosine Transform:

http://www.cs.cf.ac.uk/Dave/Multimedia/node231.html

[6] Hierarchical Image Classification, Stefan Kuthan, Allan

Hanbury, TU Vienna

[7] Edge Detection, Bryan S. Morse, Brigham Young University

[8] J. Ashley, R. Barber, M. Flickner, J. Hafner, D. Lee,W.

Niblack, and D. Petkovic. Automatic and semi-automatic methods

for image annotation and retrieval in QBIC. In W. Niblack and R.

C. Jain, editors, Storage and Retrieval for Image and Video

Databases III. SPIE – The International Society for Optical

Engineering, 1995.

[9] J. Smith, “Integrated Spatial and Feature Image Systems:

Retrieval, Compression and Analysis,” PhD thesis, Columbia

University, USA, February, 1997.