FAST QUERY IMAGE RETRIEVAL BY USING FEATURE … · are then used as retrieval keys during image search. But, ... MULTIVIEW ALIGNMENT HASHING FOR EFFICIENT IMAGE SEARCH BY LI ... Hashing

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, www.ijcea.com ISSN 2321-3469

Dr. S. Siamala devi, Deepa, Indira Sneka and Jenita Nancy 114

FAST QUERY IMAGE RETRIEVAL BY USING FEATURE

EXTRACTION METHOD FOR BIGDATA

Dr. S. Siamala devi 1, Deepa 2, Indira Sneka 3, Jenita Nancy 4

Department of Computer Science and Engineering , Sri Krishna College of Technology

ABSTRACT:

Today, fast-moving and evolving trends in image processing leads to radical change in the digital

environment. Digital images have huge applications in different fields like weather forecasting, space research, medical and diagnostics, military, cybercrime etc. In an advance growth of

technology, an explosive number of images were generated. These wide varieties of images were

increasingly acquired, stored and captured continuously which causes inefficiency in image retrieval. Due to alarming growth of internet images and high volume of data, the technique of

Content Based Image Retrieval (CBIR) were emphasized. As a result, significance of image

retrieval algorithms has been considerably increased. In this paper, a fast image retrieval algorithm by feature extraction methods is proposed. Here low level feature such as color,

texture and shape were extracted from images and certain similarity matching algorithm was used to compare query image with database images. The system retrieves images from the

associated database. The database is re-scripted after each level according to Distance Matching

Algorithm.

Keywords: Big data, Image retrieval, feature extraction, image color analysis, distance metrics, feature ranking.

[1] INTRODUCTION

With the advancement in data storage and image acquisition, large image dataset were created

which otherwise called as Big data. Big data are the datasets that are voluminous and complex to

process data that the traditional data processing application software are inadequate to deal with

them. It includes capturing data, data storage, data analysis, data searching, sharing, transferring,

querying, and updating data and information privacy.

There are five dimensions in big data, known as volume, variety, velocity, and recently added

FAST QUERY IMAGE RETRIEVAL BY USING FEATURE EXTRACTION METHOD FOR BIGDATA


veracity and value. Big data provides an infrastructure for transparency in manufacturing industry,

medical fields, space research, and military etc.

The data which were collected from different source such as emails, applications database

servers ,mobiles and other electronic devices such as iPod ,tablet ,laptop etc. were captured,

assembled, formatted, manipulated, stored and then analysed. These data can help many

organizations and institutions to improve its industrial operation and make faster, more intelligent

decisions.

It organize and manage semi structured and even unstructured data efficiently. In recent years,

database indexing, database re-ranking, text retrieval techniques has become a general pattern

.However the present text base image retrieval cannot meet the requirement of expectation

compared with content based retrieval. So, text based image retrieval is in incomplete stage.

[1.2] IMAGE RETRIEVAL PROBLEM

Nowadays, all fields of human life including commerce, government academics, crime

investigation, surveillance, engineering, architecture journalism, fashion and historical research

use image as information. In last decade, there was a rapid development in social media,

computers, and image capturing devices which collect large number of digital images and store

them to computer readable format. These images are classified, categorized and stored on

computers as a huge collection referred as image database. It includes the raw data, images and

information captured by computer assisted image analysis. To retrieve query image, the user have

to search for it among the image database using some search engine. The search engine will

compare query images to the related one. Here the user encounters the main problem of locating

user relevant image in the large and varied collection of resulted images. This problem is known as

image retrieval problem.

To solve this difficulty, two types of image retrieval method were adopted for searching i)

Text Based Image Retrieval (TBIR) ii) Content Based Image Retrieval (CBIR).

[1.3] TEXT BASED IMAGE RETRIEVAL

In TBIR, images are indexed using keywords, subject headings or classification codes which

are then used as retrieval keys during image search. But, for large database text based retrieval

became more difficult and the process becomes more laborious and time consuming task. Second

problem is that keyword must be unique, standardized and subjective .To overcome these problem,

contents of image such as color, texture, shapes were automatically extracted from images.

[1.4] CONTENT BASED IMAGE RETRIEVAL

To overcome the shortcomings of TBIR methods, researches put forward the CBIR. CBIR

directly gets visual vectors of the images to find out the similar characteristics such as color,

shape, texture etc. Because of the difference in the feature scale, CBIR method is divided into

global features and local features. According to the distance measurement between the feature

vectors, images in database are matched with query image.

CBIR is divided into two steps i) indexing ii) searching. In indexing step, image contents such as

color, shape, texture are extracted and stored as feature vector in the feature database.

In the second step of CBIR, for every query image, feature vector is constructed and compared

with all feature database images for similarity. The storage problem will occur in many devices

and computer hardware. To overcome the space complexity and manipulation time, all images are

represented in compressed format (JPEG) Joint Photographic expert group and Moving



Photographic Expert Group (MPEG). From all these compressed images, the low level features

(color, shape, texture) are extracted. First the images are decoded from the compressed domain to

pixel domain. For all the images in pixel domain, image processing and analysis methods are

applied .This process is inefficient because it require more time and space, processing time.

[2] LITERATURE SURVEY

[2.1] BEGINNERS TO CONTENT BASED IMAGE RETRIEVAL BY SPATTANAIK,

D.G.BHALKE AT (MAY 2012):

S-Pattanaik, D.G.Bhalke gives an overview idea of retrieving images from a large database.

CBIR is used for automatic indexing and retrieval of images depending upon contents of images

known as features. The features may be low level or High level. The low level features include

color, texture and shape. The high level feature describes the concept of human brain. The

difference between low level features extracted from images and the high level information need

of the user known as semantic gap. A Single feature can represent only part of the image property.

So multiple features are used to enhance the image retrieval process. It has used color histogram,

color mean, color structure descriptor and texture for feature extraction. The feature matching

procedure is based on their Euclidean distance.

[2.2] IMAGE RETRIEVAL WITH INTRACTIVE QUERY DESCRIPTION AND

DATABASE REVISION BY S.-S., SEBASTIAN-S AT (2011):

Sebastian-S has a further exploration and study of visual feature extraction. According to the

HSV (Hue, Saturation, Value) color space, the work of color feature extraction is finished, the

process is as follows: quantifying the color space in non-equal intervals, constructing one

dimension feature vector and representing the color feature by cumulative histogram. Similarly,

the work of texture feature extraction is obtained by using grey-level co-occurrence matrix

(GLCM) or color co-occurrence matrix (CCM).

Through the quantification of HSV color space, color features were combined. Depending on the

former, image retrieval based on multi-feature fusion is achieved by using normalized Euclidean

distance classifier. Through the image retrieval experiment, indicate that the use of color features

and texture based on CCM has obvious advantage.

[2.3] ANALYSIS OF DISTANCE METRICS IN CONTENT BASED IMAGE RETRIEVAL

USING STATISTICAL QUANTISED HISTOGRAM TEXTURE FEATURE IN DCT

DOMAIN BY FAZIL MALIK ,BAHARUM BAHARUDIN

Features in images are extracted in form of pixel and compressed domains. From the DCT

blocks of the image using the significant energy of the DC component and the first three AC

coefficients of the blocks, the quantized histogram statistical texture features were extracted. For

the effective matching of the query image with database images, various distance metrics are used

to measure similarities between texture features. On the basis of various distance metrics such as

Euclidean distance, Manhattan distance. The analysis of the effective CBIR is performed in

different number of quantization bins. This method is tested by using Corel image database using

various distance metrics with different histogram quantization in a compressed domain.

[2.4] IMAGE COMPRESSION USING BLOCK TRUNCATION CODING BY DOAA

MOHAMMED, FATMA ABOU-CHADI AT (2011):



The present work investigates image compression using Block Truncation Coding (BTC).

Two algorithms were selected namely, the original BTC and Absolute Moment block truncation

coding (AMBTC) and a comparative study was performed. Both of two techniques rely on

applying divided image into non overlapping blocks. They differ in the way of selecting the

quantization level in order to remove redundancy. Objectives measures were used to evaluate the

image quality such as: Peak Signal to Noise Ratio (PSNR), Weighted Peak Signal to Noise Ratio

(WPSNR), Bit Rate (BR) and Structural Similarity Index (SSIM).The results have shown that the

ATBTC algorithm outperforms the BTC. It has been show that the image compression using

AMBTC provides better image quality than image compression using BTC at the same bit rate.

Moreover, the AMBTC is quite faster compared to BTC.

[2.5] MULTIVIEW ALIGNMENT HASHING FOR EFFICIENT IMAGE SEARCH BY LI

LIU, MENGYANG YU, LING SHAO

Hashing is used for searching nearest neighbour in large – scale database, by including high

dimensional feature like background and noise into similarity– preserving hamming space in a low

dimension For most hashing methods ,space the performance of retrieval heavily depends on the

choice of the high dimension feature descriptor. Furthermore, a single type of feature cannot be

descriptive enough for different images when it is used for hashing Thus, how to combine multiple

representations for learning effective hashing function in an imminent task.

[2.6] FAST COLOR-SPATIAL FEATURE BASED IMAGE RETRIEVAL METHODS BY

C.H.LIN, D.C HUANG AND Y.K.CHAN

All image pixels are classified into several clusters depending on its colors. Three types of

color spatial distribution (CSD) features of the image are obtained by measuring the pixel spatial

distance in a same cluster. Based on these features, new image retrieval methods are also

introduced. A filter is also used to delete the most undesired and unwanted images, to change the

image retrieval methods. A new genetic algorithm is also used to decide the most parameters

which are used in the retrieval methods.

[2.7] IMAGE RETRIEVAL BY EXAMPLES BY ROBERTO BRUNELLI AND ORNELLA

MICH

Two key issues are solved by an efficient content-based query by a fast response time.

Example retrieval by presenting the architecture of a distributed image retrieval system. It also

quantifies the effectiveness of low level visual descriptors in database. It also improves the system

response time, while querying very large databases. A new mechanism is introduced, to adapt

system query strategies, to improve the relevance feedback effectiveness. Finally, a solution for

the issue of browsing multiple distributed databases is proposed using multidimensional scaling

techniques.

[2.8] EDGE HISTOGRAM DESCRIPTOR, GEOMETRIC MOMENT AND SOBEL EDGE

DETECTOR COMBINED FEATURES BASED OBJECT RECOGNITION AND

RETRIEVAL SYSTEM BY] NEETESH PRAJAPATI, AMIT KUMAR, G.S. PRAJAPAT

Three feature descriptor such as geometric moment, edge histogram descriptor and Sobel edge

detector techniques were combined to recognize the objects in the images are invariant with the

changes in scaling, orientation, and rotation with respect to angle. In image perception, edges are

used as a feature descriptor. Edge Histogram Descriptor (EHD) as a feature vector which



represents the spatial distribution of five edge types. Another shape feature vector, geometric

moment invariant is used to extort global features.

Due to Sobel operator’s smoothing effect, noise present in Images are less sensitive.

Sobel edge detection is chosen as a third feature vector to extract the shape features.

[2.9] COLOUR IMAGE CLUSTERING USING K-MEANS BY SNEHA SILVIA,

VAMSIDHAR, SUDHAKAR

Images are grouped into meaningful categories to reveal useful information which is otherwise

known as clustering. To extract features for image dataset, Block Truncation Coding is used and

K-Means clustering algorithm is introduced to group the image dataset into various clusters. The

dimension of interest is defined based on the application. In Images dataset, color, texture and

shape are taken as dimensions. In clustering, two methods were involved. First part is extraction

and second part is grouping. In image database, a feature vector are captured and stored in

database.

[2.10] TEXTURE ANALYSIS BASED ON THE GRAY-LEVEL CO-OCCURRENCE

MATRIX CONSIDERING POSSIBLE ORIENTATIONS BY BISWAJIT PATHAK,

DEBAJYOTI BAROOAH

By using a grey-level co-occurrence matrix (GLCM), texture features were commonly

extracted. It contains second order statistical information of image neighbouring pixels. In the

present work, a sample image of 8 bit grey scale image pattern acts as a non-destructive to

describe the surface texture. When the information of the image is not present in higher frequency

domain, the use of a contemporary method, known as absolute value of differences (AVD) were

used. It can be used as an alternative to the classical inertia of moment (IM) computing and

directions do matter while GLCM processing on image pattern.

[2.11] IMAGE TEXTURE FEATURE EXTRACTION USING GLCM APPROACH BY P.

MOHANAIAH, P. SATHYANARAYANA, L. GURUKUMAR

Low level image features such as extraction of color, texture and shape or domain specific

features were used as key feature. An application of grey level co-occurrence matrix (GLCM) is

used to extract second order statistical texture features for image motion estimation. The Four

features namely, Inverse Difference Moment, Angular Second Moment, Correlation and Entropy

are computed. These texture features have high discrimination accuracy, requires less response

time and high efficiently used for Pattern recognition applications.

[3] ALGORITHM

[3.1] EXISTING DATA REVISION ALGORITHM

In database revision method [2], images is the feature database are arranged in the order of

similarity with the given query image. In the number of selected images, a cut off is established.

Thus, the feature database is now reduced to n-numbers of images with high similarity. The top k

number of images is then displayed to the viewer, where k represents any reasonable number

manageable by the user. Database images are re-ranked based on the user satisfaction level, which

ranges from 0 to 1.



Images with lower user-satisfaction values are deleted from the feature database while higher

satisfied image will remain in the database. The database revision process continues as many times

as user gets satisfied with displayed images. The process must also delete the semantic gap

occurred in retrieval process. It also modify the database with respect to user opinion .Database

Revision prevent the contiguous appearance of irrelevant images by eliminating then from

database.

[3.2] PROPOSED DISTANCE MATCHING ALGORITHM

Due to the rapid development and improvement of the .The storage problem will occur in

many devices and computer hardware. To overcome the space complexity and manipulation time,

all images are represented in compressed format JPEG and MPEG.

From all these compressed images, the low level features (color, shape, texture) are extracted as

shown in the [Figure-4.1]. First the images are decoded from the compressed domain to pixel

domain. For all the images in pixel domain, image processing and analysis methods were applied.

This process is inefficient become it require more time and space, processing time.

Feature Ranking Image database

Distance matching

Figure: 4.1. Typical Architecture of CBIR System by using Distance Matching Algorithm

[3.3] FEATURE EXTRACTION

The first issue in CBIR is to extract image feature efficiently and represent these features in a

particular form that can be used efficiently in image matching. The texture statistical feature is

considered as important one that is very useful for the classification and similar image retrieval.

Some feature provides the information about intensity level distribution properties in the image

like uniformity, flatness, contrast and brightness. These statistical features are extracted in the

proposed distance matching algorithm. Some of the features are [3]standard deviation, skewness,

energy, entropy, and brightness are calculated by using the probability distribution.

[4] MODULE DESCRIPTION



The proposed method consists of [4]three modules. They are color search, texture search as

shown in the [Figure-4].

[4.1] MODULE 1: COLOR

In the entire image, Color is an important aspect which is the most easily noticeable

characteristic. It is because, color feature possess higher human attention than other feature vector.

It is the most commonly used in all the CBIR systems. Generally, there are two groups in this

feature: global color descriptors and local color descriptors. [5]Local colour descriptors represent

image color with respect to its spatial location of image. As the pixel-level color information is not

represented by local descriptor, they are more advantageous than global color descriptors. In this

paper, local color feature and some of global color features such as Mean, Standard deviation are

utilized. Local descriptors are Binary bitmap using truncation coding and color histogram. For

color histogram [7,9] RGB (Red, Green, and Blue) color space is used and for global color

descriptor, HSV (Hue, Saturation, Value) color space is used. The work of color histogram as

follows ,First, to concentrate on the localized color feature of image, image is cropped to find the

histogram of only central part of the given image while eliminating the surroundings. Then,

Histogram of the given cropped portion is extracted. It represents distribution of color intensity in

image.

Level 1 Feature Colour

Search

Da data Image Level 2 Feature

Texture Search

Level 3 Feature

Shape Search

Query Image

Image

retrieval

Figure: 4. Block diagram of the content based image retrieval by using the distance matching algorithm

[4.2] MODULE 2: TEXTURE

Texture feature of an image is derived from a combination of pixels that reoccur several times

in the image. The significance of extracting the texture is that it differentiates between objects with

same backgrounds. Grey Level Co-occurrence Matrix (GLCM)[11] is used in this system to



represent the texture feature. GLCM is a matrix. Each value of GLCM shows the number of

reoccurrences of two pixels and separated by a distance and at an angle in the image.

[4.3] MODULE 3: SHAPE

Another distinguishing feature of images is their shapes. Shape is an important descriptor. An

important shape feature is Edge Histogram Descriptor (EHD)[8,10]. It represents the relative

frequency of occurrence of the four types of edges, vertical horizontal, diagonal and anti-diagonal,

in the corresponding 4x4 sub-image blocks of the image. The normalized representation of edges

produces EHD. The shape search is carried out only for the 100 images that qualified the level 1

and level 2 searches. This considerably increased the speed of retrieval without any compromise

on the results. The database is again revised to 50 top images based on similarity qualifying the

shape search. Theses 50 images form the final database after the feature extraction.

[4.4] DESCRIPTION

The stepwise view of the proposed system is shown in the [Figure-6].The query image is

given to the browser which will give the processed image using Distance Matching Algorithm and

Dimension Reduction. The RGB components will be processed and form a cluster of images. After

clustering, the texture calculation for image and clustering will be followed. Finally, the images

are sorted out and provide the targeted image.

The query image is the source which is given as the input. It can be of any kind like living

thing and even non –living thing. The query image can be given as an image or as a path. The

processing will be done only by observing the database.

The Distance Matching algorithm can be used for the fast matching of various features such as

luminance histograms, edge histograms and local binary partition textures. By obtaining a set of

principal variables, the number of random variables is reduced.

This process is known as Dimension Reduction .It can be divided into feature selection and

feature extraction. The main purpose of the RGB color model which is a device – dependent color

model, are used for the sensing, representation and display of images. Different devices in it will

produce RGB value differently, since the color elements and their response to the each individual

RGB levels vary from one manufacture to other manufacture.

Clustering or data grouping is a key initial procedure in image processing. In present

scenario the size of database of companies has increased dramatically, these database contain large

amount of text, image. These huge databases were taken and accurate decisions in short durations

were made, in order to gain marketing advantage. A texture is a set of metrics calculated in image

processing designed to quantify the perceived texture of an image. Information about the spatial

arrangement

of image color or image intensities is given by image texture. The targeted image gives us the

similar output based on the input given. The process completes on both the algorithms and the

final result is generated.



Texture Calculation For Image and Clustering

RGB Components Clustering Based

Processing RGB Components

Sort Out The Processing Result Image Using Database Select Target Image

Figure: 6. The Stepwise block diagram of the proposed system

[7] DETERMINATION OF RATINGS

Probablity Distribution

Let P (b) be the probability distribution of b bins in each histogram with L bins then it is can

be evaluated as:

Where M is the total number of blocks in the image I.

Mean

It is defined as average intensity value of all bins of histograms. It describes the brightness of

the image and can be calculated as:

Standard deviation

It measures intensity distribution values about the mean in all blocks of histograms. It shows

high and low contrast of histogram in images with high or low values



Skewness

It measures the unequal distribution of intensity of all histogram blocks about the mean

values

Kurtosis

It calculates the peak of the distribution of intensity values about the mean value.

Energy

It is measured as a texture feature to calculate the uniformity of the intensity level

distribution in all histogram bins

Entropy

It measures the randomness of the intensity level distribution in bins

Smoothness

It is used to measure the image surface property by using standard deviation value of all

histogram bins

After texture feature calculation, these values are combined to get a feature vector such that:

Similarity Measurement

Euclidean distance

It measure the distance between two image vectors by calculating the square root of

the sum of the squared absolute difference and it can be calculated as:



City block distance

It is otherwise known as Manhattan distance. It is computed by the sum of absolute difference

between two image feature vectors and can be calculated as

[8] EVALUATION MEASURE

The performance of image retrieval is based on the performance of the feature extraction and

similarity distance measurement. It describes the performance metrics, effectiveness of image

retrieval and also stability of expected results.

Two measurements are used to evaluate image retrieval performance. They are precision, recall.

Precision

It is defined as ratio of retrieved relevant images to total query retrieved images.

Precision=A/B

Where, A is ‘‘the retrieved relevant images’’

B is ‘‘the total query retrieved images”

Recall

It is defined as ratio of the retrieved relevant images to total database images.

Recall = A/C

Where, A is ‘‘the retrieved relevant images’’

C is ‘‘the total number of relevant images in the database’’

[9] EXPERIMENTS AND RESULTS

In this paper, efficiency of existing system have been increased. The WANG dataset is used

for system evaluation. It is an image database where the images are manually selected from the

Corel database which is the collection of 1000 images. In WANG dataset, the images are

categorized into 10 classes such as

roses,elephants,horses,building,beaches,Africans,buses,fruits,dinosaurs as shown in [Figure-

9.2,9.3,9.4,9.5] Each class contains 100 unique images. It is widely used for testing CBIR systems

for image retrieval. Image Classification in the database into 10 classes makes the evaluation of

the image retrieval system easy.

The proposed system are implemented using Distance Matching Algorithm These selected

images went through the implemented system to extract the low level features and stored them in

feature database. The extracted features are clustered and indexed. The evaluating the CBIR

proposed system is shown in the [Figure-9.1]



Figure: 9.1 Comparison of precision of the existing and proposed system

Figure: 9.2 top 10 retrieved flower images Figure: 9.3 top 10 retrieved building images

Figure: 9.4 top 10 retrieved horse images Figure: 9.5 top 10 retrieved beach images

[10] CONCLUSION

Nowadays, content-based image retrieval system is a hot research topic. Many researches have

been done to develop some algorithms that solve problems to achieve accuracy in retrieving

images. This paper presents an improved system by introducing a new algorithm based on feature

categorized into level.

Here, each image from all the image classes is compared. Both conventional and proposed

methods are executed for retrieval. The significance of feature levels based search is verified. It is

much faster than conventional method and as precise as the existing methods. This will give

expected results without time wastage.



REFERENCES

[1] Swapnalini Pattanaik, D.G.Bhalke, “Beginners to Content Based Image Retrieval”. International

Journal of Scientific Research Engineering &Technology, vol. 1, no. 2, pp. 040–044, 2012.

[2] S.Sreedevi, S.Sebastian, “Image retrieval with Intractive Query Description and Database Revision”.

International Journal of Engineering, vol. 3, no. 3, 2014.

[3] Fazil malik, Baharum Baharudin, “Analysis of distance metrics in content based image retrieval using

statistical quantised histogram texture feature in DCT domain”. Journel of King Saud university

Computer and information science,vol -25,issue-2,207-218.

[4] R. Brunelli and O. Mich, “Image retrieval by examples,”.IEEE Transactions on Multimedia, vol. 2,

no. 3, pp. 164–171, 2000.

[5] Li Liu, Mengyang Yu, Ling Shao, “Multiview Alignment Hashing for Efficient Image Search”. IEEE

Transactions on image Processing, vol. 24, no. 3, pp. 956–966, Jan 2015.

[6] Doaa Mohammed, Fatma Abou-chadi, “Image Compression Using Block Truncation Coding”.

Multidisciplinary Journals in Science and Technology, Journal of Selected Areas in

Telecommunications (JSAT), February Edition, 2011.

[7] C. H. Lin, D. C. Huang and Y. K. Chan, “Fast color-spatial feature based image retrieval methods”.

Expert Systems with Applications, vol. 38, no. 9, pp. 11 412–11 420, 2011.

[8] Neetesh Prajapati, Amit Kumar, G.S. Prajapati, “Edge Histogram Descriptor, Geometric Moment and

Sobel Edge Detector Combined Features Based Object Recognition and Retrieval System”.

International Journal of Computer Science and Information Technologies, Vol. 7 (1), 2016, 407-412.

[9] Sneha Silvia, Vamsidhar, Sudhakar, “Colour Image Clustering using K-Means” International Journal

of Computer Science Technology Vol. 2, December 2011.

[10] Biswajit Pathak, Debajyoti Barooah, “Texture Analysis Based On the Grey-Level Co-Occurrence

Matrix Considering Possible Orientations”. International Journal of Advanced Research in Electrical,

Electronics and Instrumentation Engineering, Vol. 2, Issue 9, September 2013.

[11] P. Mohanaiah, P. Sathyanarayana, L. GuruKumar, “Image Texture Feature Extraction Using GLCM

Approach”. International Journal of Scientific and Research Publications, Volume 3, Issue 5, May

2013

Documents

FAST QUERY IMAGE RETRIEVAL BY USING FEATURE … · are then used as retrieval keys during image search. But, ... MULTIVIEW ALIGNMENT HASHING FOR EFFICIENT IMAGE SEARCH BY LI ... Hashing