7
Abstract - Medical imaging has become an important tool not only in documenting patient presentation and clinical findings, but also understanding and managing various diseases. Image data provides tangible visual evidence of disease manifestation. The number of digital images that needs to be acquired, analyzed, classified, stored and retrieved in the medical centers is exponentially growing with the advances in medical imaging technology. The goal of this work is to develop a medical image retrieval system for pathology images that implements recent improvements in feature representation, efficient indexing, and similarity matching. The feature representation is analyzed using various color feature extraction techniques in HSV, CIE L*u*v*, CIE L*a*b* color space. The image indexing is implemented based on the color descriptors. The efficiency of similarity matching is analyzed by means of Euclidean distance, Histogram intersection and hamming distance. The experimental results are compared for the color feature extraction techniques discussed in this paper. The image database considered for retrieval process consists of pathology images belonging to dermatology, dental, hematology, gastroscopic and cervical cancer. This paper suggests an optimal color feature extraction technique suitable for different pathology image categories chosen for the study. Keywords - Pathology images, color features, image indexing, similarity matching, image retrieval I. INTRODUCTION In recent years, enormous quantity of digital medical images is being generated in hospitals worldwide and it is expected that the amounts of such digital image database will further increase exponentially in the future. The digitally produced medical images are used for therapy, diagnosis, research and education [1]. The medical imaging field has generated additional interest in methods and tools for the management, analysis, and communication of these medical images. Many diagnostic imaging modalities are routinely used to support clinical decision making. It is important to extend such applications by supporting the retrieval of medical images by content. Traditionally, medical images are indexed and retrieved using texts only. In contrast to text-based image retrieval that uses textual language to describe the image content and consequently has significant limitations since image data cannot be fully described texturally, Content Based Medical Image Retrieval directly utilizes visual characteristics, such as color, texture, and shape, to represent image content. The content-based medical image retrieval involves describing an image by its features and then matching the query image to the most similar images within the database according to the resemblance of their features. An efficient and automatic system is required for indexing and retrieving images from the medical image databases. The problem of retrieving information based on image content has been researched by various groups since the early 1990’s resulting in the development of tools such as QBIC, Virage, and Blobworld [3]. CBIR in medicine has been an active area of research [4], but only a small number of proposed systems such as ASSERT [5] and IRMA [6] have been demonstrated in the clinical environment. In addition, although many large image databases exist, such as the National Cancer Imaging Archive (NCIA) or the Lung Imaging Database Consortium (LIDC) created under the aegis of the Cancer Imaging Program at the U.S. National Cancer Institute (NCI), these efforts have concentrated on data collection and transmission but have left development of applications to the research community. Lack of CBIR adoption is attributed partly to the difficulty of integrating current implementations with existing healthcare systems [4]. In a typical CBMIR system, low-level visual features (e.g. color, texture, shape, edge, etc.) are generated in a vector form and stored to represent the query and target images in the database [2]. When a user makes a query, image retrievals are performed based on computing similarity in the feature space and most similar to the query image are returned to the user based on similarity values computed. In general, the similarity comparison is performed either globally based on visual content descriptors from the entire image or locally based on descriptors derived from decomposed regions of the images. The prototype system developed in this work retrieves the images based on their color content. The images considered for database are pathology images that contain much color information. Pathology is a medical discipline that inherently relies heavily on images and image analysis. A good deal of research in managing pathology image databases has been conducted in the last decade [16]. Nearly all of these rely on text-based retrieval algorithms as their fundamental Analysis of Color Feature Extraction Techniques for Pathology Image Retrieval System M.Sheerin Banu 1 , Krishnan Nallaperumal 2 , 1 Professor and Head, Department of Computer Science and Engineering, Sethu Institute of Technology, Virudhunagar District, India 2 Professor and Head, Centre for Information Technology and Engineering, Manonmaniam Sundaranar University, Tirunelveli, India ( 1 [email protected] , 2 [email protected] ) 978-1-4244-5967-4/10/$26.00 ©2010 IEEE

Analysis of Color Feature Extraction Techniques for Pathology Image Retrieval System

Embed Size (px)

Citation preview

Page 1: Analysis of Color Feature Extraction Techniques for Pathology Image Retrieval System

Abstract - Medical imaging has become an important tool not only in documenting patient presentation and clinical findings, but also understanding and managing various diseases. Image data provides tangible visual evidence of disease manifestation. The number of digital images that needs to be acquired, analyzed, classified, stored and retrieved in the medical centers is exponentially growing with the advances in medical imaging technology. The goal of this work is to develop a medical image retrieval system for pathology images that implements recent improvements in feature representation, efficient indexing, and similarity matching. The feature representation is analyzed using various color feature extraction techniques in HSV, CIE L*u*v*, CIE L*a*b* color space. The image indexing is implemented based on the color descriptors. The efficiency of similarity matching is analyzed by means of Euclidean distance, Histogram intersection and hamming distance. The experimental results are compared for the color feature extraction techniques discussed in this paper. The image database considered for retrieval process consists of pathology images belonging to dermatology, dental, hematology, gastroscopic and cervical cancer. This paper suggests an optimal color feature extraction technique suitable for different pathology image categories chosen for the study.

Keywords - Pathology images, color features, image indexing, similarity matching, image retrieval

I. INTRODUCTION In recent years, enormous quantity of digital medical images is being generated in hospitals worldwide and it is expected that the amounts of such digital image database will further increase exponentially in the future. The digitally produced medical images are used for therapy, diagnosis, research and education [1]. The medical imaging field has generated additional interest in methods and tools for the management, analysis, and communication of these medical images. Many diagnostic imaging modalities are routinely used to support clinical decision making. It is important to extend such applications by supporting the retrieval of medical images by content. Traditionally, medical images are indexed and retrieved using texts only. In contrast to text-based image retrieval that uses textual language to describe the image content and consequently has significant limitations since image data cannot be fully described texturally, Content Based Medical Image Retrieval directly utilizes visual characteristics, such as color, texture, and shape, to

represent image content. The content-based medical image retrieval involves describing an image by its features and then matching the query image to the most similar images within the database according to the resemblance of their features. An efficient and automatic system is required for indexing and retrieving images from the medical image databases. The problem of retrieving information based on image content has been researched by various groups since the early 1990’s resulting in the development of tools such as QBIC, Virage, and Blobworld [3]. CBIR in medicine has been an active area of research [4], but only a small number of proposed systems such as ASSERT [5] and IRMA [6] have been demonstrated in the clinical environment. In addition, although many large image databases exist, such as the National Cancer Imaging Archive (NCIA) or the Lung Imaging Database Consortium (LIDC) created under the aegis of the Cancer Imaging Program at the U.S. National Cancer Institute (NCI), these efforts have concentrated on data collection and transmission but have left development of applications to the research community. Lack of CBIR adoption is attributed partly to the difficulty of integrating current implementations with existing healthcare systems [4]. In a typical CBMIR system, low-level visual features (e.g. color, texture, shape, edge, etc.) are generated in a vector form and stored to represent the query and target images in the database [2]. When a user makes a query, image retrievals are performed based on computing similarity in the feature space and most similar to the query image are returned to the user based on similarity values computed. In general, the similarity comparison is performed either globally based on visual content descriptors from the entire image or locally based on descriptors derived from decomposed regions of the images. The prototype system developed in this work retrieves the images based on their color content. The images considered for database are pathology images that contain much color information. Pathology is a medical discipline that inherently relies heavily on images and image analysis. A good deal of research in managing pathology image databases has been conducted in the last decade [16]. Nearly all of these rely on text-based retrieval algorithms as their fundamental

Analysis of Color Feature Extraction Techniques for Pathology Image Retrieval System

M.Sheerin Banu1, Krishnan Nallaperumal2,

1 Professor and Head, Department of Computer Science and Engineering, Sethu Institute of Technology, Virudhunagar District, India

2Professor and Head, Centre for Information Technology and Engineering, Manonmaniam Sundaranar University, Tirunelveli, India

([email protected], [email protected])

978-1-4244-5967-4/10/$26.00 ©2010 IEEE

Page 2: Analysis of Color Feature Extraction Techniques for Pathology Image Retrieval System

operating mechanism. It has proved to be an effective way for managing small image collections with predetermined domain context. Digital pathology has been attracting many researchers due to their high space and computational requirements. Pathology image databases deal with two major image types: gross and microscopic. In this work, gross pathology images are considered for image database. Gross pathology image content is especially important in that it is the foundation upon which pathologists make their diagnoses. Content-based image retrieval systems for pathology images have wide applicability in computer aided diagnosis by allowing the pathologist to retrieve similar cases to a new case along with the diagnosis information. Therefore, CBIR of gross pathology images is becoming an increasingly important and necessary methodology. The objective of this work was to develop a Pathology Image Retrieval System (PIRS) that will facilitate the automated retrieval of similar pathology images based on the color features: (i) color histogram, (ii) color moments, and (iii) color correlogram. Our pathology image database consists of dermatology images, gastroscopic images, dental images, hematology images and cervicographic images. A total of 222 pathology images were analyzed. In a clinical decision making process, PIRS can supply images to physicians or other experts with a similar visual appearance to a query image. Similar images retrieved by the system with proven pathology can assist the physicians in their decision making process [4]. PIRS can also be an effective tool for web based biomedical education, as students can browse large image repositories by their visual content and can find important or interesting cases. In research and clinical trials, images can be found for publications, and growth of diseases can be monitored effectively. In this paper, we have proposed a pathology image retrieval system to retrieve the similar digital pathology images from the medical image database. The organization of this paper is as follows. Section 2 describes the feature extraction, color analysis. Section 3 presents various color feature extraction techniques adopted in this work. Section 4 explains the retrieval system and the similarity measures used for analysis followed by the experimental results in Section 5. Finally a conclusion and future work is presented in Section 6.

II. FEATURE EXTRACTION An image feature can be defined as the value generated by a predefined image-processing algorithm applied to the image of interest. Global features are extracted from the content of the entire image. Meanwhile, local features are extracted from local regions of an image, corresponding to objects within the image. Image content can thus be defined as the set of all

possible features, or combinations of basic features over a target image. The indexing of an image database is often referred as feature extraction. Mathematically, a feature is an n-dimensional vector, with its components computed by some image analysis. The most commonly used visual cues are color, texture, shape, spatial information, and motion in video. Feature selection is a critical issue in retrieval system design. Although there are various criteria and techniques available in literature, it is still hard to tell which feature is necessary and sufficient to result in a high performance retrieval system. This is because the system performance relies on not only the features but the types of retrieval system as well.

A. Color Analysis

Color is perhaps the most expressive of all the visual

features and has been extensively studied in the image retrieval research during the last decade. Color features are among the most important and extensively used low-level features in image database retrieval. They are usually robust in noise, resolution, and orientation and resizing. Due to their little semantic meaning and its compact representation, color features tend to be more domain independent compared to other features. Its three dimensional values make its discrimination potentiality superior than the single dimensional grey values of images. The first step to extract color feature is to select an appropriate color space. Several color spaces are available, such as RGB, CMYK, HSV, CIE L*u*v*, and CIE L*a*b*. However, RGB color space is not perceptually uniform, which implies that two colors with larger distance can be perceptually more similar than another two colors with smaller distance, or simply put, the color distance in RGB space does not represent perceptual color distance[7].

In view of this drawback, perceptually uniform color spaces like HSV, CIE L*u*v and CIE L*a*b* are chosen for a comparative analysis. A three dimensional representation of the HSV color space is a hexacone, where the central vertical axis represents the Intensity [9]. Hue is defined as an angle in the range [0, 2π ] relative to the Red axis with red at angle 0, green at 2π /3, blue at 4 π /3 and red again at 2π . Saturation is the depth or purity of the color and is measured as a radial distance from the central axis with value between 0 at the center to 1 at the outer surface. Gonzales and Woods [8] use the following equations for RGB to HSV conversion:

H=cos 1−( ) ( )[ ]

( ) ( )( )[ ]BGBRGR

BRGR

−−+−

−+−

221

(1)

S = 1- ( )[ ]BGRBGR

,,min3

++ (2)

Page 3: Analysis of Color Feature Extraction Techniques for Pathology Image Retrieval System

V = ( )BGR ++31

(3)

CIE L*u*v* color space is composed of three components: L, u, and v. L defines the luminance while u and v define the chrominance. In order to use L*u*v* space, the colors values are first converted from RGB space into CIEXYZ space with a linear transform and then from CIEXYZ space into L*u*v* color space using the following transform [7, 10]:

⎪⎩

⎪⎨⎧

>′−=otherwiseny903.3

60.00885nyif163 'n

y116l (4)

( )( )nvv13lv

nuu13lu

′−′=

′−′= (5)

where ;Yn

Yny =′

(6)

⎥⎥

⎢⎢

⎥⎥

⎢⎢

⎥⎥

⎢⎢

⎡=

BGR

1.1160.0060.0000.1140.5870.2990.2000.1740.607

ZYX

(7)

where [ ]255,0,, ∈BGR are RGB values, ZYX ,, are the

tristimulus values of the object, and nZnYnX ,, are the tristimulus values of the reference white. In this paper, we adopt D50 illuminant: 964296.0=nX , 0.1=nY and

82105.0=nZ .

An L*a*b* color space is a color-opponent space with dimension L for luminance and a and b the color-opponent dimensions for chrominance. The values of a range from green to red and b ranges from blue to yellow based on nonlinearly-compressed CIE XYZ color space coordinates. The non-linear relations for L*, a*, b* are intended to mimic the logarithmic response of the eye. The transformation from CIE XYZ to CIE Lab is performed with the following equations:

(8)

where r,g,b [0,255] are RGB values, X,Y,Z are the tristimulus values of the object, and Xn, Yn, Zn are the tristimulus values of the reference white. In this paper, we adopt D50 illuminant: 964296.0=nX , 0.1=nY and

82105.0=nZ . B. Color Descriptors Color is an important attribute for image retrieval: color is an intuitive feature for which it is possible to use an effective and compact representation. Color spaces provide the method to manipulate colors. In the field of image processing and computer graphics, a lot of color models have been proposed [11]. In this paper, we have worked with color histogram, color moments and color correlogram methods for analysis. The main method of representing color information of images in CBIR systems is through color histograms. A color histogram is a type of line graph, where each peak represents a particular color of the color space being used. The peaks in a color histogram are referred to as bins and they represent the x-axis. The number of bins depends on the number of colors there are in an image. The y-axis denotes the number of pixels there are in each bin. In other words it denotes how many pixels in an image are of a particular color. Here Global and local color histograms are used in extracting the color features of images.

Fig. 1 Sample gastroscopic images and its color histogram in RGB color space

;n3Zn15YnX

n9Ynv;

3Z15YX9Y

v

;n3Zn15YnX

n4Xnu;

3Z15YX4X

u

++=′

++=′

++=′

++=′

⎥⎥

⎢⎢

⎥⎥

⎢⎢

⎥⎥

⎢⎢

⎡=

BGR

0.9502270.1191930.0193340.0721690.7151600.2126710.1804230.3575800.412453

ZYX

16nY

Y116.f*L −= ⎟⎠⎞⎜

⎝⎛

⎥⎦⎤

⎢⎣⎡ ⎟

⎠⎞⎜

⎝⎛⎟

⎠⎞⎜

⎝⎛ −=

nYYf

nXXf500.*a

⎥⎦⎤

⎢⎣⎡ ⎟

⎠⎞⎜

⎝⎛⎟

⎠⎞⎜

⎝⎛ −=

nZZf

nYYf200.*b

( )⎪⎩

⎪⎨⎧

+>=

else16/1167.7867.r0.008856r3

2rrf

Page 4: Analysis of Color Feature Extraction Techniques for Pathology Image Retrieval System

Color moments can be effectively used in many medical image retrieval systems. Color moments offer computational simplicity, speedy retrieval, and minimal storage. The mathematical meaning of this approach is that any color distribution can be characterized by its moments. In this paper, the moments such as the mean (first order), variance (second order), skewness (third order), standard deviation and kurtosis represent the feature vector. These moments are defined as:

∑=

=N

jiji f

N 1

( ) 21

1

21⎟⎟⎠

⎞⎜⎜⎝

⎛−= ∑

=

N

jiiji f

Nμσ (9)

( ) 31

1

31⎟⎟⎠

⎞⎜⎜⎝

⎛−= ∑

=

N

jiiji f

Ns μ

where ijf is the value of the thi color component of the image pixel j, and N is the number of pixels in the image.

(a) (b)

Fig. 2 (a) Sample Dermatology Image (b) Pixel Color Moments and Histogram Color Moments for Red Channel of (a)

A color correlogram expresses how the spatial correlation of pairs of colors changes with distance. A color correlogram is a table indexed by color pairs, where the k-th entry for (i, j) specifies the probability of finding a pixel of color j at a distance k from a pixel of color i in the image.

Let T be an nn × image. The colors in T are quantized into m colors mcc ,...,1 . For a

pixel ( ) Tyxp ∈= , , let ( )pT denotes its color.

Let ( ){ }cpTpTc == | . Thus, the notation cTp ∈ , is

synonymous with ( ) cpTTp =∈ , . Given any pixel of

color ic in the image, color correlogram gives the

probability that a pixel at distance k away from the given pixel is of color jc , it is defined as:

(10)

where { }212121 ,max yyxxppk −−=−=

( ){ }cpTpTc =≡

III. COLOR FEATURE EXTRACTION TECHNIQUES

Feature extraction is a form of dimensionality

reduction. Input data is reduced to a set of features, which is called features vector. The transformation of the input data into feature vector is known as feature extraction. The color feature extraction techniques analyzed in this paper are: Pixel based Color Moments Descriptor (PCMD), Color Histogram Moments Descriptor (CHMD), Single Channel Histogram Moments Descriptor (SCHMD), Maximum Frequency Symmetrical Color Spatial Feature (MFSCSF), Symmetrical Color Spatial Histogram (SCSH), and Binary Haar Color Descriptor (BHCD).

A. PCMD

Pixel value of image itself is the simplest form of image features, and to compare query image pixel value with database images pixel value this is the most direct approach for image retrieval. In this Pixel Color Moment Descriptor, the original image in RGB color space is converted into HSV/CIE L*u*v* /CIE L*a*b* color space. The color moments Mean, Variance, Standard Deviation, Skewness and Kurtosis are computed based on the pixel values of the image. This pixel based color moments are stored as feature vectors. The similarity between query image and database images are measured by means of a distance metric. The most similar images are displayed to the users according to the rank. B. CHMD

The color histogram describes the proportion of pixels of each color in an image with simple and computationally effective manner. In Color Histogram

( ) ( ) [ ]kppTprPTj

ic

ji cTpTp

kcc =−∈≅

∈∈212

,,

21

γ

Page 5: Analysis of Color Feature Extraction Techniques for Pathology Image Retrieval System

Moment Descriptor, we obtain the histogram of the image and then the moments are computed. The RGB image is converted into HSV/CIE L*u*v* /CIE L*a*b* image. The color moments Mean, Variance, Standard Deviation, Skewness and Kurtosis are computed based on the histogram values of the image. This histogram based color moments are stored as feature vectors. The similarity between query image and database images are measured by means of a distance metric. The most similar images are displayed to the users according to the rank.

C. SCHMD

Regardless of the color space, color information in

an image can be represented by a single 3-D histogram or three separate 1-D histograms. Feature extraction is done with the histogram of each color component, i.e. the color histogram is done by three histograms. These color representations are invariant under rotation and translation of the image.

Given an image f , of size M by N pixels,

characterized by the color c at location ( )ji, ,

i.e. ( )jifc ,= , the first-order color distribution or histogram of the color set C is given by

( ) ( )( )∑∑−

=

=

−=1

0

1

0

,1 M

i

N

jf cjif

MNch δ , Cc ∈∀

(11) In the equation above ( )δ is the unitary impulse

function. The value of each bin is thus the number of image pixels having the color c . If the cardinal of C is n ( n is the number of bins), the histograms can be represented as feature vectors in an n-dimensional space.

In this technique, the color moments Mean,

Variance, Standard Deviation, Skewness and Kurtosis are computed based on the histogram values of H channel and L channel of the image. This single channel histogram based color moments are stored as feature vectors. The similarity between query image and database images are measured by means of a distance metric. The most similar images are displayed to the users according to the rank.

D. MFSCSF

Gong,Y [12] proposed a method of dividing the image into 3×3 grids, each grid has same weight. An improvement of this partitioning has been proposed in which the image is divided into 4 × 4 grids of same weight as shown in the fig. 3. In Maximum Frequency Symmetrical Color Spatial Feature, the RGB image is converted into HSV/CIE L*u*v* /CIE L*a*b* image. This image is divided into 16 equal regions of same

weight. The maximum color frequency in each region is calculated and stored as feature vector. The similarity between each region of query image and each region of database images are measured by means of a distance metric. If the ten of the most similar regions are having less distance than the threshold value then the most similar images are displayed to the users based on the rank.

E. SCSH

In Symmetrical Color Spatial Histogram, the RGB

image is converted into HSV/CIE L*u*v* /CIE L*a*b* image. This image is divided into 16 equal regions of same weight. The histogram values of each region are stored as feature vector. The similarity between each region of query image and each region of database images are measured by means of a distance metric. If the ten of the most similar regions are having less distance than the threshold value then the most similar images are displayed to the users based on the rank.

F. BHCD

The Haar transform coefficients of the 64-bin color

histogram are obtained using Haar wavelet function. Since the Haar wavelet function contains only values of +1 or -1 [11]. Therefore, the computation does not involve any multiplication.

1 0<=x<0.5 Ψ(x) = -1 0.5<=x<1 0 Elsewhere (12)

The wavelet function Ψ(x) together with its integer translates and binary scaling, spans the difference between any two adjacent scaling subspaces [13]. The Haar transform coefficients are obtained by taking the inner product of the basis functions with the given histogram. Each of the Haar transform coefficients is quantized to binary. The Haar transform coefficients are hierarchically computed. The first level, the 64 bins of histogram is divided into two halves. If the sum of the histogram values in the left half is greater than the sum of the histogram values in the right half then the first bit of descriptor is ‘1’ else it is ‘0’. This is repeated recursively at second, third, fourth, fifth and sixth levels resulting in 4, 8, 16, and 32 coefficients. Therefore, the 127(1+2+4+8+16+32) bits binary Haar descriptor is obtained. This 63-bit binary Haar descriptor is computed for HSV and CIE L*u*v* color spaces for analysis.

IV. IMAGE RETRIEVAL

A query image’s content is extracted during runtime

and used to match against those in the database. The result of the query is a set of images that are similar to the query image, rather than an exact match. In PCMD, Euclidean distance is used as a distance metric. In CHMD, SCHMD,

Page 6: Analysis of Color Feature Extraction Techniques for Pathology Image Retrieval System

MFSCSF, and SCSH techniques, the similarity between query image and database images are measured by means of the following distance metrics:

A. Normalized Histogram Intersection

Normalized histogram intersection is equivalent to the use of the sum of absolute differences or city-block metric. The normalized histogram intersection distance is defined by

( ) ( ) ( )( )( ) ( )⎟

⎠⎞

⎜⎝⎛ ∑

=∑−

=

∑−

=−= 1

0

1

0,min

1

0,min

1, n

i

n

iiIhiQh

n

iiIhiQh

IhQhd∩ (13)

B. Histogram Euclidean Distance

The classical histogram Euclidean distance is defined as

( ) ( ) ( )( )∑−

=−=

1

02,

n

iiIhiQhIhQhEd (14)

C. Histogram 2χ Metric Distance

( ) ( ) ( )( )( ) ( )( )∑

= +−

=1

0

2

,2

n

i IQ

IQIQ ihih

ihihhhdχ (15)

D. Histogram Mahalanobis Distance

The Mahalanobis distance is given as

( ) ( ) ( )( )( )∑

=

−=

1

02

2

,n

i I

IQIQM i

ihihhhd

K

K

K σ (17)

E. Histogram Quadratic Distance

The quadratic form distance is defined as

(20)

The most similar images are displayed to the users according to the rank.

In BHCD, when an image is selected as the query

image, its 64-bin color histogram is calculated. The single channel color histogram of query image is transformed to the binary Haar descriptor. Similarly, the color descriptors for the database images are obtained and stored. The descriptor of the query image is

compared to the descriptor of the candidate images using the hamming distance. The retrieval results are ordered according to the rank and presented to the user.

V. EXPERIMENTAL RESULTS

Database consists of 222 pathology images for

analysis [15][16][17][18]. Averaged Normalized Modified Retrieval Rate (ANMRR) is an overall performance calculated by averaging the result from each query. The average rank AVR(q) for query q is computed as follows:

AVR(q)= ∑=

)q(NG

1k )q(NG

)k(Rank (21)

where NG(q) is the size of the ground truth set for a query image q, Rank(k) is the ranking of the ground truth images retrieved by the retrieval algorithm. The modified retrieval rank is computed as follows:

MRR(q)=AVR(q)-0.5-2

)q(NG (22)

Normalized Modified Retrieval Rate (NMRR) is used

to measure the performance of each query. NMRR is defined by

NMRR (q) =)q(NG*5.05.0)q(K

)q(MRR

−+ (23)

The NMRR is in the range of [0, 1] and smaller

values represent a better retrieval performance. ANMRR is defined as the average NMRR over a range of queries, which is given by

ANMRR = ∑=

NQ

1q)q(NMRR

NQ

1 (24)

where NQ is number of query images.

Fig. 3. Sample Query Images

( )21

1

0

1

0

1

0

1

0

1

0

1

0

, ⎟⎟⎠

⎞⎜⎜⎝

⎛++= ∑∑ ∑∑ ∑∑

=

=

=

=

=

=

n

i

n

j

n

i

n

j

n

i

n

jjijiijjiijqad yxyyaxxayxD

Page 7: Analysis of Color Feature Extraction Techniques for Pathology Image Retrieval System

TABLE I RETRIEVAL PERFORMANCE BASED ON ANMRR

Color based

Image Retrieval Techniques

ANMRR

HSV Color Space

CIE L*u*v*

Color Space

CIE L*a*b* Color Space

PCMD 0.28 0.03 0.08 CHMD 0.11 0.12 0.02 SCHMD 0.04 0.15 0.05 MFSCSF 0.02 0.12 0.07 SCSH 0.07 0.07 0.06 BHCD-63-BIT 0.01 0.04 0.10

VI. CONCLUSION AND FUTURE WORK

In this paper, various color feature extraction

techniques are analyzed in HSV, CIE L*u*v* and CIE L*a*b* color spaces. The experimental results confirm that the Binary Haar Color Descriptor yields better retrieval results in HSV color space than the other techniques discussed in this paper. With this suitable color feature extraction technique, a Pathology Image Retrieval System is developed to retrieve similar pathology images from the database. The system using color feature alone may not be suitable for all pathology images. Hence, texture features can be combined with color to improve the retrieval efficiency of the Pathology Image Retrieval System.

ACKNOWLEDGMENT

The corresponding author would like to acknowledge the "Sierra Atlantic - ITT IDL Image Processing Research Lab, Centre for Information Technology and Engineering, Manonmaniam Sundaranar University for providing access to this research work on Pathology based Content Based Image Retrieval System and the research colleagues working in the lab for their cooperation and suggestions.

REFERENCES

[1] H.D.Tagare, C. Jafe, and J. Duncan. “Medical image databases: A content based retrieval approach”, J. Amer. Med. Informat. Assoc., 4(3): 184–198, 1997. [2] A. Smeulders et al.. “Content-based image retrieval at the end of the early years”, IEEE Trans. Pattern Anal. Mach. Intell., 22(12):1349–1380, 2000. [3] Antani S, Kasturi R, Jain R., “A survey on the use of pattern recognition methods for abstraction, indexing and retrieval of images and video”, Pattern Rec. 2002;35(4):945-65. [4] Muller H, Michoux N, Bandon D, Geissbuhler A., “A re-

view of content-based image retrieval systems in medical applications—clinical benefits and future directions”, Int J Med Info. 2004; 73(1):1-23.

[5] Shyu CR, Brodley CE, Kak AC, Kosaka A, Aisen AM, Broderick LS., “ASSERT: A physician-in-the-loop content-

based retrieval system for HCRT image databases”, Comp Vis & Im Understanding 1999; 75(12):111-32.

[6] Thies C, Güld MO, Fischer B, Lehmann TM., “Content- based queries on the CasImage database within the IRMA framework”, Lec Notes Comp Sci 2005;3491:781-92.

[7] Dengsheng Zhang, “Improving Image Retrieval Performance by Using Both Color and Texture Features”, IEEE Proceedings of the Third International Conference on Image and Graphics, 2004. [8] Rafael C. Gonzalez, and Richard E. Woods, “Digital Image Processing”, Prentice Hall, 2001. [9] G.Stockman and L.Shapiro, “Computer Vision”, Prentice Hall, 2001. [10] Tkalcic,M., and Tasic,J.F., “Color spaces: perceptual,

historical and applicational background EUROCON 2003”, Computer as a Tool, The IEEE Region 8, vol. 1, Sept. 2003, pp. 304-308.

[11] G. Sharma and H. J. Trussell, “Digital Color imaging,” IEEE Trans. on Image Processing, Vol. 6, No. 7, pp. 901–932, 1997. [12] Gong, Y., Zhang, H., Chuan, H.C., etc. “An Image

Database System With Content Capturing and Fast Image Indexing Abilities”. Proc. IEEE International Conference on Multimedia Computing and Systems, 1994, Boston, pp.121-130.

[13] F.W. Lancaster, "Information Retrieval Systems: Characteristics, Testing and Evaluation," Wiley, New York, 1968.

[14] J. R. Smith. “Integrated Spatial and Feature Image System: Retrieval, Analysis and Compression”. Ph.D thesis, Columbia University, 1997.

[15] www.oralcancerfoundation.org [16] www.gastrolab.net [17] http://dermatlas.med.jhmi.edu [18] www.dermis.net