6
Performance Analysis of Distance Measures for Computer tomography Image Segmentation V.V.Gomathi Dr. S.Karthikeyan Research Scholar Assistant Professor Research and Development Centre Department of Information Technology Bharathiar University College of Applied Sciences Coimbatore, India Sohar, Oman [email protected] [email protected] Abstract This paper presents a comparative evaluation of different distance metrics for clustering data points for organ segmentation. Selecting the exact distance measure is the challenging problem in clustering. In this research work, we have compared Euclidean distance, Manhattan Distance, Minkowski distance, Chebyshev distance and Signature Quadratic form Distance measures. The main aim of this research work is to identify the best distance measures for exact segmentation of clustering the images by minimizing fragmentation issue. Real time Dataset are used to evaluate the distance measures. Keywords: Euclidean distance, Manhattan Distance, Minkowski distance, Chebyshev distance, Signature Quadratic form Distance measures, clustering, segmentation, Classification. 1. Introduction Image segmentation is the process of partitioning a digital image into segments. Segmentation refers to simplifying and/or change the representation of an image into more meaningful and easier to analyze [11]. Image Segmentation is the most interesting and challenging problems in computer vision generally and especially in medical imaging applications. Accurate, fast and reproducible image segmentation techniques are required for various applications. Segmentation algorithms available vary widely depending on the specific application, image modality and other factors. Medical image segmentation is the process of outlining relevant anatomical structures in an image dataset. It is a problem that is central to a variety of medical applications including image enhancement and reconstruction, surgical planning, disease classification, data storage and compression, and 3D visualization [21] [9]. In cluster based medical image segmentation algorithms, more number of unwanted fragments exist and also fragments are not consistent when executed for a certain number of times i.e. when the same image executed for different number of times, the result were not holding the same number of fragments, position of fragment and size of fragment and also were dynamic. For diminishing these drawbacks, the distance based segmentation algorithm has been proposed. A central problem in image recognition and computer vision is determining the distance between images [8][12]. Clustering is the process of organizing objects into groups. The aim of clustering is to find out the intrinsic grouping in a set of unlabeled data. An important component of a clustering algorithm is the distance measure between data points. Any segmentation or classification of images involves combining or identifying objects that are close or similar to each other. The choice of distance is very important and should not be taken unconscientiously. Generally, some experience or subject matter based knowledge is more helpful in selecting a suitable distance for any clustering based application. Distance metrics plays a very important role in the clustering process. Distance metrics are used to segmenting the objects by region growing and classifying image pixels by the cluster analysis in image processing [1]. Traditionally the most of the V V Gomathi et al, Int.J.Computer Technology & Applications,Vol 5 (2),400-405 IJCTA | March-April 2014 Available [email protected] 400 ISSN:2229-6093

Performance Analysis of Distance Measures for Computer ...Archana Singh et al. implemented K-means with different measures and found Euclidean ... Minkowski Distance is the generalized

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Performance Analysis of Distance Measures for Computer ...Archana Singh et al. implemented K-means with different measures and found Euclidean ... Minkowski Distance is the generalized

Performance Analysis of Distance Measures for Computer tomography

Image Segmentation

V.V.Gomathi Dr. S.Karthikeyan

Research Scholar Assistant Professor

Research and Development Centre Department of Information Technology

Bharathiar University College of Applied Sciences

Coimbatore, India Sohar, Oman

[email protected] [email protected]

Abstract

This paper presents a comparative evaluation of

different distance metrics for clustering data points

for organ segmentation. Selecting the exact

distance measure is the challenging problem in clustering. In this research work, we have

compared Euclidean distance, Manhattan

Distance, Minkowski distance, Chebyshev distance

and Signature Quadratic form Distance measures.

The main aim of this research work is to identify

the best distance measures for exact segmentation

of clustering the images by minimizing

fragmentation issue. Real time Dataset are used to

evaluate the distance measures.

Keywords: Euclidean distance, Manhattan

Distance, Minkowski distance, Chebyshev

distance, Signature Quadratic form Distance

measures, clustering, segmentation, Classification.

1. Introduction

Image segmentation is the process of

partitioning a digital image into segments.

Segmentation refers to simplifying and/or change

the representation of an image into more

meaningful and easier to analyze [11]. Image

Segmentation is the most interesting and

challenging problems in computer vision generally

and especially in medical imaging applications.

Accurate, fast and reproducible image

segmentation techniques are required for various

applications. Segmentation algorithms available

vary widely depending on the specific application,

image modality and other factors.

Medical image segmentation is the process of

outlining relevant anatomical structures in an

image dataset. It is a problem that is central to a

variety of medical applications including image

enhancement and reconstruction, surgical planning,

disease classification, data storage and

compression, and 3D visualization [21] [9].

In cluster based medical image segmentation

algorithms, more number of unwanted fragments

exist and also fragments are not consistent when

executed for a certain number of times i.e. when

the same image executed for different number of

times, the result were not holding the same number

of fragments, position of fragment and size of fragment and also were dynamic. For diminishing

these drawbacks, the distance based segmentation

algorithm has been proposed.

A central problem in image recognition and

computer vision is determining the distance

between images [8][12]. Clustering is the process

of organizing objects into groups. The aim of

clustering is to find out the intrinsic grouping in a

set of unlabeled data. An important component of a

clustering algorithm is the distance measure

between data points. Any segmentation or

classification of images involves combining or

identifying objects that are close or similar to each

other.

The choice of distance is very important and

should not be taken unconscientiously. Generally,

some experience or subject matter based

knowledge is more helpful in selecting a suitable distance for any clustering based application.

Distance metrics plays a very important role in the

clustering process. Distance metrics are used to

segmenting the objects by region growing and

classifying image pixels by the cluster analysis in

image processing [1]. Traditionally the most of the

V V Gomathi et al, Int.J.Computer Technology & Applications,Vol 5 (2),400-405

IJCTA | March-April 2014 Available [email protected]

400

ISSN:2229-6093

Page 2: Performance Analysis of Distance Measures for Computer ...Archana Singh et al. implemented K-means with different measures and found Euclidean ... Minkowski Distance is the generalized

image segmentation techniques are based on

classical metric such as Euclidean metric.

In this paper different distance metrics such as

Euclidean distance, Manhattan Distance,

Minkowski distance, Chebyshev distance,

Signature Quadratic form Distance were analyzed.

The main contribution of this paper is to

demonstrate the performance of these distance

metrics for computer tomography images. To the best of our knowledge, such a performance

comparison has not been done on real time medical

images especially computer tomography images.

The rest of the paper is organized as follows:

Section 2 describes the Related Works. Section 3

discusses Materials and Methods. Section 4

presents experimental results and discussion.

Section 5 concludes the work in the paper.

2. Related Works

Archana Singh et al. implemented K-means

with different measures and found Euclidean

distance metric gives best result and Manhattan

distance metric’s performance is worst[2]. Vadivel

et al. have used Manhattan distance, Euclidean

distance, Vector Cosine Angle distance and Histogram Intersection distance for a number of

color histograms on a large database of images and

the experimental results shows that the Manhattan

distance performs better than the other distance

metrics for all the five types of histograms [22].

N.Selvarasu et al. proposed Euclidean distance

based color image segmentation algorithm for

abnormality Extraction in Thermographs [17].

Sourav Paul et al. integrated a self organizing map

with mahalanobis distance to determine the winner

unit. The distance between the input vector and the

weight vector has been determined by mahalanobis

distance and chooses the unit whose weight vector

has the smallest mahalanobis distance from the

input vector [19]. Hsiang-Chuan Liu et al. proposed

an improved Fuzzy C-Means algorithm based on a

standard Mahalanobis distance (FCM-SM)[10].O. A. Mohamed Jafar et al. made a comparative study

of K-Means and FCM algorithm with Chebyshev

distance, Chi-square distance measures and they

found FCM based on Chi-square distance measure

had better result than Chebyshev distance

measure[15]. Luh Yen et al. proposed a new

distance metric called the Euclidian Commute

Time (ECT) distance, based on a random walk

model on a graph derived from the data which

allows retrieving well-separated clusters of

arbitrary shapes [13]. Modh Jigar.S et al. Used

L*a*b color space and using cosine distance

matrices instead of sqeuclidean Distance with

clustering based K-means segmentation technique

[14].

3. Materials and Methods 3.1. Data set

Different kind of Tumour patient dataset were

collected by a SIEMENS SOMATOM EMOTION SPIRAL CT scanner located at Multi Speciality

Hospital, Coimbatore, Tamilnadu, India. Besides a

normal scan performed at a routine clinical dosage

(130 mA), an additional scan from the same patient

was acquired at a much lower tube current, i.e. 20

mA. The 3D image data consisted of DICOM

(Digital Imaging and Communications in

Medicine) consecutive slices, each slice being of

size 512 by 512 and having 16-bit grey level

resolution. Each of the organs of interest in this

research was manually contoured by the expert for

the comparison of auto segmented output with

manual contoured image.

3.2. Methods

A. An Overview of Distance Measures in

Clustering

Distance metric is a key issue in many

machine learning algorithm [19]. The distance

measure plays an important role in acquiring exact

clusters. It is used to discover the similarity and

dissimilarity between the pair of objects in the

clustering techniques. Clustering techniques are

based on measuring similarity and dissimilarity

between data objects by calculating the distance

between each pair. The choice of distance measure

between clusters has a large effect on the shape of

the resulting clusters. Euclidean distance is

generally used in many clustering techniques. In this work we consider Euclidean distance,

Manhattan Distance, Minkowski distance,

Chebyshev distance and Signature Quadratic form

Distance.

1) Euclidean distance This distance is most commonly used in all

applications especially used in clustering problems.

Euclidean distance computes the root of square

difference between co-ordinates of pair of objects.

Euclidean distance is calculated for every image

pixel from the average intensities. It is computed in

medical image segmentation as:

n

i

iiyx yxd1

2

,

2) Manhattan Distance Manhattan distance is also called city block

distance. It represents distance between points in a

city road grid. Manhattan distance computes the

absolute differences between coordinates of pair of

objects.

V V Gomathi et al, Int.J.Computer Technology & Applications,Vol 5 (2),400-405

IJCTA | March-April 2014 Available [email protected]

401

ISSN:2229-6093

Page 3: Performance Analysis of Distance Measures for Computer ...Archana Singh et al. implemented K-means with different measures and found Euclidean ... Minkowski Distance is the generalized

n

i

iiyx yxd1

,

3) Minkowski Distance

Minkowski Distance is the generalized distance

metric which is a generalization of the distance

between points in Euclidean space. This is defined

as:

pn

i

p

iiyx yxd

1

1

,

4) Chebyshev distance

Chebyshev distance is also called Maximum value distance or chessboard distance. It computes the

absolute magnitude of the differences between

clustering variable values. It is calculated by the

following formula:

ii

ni

yx yxd

max....2,1

,

5) Signature Quadratic form distance

Signature Quadratic form distance is a

generalization of the Quadratic form distance. It

(SQFD) [4] is an adaptive distance-based similarity

measure. Signature Quadratic Form Distance

measure which allows efficient similarity

computations based on flexible feature

representations. This approach bridges the gap

between the well-known concept of Quadratic

Form Distances and feature signatures. The

Signature Quadratic Form Distance (SQFD) is a

recently introduced distance measure for content-

based similarity. It makes use of feature signatures,

a flexible way to summarize the features of a

multimedia object. The SQFD is a way to measure

the similarity between two objects. Signature Quadratic Form Distance showing good

retrieval performance for various multimedia

databases [5]. The SQFD works on feature

signatures consisting of sets of points, where each

point has a weight and a set of coordinates. If the

points are generated by clustering, they are also

called weighted centroids.

Calculate the similarity matrix A for P and Q with

the similarity function f, by using the following

formula

Similarity Matrix Value f (Q, P) =

221/1 yyxx PQPQ

Where

P - Intensity Vector Pixel Values

(signatures)

Q - Input Image Pixel (signatures)

Qx and Qy – X and Y position

Px and Py – X and Y position

Signature Quadratic form distance [3] [4] is defined

as

SQFDA (Q, P) = T

PQAPQ |**|

Here we have changed the parameter of Signature

Quadratic Form Distance for medical image

segmentation.

B. Major Algorithm for Organ

Segmentation using Distance measures

Various distance measures have been evaluated for

calculating the minimum distance for our

application (Computer tomography Image

Segmentation) as follows:

2

kM istanceEuclideanD 1) ijk YX

)min(MZ ij

ijY kk X M DistanceManhattan 2)

)min(MZ ij

2

kM Distance Minkowski 3) ijk YX

)min(MZ ij

ijkk YmaxM Distance Chebyshev 4) X

Step 1: Read Radiotherapy Structure Set

(RTSS) file

Step 2: Extract manual segmented contour

Data from RTSS

Step 3: Construct Manual segmented

organs from extracted contour data

Step 4: Enhance the contrast of the input

image using dicom contrast

Step 5: Clone the input image as output

image

Step 6: Initialize the cluster step value

Step 7: Generate cluster elements for cluster vector based on the cluster step.

Step 8: Calculate the distance between all

cluster elements and every pixel value

individually.

Step 9: Replace the pixel value in the

output image with the best cluster element

by evaluating the minimum distance.

V V Gomathi et al, Int.J.Computer Technology & Applications,Vol 5 (2),400-405

IJCTA | March-April 2014 Available [email protected]

402

ISSN:2229-6093

Page 4: Performance Analysis of Distance Measures for Computer ...Archana Singh et al. implemented K-means with different measures and found Euclidean ... Minkowski Distance is the generalized

)min(MZ ij

Where

X - Intensity vector

Xk – kth

Value of Intensity vector

Yij – Input Image Pixel

M - Output vector

Mk - kth

Value of output vector

Zij - Output image pixel

5) Signature Quadratic form distance has been

calculated by the following steps:

a) Extract the feature signatures from intensity

vector and input image pixel i.e. P= Xk , Q= Yij

Where

X - Intensity vector

Xk – kth

Value of Intensity vector

Yij – Input Image Pixel

P, Q - Feature Signatures

b) Calculate the Similarity Matrix =

221/1 nnmm PQPQA

Where

A-Similarity Matrix

Qm and Qn – X and Y position of Q

elements respectively

Pm and Pn – X and Y position of P

elements respectively

c) Signature Quadratic form distance is defined as

SQFD Mk = T

kijkij XYAXY |**|

Where

A-Similarity Matrix

X - Intensity vector

Xk – kth

Value of Intensity vector

Yij – Input Image Pixel

M - Output vector

Mk - kth

Value of output vector

d) )min(MZ ij

Where

M – Output Vector

Zij - Output image pixel

Initially, Radiotherapy Structure Set (RTSS) file is

read from set of DICOM Images. Already all the

necessary organs are contoured by the medical

expert. Generate the manual segmented organs

from extracted contoured data. The contrast of the

input image is enhanced by using dicom contrast

technique. In this algorithm, the initial cluster step

values have chosen either by manually or

randomly. Cluster elements have been generated

for cluster vector based on the cluster step. Then

Calculate the distance between all cluster elements

and every pixel value individually. Replace the

pixel value with the best cluster element by

evaluating the minimum distance. Different

distance methods are used to calculate the distance

between cluster elements and every pixel.

4. Experimental Results and Discussion

Experimentation was carried out on 100 numbers

of different tumour patients contains 100 to 1000

slices of Computer Tomography images using

different Segmentation algorithms. The image

format is DICOM (Digital Imaging

Communications in Medicine). The algorithm has been implemented in Matlab environment. Manual

Segmentation done by the experts. Experimental

results of the images are illustrated here.

able 1. Performance analysis of Euclidean Distance, Manhattan Distance, Minkowski

Figure.1. (a). Input (b) Euclidean CT Image Segmentation

(c) Manhattan (d) Minkowski Segmentation Segmentation

(e) Chebychev (f) SQFD Segmentation Segmentation

Figure.1. Segmentation output using different

distance measures

V V Gomathi et al, Int.J.Computer Technology & Applications,Vol 5 (2),400-405

IJCTA | March-April 2014 Available [email protected]

403

ISSN:2229-6093

Page 5: Performance Analysis of Distance Measures for Computer ...Archana Singh et al. implemented K-means with different measures and found Euclidean ... Minkowski Distance is the generalized

Table 1. Performance analysis of Euclidean Distance, Manhattan Distance, Minkowski Distance, Chebychev Distance, Signature Quadratic form Distance Segmentation algorithms

The main objective of this paper is to study the

performance of different distance metrics. We

carried out the experiments by applying all distance

measures .The algorithm was applied and segmented on computer tomography images. Table

1 and Figure. 1 shows the results of all

experiments. In our experimental evaluation we

figured out that the sensitivity, specificity, accuracy

and number of fragments are similar for all

distance measures for all organs. It also shows that

there are no best distance measures for segmenting

the image.

5. Conclusion

Selection of distance metrics plays a very

important role in clustering and also it is a very

challenging task. The main aim of our study is to

determine the good distance metrics for clustering

the images. Traditionally Euclidean distance is

used in clustering algorithms. In this paper we have

implemented Euclidean, Manhattan, Minkowski,

Chebychev, Signature Quadratic form distance

measures on real time data set of Computer

tomography images. The results are exactly similar

and the segmentation output is also same for all 5

different metrics. The result can be varied based on the task, number of data and complexity of the task.

There is no universal distance measure which can

be best suited for all clustering applications. But in

our observation also, none of the metrics is a best

metric for medical image segmentation. The

researcher can be use any distance measure based

on their application with respect to clustering.

Acknowledgement

The authors like to thank Dr.M.Hemalatha,

Professor, Department of Computer Science,

Karpagam University for her valuable suggestions,

comments and words of encouragement. It helped

us to make this research work successful.

References

[1] Andras Hajdu, Janos Kormos, Benedek Nagy,

and Zoltan Zorgo, “Choosing appropriate

distance measurement in digital image

segmentation”, 2004.

[2] Archana Singh, Avantika Yadav, Ajay Rana,

“K-means with Three different Distance

Metrics”, International Journal of Computer

Applications, Volume 67, No.10, April 2013.

[3] Beecks C, Uysal M.S, Seidl.T, “Signature

Quadratic Form Distances for Content-Based Similarity,” in Proceeding of ACM

International Conference on multimedia,

2009, pp. 697–700.

[4] Beecks.C, Uysal M.S, Seidl.T , “Signature

Quadratic Form Distances for Content-based

Similarity”, ACM CVIR 2010.

[5] Beecks.C, Uysal M.S, Seidl.T, “A

Comparative Study of Similarity Measures for

Content-Based Multimedia Retrieval”,

International Proceeding. IEEE International

Conference on Multimedia & Expo, pages

1552–1557, 2010.

[6] Christian Beecks, Anca Maria Ivanescu,

Steffen Kirchhoff and Thomas Seidl ,”

Modeling Image Similarity by Gaussian

Mixture Models and the Signature Quadratic

Distance

Measures Organs

Quantitative Parameters

Sensitivity Specificity Accuracy No of

Fragments

Euclidean Distance

Manhattan Distance

Minkowski Distance

Chebychev Distance

Signature Quadratic

form distance

Lung 93.04 99.93 89.68 9

Liver 97.30 99.31 99.25 39

Heart 88.45 99.93 99.67 15

Spinal Cord 98.12 99.33 99.33 11

Bones 67.32 99.98 99.92 3

V V Gomathi et al, Int.J.Computer Technology & Applications,Vol 5 (2),400-405

IJCTA | March-April 2014 Available [email protected]

404

ISSN:2229-6093

Page 6: Performance Analysis of Distance Measures for Computer ...Archana Singh et al. implemented K-means with different measures and found Euclidean ... Minkowski Distance is the generalized

Form Distance”, IEEE International

Conference on Computer Vision, 2011.

[7] Christian Beecks, Jakub Lokoc, Thomas Seidl,

Tomas Skopal,” Indexing the Signature

Quadratic Form Distance for Efficient

Content-Based Multimedia Retrieval”, ICMR

’11, April 17-20, 2011.

[8] Daniel P.Huttenlocher, Gregory

A.Glanderman, and William J.Rucklidge, “Comparing the Images using the Housdorff

distance”, IEEE Transactions on Pattern

Analysis and Machine Intelligence, Vol.15,

No.9, September 1993.

[9] Gomathi V. V., Karthikeyan.S .A Proposed

Hybrid Medoid Shift with K-Means (HMSK)

Segmentation Algorithm to Detect Tumor and

Organs for Effective Radiotherapy.

International Conference on Mining

Intelligence and Knowledge Exploration,

Lecture Notes in Computer Science(Springer)

2013;Dec 18-20; 8284, pp.139-147.

[10] Hsiang-Chuan Liu, Bai-Cheng Jeng, Jeng-

Ming Yih, and Yen-Kuei Yu, “Fuzzy C-Means

Algorithm Based on Standard Mahalanobis

Distances”, Proceedings of the International

Symposium on Information Processing (ISIP’09), August 21-23, 2009, pp. 422-427.

[11] Linda G. Shapiro and George C.

Stockman,”Computer Vision”, New Jersey,

Prentice-Hall, ISBN 0-13-030796-3, pp.279-

325.

[12] Liwei Wang, Yan Zhang, Jufu Feng, “On the

Euclidean Distance of Images”, IEEE

Transaction on Pattern Analysis and Machine

Intelligence, Vol. 27, No.8, Aug.2005, pp.

1334-1339.

[13] Luh Yen, Denis Vanvyve, Fabien Wouters,

Francois Fouss, “Clustering using a random

walk based distance measure”, European

Symposium on Artificial Neural Networks

Bruges, ISBN 2-930307-05-6, April 2005.

[14] Modh Jigar S, Shah Brijesh, Shah Satish k ,” A

New K-mean Color Image Segmentation with Cosine Distance for Satellite Images”,

International Journal of Engineering and

Advanced Technology (IJEAT),ISSN: 2249 –

8958, Volume-1, Issue-5, June 2012.

[15] Mohamed Jafar O.A., Sivakumar.R, “A

Comparative Study of Hard and Fuzzy Data

Clustering Algorithms with Cluster Validity

Indices”, Proceedings of International

conference on Emerging research in

Computing, Information, Communication and

application, Elsevier Publication, 2013.

[16] Peter Grabusts, “The Choice Of Metrics For

Clustering Algorithms “, Proceedings of the

8th International Scientific and Practical

Conference. Volume 2, 2011.

[17] Selvarasu.N, Alamelu Nachiappan and

Nandhitha N.M,” Euclidean Distance Based

Color Image Segmentation of Abnormality

Detection from Pseudo Color Thermographs”,

International Journal of Computer Theory and

Engineering, Vol. 2, No. 4, August, 2010.

[18] Soni Madhulatha.T,”An Overview on

Clustering Methods”, IOSR Journal of

Engineering, Apr. 2012, Vol. 2(4), pp. 719-

725.

[19] Sourav Paul, Mousumi Gupta,” Image Segmentation by Self Organizing Map with

Mahalanobis Distance”, International Journal

of Emerging Technology and Advanced

Engineering, Volume 3, Issue 2, February

2013.

[20] Sung-Hyuk Cha,” Comprehensive Survey on

Distance/Similarity Measures between

Probability Density Functions”, International

Journal Of Mathematical Models and Methods

In Applied Sciences, Issue 4, Volume 1, 2007.

[21] Tsai.A, Wells.W, Tempany.C, Grimson.E,

Willsky.A,”Mutual information in coupled

multi-shape model for medical image

segmentation”, Medical Image Analysis

(Elsevier), 2004, pp. 429–445.

[22] Vadivel. A, Majumdar A.K, Shamik Sural,

"Performance comparison of distance metrics in content-based Image retrieval applications”.

V V Gomathi et al, Int.J.Computer Technology & Applications,Vol 5 (2),400-405

IJCTA | March-April 2014 Available [email protected]

405

ISSN:2229-6093