K-Means Clustering and Affinity Clustering based on Heterogeneous Transfer Learning

Embed Size (px)

Citation preview

  • 7/29/2019 K-Means Clustering and Affinity Clustering based on Heterogeneous Transfer Learning

    1/7

    K-Means Clustering and Affinity Clusteringbased on Heterogeneous Transfer Learning

    Shailendra Kumar Shrivastava, Dr. J. L. Rana, and Dr. R.C. Jain

    Abstract - Heterogeneous Transfer Learning aims to extract the knowledge form one or more tasks from same feature space

    and applies this knowledge to target task of another features space. In this paper two clustering algorithms K-means clustering

    and Affinity clustering both based on Heterogeneous Transfer Learning (HTL) have been proposed. In both the algorithms anno-

    tated image datasets are used. K-means based on HTL first finds the cluster centroid of Text (annotations) by K-Means. In the

    next step these centroids of Text are used to initialize the centroids in image clustering by K-means. Second algorithm, Affinity

    clustering based on HTL first finds the exemplar of annotations and then these exemplar of annotations are used to initialize the

    similarity matrix of image datasets to find the clusters. F-Measure Scores and Purity scores increase and Entropy Scores de-

    creases in both the algorithms. Clustering accuracy of affinity based on HTL is better than K-Means based on HTL.

    Key words- Heterogeneous Transfer learning, clustering, affinity propagation, K-Means, feature space.

    1 INTRODUCTIONn the Literature[1] Machine Learning is defined as:A computer program is said to learn from experi-ence E with respect to some class of tasks T and per-

    formance measure P, if its performance at tasks in T, asmeasured by P, improves with experience E.

    However many machine learning methods workwell only under assumption, that the training data andtesting data are drawn from same feature space. If fea-ture space is different in training and testing data,most statistical models will not work. In this case oneneeds to recollect the training and testing data in samefeatures space and rebuild the model. But this is ex-

    pensive and difficult. In such cases transfer learning [3]between task domains is desirable. Transfer learningallows the domain tasks, and distribution used intraining and testing to be different. In heterogeneoustransfer learning, the knowledge is transferred acrossthe domains or tasks that have different features spacee.g. classifying the web pages in Chinese using thetraining document in English [4]. Probabilistic latentsemantic analysis (PLSA) [5] was used in clusteringimages by using the annotation (Text). Transfers learn-ing in Machine Learning technologies [2] have alreadyachieved significant success in many knowledge engi-neering areas including classification, regression and

    clustering.Clustering is a fundamental task in computerizeddata analysis. It is concerned with the problem of parti-tioning a collection of data points intogroups/categories using unsupervised learning tech-

    niques. Data points in groups are similar. Such groupsare called clusters [6][7][8]

    In this paper two algorithm, K-Means [8][9] basedon Heterogeneous Transfer Learning and Affinity clus-tering based on transfer learning are proposed. Affini-ty propagation [6] is a clustering algorithm which forgiven set of similarities (also denoted by affinities) be-tween pairs of data points, partitions the data by pass-ing the messages among the data points. Each partitionis associated with a prototypical point that best de-scribes that cluster. AP associates each data point withone such prototype. Thus, the objective of AP is to

    maximize the overall sum of similarities between datapoints and their representatives. In K-Means startswith random initial partitions and keeps reassigningthe patterns to clusters based on similarity betweenpattern and centroids until a convergence criterion ismet.

    In annotated image dataset has two features space.First one is text another one is image feature space. InK-means Text data (annotations) is used to find theclusters by K-Means. In order, to transfer knowledgeof text features space into image feature space firstfinds the centroid of annotations by K-Means. Nowcorresponding to Text (annotations) centroids, images

    centroids become available. Next we take completeimage data sets and assign it to the centroid on thebasis of minimum Euclidean distance and finally applyK-Means to generate the image clusters. In Affinityclustering based on HTL we use Text (annotations) ofimages to find exemplars by affinity propagation clus-tering. For transferring the knowledge form text fea-tures space to image feature space, we initialize theimage similarity matrix diagonal by exemplar of textclustering then generate the images clusters of imagesimilarity matrix by affinity propagation clustering.

    The remainder of this paper is organized as follows.Section 2 gives a brief over view of Transfer Learning,

    Shailendra Kulmar Shrivastava is with the Department of InformationTechnology, Samrat Ashok Technological Institute, Vidisha, M.P.464001,India

    Dr.J.L.Rana,Ex Head of Department of Computer Sc. & Engineering waswith the M.A.N.I.T., Bhopal, India

    Dr. R.C.Jain,Director , is with the Samrat Ashok Technological Institute,Vidisha, M.P.464001, India

    I

    JOURNAL OF COMPUTING, VOLUME 5, ISSUE 1, JANUARY 2013, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 31

    2013 Journal of Computing Press, NY, USA, ISSN 2151-9617

  • 7/29/2019 K-Means Clustering and Affinity Clustering based on Heterogeneous Transfer Learning

    2/7

    original Affinity Propagation algorithm and Vectorspace model. Section 3 describes the main idea anddetails of our proposed algorithms. Section 4 discussesthe experimental results and evaluations. Section 5provides the concluding remarks and future directions.

    2 RELATED WORKS

    Before going into details of our proposed K-Meansbased on Heterogeneous Transfer learning and Af-finity Clustering Based on Heterogeneous TransferLearning algorithms, some works that are closely re-lated to this paper are briefly reviewed. TransfersLearning, K-Means clustering algorithm, affinity prop-agation algorithm and vector space model will be dis-cussed.

    2.1 Transfer Learning

    Machine Learning methods work well only undercommon assumption, the training and test data fromsame features space and same distribution. When dis-

    tributions changes, most statistical models need to berebuilt from scratch using newly collected trainingdata. In many real world applications, it is expensiveor impossible to re-collect the needed training data andrebuild the model. It would be nice to reduce the needand effort to re-collect training data. In such casesKnowledge transfer or Transfer Learning [3] betweentasks domain would be desirable. Transfer learninghas following three main research issues (1) What totransfer (2) How to transfer (3) When to transfer. Inthe inductive transfer learning setting, the target task isdifferent form source task, no matter source and targetdomain is the same or not. In the transductive transfer

    learning, the source and target tasks are same, whilesource and target domain are different. In the unsu-pervised transfer learning setting, similar to inductivetransfer learning setting, target task is different butrelated to source tasks. In the heterogeneous transferlearning, transfer the knowledge across domain or taskthat has different feature space.

    2.2 K-Means Clustering Algorithm

    K-Means[8][9] algorithm is one of the best known andmost popular clustering algorithms K-Means seeksoptimal partition of the data by minimizing the sum ofsquared error criterion, with an iterative optimizationprocedure. The K-Mean clustering procedure is as fol-

    lowing.1. Initialize a K - partition randomly or based on

    some prior knowledge. Calculate the clusterprototype matrix M = [ m 1 , , m K ]

    2. Assign each object in the data set to the near-est cluster C

    3. Recalculate the cluster prototype matrix basedon the current partition,

    4. (1)5. Repeat steps 2 and 3 until there is no change

    for each cluster.Major Problem with this algorithm is that it is sensi-

    tive to selection of initial partition.

    2.3 Affinity Clustering Algorithm

    Affinity clustering algorithm [10][11][12] is based onmessage passing among data points. Each data pointreceives the availability from others data points (fromexemplar) and sends the responsibility message to oth-ers data points (to exemplar). Sum of responsibilities

    and availabilities for data points identify the exem-plars. After the identification of exemplar the datapoints are assigned to exemplar to form the clusters.Following are the steps of affinity clustering algo-rithms.

    1. Initialize the availabilities to zero , 02. Update the responsibilities by following

    equation. , ,

    max , , (2)Where , is the similarity of data point iand exemplar k.

    3. Update the availabilities by following equa-

    tion , 0, , 0, , .., (3)Update self-availability by following equa-tion

    , max0, , (4)4. Compute sum = , , for data

    point i and find the value of k that maximizethe sum to identify the exemplars.

    5. If Exemplars do not change for fixed numberof iterations go to step (6) else go to Step (1)

    6. Assign the data points to Exemplars on the

    basis of maximum similarity to find clusters.

    2.4 Vector Space Model

    Vector space model [13] uses to represent the text doc-uments. In VSD each document d is considered as avector in the M-dimensional term (word) space. In thealgorithm the tf-idf weighing scheme is used. In VSDmodel each document represented by following equa-tion. 1, , 2, , . , (5)

    Where, N is the number of terms (words) in thedocument. And

    , 1 log , log1 / (6)Where , frequency of i

    th

    term in the docu-ment d and df(i) is the number of document containing

    i t h term . Inverse document frequency (idf) is de-

    fined as the logarithm of the ratio of number of docu-

    ments (N) to the number of document containing the

    given word (df).

    3 CLUSTERING BASED ON HETEROGENOUSTRANSFER LEARNING

    In this section, two algorithm of clustering based onheterogeneous transfer learning are proposed. First is

    JOURNAL OF COMPUTING, VOLUME 5, ISSUE 1, JANUARY 2013, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 32

    2013 Journal of Computing Press, NY, USA, ISSN 2151-9617

  • 7/29/2019 K-Means Clustering and Affinity Clustering based on Heterogeneous Transfer Learning

    3/7

    K-Means clustering based on heterogeneous TransferLearning and second is Affinity Propagation Cluster-ing Based on Heterogeneous Transfer Learning.

    3.1 K-Means Clustering based onHeterogeneous Transfer Learning

    K-Means clustering based on heterogeneous transferlearning extends the K-Means for clustering. Annotat-

    ed image data set has been used in simulation studies.In this annotations (Text features space) and images(image features space) are computed. K-Means cluster-ing is applied to text (annotation) of images to find thecentroid. In order to transfer knowledge from one taskto another task, first step is to initialize the centroid inimage clustering by the centroid obtained in Text clus-tering. For text clustering phrase base VSD [13] isused. In vector space model w (d, i), term frequencyand document frequency are calculated on the basis ofterm. Term in vector space model is word. But thephrase instead of word will be used. This can be calledVector space model bases on the phrase. Phrase (Term)

    frequency and document frequency can be calculatedby suffix tree. Here document frequency is a number,that a document contains the phrase. Generate the cen-troid of annotations by K-Means algorithm giving in-put VSD. Now apply the K-Means clustering to imagedata sets by initializing the centroid by the centroidobtained in text clustering. Proposed K-means cluster-ing algorithm based on heterogeneous transfer learn-ing can be written as following.

    1. Input Annotations(Text) for Clustering2. Text preprocessing.

    Removing all stop words.

    Words steaming are done.3. Find the words and assign the unique numberto each word.

    4. Convert text into sequence of number.5. Suffix tree construction using Ukkonen algo-

    rithm.6. Calculate the Phrase (term) frequency from

    suffix tree.7. Calculate the document frequency of phrase

    from suffix tree.8. Construct the Vector space model of text using

    phrase.9. Apply k-means on VSD.10. Initialize the centroid in image domain by cen-

    troid obtained from text clustering.11. Apply K-means in Image data sets to find

    clusters.

    3.2 Affinity Clustering based on HeterogeneousTransfer Learning

    Affinity clustering based on heterogeneous transferlearning extends the affinity propagation clustering.Annotated image data set is used. In these annotations(Text features space) and images (image featuresspace) form the starting point. Affinity clustering isapplied to annotation (text) of images to find the ex-

    emplar. In order to transfer knowledge from one taskto another task, diagonal values of similarity matrix ofimage data sets are assigned on the bases of exemplarof text clustering. For text clustering phrase base VSDis used. In vector space model w (d, i), term frequencyand document frequency is calculated on the basis ofterm. Term in vector space model is word. But thephrase is used instead of word. This can be called Vec-tor space model bases on the phrase. Phrase (Term)frequency and document frequency can be calculatedby suffix tree. Here document frequency is a numberdocument contains the phrase. VSD model based onthe phrase is used to compute the cosine similarity [1].Similarity of two document di and dj is calculated byequation (7) and document can be represented byequation (4).

    , ||

    (7)

    Self-similarity/Preference [9] is finding from by equa-tion (8).

    simd, d ,,,- 1 k N (8)Affinity propagation algorithm for clustering is ap-

    plied to generate the exemplar. Extract the features ofimage data sets to make the features vector space ofimage data set. Next finds the similarity matrix fromimage vectors. Assign the diagonal value of similaritymatrix of image domain on the bases of exemplar ofText clustering, which transfer the knowledge fromone domain to other domain. Generate the exem-plars/clusters by affinity propagation clustering algo-rithm. Proposed algorithm can be written as follow-ing.

    1. Input Annotations(Text) for Clustering2. Text preprocessing.

    Removing all stop words.Words steaming are done.

    3. Find the words and assign the unique numberto each word.

    4. Convert text into sequence of number.5. Suffix tree construction using Ukkonen algo-

    rithm.6. Calculate the Phrase (term) frequency from

    suffix tree.7. Calculate the document frequency of phrase

    from suffix tree.8. Construct the Vector space model of text using

    phrase.9. Find the Phrase based similarity matrix of

    documents from vector space model by equa-tion 7.

    10. Preference in similarity matrix is assigned byequation 8.

    11. Initialize the availabilities to zero ai, k 012. Update the responsibilities by equations (2).13. Update the availabilities by equation (3).14. Update self-availability by equation (4).15. Compute sum = ai, k ri, k for data point

    i and find the value of k that maximize thesum to identify the exemplars.

    JOURNAL OF COMPUTING, VOLUME 5, ISSUE 1, JANUARY 2013, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 33

    2013 Journal of Computing Press, NY, USA, ISSN 2151-9617

  • 7/29/2019 K-Means Clustering and Affinity Clustering based on Heterogeneous Transfer Learning

    4/7

    16. If Exemplars do not change for fixed numberof iterations go to step (12) else go to Step (17)

    17. Extract feature vector from image data sets.18. Find the similarity matrix from image feature

    vector.19. Transfer the knowledge from text feature

    space to image feature space.20. Initialize the availabilities of to zero

    , 021. Update the responsibilities by equations 1.22. Update the availabilities by equation 2.23. Update self-availability by equation 3.24. Compute sum = , , for data

    point i and find the value ofk that maximizethe sum to identify the exemplars.

    25. If Exemplars do not change for fixed numberof iterations go to step (21) else go to Step (26)

    26. Assign the data points to Exemplars on thebasis of maximum similarity to find clusters.

    4 EXPERIMENTAL RESULTS AND EVALUATIONIn this Section, results and evaluation of set of experi-ments are presented to verify the effectiveness andefficiency of our proposed algorithm for clustering.Evaluations parameters are F-Measures, Purity andEntropy. Experiments have been performed on datasets constructed from Corpus Caltech 256[14]. We willdiscuss Evaluation parameter, Datasets and results.

    4.1 Evaluations Parameters [15]

    For ready reference definition and formulas of F-Measure, Purity and Entropy are given below.

    4.1.2 F-measure

    F-Measure combines the Precision and Recall. LetC={ be clusters of data set D of N doc-uments ,and let , represents the cor-rect class of set D. Then the Recall of Cluster j withrespect to Class i is defined as

    Recall(i , j )= Then the Precision of Cluster j with respect to Class iis defined as

    Precision (i , j )= F-Measures of cluster and class is the combina-tions of Precision and Recall in following manner.

    , 2 , , , , F-Measure for overall quality of cluster set C is definedby the following equation

    F |C|

    N

    max.. Fi, j

    4.1.3 Purity

    Purity indicates the percentage of the dominant classmember in the given cluster. For measuring the overallclustering purity weighted average purity is used. Pu-rity is given by following equation.

    max.. ,

    4.1.4 Entropy

    Entropy tell us the homogeneity of cluster. Higherhomogeneity of cluster entropy should be low and viceversa. Like weighted F-Measure and weighted Purityweighted entropy is used which is given by followingequation

    1log

    log

    Where is the probability that a member of clus-ter belongs to class .

    To sum up, we would like to maximize the F-

    sMeasure and Purity scores and minimize the entropyscore of cluster to achieve high quality clustering.

    4.2 Data Set Preparation

    Image data sets of 100,300,500 and 800 Images havebeen consturcted. Images are randomly chosen fromCaltech-256. Manually annotated (text) files are createdfor each datasets.

    4.3 Experimental Results Discussion

    Extensive experiments are carried out to show the ef-fectiveness of proposed algorithms. Annotations andimages have been combined. Experiments are perform-

    ing on following combinations annotations and imag-es. Without annotations ,100 annotations and 100 im-ages ,100 annotations and 300 images ,100 annotations500images ,100 images 800 annotations ,300 annota-tions 300 images ,300 annotations 500images, 300 anno-tations and 800 images ,500 annotations and 500 imag-es ,500 annotations and 800 images. Results of experi-ments are given in Table 1, Table 2, and Table 3. It canbe observed from fig 1, fig 2, fig 3, fig 4, fig 5 and fig 6that in both algorithms the entropy scores, purityscores and entropy scores vary with number of annota-tions and number of images. In both algorithms entro-py scores and purity scores are maximum and entropyscores is minimum on the optimum number of annota-tions. For comparison of K-means clustering based onHTL and Affinity clustering based on HTL are plotted.From fig 7, fig 8 and fig 9 it is observed that F-Measurescores and Purity scores are larger and Entropy scoresis smaller in Affinity Clustering Based on HTL.

    JOURNAL OF COMPUTING, VOLUME 5, ISSUE 1, JANUARY 2013, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 34

    2013 Journal of Computing Press, NY, USA, ISSN 2151-9617

  • 7/29/2019 K-Means Clustering and Affinity Clustering based on Heterogeneous Transfer Learning

    5/7

    Fig 1: Variation of F-Measure Scores with Annota-tions(Text) in K-Means based heterogeneous transfer

    learning clustering

    Fig 2: Variation of Purity Scores with Annota-tions(Text) in K-Means based heterogeneous transfer

    learning clustering

    Fig 3: Variation of Entropy Scores with Annotations(Text) in K-Means based heterogeneous transfer learn-

    ing clustering

    Fig 4: Variation of F-Measure Scores with Annota-tions(Text) in Affinity clustering based heterogeneous

    transfer learning

    Fig 5: Variation of Purity Scores with Annota-tions(Text) in Affinity clustering based heterogeneous

    transfer learning

    Fig 6: Variation of Entropy Scores withAnnotations(Text) in Affinity clustering based

    heterogeneous transfer learning

    JOURNAL OF COMPUTING, VOLUME 5, ISSUE 1, JANUARY 2013, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 35

    2013 Journal of Computing Press, NY, USA, ISSN 2151-9617

  • 7/29/2019 K-Means Clustering and Affinity Clustering based on Heterogeneous Transfer Learning

    6/7

    Numberof An-

    notation

    No. ofImagesin Data

    sets

    F-MeasureAP Basedon HTL

    F-MeasureK-MeansBased on

    HTL

    0 100 0.30711 0.26873

    100 100 0.43563 0.35208

    0 300 0.25227 0.24254

    100 300 0.42308 0.35364300 300 0.24565 0.10109

    0 500 0.18273 0.18823

    100 500 0.41944 0.30234

    300 500 0.28443 0.19764

    500 500 0.19175 0.18912

    0 800 0.18928 0.18064

    100 800 0.40586 0.32492

    300 800 0.35365 0.26969

    500 800 0.16184 0.12764

    Table 1: Comparison of F-Measure Scores

    Numberof An-

    notation

    No. ofImagesin Data

    sets

    PurityAP Basedon HTL

    PurityK-MeansBased on

    HTL

    0 100 0.3700 0.2900

    100 100 0.4800 0.3600

    0 300 0.2800 0.2000

    100 300 0.3907 0.2966

    300 300 0.2700 0.2015

    0 500 0.1980 0.1680

    100 500 0.3362 0.2480

    300 500 0.2000 0.1175

    500 500 0.1287 0.1060

    0 800 0.1900 0.1062100 800 0.2875 0.2537

    300 800 0.2025 0.1200

    500 800 0.1912 0.1175

    Table 2: Comparison of Purity Scores

    Numberof Anno-

    tation

    No. ofImag-es inDatasets

    EntropyAP Basedon HTL

    EntropyK-MeansBased on

    HTL

    0 100 0.75162 0.85679

    100 100 0.60888 0.70140

    0 300 0.80327 0.89764100 300 0.68969 0.79095

    300 300 0.78225 0.80882

    0 500 0.80658 0.93903

    100 500 0.69742 0.83917

    300 500 0.77842 0.88506

    500 500 0.79886 0.93907

    0 800 0.87716 0.95362

    100 800 0.74227 0.78226

    300 800 0.78725 0.88942

    500 800 0.86091 0.97506

    Table 3: Comparison of Purity Scores

    Fig 7: Comparison of F-Measure Scores withAnnotations(Text) in K-Means clustering based HTL

    and AP Based on HTL(Number of Images in Data sets800)

    Fig 8: Comparison of Purity Scores withAnnotations(Text) in K-Means clustering based HTL

    and AP Based on HTL(Number of Images in Data sets800)

    Fig 9: Comparison of Entropy Scores withAnnotations(Text) in K-Means clustering based HTL

    and AP Based on HTL(Number of Images in Data sets800)

    JOURNAL OF COMPUTING, VOLUME 5, ISSUE 1, JANUARY 2013, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 36

    2013 Journal of Computing Press, NY, USA, ISSN 2151-9617

  • 7/29/2019 K-Means Clustering and Affinity Clustering based on Heterogeneous Transfer Learning

    7/7

    5 CONCLUDING REMARKS AND FUTUREDIRECTIONS

    In this paper two algorithms for clustering. K-MeansClustering based on HTL and Affinity Clustering

    based on HTL have been proposed. Clustering Accura-cy of K-Means based on HTL is better than K-Meanswhereas Affinity Clustering based on HTL gives farbetter clustering accuracy than simple Affinity Propa-gation Clustering. It is also concluded that the cluster-ing accuracy of Affinity based on HTL is much betterthan the K-Means Based on HTL. Extensive experi-ments on many datasets show that the proposed Affin-ity based on HTL produces better clustering accuracywith less computational complexity.

    There are a number of interesting potential avenuesfor future research. Affinity Clustering based on HTLcan be made hierarchical. Results of FAPML can be

    improved by designing it on the basis of HTL. Bothalgorithms can be applied to information retrieval.

    REFERENCES

    [1] Tom M. Mitchell, Machine Learning ,McGraw-Hill , 1997pp1- 414

    [2] EthemAlpaydin , Introduction to Machine Learning ,PrenticeHall of India Private Limited New Dehli,2006,pp133-150.

    [3] Sinno Jialin Pan and Qiang Yang ,A Survey on Transfer Learn-ing , IEEE Transactions on Knowledge and Data Engineering

    (IEEE TKDE) Volume 22, No. 10, October 2010 ,pp 1345-1359,

    [4] X. Ling, G.-R. Xue, W. Dai, Y. Jiang, Q. Yang, and Y. Yu, CanChinese web pages be classified with english data source?,

    Proceedings of the 17th International Conference on World Wide

    Web, Beijing, China ,ACM, April 2008, pp. 969978.

    [5] Qiang Yang, Yuqiang Chen, Gui-Rong, Xue,Wenyuan Dai ,Yong,Heterogeneous Transfer Learning for Image Clustering via theSocial Web,ACL-IJCNLP 2009 ,pp 1-9.

    [6] RuiXu Donald C. Winch, Clustering , IEEE Press 2009 ,pp 1-282

    [7] Jain, A. and DubesR. Algorithms for Clustering Data , Eng-lewood Cliffs, NJ Prentice Hall, 1988.

    [8] A.K. Jain, M.N. Murthy and P.J. Flynn, Data Clustering: A Re-view , ACM Computing Surveys, Vol.31. No 3, September 1999,pp 264-322.

    [9] RuiXu, and Donald Wunsch, Survey of Clustering. Algorithms, IEEE Transactions on Neural Network, Vol 16, No. 3, 2005 pp645.

    [10] Frey, B.J. and DueckD. Clustering by Passing Messages Be-tween Data Points , Science 2007, pp 972976.

    [11] Kaijun Wang, Junying Zhang, Dan Li, Xinna Zhangand Tao Guo,Adaptive Affinity Propagation Cluster-

    ing,ActaAutomaticaSinica, 2007 ,1242-1246.[12] Inmar E. Givoni and Brendan J. Frey,A Binary Variable Model

    for AffinityPropagation,Journal Neural Computation,Volume 21Issue 6, June 2009,pp1589-1600.

    [13] Salton G., Wong A., and Yang C. S., 1975, A Vector SpaceModel for Automatic Indexing, Comm. ACM, vol. 18, no. 11, pp.613-620.

    [14] http://www.vision.caltech.edu/Image_Datasets/Caltech256/

    [15] Chim H. and Deng X., 2008 Efficient Phrase Based DocumentSimilarity for Clustering, IEEE Trans. Knowledge and Data En-

    gineering, vol. 20, No.9.

    Shailendra Kumar Shrivastava, B.E.(C.T.),M.E.(CSE) Associ-ate Professor in Department of Information Technology. SamratAshok Technological Institute Vidisha. He has more than 23Years Teaching Experiences. He has published more than 50research papers in National/International conferences and Jour-nals .His area of interest is machine learning and data mining.He

    is PhD. Scholar at R.G.P.V.Bhopal

    Dr J.L.Rana B.E.M.E.(CSE),PhD(CSE) .Formerly he was Headof Department Computer Science and Engineering , M.A.N.I.T.Bhopal M.P. Inidia.He has more than 40 Years Teaching Ex-perinces. His area of intrest includes Data Mining, Image Pro-cessing, and Ad-Hoc Network etc. He has so many publicationsin International Journal and conferences.

    Dr. R.C.Jain PhD .He is the director Samrat Ashok TechnologicalInstitute Vidisha M.P. India.He has more than 35 Years TeachingExperiences. Research Interest includes Data Mining , ComputerGrpahics, Image Processing, Data Mining .He has publishedmore than 250 research papers in Internatinal Journals and Con-ferences.

    JOURNAL OF COMPUTING, VOLUME 5, ISSUE 1, JANUARY 2013, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 37

    2013 Journal of Computing Press, NY, USA, ISSN 2151-9617