1202.full

Embed Size (px)

Citation preview

  • 7/30/2019 1202.full

    1/15

    http://sim.sagepub.com/SIMULATION

    http://sim.sagepub.com/content/88/10/1202The online version of this article can be found at:

    DOI: 10.1177/0037549712445233

    2012 88: 1202 originally published online 22 May 2012SIMULATIONKhamron Sunat, Panida Padungweang and Sirapat Chiewchanwattana

    Generalized Transport Mean Shift algorithm for ubiquitous intelligence

    Published by:

    http://www.sagepublications.com

    On behalf of:

    Society for Modeling and Simulation International (SCS)

    can be found at:SIMULATIONAdditional services and information for

    http://sim.sagepub.com/cgi/alertsEmail Alerts:

    http://sim.sagepub.com/subscriptionsSubscriptions:

    http://www.sagepub.com/journalsReprints.navReprints:

    http://www.sagepub.com/journalsPermissions.navPermissions:

    http://sim.sagepub.com/content/88/10/1202.refs.htmlCitations:

    What is This?

    - May 22, 2012OnlineFirst Version of Record

    - Oct 8, 2012Version of Record>>

    at Bibliotheques de l'Universite Lumiere Lyon 2 on November 4, 2012sim.sagepub.comDownloaded from

    http://sim.sagepub.com/http://sim.sagepub.com/http://sim.sagepub.com/content/88/10/1202http://sim.sagepub.com/content/88/10/1202http://www.sagepublications.com/http://www.scs.org/http://sim.sagepub.com/cgi/alertshttp://sim.sagepub.com/cgi/alertshttp://sim.sagepub.com/subscriptionshttp://sim.sagepub.com/subscriptionshttp://sim.sagepub.com/subscriptionshttp://www.sagepub.com/journalsReprints.navhttp://www.sagepub.com/journalsReprints.navhttp://www.sagepub.com/journalsPermissions.navhttp://sim.sagepub.com/content/88/10/1202.refs.htmlhttp://online.sagepub.com/site/sphelp/vorhelp.xhtmlhttp://online.sagepub.com/site/sphelp/vorhelp.xhtmlhttp://online.sagepub.com/site/sphelp/vorhelp.xhtmlhttp://sim.sagepub.com/content/early/2012/05/21/0037549712445233.full.pdfhttp://sim.sagepub.com/content/early/2012/05/21/0037549712445233.full.pdfhttp://sim.sagepub.com/content/88/10/1202.full.pdfhttp://sim.sagepub.com/content/88/10/1202.full.pdfhttp://sim.sagepub.com/http://sim.sagepub.com/http://sim.sagepub.com/http://online.sagepub.com/site/sphelp/vorhelp.xhtmlhttp://sim.sagepub.com/content/early/2012/05/21/0037549712445233.full.pdfhttp://sim.sagepub.com/content/88/10/1202.full.pdfhttp://sim.sagepub.com/content/88/10/1202.refs.htmlhttp://www.sagepub.com/journalsPermissions.navhttp://www.sagepub.com/journalsReprints.navhttp://sim.sagepub.com/subscriptionshttp://sim.sagepub.com/cgi/alertshttp://www.scs.org/http://www.sagepublications.com/http://sim.sagepub.com/content/88/10/1202http://sim.sagepub.com/
  • 7/30/2019 1202.full

    2/15

    Simulation

    Simulation: Transactions of the Society for

    Modeling and Simulation International

    88(10) 12021215

    2012 The Society for Modeling and

    Simulation International

    DOI: 10.1177/0037549712445233

    sim.sagepub.com

    Generalized Transport Mean Shiftalgorithm for ubiquitous intelligence

    Khamron Sunat1, Panida Padungweang2 and

    Sirapat Chiewchanwattana1

    Abstract

    Much research has been conducted recently relating to ubiquitous intelligent computing. Ubiquitous intelligence-enabledtechniques, such as clustering and image segmentation, have focused on the development of intelligence methodologies. In

    this paper, a simultaneous mode-seeking and clustering algorithm called the Generalized Transport Mean Shift (GTMS)was introduced. The data points were designated as the transportertrailer characteristic. The important concept of

    transportation was used to solve the problem of redundant computations of mode-seeking algorithms. The time com-

    plexity of the GTMS algorithm is much lower than that of the Mean Shift (MS) algorithm. This means it is able to be usedin a problem that has a very high data point, in particular, the segmentation of images containing the green vegetation. The

    proposed algorithm was tested on clustering and image-segmentation problems. The experimental results showed thatthe GTMS algorithm improves upon the existing algorithms in terms of both accuracy and time consumption. The GTMSalgorithms highest speed is also 333.98 times faster than that of the standard MS algorithm. The redundancy computation

    can be reduced by omitting more than 90% of the data points at the third iteration of the mode-seeking process. This isbecause GTMS algorithm mainly reduces the data in the mode-seeking process. Thus, use of the GTMS algorithm would

    allow for the building of an intelligent portable device for surveying green vegetables in a ubiquitous environment.

    Keywords

    Mean Shift algorithm, agglomerative mean shift clustering, Generalized Transport Mean Shift algorithm, image segmenta-

    tion, clustering, mode seeking, ubiquitous intelligence

    1. Introduction

    Ubiquitous intelligence computing is widely dedicated to

    research on the technologies used to improve the intelli-

    gence capability of multimedia devices. In this situation,

    the intelligence capacity to elaborate, extract information,

    and improve the quality of the extracted information from

    the environment is crucial. The development of ubiquitous

    intelligence, such as data analysis, image analysis, pattern

    analysis, and computer vision, is also addressed. The clus-

    tering problem is an important process in data analysis.

    Computer vision problems, such as video and motionestimation, require an appropriate area of support for cor-

    respondence operations. The area of support can be identi-

    fied using segmentation techniques. Pattern recognition

    problems can also make use of segmentation results in

    matching. Consequently, image segmentation and cluster-

    ing are important processes in ubiquitous intelligence com-

    puting and are used for analyzing and investigating the

    nature of the given data. A powerful technique for solving

    this problem is to automatically find the mode of density

    of the given data. Normally the algorithm is informed by

    the data density. The local maximums of the density sur-

    face are assumed to be the modes or the centers of clusters.

    All data points are computed to find their modes. The data

    points that have the same mode will be assigned to be in

    the same cluster. The algorithm is useful for clustering,1,2

    image segmentation,3,4 and tracking.5 However, finding

    the mode of all of the data points requires a repeated pro-

    cess, which is very time consuming. The Generalized

    Transport Mean Shift (GTMS) algorithm and its variation

    are extensively proposed to overcome this difficulty.

    The Mean Shift (MS) algorithm is a powerful technique

    for seeking the modes of any given data. The standard MSalgorithm is an iterative procedure that can automatically

    find the mode of density of a data point. It begins by

    1Department of Computer Science, Khon Kaen University, Thailand2Department of Mathematics, Statistic and Computer, Ubon Ratchathani

    University, Thailand

    Corresponding author:

    Khamron Sunat, Department of Computer Science, Khon Kaen

    University, Khon Kaen, 40002, Thailand.

    Email: [email protected]

    at Bibliotheques de l'Universite Lumiere Lyon 2 on November 4, 2012sim.sagepub.comDownloaded from

    http://sim.sagepub.com/http://sim.sagepub.com/http://sim.sagepub.com/http://sim.sagepub.com/
  • 7/30/2019 1202.full

    3/15

    computing the weight of all points by the function of their

    distances from the considered point. The next step is to

    compute the weighted mean point to get the shift position.

    This changes iteratively because of density and increases

    until the shift position is not changed or is changed less

    than the acceptable distance. It is assumed that this posi-

    tion is the mode of the considering point and all data

    points that converge into the same mode will also be

    assumed to be the same cluster. However, the standard

    MS algorithm process is very slow because its time com-

    plexity is O(kn2 m).6 This makes it unsuitable for use,

    especially in image-segmentation applications, such as the

    segmentation of images containing green vegetation,

    which have a very high data point. Improving the speed of

    the MS algorithm by reducing the computational complex-

    ity is, therefore, very important.

    Techniques to speed up the MS algorithm were fre-

    quently proposed. A speed-up technique of neural network

    learning was applied to the MS algorithm, as proposed by

    Padungweing et al.;7

    this allowed it to perform at greaterspeeds than the previous version, whilst also retaining the

    accuracy of the original. Following this, the speed up of

    the Gaussian Blurring Mean Shift (GBMS) algorithm has

    been proposed6 by clustering the data in each iteration and

    removing the cluster that has some data converted to its

    mode. Thus, the next iteration will have less data, allowing

    for faster computing. However, the GBMS algorithm can

    produce different results when compared to the standard

    MS algorithm. This is because the density estimation is

    computed using the current position in each iteration, but

    the density estimation of the standard one is usually com-

    puted using the initial position of the given data set.

    Several methods that can improve the speed of the MSalgorithm for image segmentation were proposed.8 Firstly,

    the neighborhood pixels are grouped into the same cell and

    then the MS algorithm was used. The cell that is shifted

    into the shifted cell in the previous iteration will stop com-

    puting and will be assumed to be the same cluster.

    Secondly, the neighborhood pixels in the spatial domain,

    which are indicated by the specific distance, are grouped

    into the same cluster and the process continues as in the

    first method. Both methods produced excellent speed ups.

    However, a wrong clustering result can occur, even in the

    first step of clustering the group of points. The remaining

    two methods approximate the E (Expectation) and M

    (Maximization) steps in each iteration using a subset of thedata and the quadratic convergence technique, which

    helped to decrease the number of iterations. However, this

    requires high computation, resulting in less help in speed-

    ing up as hoped. The Improved Fast Gaussian Transform

    Mean Shift (IFGT-MS) algorithm9 adopts the improved

    Gaussian transform for numerical approximation. It is very

    efficient for large-scale and high-dimensional data sets.

    However, the IFGT-MS algorithm not only is limited to

    the Gaussian kernel function but also it fails on moderate

    scale data.10,11 Based on the best of the authors knowl-

    edge, the recently and much proposed algorithm is

    Agglomerative Mean Shift (Agglo-MS) clustering.10,11

    Covering hyper ellipsoids were used to cluster data itera-

    tively, which leads to hierarchical clustering via the MS

    process. The covering hyper ellipsoids need to compute

    the inverse of the covariant matrix, which is an extra cost

    and biased by data dimension. It can also produce a poor

    result if the parameter is not properly selected. However,

    this algorithm inspired us by demonstrating the use benefit

    of the hill climbing algorithm, where many data points are

    shifted though the same direction.

    In this paper, the GTMS algorithm is proposed for intel-

    ligence modeling. The basic idea of the MS algorithm,

    which is the shift process, is presented in this algorithm.

    However, instead of finding the mode of all points, the

    GTMS algorithm requires few points, called transporters,

    which are representing their trailers. Moreover, finding

    the transporters does not require any extra cost, because

    the GTMS algorithm uses the distance values that must becomputed in the shifting process. The trailers are the data

    points that are shifted into the same mode as a transporter.

    The relationship transportertrailer is investigated by

    considering the direction of the transporters trajectory and

    the trailers shift direction. The trailers are excluded for

    the next iteration and only the transporters are computed.

    In addition, a transporter can be assigned as a trailer in a

    next iteration; this not only reduces the number of trans-

    porters to be computed, but also performs a simultaneous

    hierarchical clustering.

    In Section 2, we briefly explain the nature of the MS

    algorithm and the Agglo-MS algorithm. The GTMS algo-

    rithm will be introduced and proposed in Section 3. Theexperimental results on real-world clustering, image-

    segmentation problems and discussion will be described in

    Section 4. Section 5 is the conclusion.

    2. Standard Mean Shift algorithm and

    Agglomerative Mean Shift algorithm

    Let XRm be a data set in an m-dimensional Euclidean

    space ofn data points. X = (x1, x2,., xn) andxi = [x1, x2,

    ., xm,]T. A probability density estimation of a given data

    x is defined by

    p(x)=1

    n

    Xni=1

    K (x xi)=k k2

    ; 1

    where K(t) is a kernel function ands is a constant band-

    width such that s> 0. A mode of the density is a position

    x having zero gradient, rp(x)= 0: The MS algorithm is aniterative procedure for seeking the mode of density estima-

    tion with repeated shifting of the position x towards high

    density and is written as

    Sunat et al. 1203

    at Bibliotheques de l'Universite Lumiere Lyon 2 on November 4, 2012sim.sagepub.comDownloaded from

    http://sim.sagepub.com/http://sim.sagepub.com/http://sim.sagepub.com/http://sim.sagepub.com/
  • 7/30/2019 1202.full

    4/15

    x(+ 1)=f(x()); 2

    with

    f(x)=

    Pni= 1 K

    0 (x xi)=k k2

    xi

    Pnj= 1 K

    0 (x xj) 2

    ; 3

    where K(t) = dK/dt and is the iteration index. Using the

    Gaussian function K(t) = e-t/2, (2) and (3) can be reduced12

    to

    x(+1)=Xn

    i= 1p(ijx())xi 4

    and

    p(ijx())=exp ( 1

    2(x() xi)

    2)Pnj= 1 exp (

    12

    (x() xj)

    2) : 5

    The algorithm will be terminated if the shift distance is

    equal to zero or is less than a tolerant threshold as follows:

    x() x(1) threshold: 6

    The clustering is performed by representing each mode of

    the kernel density estimate as the cluster and the data

    points are converged to their corresponding modes. This

    idea of plotting two clusters can be depicted graphically,

    as shown in Figure 1(a). Figure 1(b) shows that data points

    are shifted rising toward their mode. The solid black lines

    represent the trajectory of each data point.

    The Agglo-MS10,11 is an agglomerative MS clustering

    algorithm. It is built upon an iterative query set compres-

    sion mechanism motivated by the quadratic bounding opti-mization characteristic of the MS algorithm. It performs

    well on segmentation of images and clustering of moder-

    ate scale data sets. Since the space is limited, the interested

    reader is directed to Yuan et al.10,11

    3. Generalized Transport Mean Shift

    algorithmIn general, there are many positions shifting through the

    same trajectory and trying to place themselves at their mode,

    as the example shows in Figure 1. Considering Figure 2, the

    ith data is shifted to the position that is closed to the original

    position of the kth data at iteration . Also, the direction of

    the shift vector of ith data is in parallel to the trajectory vec-

    tor of the kth data. Therefore, the ith data should be consid-

    ered as the trailer of the kth data, which is assumed to be a

    transporter of the ith data. Hence, the shifting of the ith

    data need not be computed in the next iteration.

    Even though the jth data at iteration is also shifted to

    the position near the original position of the kth data, its

    mode is different from the mode of the kth data. One of

    the main ideas of this work is that the nearest point that is

    assigned as the transporter should have the same direction

    of trajectory vector as the direction of the shift vector of

    the trailer.

    In order to acquire the solution, four matrices are intro-

    duced. The first matrix is a matrix of the trajectory vector

    of all the data points. The second matrix stores the indexes

    of the transporters. The last two matrices are logical, indi-

    cating the convergence status and the present status of the

    data points. The details of each matrix are as follows.

    Let URmxn. The ith column of U, denoted by ui, is a

    unit trajectory vector of the ith data at the first iterationand can be computed as

    Figure 1. (a) The plotting of two data clusters. (b) The trajectory of data point by applying the Mean Shift algorithm to a two-

    dimensional data set. The third axis denotes density of data.

    1204 Simulation: Transactions of the Society for Modeling and Simulation International 88(10)

    at Bibliotheques de l'Universite Lumiere Lyon 2 on November 4, 2012sim.sagepub.comDownloaded from

    http://sim.sagepub.com/http://sim.sagepub.com/http://sim.sagepub.com/http://sim.sagepub.com/
  • 7/30/2019 1202.full

    5/15

    ui =x1i x

    0i

    x1i x0ik k

    : 7

    Let TR1xn be a transporter matrix, where the ith column

    of T denoted by ti is an index of a transporter of the ith

    data, such that

    ti =argmin

    j

    xi xj

    2

    if ij

    i otherwise;

    8