7
Hierarchical image representation using 3D camera geometry for content-based image retrieval Sang Min Yoon a,n , Holger Graf b , Arjan Kuijper b,c a School of Computer Science, Kookmin University, 77, Jeongneung-ro, Sugnbuk-gu, Seoul 136-702, Republic of Korea b Fraunhofer IGD, Fraunhoferstrasse 5, Darmstadt 64283, Germany c Graphical Interactive Systems, Computer Science, TU Darmstadt, Fraunhoferstrasse 5, Darmstadt 64283, Germany article info Article history: Received 10 August 2013 Received in revised form 21 January 2014 Accepted 21 January 2014 Available online 10 February 2014 Keywords: Hierarchical image representation Constrained agglomerative clustering Camera geometry abstract In this paper we present a hierarchical image representation methodology by clustering images with 3D camera geometry in order to efciently retrieve the images according to user's viewpoint. The framework of our proposed technique is composed of two steps: rst the visual correlation analysis between images in a large database is determined by the estimated 3D camera geometry and second images are classied using a constrained agglomerative hierarchical image clustering method to retrieve the images the users search. The constrained agglomerative hierarchical image clustering method provides balanced hierarchical layers, independent of the number of images within the cluster. It also provides a convenient way to browsing, navigating, and categorizing of the images with various viewpoints, illumination, and partial occlusion. & 2014 Elsevier Ltd. All rights reserved. 1. Introduction With the rapid development of digital cameras, cell phones, and PDAs with embedded cameras, the efcient representation of the photos from a personal image library or on the web is becoming more and more important. Image retrieval and brows- ing applications have been developed encouraging people to efciently search and retrieve a set of images which they want to see in a large database (Goesele et al., 2010; Singhai and Shandilya, 2010; Vasconcelos, 2005; Vassilieva, 2009; Yoon and Kuijper, 2011). To retrieve the images that have similar visual elements (Ilbeygi and Shah-Hosseini, 2012; Yang et al., 2012) from a query image has been a challenging topic in the area of computer vision and machine learning and has been coined Content-Based Image Retrieval(CBIR). CBIR is an attempt to search the images by using contents in the multimedia database in order to derive the meaningful features (Amayri and Bouguila, 2013) as well as to measure the dissimilarity of the visual objects by using distance functions. The performance of typical CBIR methods heavily relies on both the denition of the similarity measures and the conguration of the database. The general Euclidean distance similarity measure and naive database conguration methodologies are often too generic, so that CBIR systems take too much time to search for the images and fail to retrieve the desired ones. Most popular CBIR systems are based on annotated text information of the contents to search the images in a given set or on the web (Chai et al., 2007; Ahmad, 2008; Liu and Özsu, 2009). However, these systems sometimes do not nd the correct images because of lack of text information of the contents, false annotations, or wrong title of the images. Visual features such as color, texture, or shape information are important for accurate CBIR system, but a visual content based image retrieval system is still very dependent on the change in illumination, pose, color and so on. In this paper, we present an effective CBIR methodology by using hierarchical clustering of the images with extrinsic camera parameters in database and a similarity measure with a visual codebook (cf. Qiao et al., 2012 for web social networks). To achieve this, we rst label the images with a constrained agglomerative hierarchical image classication in which the feature space consists of the 3D camera position. The constrained agglomerative hier- archical clustering method causes the clusters to balance the hierarchical layers, independent of the number of the images of each cluster. A visual codebook is then constructed with common visual features in the layers and clusters. Fig. 1 shows our proposed clustering methodology for hier- archical clustering and measuring the similarity. Hierarchical clustering of images and the visual codebook is represented in the left image of Fig. 1, and the similarity measure between a query image's visual feature and codebook as shown in the right Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/engappai Engineering Applications of Articial Intelligence 0952-1976/$ - see front matter & 2014 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.engappai.2014.01.012 n Corresponding author. E-mail address: [email protected] (S.M. Yoon). Engineering Applications of Articial Intelligence 30 (2014) 235241

Hierarchical image representation using 3D camera geometry for content-based image retrieval

  • Upload
    arjan

  • View
    215

  • Download
    3

Embed Size (px)

Citation preview

Hierarchical image representation using 3D camera geometryfor content-based image retrieval

Sang Min Yoon a,n, Holger Graf b, Arjan Kuijper b,c

a School of Computer Science, Kookmin University, 77, Jeongneung-ro, Sugnbuk-gu, Seoul 136-702, Republic of Koreab Fraunhofer IGD, Fraunhoferstrasse 5, Darmstadt 64283, Germanyc Graphical Interactive Systems, Computer Science, TU Darmstadt, Fraunhoferstrasse 5, Darmstadt 64283, Germany

a r t i c l e i n f o

Article history:Received 10 August 2013Received in revised form21 January 2014Accepted 21 January 2014Available online 10 February 2014

Keywords:Hierarchical image representationConstrained agglomerative clusteringCamera geometry

a b s t r a c t

In this paper we present a hierarchical image representation methodology by clustering images with 3Dcamera geometry in order to efficiently retrieve the images according to user's viewpoint.

The framework of our proposed technique is composed of two steps: first the visual correlationanalysis between images in a large database is determined by the estimated 3D camera geometry andsecond images are classified using a constrained agglomerative hierarchical image clustering method toretrieve the images the users search.

The constrained agglomerative hierarchical image clustering method provides balanced hierarchicallayers, independent of the number of images within the cluster. It also provides a convenient way tobrowsing, navigating, and categorizing of the images with various viewpoints, illumination, and partialocclusion.

& 2014 Elsevier Ltd. All rights reserved.

1. Introduction

With the rapid development of digital cameras, cell phones,and PDAs with embedded cameras, the efficient representation ofthe photos from a personal image library or on the web isbecoming more and more important. Image retrieval and brows-ing applications have been developed encouraging people toefficiently search and retrieve a set of images which they wantto see in a large database (Goesele et al., 2010; Singhai andShandilya, 2010; Vasconcelos, 2005; Vassilieva, 2009; Yoon andKuijper, 2011). To retrieve the images that have similar visualelements (Ilbeygi and Shah-Hosseini, 2012; Yang et al., 2012) froma query image has been a challenging topic in the area of computervision and machine learning and has been coined “Content-BasedImage Retrieval” (CBIR).

CBIR is an attempt to search the images by using contents inthe multimedia database in order to derive the meaningfulfeatures (Amayri and Bouguila, 2013) as well as to measure thedissimilarity of the visual objects by using distance functions.The performance of typical CBIR methods heavily relies on boththe definition of the similarity measures and the configuration ofthe database. The general Euclidean distance similarity measureand naive database configuration methodologies are often too

generic, so that CBIR systems take too much time to search for theimages and fail to retrieve the desired ones.

Most popular CBIR systems are based on annotated textinformation of the contents to search the images in a given setor on the web (Chai et al., 2007; Ahmad, 2008; Liu and Özsu,2009). However, these systems sometimes do not find the correctimages because of lack of text information of the contents, falseannotations, or wrong title of the images. Visual features such ascolor, texture, or shape information are important for accurateCBIR system, but a visual content based image retrieval system isstill very dependent on the change in illumination, pose, colorand so on.

In this paper, we present an effective CBIR methodology byusing hierarchical clustering of the images with extrinsic cameraparameters in database and a similarity measure with a visualcodebook (cf. Qiao et al., 2012 for web social networks). To achievethis, we first label the images with a constrained agglomerativehierarchical image classification in which the feature space consistsof the 3D camera position. The constrained agglomerative hier-archical clustering method causes the clusters to balance thehierarchical layers, independent of the number of the images ofeach cluster. A visual codebook is then constructed with commonvisual features in the layers and clusters.

Fig. 1 shows our proposed clustering methodology for hier-archical clustering and measuring the similarity. Hierarchicalclustering of images and the visual codebook is represented inthe left image of Fig. 1, and the similarity measure between aquery image's visual feature and codebook as shown in the right

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/engappai

Engineering Applications of Artificial Intelligence

0952-1976/$ - see front matter & 2014 Elsevier Ltd. All rights reserved.http://dx.doi.org/10.1016/j.engappai.2014.01.012

n Corresponding author.E-mail address: [email protected] (S.M. Yoon).

Engineering Applications of Artificial Intelligence 30 (2014) 235–241

image of Fig. 1. In the hierarchically summarized images with theextracted 3D camera parameters and visual features, we canefficiently visualize its representative images in a balanced layerwithin a hierarchical structure.

The main contributions of this paper are as follows:

1. Our proposed method provides an efficient image retrievalaccording to user's viewpoints.

2. It provides a visualization of the representative images accord-ing to a geographical zoom in/out. As we zoom out of thegeographical viewpoints, we only show the representativeimages of the site or building. Otherwise, we zoom in themap, we browse the images according to its 3D cameraparameters.

After a discussion of the background in Section 2 we presentour new hierarchical database representation methodology withrecovery of the 3D camera parameters and the hierarchicalclustering in Section 3. In our experiments in Section 4 we showthe effectiveness of our method. We conclude with a discussion inSection 5.

2. Background

The research for geographic location based image retrieval orbrowsing has received much attention during the last few years(Naaman et al., 2005; Heesch, 2008). The organization of imagecollections has been accomplished by several classification criteria,such as detecting significant events, geographical characteristics ina specific location, or tags in titles of a photographs (Das et al.,2008; Naaman et al., 2005; Reitz and Kuijper, 2009; Lux et al.,2010; Liu et al., 2009). However, current research efforts for imageretrieval based on common context and visual features withinimage repositories try to summarize the collection of images. Thehierarchical image representation tasks (Kuijper and Florack,2003) are composed of various technologies of Image BasedModeling using 3D camera geometry, feature classification forimage clustering, and image similarity measures. There are similarapproaches for image representation methods, interactive brows-ing, and exploration of a collection of photographs and imagesummarization to represent the visual contents from a given set.

2.1. Image Based Modeling

Image Based Modeling is one of the most prominent tasks inthe development of computer vision and graphics. In particular,advanced computer vision methods address the development ofalgorithms to derive different sorts of information from digitalimages. The process of recovering 3D structures from multiple 2Dphotographs has been one of its central endeavors (Sainz et al.,2002). If the spatial relationship between multiple images isknown, automatic feature detection and matching of a collectionof image sets is involved in establishing a one to one correspon-dence between image planes, determining its relative orientationand the depth relation of spatial points with respect to the plane.

There are several approaches to the problem of a relativeorientation determination problem. In many approaches, Structurefrom Motion (SfM) – which refers to the process of building a 3Dmodel from multiple images – is very popular to estimate 3Dcamera parameters (Mayer, 2008; Snavely, 2009; Goesele et al.,2010). Correspondence approaches in SfM take advantage of imagedisplacements induced by ego-motion. Most of these methodsmatch a large number of points or features in two temporallyseparated images and quantitatively measure the image displace-ments (Wientapper et al., 2011b, 2011a).

2.2. Image clustering

Image clustering is the unsupervised classification of patternsinto groups. Geographical tagging or title of photographs havebeen used for image retrieval or browsing applications. Never-theless, the goal of feature classification and clustering in imageprocessing and computer vision is how to deal with images forclassification, in order to separate the images by low-level featuressuch as color, texture, shape, or by high level semantics, or acombination of those features (Kuijper and Florack, 2005). Asimilarity measure using these features between images is oneof the critical issues because it is still weak in partial occlusion,view translation, orientation and noise (Kuijper and Florack, 2002).Most classification approaches into three main categories; parti-tion, division, and aggregation. One of the most popular methodsin partition methodology is the k-Means method (Murthy et al.,2010), the division follows kd-tree method (Gao et al., 2008).

Combining these two approaches is still a challenging researcharea. In the next section we present our method that add

Fig. 1. Our proposed hierarchical image clustering of image data and the similarity measure of visual features between new image and image database. Left: The structure ofthe database constructed with various categories. Each category is clustered by estimated 3D camera parameters. Right: An example of the similarity measure of codebooksbetween a query image and the database.

S.M. Yoon et al. / Engineering Applications of Artificial Intelligence 30 (2014) 235–241236

hierarchical layers exploiting 3D camera parameters and relatedposes.

3. Hierarchical representation of database

Our hierarchical image representation method which we pro-pose in this paper is composed of multiple layers. In the lowestlayer, we extract the 3D extrinsic camera parameters of the imageswhich build the foundation of our hierarchical image clustering.Upper layers of our hierarchical structure are separated by aclustering algorithm, in which the feature space established withinthe lowest layer consists of the camera's 3D position and orienta-tion. In the following sections, we will explain how we extract the3D extrinsic camera parameters from multiple images, establishrelationship between 3D positions from multiple images andadequate classification methods of images at the upper layersbased on the similarity measure derived from the camera's 3Dextrinsic parameters.

3.1. Recovery of relevant 3D camera parameters

Given n images in the database, the extrinsic camera para-meters of each image, Eiðr; tÞ; ði¼ 1;…;nÞ, where r is a 3�3rotation matrix, and t is a 1�3 translation vector, are recoveredby using available multiple view geometry methods (Harltey andZisserman, 2006; Yoon and Graf, 2009) combining

1. an adequate feature detection mechanism in each camera,2. feature matching between multiple images,3. the calculation of the epipolar geometry, and4. the 3D position estimation within the world coordinate system.

Fig. 2 highlights this process of recovering the Eiðr; tÞ frommultiple images. Fig. 2 (left) shows example images within acollection of images. We have no prior knowledge, such as imageresolution, tags, or title. It also shows matching features after theScale Invariant Feature Transform (SIFT) and Random SampleConsensus (RANSAC) (Lowe, 1999; Wei et al., 2008). The derivedepipolar lines between two example images are also identified.Applying this process, we derive the propagation of errors from acamera motion to an epipolar constraint over two correspondingmatching points. The error in the epipolar constraint is simplycharacterized using the distance from an image point to theepipolar line derived from the corresponding image point in theother image. Fig. 2(right) displays the relative 3D position androtation of the camera within the world coordinate system.

Until a sufficient number of 3D feature positions are estimated,visual features are detected automatically with SIFT (Lowe, 1999).These detected feature points are candidates for feature matchingand estimation of the 3D position of the images. In order toremove the outliers aiming at a reduction of errors for the 3Dextrinsic camera parameters, we employ the RANSAC approach.Features obtained by SIFT for feature detection and matching arescale invariant, partially invariant to changing viewpoints, andchange in illumination because of its characteristics.

From the epipolar geometry and matching points, we extractthe rotation and orientation of the cameras. By calculating theepipolar geometry and extracting the extrinsic camera parametersof multiple images, we sketch the relationship of the images. WithSIFT and RANSAC, the epipolar geometry and 3D camera positionof a set of images is estimated as shown in Fig. 3. In this figure, therecovered 3D camera position is shown within the world coordi-nate system with the images of Casa Mila, Barcelona, Spain.

The lowest layer of our hierarchical structure is constructedusing these extrinsic camera parameters of the images. In a nextstep, we describe the clustering of images based on the distance ofeach 3D camera position within the world coordinate system.

3.2. Hierarchical image clustering

Image clustering and categorization are means for high-leveldescription of image content. Most content based image clusteringand summarization algorithms rely on feature extractionand similarity measure by comparing with visual features likecolor, texture, shape or text information of images. However, inthis paper we analyze the extrinsic camera parameters instead, inorder to cluster and classify images with their geographicalcharacteristic.

In image clustering and retrieval applications, unsupervisedimage clustering can be separated with non-hierarchical andhierarchical clustering algorithms. In numerous non-hierarchicalclustering methods, which are extensively used in data classifica-tion or data mining in various areas, k-Means clustering is analgorithm to cluster n images based on attributes into k partitions,where kon. It has the aims to form a k-block set partition of dataand to find a good local minimum with linear complexity O(kmin)with respect to the number of instances (Murthy et al., 2010).However, the algorithm is sensitive to initial starting conditionsand hence must be randomly repeated many times.

Conversely, hierarchical clustering algorithms (Lee and Crawford,2005; Takeuchi et al., 2007) are run once and create a dendrogram:a tree structure containing a k-block set partition for each value ofk between 1 and n, where n is the number of images at the lowestlevel to cluster allowing the user to choose a particular clustering

Fig. 2. Recovering the 3D camera parameters in the world coordinate system by automatic feature detection and matching. Left: Epipolar geometry and matching pointswith automatic feature detection and matching. Right: Recovered 3D camera position with SIFT and RANSAC.

S.M. Yoon et al. / Engineering Applications of Artificial Intelligence 30 (2014) 235–241 237

granularity. Hierarchical clustering algorithms are also separatedwith variants of the single-link, complete-link, and minimum-variance algorithms. The single-link and complete-link algorithmsare most popular. These two algorithms differ in the way theycharacterize the similarity measure between a pair of clusters.

In the single-link method we apply (Yoon and Graf, 2008), thedistance between two clusters is the minimum of the distancesbetween all pairs of camera's 3D extrinsic parameters drawn fromthe two clusters. The basic agglomerative hierarchical clusteringalgorithm begins with one image per cluster.

Let

S¼ fE1; E2;…; En�1; Eng ð1Þbe the set of 3D extrinsic parameters, Ei, to be clustered. At theinitial status, the number of clusters is same to the number ofimages, n, and each cluster Ci is represented as Ei for every i. Thenwe progressively join the closest clusters pairwise through thefollowing equations until k¼1 (all clusters are hierarchicallypaired).

sði; jÞ ¼DðCi;CjÞ; 8 i; j ð2Þ

l;m¼ argmina;b

sða; bÞ; ð3Þ

Cl ¼ JoinðCl;CmÞ ð4Þ

RemoveðCmÞ ð5ÞHere sði; jÞ is the similarity measure between cluster Ci and Cj.

In this paper, the similarity measure between clusters are calcu-lated by the Euclidean distance, D, of the camera's 3D extrinsicparameters.

The objective of our hierarchical clustering algorithm is toextract a multi-level partitioning of images based on 3D cameraparameters, i.e. a partitioning which groups images into a set ofclusters and then, recursively, partitions them into smaller sub-clusters, until some stop criteria are satisfied. Agglomerativehierarchical clustering algorithms start with several clusters con-taining only one object, and iteratively two clusters are chosen andmerged to form one larger cluster. This process is repeated untilonly one large cluster is left, that contains all objects. Divisivealgorithms work in the symmetrical way.

Fig. 4 (left) shows the original agglomerative hierarchical imageclustering. A Similarity measure between multiple images iscomputed by the Euclidean distance between 3D camera's posi-tion. From multiple images of the Casa Mila, layers are automati-cally clustered into 12 layers. This number of layers is differentfrom site to site or change of viewpoints.

The unconstrained version of agglomerative hierarchical imageclustering builds a dendrogram for all values of k. If there are manyimages at a public site, the dendrogram will be high. However,there are many places that have only few uploaded photographson the web. To balance the hierarchical layers of each sites, we

impose some constraints on the hierarchical clustering. Whenbuilding the dendrogram, we need the constraint of the numbersof dendrogram, by W-constraint and B-constraint algorithm(Davidson and Ravi, 2005):

� The B-constraint is defined as the distance between any pair ofimages in two different clusters to be at least Bmin.� The W-constraint requires that for each point x in Ci, there mustbe another point y in Ci such that the distance between x and yis at most Wmax.

We thus prune the dendrogram by starting to build clusters at kmax

and stop building the tree when kmin clusters are reached. At theinitial of unconstrained hierarchical clustering approach, thenumber of clusters was equal to the number of images, n, buthere we construct an initial cluster by the B and W-constraints.This constrained agglomerative hierarchical clustering algorithmprocedure is shown below:

kmax; kmin ¼ calculateBoundðWmax;BminÞ ð6Þ

sðCi;CjÞZBmin 8 i; j; ð7Þ

sðx; yÞrWmax 8x; yACi ð8ÞHere, the distance is bound by the distance between clusters

and within clusters. Within this boundary, we join the closedcluster until the dendrogram is kmin. Fig. 4 shows the constrainedagglomerative hierarchical on-line image clustering method withthe constrained number of k and euclidean distance constraintbased on the recovered 3D camera's position for the Casa Milaimages.

4. Experiments

We conducted our experiments with various on-line imagesdownloaded from the Internet. Here we discuss the results on tworepresentative sets in more detail. In order to show the hierarch-ical structure we used a limited number of images in these twopresented set, although the method is obviously able to handlelarger sets. Since the images were derived by searching on keywords (indeed, a annotated text-based pre-filtering!) for a tour-istic location, we assume that these locations were visible in theimages. Images with a fake or erroneous annotation would beclassified as outliers and put clusters that can easily be recognized.

4.1. Casa Mila, Spain

In total, 90 images are used for our experiments with multipleon-line images of Casa Mila, Barcelona, on the web. The imagesin this Casa Mila set are roughly divided into 3 categories withrespect to region of interest focusing: main building, near view

Fig. 3. 3D Camera position and orientation of the multiple images which are extracted with SIFT and RANSAC within category and some example images in the cluster.

S.M. Yoon et al. / Engineering Applications of Artificial Intelligence 30 (2014) 235–241238

front, and roof. The number of images in these categories are 55,21, and 14, respectively.

In the previous section, we already represented the recoveredcamera's 3D position and unconstrained and constrained on-lineimage representation methodology. Fig. 5 shows the automatichierarchical on-line image clustering of on-line images in the nearview of the front and images on the roof and representativeimages in the layers of the cluster.

Our next experiment is the comparison with non-hierarchicalclustering algorithms such as the k-Means and Mean-Shift cluster-ing algorithms (Comaniciu and Meer, 2002; Xu et al., 2005) tocheck the efficiency of the hierarchical structure as shown in Fig. 6.The top row of Fig. 6 shows on the left the images in the computed3D camera coordinates and on the right the result of the k-Meansclustering for k¼5 and k¼7. Clearly it is difficult to find anddetermine a priori the desired amount of clusters appropriate forthe set of images. The bottom row of Fig. 6 shows the Mean-Shiftcluster method, that automatically separates the cloud into 6 clus-ters. Clearly, here one can also argue if this is the desired amountof clusters. The representative images of the building in the clusteralso shown in Fig. 6 are in the center of gravity of the cluster. Here

we can see that the constrained hierarchical on-line imageclustering method produces better results in automatically con-structing a structure and balancing the hierarchy of the areas thathave an unbalanced amount of on-line photographs density percluster.

4.2. Blue Mosque, Turkey

In Fig. 7 we show the extracted 3D camera position and theunconstrained and constrained hierarchical on-line image cluster-ing method applied to 28 on-line obtained images of the BlueMosque in Istanbul, Turkey. Here we see the same effects that theconstrained clustering results in a more balanced dendrogramrepresenting the tourists' preferred locations the take a picture.

4.3. Namdaemun, Korea

Fig. 8 shows the comparison of the hierarchical representationof the images of Namdaemun (one of the Eight Gates in theFortress Wall of Seoul) which are downloaded from the web. Asshown in Fig. 8, these images have various illumination and view

Fig. 4. Comparison of unconstrained (left) and constrained (right) agglomerative hierarchical image clustering methods from the recovered 3D camera parameters using theCasa Mila images.

Fig. 5. Constrained hierarchical clustering in the other categories in Casa Mila and some representative images of the cluster in each category.

S.M. Yoon et al. / Engineering Applications of Artificial Intelligence 30 (2014) 235–241 239

changes. By clustering the image according to camera's view-points, the hierarchical layers are reduced from 11 layers to 8 layersby applying the constrained agglomerative hierarchical clusteringmethod.

4.4. Hierarchical layers

Table 1 shows the number of hierarchical layers of the categorywhen we tested the unconstrained and constrained hierarchicalimage clustering method. As shown in Table 1, we can see that theconstrained hierarchical image clustering method is reduces thenumber of layers in balancing the hierarchical layers of thecategories.

5. Discussion

In this paper, we have presented a new hierarchical imagerepresentation method for efficient clustering on-line imagesusing a geographic characteristic. We also presented a newapproach in order to estimate the relationships within a collectionof images in a database and how to construct a hierarchy of imageswith a constrained agglomerative clustering methodology.

The 3D camera position is extracted by automatic featuredetection using SIFT, epipolar geometry estimation and removingoutliers with RANSAC. This new constraint hierarchical clusteringof the images allows us to efficiently browse, navigate, andsummarize photographs in a large repository. The hierarchicaltree which we presented in this paper can be useful to manyapplications involving large collections of digital photographs.

Fig. 6. Non-hierarchical on-line image clustering like the k-Means and Mean-Shift methods with camera's 3D extrinsic parameters. Top: k-Means clustering of on-line imagesfor k¼5 and k¼7. Bottom: Mean-Shift automatically find six clusters with their sizes proportional to the number of images in the cluster.

Fig. 7. Examples using on-line images of the Blue Mosque, Istanbul, Turkey. Left: Some representative images in the hierarchical structure of images and the recovery of the3D camera's extrinsic parameters. Right: Unconstrained and constrained agglomerative hierarchical clustering.

Fig. 8. Examples using on-line images of the Namdaemun, Seoul, Korea. Left: Some representative images in the hierarchical structure of images which are downloaded onthe web. Right: Unconstrained and constrained agglomerative hierarchical clustering.

S.M. Yoon et al. / Engineering Applications of Artificial Intelligence 30 (2014) 235–241240

We are able to sort and view the images that are geographicallyclose to an 3D camera position that users want to watch. It givesconvenience and immersion related to applications involving largedata on web.

Once the clusters are computed, users can easily select arepresentative image that corresponds to their desired viewpointusing the 3D camera position and orientation.

Our future work aims on improving this system for industrialapplications like hierarchical 3D reconstruction from the collectedimages. We will focus on the advanced interaction with the userwhere our hierarchical structure is needed for immersive naviga-tion or viewing of the images.

Acknowledgments

S.M. Yoon was funded by the Korea MeteorologicalAdministration Research and Development Program under GrantWeather Information Service Engine (WISE) project, 153-3100-3133-302-350.

References

Ahmad, I.S., 2008. Text-based image indexing and retrieval using formal conceptanalysis. Trans. Intern. Inf. Syst. 2 (3), 150–170.

Amayri, O., Bouguila, N., 2013. On online high-dimensional spherical data clusteringand feature selection. Eng. Appl. Artif. Intell. 26 (4), 1386–1398.

Chai, J.Y., Zhang, C., Jin, R., 2007. An empirical investigation of user term feedback intext-based targeted image search. ACM Trans. Inf. Syst. 25 (1), 1–25.

Comaniciu, D., Meer, P., 2002. Mean shift: a robust approach toward feature spaceanalysis. IEEE Trans. Pattern Anal. Mach. Intell. 24 (5), 603–619.

Das, M., Farmer, J., Gallagher, A.C., Loui, A.C., 2008. Event-based location matchingfor consumer image collections. In: Luo, J., Guan, L., Hanjalic, A., Kankanhalli, M.S., Lee, I. (Eds.), Proceedings of the 7th ACM International Conference on Imageand Video Retrieval, CIVR 2008, Niagara Falls, Canada, July 7–9, ACM, pp. 339–348.

Davidson, I., Ravi, S.S., 2005. Agglomerative hierarchical clustering with constraints:theoretical and empirical results. In: Jorge, A., Torgo, L., Brazdil, P., Camacho, R.,Gama, J. (Eds.), Knowledge Discovery in Databases: PKDD 2005, 9th EuropeanConference on Principles and Practice of Knowledge Discovery in Databases,Porto, Portugal, October 3–7, 2005, Proceedings. Lecture Notes in ComputerScience, vol. 3721, Springer, pp. 59–70.

Gao, L., Li, Z., Katsaggelos, A.K., 2008. A kd-tree based dynamic indexing scheme forvideo retrieval and geometry matching. In: ICCCN. IEEE, pp. 940–944.

Goesele, M., Ackermann, J., Fuhrmann, S., Klowsky, R., Langguth, F., Mücke, P., Ritz, M.,2010. Scene reconstruction from community photo collections. IEEE Comput. 43(6), 48–53.

Harltey, A., Zisserman, A., 2006. Multiple View Geometry in Computer Vision, 2nded. Cambridge University Press, Cambridge, UK

Heesch, D., 2008. A survey of browsing models for content based image retrieval.Multim. Tools Appl. 40 (2), 261–284.

Ilbeygi, M., Shah-Hosseini, H., 2012. A novel fuzzy facial expression recognitionsystem based on facial feature extraction from color face images. Eng. Appl.Artif. Intell. 25 (1), 130–146.

Kuijper, A., Florack, L., 2003. The hierarchical structure of images. IEEE Trans. ImageProcess. 12 (9), 1067–1079.

Kuijper, A., Florack, L.M.J., 2002. Understanding and modeling the evolution ofcritical points under Gaussian blurring. In: Proceedings of the 7th EuropeanConference on Computer Vision (Copenhagen, Denmark, May 28–31, 2002),LNCS 2350, pp. 143–157.

Kuijper, A., Florack, L.M.J., 2005. Using catastrophe theory to derive trees fromimages. J. Math. Imaging Vis. 23 (3), 219–238.

Lee, S., Crawford, M.M., 2005. Unsupervised multistage image classification usinghierarchical clustering with a Bayesian similarity measure. IEEE Trans. ImageProcess. 14 (March (3)), 312–320.

Liu, D., Hua, X.-S., Wang, M., Zhang, H., 2009. Boost search relevance for tag-basedsocial image retrieval. In: ICME. IEEE, pp. 1636–1639.

Liu, L., Özsu, M.T. (Eds.), 2009. Encyclopedia of Database Systems. Springer, USLowe, D.G., 1999. Object recognition from local scale-invariant features. In: ICCV.

pp. 1150–1157.Lux, M., Pitman, A., Marques, O., 2010. Can global visual features improve tag

recommendation for image annotation? Future Internet.Mayer, H., 2008. Issues for image matching in structure from motion. In: ISPRS

Congress. p. B3a: 21 ff.Murthy, V.S.V.S., Vamsidhar, E., Kumar, J.N.V.R.S., Rao, P.S., 2010. Content based

image retrieval using hierarchical and K-means clustering techniques. Int. J.Eng. Sci. Technol. 2 (3), 209–212.

Naaman, M., Song, Y.J., Paepcke, A., Garcia-Molina, H., 2006. Assigning textualnames to sets of geographic coordinates. Comput. Environ. Urban Syst. J. 30 (4),418–435.

Qiao, S., rui Li, T., Li, H., Peng, J., Chen, H., 2012. A new blockmodeling basedhierarchical clustering algorithm for web social networks. Eng. Appl. Artif.Intell. 25 (3), 640–647.

Reitz, T., Kuijper, A., 2009. Applying instance visualisation and conceptual schemamapping for geodata harmonisation. In: Advances in GIScience, Proceedings ofthe 12th AGILE Conference, Hannover, Germany, 2–5 June 2009. Lecture Notesin Geoinformation and Cartography, Springer, pp. 173–194.

Sainz, M., Bagherzadeh, N., Susin, A., 2002. Recovering 3D metric structure andmotion from multiple uncalibrated cameras. In: ITCC. IEEE Computer Society,pp. 268–273.

Singhai, N., Shandilya, S.K., 2010. A survey on: “content based image retrievalsystems”. Int. J. Comput. Appl. 4 (2).

Snavely, N., May 2009. Bundler: structure from motion for unordered imagecollections. Online.

Takeuchi, A., Saito, T., Yadohisa, H., 2007. Asymmetric agglomerative hierarchicalclustering algorithms and their evaluations. J. Classif. 24 (1), 123–143.

Vasconcelos, N., 2005. Content-based image and video retrieval. Sig. Process. 85 (2),231–232.

Vassilieva, N.S., 2009. Content-based image retrieval methods. Programm. Comput.Softw. 35 (3), 158–180.

Wei, W., Jun, H., Yiping, T., 2008. Image matching for geomorphic measurementbased on SIFT and RANSAC methods. In: CSSE (2). IEEE Computer Society,pp. 317–320.

Wientapper, F., Wuest, H., Kuijper, A., 2011a. Composing the feature map retrievalprocess for robust and ready-to-use monocular tracking. Comput. Graph. 35 (4),778–788.

Wientapper, F., Wuest, H., Kuijper, A., 2011. Reconstruction and accurate alignmentof feature maps for augmented reality. In: 3DIMPVT 2011: The First Joint 3DIM/3DPVT Conference (Hangzhou, China, May 16–19, 2011), IEEE, pp. 140–147.

Xu, D., Wang, Y., An, J., 2005. Applying a new spatial color histogram in mean-shiftbased tracking algorithm. In: Proceedings of the Image and Vision ComputingNew Zealand (IVCNZ '05), pp. 1–6.

Yang, H.-Y., Wang, X., Zhang, X.-Y., Bu, J., 2012. Color texture segmentation based onimage pixel classification. Eng. Appl. Artif. Intell. 25 (8), 1656–1669.

Yoon, S.M., Graf, H., 2008. Similarity measure of the visual features using theconstrained hierarchical clustering for content based image retrieval. In: ISVC(2), Lecture Notes Computer Science, vol. 5359, pp. 860–868.

Yoon, S.M., Graf, H., 2009. Hierarchical online image representation based on 3dcamera geometry. in: VISAPP (2). pp. 54–59.

Yoon, S.M., Kuijper, A., 2011. View-based 3D model retrieval using compressivesensing based classification. In: 7th International Symposium on Image andSignal Processing and Analysis, ISPA 2011 (September 4–6, 2011, Dubrovnik,Croatia). IEEE, pp. 437–442.

Table 1Comparison of the number of hierarchical layers between unconstrained andconstrained hierarchical clustering to show the balance of the hierarchical layersin various object images.

Site Number of images Unconstrained layers Constrained Layers

Building 55 12 8Front 21 7 4Roof 14 6 3Mosque 28 13 7Namdaemun 74 11 8

S.M. Yoon et al. / Engineering Applications of Artificial Intelligence 30 (2014) 235–241 241