Asset identification using image descriptors

Asset identification using image descriptors

Reena Friedel & Oscar Figuerola & Hari Kalva & Borko Furht

# Springer Science+Business Media New York 2013

Abstract Managing Information Technology (IT) assets in data centers is a time consumingand error prone process. IT personnel typically identify misplaced assets manually by crosschecking and visually inspecting assets. An automated way of keeping track of assets usingportable devices reduces human error and improves productivity. The proposed assetmanagement application on the tablet captures images of assets and searches an annotateddatabase to identify the asset. Matching performance and response time of asset matching isevaluated using three different image feature descriptors. Methods to reduce feature extrac-tion and matching complexity were developed. Performance and accuracy tradeoffs werestudied, domain specific problems were identified, and optimizations for portable platformswere made. The results show that the proposed methods reduce complexity of assetmatching by 67 % when compared to the matching process using standard image featuredescriptors.

Keywords Assetmanagement . Imagedescriptors .Data center.Complexity reduction .SIFT.

SURF. FAST

1 Introduction

A data center is a facility that hosts computer systems, servers, power supplies, storagesystems, and other related computing equipment, referred to as assets. The size and numberof these data centers are continuously increasing to accommodate the need and demand fordata services. Assets are mounted in racks and a typical rack can accommodate up to 42assets depending on the asset size. Large data centers have thousands of racks and keepingtrack of these large numbers of assets manually makes it very tedious and highly prone toerrors. Google alone accounts for close to a million servers in their data center and that isestimated to be only 2 % of the servers currently used all over the world.

Multimed Tools ApplDOI 10.1007/s11042-013-1688-1

R. Friedel : O. Figuerola : H. Kalva (*) : B. FurhtDepartment of Computer & Electrical Engineering and Computer Science, Florida Atlantic University,Boca Raton, FL 33431, USAe-mail: [email protected]

Present Address:R. FriedelThe MathWorks, Inc., Natick, USA

With increasingly complex environments, mass virtualization to the cloud and disparatetools used by traditional facilities and IT managers, it is more costly and difficult than ever tomanage data centers. The result is expensive downtime, budget overruns, poorly plannedchanges and inability to meet business demands. Since most of the operations done in datacenters today are manual, it is expensive in terms of human labor and is highly error-prone.Data center personnel can no longer haphazardly place a server in any open rack, withoutunderstanding how it will affect the overall data center from a holistic perspective.

Overlooking this impact analysis can result in severe disruptions in service. Thus withmobile devices and tablets becoming common and inexpensive, image processing applicationscan be exploited to automate asset tracking and management in the data centers.

This paper presents an image processing application for identifying assets in a data centerusingmobile devices, which can be used to track various details about the servers easily. Factorssuch as differences in size, color, background, scale, low light conditions, and camera resolutionmake the task of identification more complex. The development of image matching algorithmsby using a set of local interest points has been done over many years but applying this to the datacenter problem with constraints posed by the use of mobile devices and the operating environ-ment makes this problem challenging. The key contributions of this paper include:

1) Identifying the challenges in asset tracking using image search.2) Performance evaluation of image descriptors in the problem domain.3) Complexity reduction methods and optimization for use on mobile devices.

2 Background and related work

Asset tracking and management is essential for the smooth and continuous operations for bigcompanies. It also helps big retailers like Wal-Mart to identify and deal with problems intheir supply chain, reduce overstocking and also locate goods. Several of these big compa-nies, including government and military organizations are always looking out for low costand efficient ways to track their assets and equipment.

2.1 Traditional methods

In earlier times most of the asset management and tracking was done manually by people(maintaining a manual paper inventory). This was later replaced by the barcode systems andRFIDs. Gathering information manually was time consuming as the information had to berecorded first, then transcribed and later fed into a computer system. Every stage increasesthe already high chance of errors.

Barcode systems need a person to affix a barcode to each asset and then use a barcodereader to retrieve information. The barcode and its reader require no prior knowledge to useit and are fairly accurate under controlled settings. Although it is much faster and less errorprone than a manual process, it is still a human intensive task which consumes time.

Additionally, barcodes require a line of sight technology which makes it necessary toplace labels so that they are clearly visible [19]. The limitations of barcodes of beingsusceptible to damage and other various factors make it unfavorable to use barcodes forasset tracking and identification.

Multimed Tools Appl

Radio Frequency IDentification (RFIDs) replaces barcodes altogether. An RFID systemconsists of a ”tag“ which has a small integrated circuit (Silicon chip), memory and anantenna onboard, and a reader which can interrogate the tag through the antenna to retrieveinformation contained in the tag memory [22].

RFID’s allow non-line-of-sight scanning, but they have a limited scanning range which isbased on the necessary frequencies required for the application. To deal with the range limitations,[23] proposed to use a robot with an attached RFID reader which periodically sweeps along someprogrammed routes within a warehouse. Due to their need to conserve power, RFID’s havelimited processing power and memory. A system to attach a more powerful and a more expensiveembedded system to each asset for accurate asset tracking was proposed [12]. All these systemsstill require the use of specialized hardware and the additional cost of manually attaching externaltags to all the assets. This leads to high cost of deployment and maintenance.

2.2 Image based methods

Due to the limitations of traditional asset tracking methods, summarized in Section 2.1, there isa demand for cheaper and more feasible solutions. Image processing based solutions have thepotential to meet these demands. Modern mobile devices such as phones and tablets are wellequipped with cameras, displays, and have good graphics capabilities that makes it possible toimplement complex image and video processing applications on these devices. This gives riseto different kinds of mobile visual search applications like identifying assets, paintings,landmarks, books, CDs, DVDs, printed documents etc. Some of the initial systems to deploysuch mobile visual search systems make use of the basic image processing methods and areused for searches based on pictures taken by handheld mobile devices [1, 15, 21, 28].

2.3 Keypoint based approaches

A large number of interest-point detectors have been proposed in the literature. HarrisCorners [18], scale-invariant feature transform (SIFT) key points [17], maximally stableextreme regions (MSERs) [18], Hessian affine features from accelerated segment test(FAST) [20] are some examples. The different interest-point detectors provide differenttradeoffs in repeatability and complexity. SIFT and other affine interest-point detectors areslow to compute but are highly repeatable. The speeded up robust feature (SURF) interest-point detector provides significant speed up over SIFT interest-point detectors by using boxfilters and integral images for fast computation. However, the box filter approximationcauses significant anisotropy, i.e., the matching performance varies with the relative orien-tation of query and database images [29]. The FAST corner detector is an extremely fastinterest-point detector that offers very low repeatability.

Kim et al. present a method for environment recognition which used multiple features tosegment regions [14]. Multiple features include colors, lines, edges, and geometrical infor-mation. Combining these features, a given image was segmented into several distinct areas.The segments are then used to recognize the environment.

Quoc and Choi considered a system with three components which included book regionextraction, book segmentation, and title extraction [25]. OCR was used to identify the booktitle. OCR may not be robust against photometric and geometric distortions often encoun-tered in photos taken by smartphones.

Multimed Tools Appl

A prototype system using Tablet PCs for interactive museum guidance was proposed in[2]. Here a variation of Lowe’s Swift method for object recognition was used along withBluetooth emitters for automatic room detection. Two rooms were differentiated by evalu-ating the two Bluetooth IDs. The average detection time of object recognition was 10 s on aTablet PC. Despite the application of high-end hardware, their performance and recognitionrate was rather poor.

The book recognition system developed by Chen et. al. makes use of a system whereedge detection is performed to identify the book spines and then identify longer edges todetermine the book boundaries [7]. Such a model cannot be used for asset identification, asassets commonly have lines on the face plate grille which make it difficult to identify assetboundaries. A mobile product recognition system, built by Chen et al., is based on a low bit-rate image retrieval system with a client–server architecture [6]. The system performsproduct recognition based on the features extracted by the mobile device. The CD, DVDand Book cover image retrieval system, developed by Chen et al., is another such applicationthat has been developed which uses the client server architecture [5].

Additionally, work has been done to identify logos on vehicles, where the SIFT descriptorwas ruled out, since variations in brightness, size, and color of the logos significantly reducethe robustness of feature detection [4]. Vehicle features in the foreground, and specularreflections hindered the process of identification. The authors use the Fourier shape descrip-tors, which naturally utilizes the varying shapes of the logos in determining the most likelymanufacturers. Two assumptions key to the system design and performance are: the logo isseen head-on (no perspective change or rotation), and that the logo is the largest object in theimage. In our system we cannot use such assumptions as the logo on the asset need not bethe largest object in the image and may not be in the center of the image.

Apostolos and Anagnostopoulos present a vehicle logo recognition system that detectsand extracts SIFT keypoints in the image of a logo and performs content-based imageretrieval from a logo-image database [28]. SIFT keypoints are known to be invariant to scaleand rotation and even partially invariant to illumination differences. SIFT recognition mightfail in cases with poor illumination conditions (excessive, non-uniform, or poor lighting,shadows), weather, dirt, high occlusion from other objects, and camera wide (perspective)view angle (low contrast), which results in poorly detected features.

Tunkpien reported a palmprint identification system where the feature extraction andrecognition phases were based on principle line [31]. The extraction method is applied toobtain the principle line feature vector by using a cascade of consecutive filters. The recognitionmethod consisting of shape matching, k-nearest neighbor, and cosine similarity was proposed.

In a landmark recognition system by Torralba et al., places are represented using lowdimensional global image representation and use a Hidden Markov Model to solve thelocalization problem [30]. However, global descriptors are not very robust to occlusions,clutter, and changes in viewpoint or orientation. Cummins and Newman, propose a proba-bilistic approach to the problem of recognizing places based on co-occurrences of visualwords and learn a generative model of place appearance [8].

Technologies for image based search have matured sufficiently enough and the MPEGcommittee has begun standardization efforts [13]. It is required that this standard forcompact descriptors will simplify descriptor extraction and matching for visual searchrelated applications, also provide hardware support for such applications on mobile devices,make sure that visual search applications and databases are inter operable, and thus enablehigh performance of applications conformant to the standard.

Most applications deal with the problems of large keypoint databases either by using a smallscale implementation (order of hundreds of objects) to demonstrate their approach [10], or by

Multimed Tools Appl

reducing the complexity of the keypoint itself. A common approach uses Vector Quantization andK-Means in order to reduce each keypoint to a single word in a relatively small dictionary [9].

Another approach, described by Lowe, uses a hash function to approximate nearest-neighbor lookup [17]. While this approach improves the time performance of the nearestneighbor search it does not reduce the memory required for a large database of keypoints.

Despite the large amount of literature on finding and describing keypoints little work has beendone on the problem of directly reducing the number of keypoints or to working with databasesthat contain thousands of images that map to millions of queries (many-to-one matches).

3 Problem description

Unnecessary data center costs include human errors in every aspect of asset managementfrom keeping track of assets, maintaining inventory and identifying assets which have beenmoved. An ideal asset management solution should eliminate human errors and minimizemaintenance costs. Human error can be minimized by minimizing the amount of inputprovided by IT personnel. An ideal asset management solution should be able to identify anasset and report the operating state of the asset with minimal user input.

The problem identified above can be addressed using image based solutions. A solutionusing a mobile devices such as a tablet can meet these requirements. An application on atablet can be used to visually select an asset to be monitored. The user input is limited toselecting the rack and or asset to be monitored. The application can then be used to visuallyidentify the asset in a rack and retrieve relevant information about the asset in real-time. Thisinformation is then overlaid and presented on the screen of the portable device so that the ITpersonnel can then take the necessary actions.

The asset tracking system presented in this paper has the following features which make itdesirable for use.

& Firstly, it can run on readily available smartphones and tablets.& Secondly, the system does not require any tags or markers, thus eliminating the time and

effort needed to physically attach them to individual assets.& Thirdly, since portable devices are being used, we can easily display detailed information

about the assets on the display of the device.

There are several possible client–server architectures that can meet the needs of theproposed solution. These include:

& The mobile client transmits a query image to the server. The image-retrieval algorithmsrun entirely on the server, including an analysis of the query image.

& The mobile client processes the query image, extracts features, and transmits featuredata. The image-retrieval algorithms run on the server using the feature data as query.

& The mobile client downloads data from the server, and all image matching is performedon the device.

A hybrid of the above approaches could also be used as situation demands.

3.1 Data center layout

A data center can occupy one room of a building, one or more floors, or an entire building.Most of the equipment is often in the form of servers mounted in 19 in. rack cabinets, which

Multimed Tools Appl

are usually placed in single rows forming corridors (so-called aisles) between them. Atypical data center rack is 19 in. and has a standardized frame to mount multiple equipmentmodules. The industry standard rack cabinet is 42U tall, as shown in Fig. 1.

The height of the electronic modules is also standardized as multiples of 1.75 in. (44.45 mm)or one rack unit (U). The size of a piece of rack-mounted equipment is frequently described as anumber in “U”. For example, one rack unit is often referred to as “1U”, 2 rack units as ”2U” etc.

3.2 Keypoint based approach for asset identification

Based on the application requirements and taking into consideration the data center condi-tions, the following goals were identified as essential:

& In the asset identification problem, our primary goal was to identify the best possiblematch fast, implying that the time taken for the descriptor extraction was to be the least.

& Search andmatching was performed on a server and the search speed was a secondary goal.& The asset identification system has to work within stringent memory and the computa-

tional power constraints of the portable device. The processing on these devices must befast and economical in terms of power consumption as well.

Fig. 1 A typical section of rack rail, showing rack unit distribution in comparison with a filled data center rack

Multimed Tools Appl

& The algorithms used for the asset identification should be capable of delivering accurateresults in the lowest time possible.

& The retrieval system should be robust to allow a reliable recognition of objects capturedunder various lighting conditions as data centers are not always well lit.

Figure 2 shows the key components of the proposed solution. Using an asset managementapplication on the iPad, pictures of a complete asset are taken. Once the picture is taken,keypoints in the image are detected. Once the keypoints are detected, feature vectors for eachkeypoint are extracted. The set of descriptors extracted for the query image are then used tocompare it against the feature descriptors of unique assets that are stored in the database.

A match is then performed on the basis of the kNN (k- Nearest Neighbors). Nearestneighbor matching identifies only the best match and rejects all the others below a giventhreshold value yielding fewer false matches and the overall precision is high. The distancebetween the descriptors is the main similarity criterion [20].

We perform this kNN match in two directions, i.e. for each point in the query image wefind the two best matches in the reference image (in the database), and then we do the samething for the feature points in the other direction from the reference image to the queryimage. Thus for each feature point, we have two possible matches. We then use a ratio test tofind their distances. If this calculated distance is very low for the best match, and much largerfor the second best match, we can safely accept the first match as a good one since it isunambiguously the best choice. If the two good matches are pretty close in distance, thenthere exists a possibility that the match is erroneous and hence, these matched get rejected.This is done by making sure that the ratio of the distance of the best match over the distanceof the second best match is not greater than a given threshold.

Later, after all the matches have been eliminated in the query match set and the referencematch set, we extract those matches that are symmetrically similar and agree with each other.This is the symmetry test which imposes that, for a match pair to be accepted, both the

Fig. 2 Asset identification procedure

Multimed Tools Appl

matching points have to be the best matching feature of the other. The matches that finallypass through the ratio test and symmetry test are then returned back to the user with detailsabout the asset. These details include model number, part number, slot in the server rack, andother operating information available in the database.

The most important requirement is to minimize the response time while maximizingaccuracy. The total response time includes the time of extracting keypoints (Tk), computingthe descriptors (Tx) and the time of matching the extracted descriptors with the descriptorsstored in the database (Tm), and this is computed as shown in Eq. 1.

T ¼ Tk þ Tx þ Tm ð1ÞThe time Tk depends on the size of the image and the descriptor used. The number of

keypoints extracted depends on the type of image and the time Tx is a function of the numberof keypoints and the type of descriptor. The final component of time Tm is a function of thematching algorithm and number of keypoints in the query image and the keypoints in eachof the images in the database. These dependencies are discussed further in Section 4.

4 Descriptor selection and performance evaluation

The performance of the descriptors and complexity tradeoffs are evaluated as a part of thedescriptor selection process for the proposed solution. The evaluation was done using theassets database created from a data center at the Florida Atlantic University.

4.1 Dataset

The data for this paper was collected from a local server room at Florida Atlantic University.The server room was dimly lit and had assets mounted on server racks with 15 distinct assetsamong all the racks. The unique asset images were annotated and populated in the databasewith the feature vectors. The challenges include images captured by low resolution cameraon a tablet, server room lighting conditions, and the speed of returning the matches. Figures 3and 4 show examples of asset images used in this study. Typical asset sizes were 1U and 2U.

4.2 Choice of descriptor

In this paper, the Scale-invariant feature transform (SIFT), Speeded Up Robust Feature (SURF)detectors along with detectors like Difference of Gaussians (DoG), Determinant of Hessian(DoH) and Features from Accelerated Segment Test (or FAST) were used and evaluated. Abrief description of these detectors and descriptors is seen in the following sub-sections. Thesedescriptors were considered keeping in mind the domain and application of asset identification.

Fig. 3 Image taken at full resolution with a Nikon D80 camera. Example of 1U asset

Multimed Tools Appl

4.2.1 Scale-Invariant Feature Transform (SIFT)

Lowe, introduced the Scale Invariant Feature Transform (SIFT) which transforms image data intoscale-invariant coordinates, relative to local features [16]. The SIFT algorithm is a robust methodwhich extracts distinctive features from images invariant to rotation, scale and distortion. In order toidentify such invariant keypoints that can be repeatedly found inmultiple views of varying scale androtation, local extreme are detected inGauss-filtered difference images [26]. SIFT recognitionmightfail in cases with poor illumination conditions (excessive, non-uniform, or poor lighting, shadows),weather, dirt, high occlusion from other objects, and camera wide (perspective) view angle (lowcontrast), which results in poorly detected features [24]. The SIFT descriptor has a size of 128.

4.2.2 Speeded Up Robust Feature (SURF)

Speeded Up Robust Features (SURF) uses integral images to compute an approximation ofthe Hessian matrix.

When rotation invariance is not considered, which results in a scale-invariant version ofthe descriptor, leads to the descriptor ‘upright SURF’ (U-SURF). In applications, like mobilerobot navigation or visual tourist guiding, the camera often only rotates about the verticalaxis. The benefit of avoiding the overkill of rotation invariance in such cases is not onlyincreased speed, but also increased discriminative power, as stated in [3].

The SURF algorithm is a variation of the SIFT algorithm [3]. Its major differences include aHessian matrix-based measure as an interest point detector and approximated Gaussian secondorder derivatives using box type convolution filters. Here, the use of integral images enablesrapid implementation. The SURF descriptor has a size of 64.

4.2.3 Features from Accelerated Segment Test (FAST)

Features from Accelerated Segment Test (FAST), introduced by Rosten and Drummond,evaluate a small number of individual pixel intensities using decision trees [31]. It computesthe fraction of pixels within a neighborhood which have similar intensity to the center pixel.FAST compares pixels only on a circle of fixed radius around the point.

Since SURF descriptors aremostly based on intensity differences, they are faster to compute.However, SIFT descriptors are generally considered to be more accurate in finding the rightmatching feature. SIFT detectors are slow to compute, because of the high dimensionality oftheir descriptors, but produces highly repeatable features. The SURF interest-point detectorprovides significant speed up over SIFT. The FASTcorner detector is an extremely fast interest-point detector that offers very low repeatability. Relative performance of these descriptors in theproblem domain was evaluated and a descriptor was selected for the problem at hand.

Performance of these descriptors for a single image is presented below. The image used isthat of a common 2U asset. A single asset image, taken with an iPAD with resolution of1280×100, was used as the query and one image of 2821×1053 resolution was saved in thedatabase. Keypoints were found for both the query image and the reference image.

Fig. 4 Image taken by an iPAD and used as a query image. Example of 1U asset

Multimed Tools Appl

Descriptors were extracted for the query image. Descriptors for the reference image wereextracted offline and stored in the database.

To evaluate a match, the query image descriptors, with default values for the parametersare compared against the descriptors stored in the database. Figure 5 shows the number ofkeypoints that were extracted by each descriptor for the query image.

As seen from the Fig. 5, SIFT extracts the most number of keypoints and thus the featuresare highly repeatable. SURF and USURF have approximately equal values and FASTextracts the least number of keypoints.

The complexity of different stages of image matching are shown in Table 1. The tablepresents time complexity of individual stages for both the query (Q) image and the reference(R) image. As seen from Table 1, SIFT takes the most time and FAST takes the least time.We also note from Table 1 that the matching performance of SURF is good with much lesstime complexity. FAST on the other hand has a very good time complexity but the accuracyof finding the correct asset match is relatively poor. Thus we selected SURF which will bethe descriptor discussed in the rest of the paper. Also, SURF was shown to outperform allother descriptors and maintain reasonable repeatability with lighting variations [11]. This isan essential feature for our application considering the lighting conditions of data centerswhich usually are dimly lit and lighting conditions vary in different parts of a data center.

Based on the results reported in Table 1, the following strategies were used to reduce timecomplexity:

& Changing the resolution of the images in the database to yield fewer descriptors.& Keypoint Reduction based on the importance of the points.& Removing overlapping keypoints by further reducing them using a distance based approach.

These approaches are discussed in the sections below.

4.3 Choice of resolution for images in the database

Time complexity is largely a function of the number of keypoints in the image which in turndepends on the resolution of the images. The higher the image resolution, the more the numberof keypoints extracted.

Fig. 5 Number of keypoints detected for the asset

Multimed Tools Appl

On plotting the number of keypoints versus the image resolution on a graph, we find thatthe number of keypoints increases linearly with the increase in the image resolution, as seenin Fig. 6. The dashed line indicates that this was the best resolution for which we were ableto get a 100 % accuracy of identifying the assets in the test data set of 10 images.

We also see that the keypoint extraction time (Tk) in equation 3.1 is a function of thenumber of keypoints, seen in Fig. 7. It increases linearly as the image size increases.

The dashed line in Fig. 7 indicates that this was the best resolution for which we wereable to get a 100 % accuracy of identifying the asset achieving a good time complexity usingthe test data set.

Since complexity is largely a function of keypoints and image size, complexity can bereduced by reducing the image resolution. A lower resolution image will reduce the keypointextraction time linearly and the number of keypoints detected will also be reduced as imagefeatures are lost because of sub-sampling. We have to tradeoff reduced complexity forreduced accuracy and this tradeoff has to be balanced. Moreover, SURF is known to performvery poorly with low resolution images [26].

To identify the correct resolution for the image in the database, the following experimentwas carried out. Images were taken using a high resolution Nikon D80 (10.2 mega pixel)

Table 1 Comparison of descriptors where time is reported in milliseconds

Descriptor No. ofkeypoints

Keypoint extractiontime (Tk)

Descriptor extractiontime (Tx)

Matchingtime (T m)

Matchedor not

Totaltime (T)

SIFT Q=1370 271.1098 145.9

R=2167 532.2285 287.4561 168.3196 Match 585.3294

USURF Q=762 24.69606 50.4129

R=1467 48.07026 87.93381 172.2474 Match 247.3564

Surf Q=762 65.55889 45.27735

R=1467 114.9554 111.7241 108.724 Match 219.5602

Fast Q=55 0.961546 4.179278

R=113 1.873061 4.873557 7.555001 No match 12.69583

Fig. 6 Impact of the number of keypoints on the image resolutions

Multimed Tools Appl

camera. The images were cropped to include just the asset in consideration. It was savedwith four different resolutions i.e. full resolution, three quarter, half and quarter resolutions.

We then ran the program using one query image against all the four images in thedatabase. We found that it was able to find matches of the queried asset with the fullresolution asset image (taking more time), the three quarter resolution asset image (in lessertime) and the half resolution asset image (with a still further decrease in time). The quarterresolution image (which resembles the iPAD quality resolution) on the other hand took theleast time in executing but there was no match identified. At this step we identified that it isbest to save the image at half resolution because there is a gain in time and the accuracy ofidentifying the asset is maintained.

Table 2 shows the matching performance and total time at different resolutions forreference images in the database. As shown in Table 2, computational complexity can bereduced by approximately 50 % without affecting the matching performance.

4.4 Keypoint reduction

The next consideration was to minimize the total time of retrieval by reducing the number ofkeypoints used to identify a match. Reducing the keypoints also reduces the descriptorextraction time and the matching time. Table 3 shows the matching performance when thenumber of descriptors in the query image are reduced by selecting only a portion of the

0

100

200

300

400

500

Key

po

int e

xtra

ctio

n ti

me

Tk

835*160 1670*320 2505*480 3340*640Image Resolution

Keypoint extraction time vs Image Resolution

Fig. 7 Impact of the keypoint extraction time on the image resolutions

Table 2 Comparison of the query image against different resolutions of the same asset in the database

Images No. ofkeypoints

Keypointextractiontime (Tk)

Descriptorextractiontime (Tx)

Matchingtime (Tm)

Matchingaccuracy

Query image [1280*200] 478 54.667227 28.698912

Database images:

Full resolution 3340*640 1602 428.902017 155.538674 99.62859 Match

Three quarter resolution 2505*480 1219 241.517794 101.802712 80.22565 Match

Half resolution 1670*320 605 123.035711 53.380485 51.65225 Match

Quarter resolution 835*160 244 27.912033 13.420825 33.4768 No match

Multimed Tools Appl

keypoints. As shown in Table 3, as the number of keypoints is reduced, total time reducesbut the matching performance also drops.

4.4.1 Assessing the keypoints based on feature strength

Not all keypoints in an image are equally significant for identifying matches. We prioritizethe keypoints based on the feature response of the keypoints computed during keypointextraction in OpenCV.

Feature response or feature strength is a characteristic of the OpenCV keypoint. Thisvalue gets populated during the feature extraction phase which means it is dependent on thetype of feature detector used. SURF populates the feature responses from two first-orderHAAR wavelet filters (1, -1), and are collected on each feature point.

From Table 3 it is seen that random elimination of keypoints reduces the possibility offinding accurate matches of an asset. For this we came up with the following method:

1) Sort the keypoints based on the feature response.2) Keypoints with smaller feature strengths are removed keeping only the more significant

set of keypoints.

The results are summarized in Table 4. Prioritizing keypoints reduces total timewithout affectingmatching performance. Only 50 % of the most significant keypoints can be used in descriptorextraction without affecting matching performance while reducing total time by almost 50 %.

As we see in Table 4, if we use only 50 % of most significant keypoints, the complexityreduces by almost 50 % without affecting the matching performance.

4.4.2 Removing overlaps among keypoints

Once the number of keypoints are reduced using the technique in Section 4.4.1, we still tryto identify a possibility for further reduction in time complexity.

If many keypoints lie close to each other, they convey redundant information andinformation content of each keypoint is low. On the other hand if the descriptors are spreadout, information content is high and retaining such keypoints will likely increase the success

Table 3 Impact of usingreduced keypoints Method Total time

(T) in msMatch

SURF_100% keypoints 2415.12 all assets matched

SURF_75% keypoints 2061.16 80 % of assets matched



Table 4 Impact of usingreduced keypoints Method Total time

(T) in msMatch





Multimed Tools Appl

of matching. Schmid, Mohr, and Bauckhage proposed this information theoretic approach tokeypoint selection [27]. Using this, we focus on reducing the keypoints in a given region.We developed a method for minimizing keypoint overlap and reduce redundancies. Wemake use of the size of the keypoint, i.e. its diameter, to identify overlapping keypoints. Themain steps in keypoint reduction are:

1) Retain only the most significant keypoints as discussed in Section 4.4.1 – keypoints aresorted in decreasing order of their feature response values and 50 % of those strongkeypoints are stored in a vector K

2) ∀ keypoints in K, compare the keypoints and if the distance is lesser than the threshold,eliminate those keypoints:

if Distance Ki;Kj

� �< Threshold; i≠ j; eliminateK j:

This process leads to the elimination of closely placed keypoints. Figure 8 shows acropped asset image illustrating the keypoint reduction approach. Figure 8a shows mostsignificant keypoints before overlapped circle elimination. Figure 8b shows keypoints withthe overlapped circle removed at a distance of 0.15*Size, where Size is the diameter of thekeypoint and Figure 8c shows keypoints after eliminating with a threshold of 1*Size.

From Fig. 8, it is seen that when the threshold is the size of the keypoint (1* size), a lot ofkeypoints get eliminated. This is not desirable since a lot of information content is lost. Assets withvery small difference, e.g., assets that differ in a single digit of themodel numberwill suffer with sucha large threshold value. Example, Poweredge 1750 and Poweredge 1950were not identified properly.

Fig. 8 Images to show the elimination of the keypoints using the image of a Dell poweredge 1950 asset

Multimed Tools Appl

To identify a good steady threshold value, a series of experiments were carried out withthe test data set. Threshold values which were 1 times the size, 0.5 times the size of thekeypoint, etc. were selected. The results of these experiments are as seen in the Table 5below.

From Table 5 it can be seen that, as the threshold increases, the number of retained keypointsdescreases, the matching time decreases, and the accuracy of finding a perfect match also decreases.Using the value of 0.15 as the threshold,we can achieve 100%accuracy in identifying all the images.

4.5 Evaluation of the proposed methods

Using resolution reduction and keypoint reduction approaches, we were able to reduce thetime complexity significantly and have made it possible to have a responsive iPad assetdetection application.

Table 5 Results of different threshold values chosen

Method No. ofkeypoints

KP extractiontime (Tk)

Descriptor extractiontime (Tx)

Matchingtime (T)

Percentageof matches

1*size 61 56.21 4.7 236.7989 47 %

0.5* size 108 56.7 7.11 479.2166 60 %

0.25* size 164 55.03 10.94 613.9084 87 %

0.15* size 180 57.01 10.67 669.8754 100 %

Fig. 9 Different stages of the keypoint reduction steps

Multimed Tools Appl

Figure 9 above shows:

(a) Entire query image with all its keypoints(b) Query image with keypoints reduced by 50 % based on keypoint strength(c) Query image with keypoints reduced by eliminating overlapping keypoints(d) Matched output with lines indicating the matches from the query image to the image in

the database

The final evaluation which compares the original method to the methods described in thepaper is shown in Table 6. The same application was run, with 1 query image tested againstthe 15 images in the database. Total time (T) was noted on both the PC and the iPAD. We seefrom the results obtained that we were able to achieve approximately a 67 % reduction intime without any effect on the accuracy of identifying the correct asset.

4.6 Limitations of the proposed solution

Matching performance is sensitive to the quality of the camera. If the resolution is very poor,the camera has to be moved as close as possible to the asset to capture the image. The iPADresolution is low (approx., 1280×100 pixel per 1U asset) which gives rise to poor qualityimages that cannot be used as reference images in the database.

The performance also drops when the lighting conditions of the data center are ratherpoor. This problem can be overcome using a camera with a good flash. When the queryimages were taken with a powerful camera and flash, and the application being locally runon a PC, it was noticed that there were perfect matching results at all times.

Table 6 Comparison of implemented methods (all time in milliseconds)

Method On PC On iPAD % Reductionin time

Tk Tx Tm Total time Tk Tx Tm Total time

SURF 58 60 1597 1715 1745 680 15713 18138

With resolution reduction 55 30 852 937 1740 360 7810 9910 45 % decrease

With feature strengthkeypoint reduction

55 19 647 721 1735 326 5459 7520 57 % decrease

With removal of closelyplaced keypoints

55 12 491 558 1740 285 3755 5780 67 % decrease

Fig. 10 Example of an asset obscured by wires

Multimed Tools Appl

Assets whose logos are obscured with wires and cannot be noticed even normally by thehuman eye would be difficult to identify by the image processing application. Figure 10 is anexample of such a situation where the asset is almost completely obscured with wireshanding down in front of the asset and matching fails.

4.7 Evaluation of the proposed method using the stanford dataset

Using the methods described above, we tested the application on a publicly available Stanford’sMobile Visual Search dataset (SMVS) which includes 8 categories of objects like CDs, DVDs,book covers, printed matters, video clips, business cards, museum paintings. The SMVS querydata set has the following key characteristics that are lacking in other data sets: rigid objects,widely varying lighting conditions, perspective distortion, foreground and background clutter,realistic ground-truth reference data, and query images from heterogeneous low and high-endcamera phones [5]. Samples of these images are seen in Fig. 11.

The query images were taken with query images with different camera phones, includingsome digital cameras, some of which include Apple (iPhone4), Palm (Pre), Nokia (N95,N97, N900, E63, N5800, N86), Motorola (Droid), Canon (G11). The resolution of the queryimages varies for each camera phone. Product categories like CDs, DVDs, books werecaptured indoors under widely varying lighting conditions over several days.

The references are clean versions of images obtained from the product websites. Allreference images are high quality JPEG compressed color images. The resolution ofreference images varies for each category.

Fig. 11 Sample images from the Stanford Mobile Visual Dataset

Table 7 Comparison of time taken for the SMVS dataset

Method Average search time (in ms) % Time

Book covers dataset with original SURF 37503

Book covers dataset with our SURF 16314 56 % decrease

CD covers dataset with original SURF 15802

CD covers dataset with our SURF 8071 48 % decrease

DVD covers dataset with original SURF 24359

DVD covers dataset with our SURF 11475 52 % decrease

Multimed Tools Appl

For our experiment we used three categories i.e. books, CD covers and DVD covers. Inall three categories there were approximately 300 query images which were comparedagainst 200 reference images.

We conducted a performance evaluation experiment to compare the origianl SURFimplementation and our implementation. The results obtained were summarized in Table 7.It is seen, from Table 7 that the average search time for each image shows a significantdecrease of approximately 50 % for the identification of the books, CD and DVD covers.

5 Conclusion

In this paper we presented a computer vision based approach for asset management in datacenters, wherein a user can take a photo of an asset with a tablet camera and the assets in theimage are identified. Leveraging tablets which are portable, affordable, and ubiquitous enablesus to build a cost-effective asset identification system. Most of the challenges arise out of thelow resolution of the assets captured with iPAD camera, the low lighting conditions in serverrooms, speed and accuracy of providing a good user experience. Common descriptors wereevaluated and found that for this particular application SURF performs well both in terms ofexecution time as well as finding accurate matches. Since most of the mobile devices require afast execution time, complexity reductions and optimizations in the form of reduced resolutionand keypoint reduction were developed. The proposed approaches reduced the complexity byabout 67 % without affecting the accuracy.

Acknowledgments This project was funded by the NSF Industry/University Cooperative Research Centerfor Advanced Knowledge Enablement at Florida Atlantic University, NSF award No. 0934339.

References

1. 10 Google goggles. (2011). Retrieved from http://www.google.com/mobile/goggles/.2. Bay H, Fasel B, Gool. LV (2005). Interactive museum guide. In Proc. ubiquitous computing (UbiComp),

ACM Press3. Bay H, Tuytelaars, T, Gool, LJV (2006). SURF: Speeded Up Robust Features. In ECCV, Lecture Notes in

Computer Science, vol. 3951. Springer, pp. 404–4174. Burkhard T, Minich AJ , Christopher Li. (2011). Vehicle logo recognition and classification: feature

descriptors vs. shape descriptors. EE368 FINAL PROJECT Stanford University5. Chen D, Tsai, SS, Chandrasekhar V, Takacs G, Chen H, Cheung NM, Reznik Y, Vedantham R,

Grzeszczuk, R, Bach J, Girod B. (February 23–25, 2011). The Stanford mobile visual search data set.In MMSys’11. San Jose, California, USA

6. Chen D, Tsai, SS, Chandrasekhar V, Takacs G, Cheung NM, Vedantham R, Grzeszczuk R, Girod B(2010b). Mobile product recognition. In Proc. ACM Multimedia 2010

7. Chen D, Tsai SS, Hsu CH, Kim K, Singh JP, Girod B (2010a). Building book inventories usingsmartphones. In Proc. ACM Multimedia

8. Cummins M, Newman P (2008) FAB-MAP: probabilistic localization and mapping in the space ofappearance. Int J Robot Res 27(6):647–665

9. Dance C, Willamowski J, Fan L, Bray C, Csurka G (2004). Visual categorization with bags of keypoints.In ECCV International Workshop on Statistical Learning in Computer Vision

10. Fergus R, Perona P, Zisserman A (2003). Object class recognition by unsupervised scale-invariantlearning. In CVPR, IEEE Computer Society 264–271

11. Gauglitz S, Höllerer T, Turk, M (2011). Evaluation of interest point detectors and feature descriptors forvisual tracking

12. Ibach P, Stantchev V, Lederer F, Weiss A, Herbst T, Kunze T. (2005, December). WLAN-based assettracking for warehouse management. In Proc. IADIS International Conference e-Commerce, 1–8.

Multimed Tools Appl

http://www.google.com/mobile/goggles/

13. ISO/MPEG. (July 2011). Compact Descriptors for Visual Search: Call for Proposals. MPEG outputdocument N12201.

14. Kim D-N, Kang H-D, Kim, T, Jo K-H (October 2006).Object recognition using segmented region andmultiple features on outdoor environments. In: 1st Int. Strategic Techonol. Forum, pp 305–308

15. Kooaba. (2011). Retrieved from http://www.kooaba.com/.16. Lowe D (1999) Object recognition from local scale-invariant features. Proc Seventh IEEE Int Conf Comp

Vision 2:1150–115717. Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–11018. Matas J, Chum O, Urban M, Pajdla.T (September 2002). Robust wide baseline stereo from maximally

stable extremal regions. In Proc. British Machine Vision Conf. (BMVC), Cardiff, Wales19. McCathie L ,Michael K (2005, October). Is it the end of barcodes in supply chain management? In Proc.

Collaborative Electronic Commerce Technology and Research Conference, LatAm, 1–1920. Mikolajczyk K, Schmid C. (2003). A performance evaluation of local descriptors. In International

conference on computer vision & pattern recognition.21. Nokia point and find. (2012). Retrieved from http://pointandfind.nokia.com/.22. Ouertani MZ, Parlikad, AK, Mcfarlane D (2008). Towards an approach to select an asset information

management strategy. International Journal of Computer Science and Applications, 5(3b), 25–44.Technomathematics Research Foundation

23. Patil A, Munson J, Wood D, Cole A (2008) Bluebot: asset tracking via robotic location crawling. ComputCommun 31(6):1067–1077

24. Psyllos AP, Anagnostopoulos CN, Kayafas E (2010) Vehicle logo recognition using a SIFT-basedenhanced matching scheme. IEEE Trans Intell Transp Syst 11(2):322–328

25. Quoc N, Choi W (2009) A framework for recognition books on bookshelves. In: Proc. InternationalConference on Intelligent Computing (ICIC’09), pages. Ulsan, Korea, pp 386–395

26. Ruf B, Detyniecki M (2009). Identifying paintings in museum galleries using camera mobile phones.Paper presented at Proceedings of the Singaporean French Ipal Symposium - sinfra′09, Singapore

27. Schmid C, Mohr R, Bauckhage C (2000) Evaluation of interest point detectors. Int J Comput Vis37(2):151–172

28. Snaptell. (2007–2009). Retrieved from http://www.snaptell.com/.29. Takacs G, Chandrasekhar V, Chen, DM, Tsai SS, Grzeszczuk R, Girod B. Unified real-time tracking and

recognition with rotation invariant fast features. In Proc. IEEE Conf. Computer Vision and PatternRecognition (CVPR), SFO, CA, to be published.

30. Torralba A, Murphy KP, Freeman, WT, Rubin MA (2003). Context-based Vision System for Place andObject Recognition. In ICCV ’03: Proceedings of the Ninth IEEE International Conference on ComputerVision, page 273

31. Tuytelaars T, Mikolajczyk K (2008) Local invariant feature detectors: a survey. Found Trends 3(3):177–280

Reena Friedel completed her Masters in Computer and Information Sciences from Florida Atlantic University(FAU) in 2012. She also received the Bachelor in Computer Science Engineering degree from Dr. AmbedkarInstitute of Technology, Visvesvaraya Technological University, India in 2008, prior to working for a majorFortune 500 company back in India.

Multimed Tools Appl

http://www.kooaba.com/

http://pointandfind.nokia.com/

http://www.snaptell.com/

During her Masters at FAU she was a researcher in various fields which included “Object Recognition forAsset Identification”, “Interactive 3D Data Visualization”, and “Modeling a Java 3D framework”. Her work asa Research Assistant includes object recognition, data retrieval, creating graphical user interfaces, imageprocessing, and volume rendering on interactive 3D displays such that a user has ability to both visualize aswell as interact with the 3D data to facilitate rotation and cross sectional analysis. She was also responsible forthe 3D rendering of data within a flat-file based climate data store and this article has been published in theNSF Compendium of Industry- Nominated Technology Breakthroughs of NSF Industry/University Cooper-ative Research Centers, 2012. She was also involved in building a 3D model for the setup of cameras,mounted on a plane to monitor the presence of sea turtles off the coast of Florida.

Oscar Figuerola Salas is a Computer Science Ph.D. Candidate in the Department of Computer & ElectricalEngineering and Computer Science at Florida Atlantic University (FAU) in Boca Raton, Florida. He alsoworks as a Research and Teaching Assistant at FAU. His research activity is focused on Social TV.Oscar Figuerola received his B.Sc. from the University of Castile-La Mancha (UCLM) in Albacete in 2011.Since graduation, he has worked on different research projects including: context understanding withMicrosoft’s Kinect, asset recognition using image descriptors, healthcare videoconferencing, and subjectivevideo quality testing.

Hari Kalva is an Associate Professor and the Director of the Multimedia Lab in the Department of Computer& Electrical Engineering and Computer Science at Florida Atlantic University (FAU).Dr. Kalva has over 20 years of experience in multimedia research, development, and standardization. He has

Multimed Tools Appl

made key contributions to the MPEG-4 Systems standard and also contributed to the DAVIC standardsdevelopment. His current research activities include video compression and video communication with recentemphasis on perceptual coding, HEVC, and SocialTV. His publication record includes 2 books, 8 bookchapters, 30 journal papers, 88 conference papers, and 10 granted patents. He is a recipient of the 2008 FAUResearcher of the Year Award and the 2009 ASEE Southeast New Faculty Research Award.Dr. Kalva received a Ph.D. and an M.Phil. in Electrical Engineering from Columbia University in 2000 and1999 respectively. He received an M.S. in Computer Engineering from Florida Atlantic University in 1994,and a B. Tech. in Electronics and Communications Engineering from N.B.K.R. Institute of Science andTechnology, S.V. University, Tirupati, India in 1991.

Borko Furht is a professor and chairman of the Department of Computer & Electrical Engineering andComputer Science at Florida Atlantic University (FAU) in Boca Raton, Florida. He is also Director of theNSF-sponsored Industry/University Cooperative Research Center for Advanced Knowledge Enablement.Before joining FAU, he was a vice president of research and a senior director of development at Modcomp(Ft. Lauderdale), a computer company of Daimler Benz, Germany, a professor at University of Miami in CoralGables, Florida, and a senior scientist in the Institute Boris Kidric-Vinca, Yugoslavia. Professor Furht receivedPh.D. degree in electrical and computer engineering from the University of Belgrade. His current research is inmultimedia systems, video coding and compression, 3D video and image systems, video databases, wirelessmultimedia, and Internet computing. He has been Principal Investigator and Co-PI of several multiyear,multimillion dollar projects—on Coastline Security Technologies, funded by the Department of Navy, I/UCRC Center funded by NSF, One Pass to Production funded by Motorola, NSF PIRE project on Global LivingLaboratory for Cyber Infrastructure Application Enablement, and High-Performance Computing grant fromNSF. He is the author of numerous books and articles in the areas of multimedia, Internet engineering,computer architecture, real-time computing, and operating systems. He is a founder and editor-in-chief of theJournal of Multimedia Tools and Applications (Springer, 1993). He recently founded and serves as Editor-in-Chief (jointly with Dr. Khoshgoftaar) of the International Journal of Big Data (Springer, 2013). His latestbooks include “Handbook of Data Intensive Computing” (Springer 2011), “Handbook of Augmented Reality”(Springer, 2011), “Handbook of Cloud Computing” (Springer 2010), “Handbook of Social Network Tech-nologies and Applications” (Springer 2010), “Handbook of Multimedia for Digital Entertainment and Arts”(Springer, 2009), Long Term Evolution: 3GPP LTE Radio and Cellular Technology“ (CRC Press, 2009),”Encyclopedia of Wireless and Mobile Communications, 2nd Edition“ (CRC Press, 2012), ”Handbook ofMedia Broadcasting“ (CRC Press, 2008), and ”Encyclopedia of Multimedia“ (Springer 2008). He hasreceived several technical and publishing awards, and has consulted for many high-tech companies includingIBM, Hewlett-Packard, Xerox, General Electric, JPL, NASA, Honeywell, and RCA, and has been an expertwetness for Cisco and Qualcomm. He has also served as a consultant to various colleges and universities. Hehas given many invited talks, keynote lectures, seminars, and tutorials. He serves on the Board of Directors ofseveral high-tech companies.

Multimed Tools Appl

Documents

Asset identification using image descriptors