Cloud-based Object Recognition for Robotsneuron.tuke.sk/.../Cloud-based_Object_recognition_for_Robots.pdfCloud-based Object Recognition for Robots ... The image data are stored as

adfa, p. 1, 2011.

© Springer-Verlag Berlin Heidelberg 2011

Cloud-based Object Recognition for Robots

Daniel LORENCIK1, Jaroslav ONDO1, Peter SINCAK1, Hiroaki WAGATSUMA2

1Department of Cybernetics and Artificial Intelligence, Technical University of Kosice

{daniel.lorencik, jaroslav.ondo, peter.sincak}@tuke.sk

2Department of Human Intelligence Systems, Kyushu Institute of Technology,

Kitakyushu, Japan

[email protected]

Abstract. Paper deals with Cloud-based Robotics approach which seems to be

very supported by new technologies in the area of Cloud computing. In this paper,

we will present and early implementation of a system for cloud-based object recog-

nition. The primary use of the system is to provide an object recognition as a ser-

vice for a wide range of devices. The main benefit of using the cloud as a platform

are easy scalability in the future and mainly the sharing of already collected

knowledge between all devices using this system. The system consist of feature

extraction part and the classification part. For feature extraction, SIFT and SURF

are used, and for the classification, the MF ArtMap has been used. In this paper,

the implementation of both parts will be presented in more detail, as well as pre-

liminary results. We do assume that Cloud Robotics and Brain research for Robots

will emerge into a functional system able to share and utilize common knowledge

and also personalization in close future.

Keywords: cloud computing, cloud robotics, SIFT, SURF, MF ArtMap, Brain

like systems

1 Introduction

Cloud Computing was introduced for IT domain many years ago. The impact to Intelli-

gent Robotics came only recently when a concept of Cloud Robotics came into the do-

main of Intelligent Robotics [1], [2]. We do believe that Cloud Robotics should include

implementation of Artificial Intelligence on the Cloud and also this technology can bring

some major changes in core Artificial Intelligence like pattern Recognition towards con-

tinuously changing representation set for learning. Learning approach seems to be incre-

mental and also some brain like inspirations can play important role of the resulting sys-

tem. Crowdsourcing and also multisource information about brain functioning can bring

effect in resulting accuracy of Robotic Intelligence.

mailto:peter.sincak%[email protected]

2 Cloud Based Framework for Cloud Robotics

We have discussed the system proposal in greater detail in [3]. The proposed system

is based on the notion of AI Brick [4] – to provide well-defined system suited for one

task – in this case the object recognition. Since the system uses Microsoft Azure as a

cloud platform, the inherited cloud capabilities will allow for easy scalability in case of

increased demand on service, will allow for easy deployment of new versions (as the

main logic will be provided as a cloud service) and most importantly, it will allow for

knowledge acquisition and sharing from all of the connected clients.

The important feature of the system is that it places no special requirements for the

devices that would use it. The only requirements are ability to capture images and to

send them over the internet connection to the service.

2.1 Cloud Computing Platform and Technological Aspects

As was already mentioned, the system is based on the PaaS [5] (Platform as a Service)

provided by Microsoft Azure. Since our system is intended to be a cloud service, we

adopted the modular architecture of Azure cloud services, where user interfaces are cre-

ated as web roles hosted on virtual computers of variable computing power with the use

of ASP.NET, and the background jobs are created as worker roles hosted on dedicated

virtual servers of variable computing power. These are interacting with the use of Mes-

sage Bus and Queues.

The image data are stored as a blob storage, as well as the descriptors extracted from

them.

To create a truly cloud based service, we use the No-SQL Azure Tables [6] for cross-

referencing the image data, extracted descriptors and the classification data instead of

SQL-like databases.

From the high level architecture proposal on the Fig. 1 it can be seen that only the

image is sent over the Internet to the cloud service as an input data. The required prepro-

cessing and feature extraction is done on the cloud. This approach certainly creates a

problem in the terms of the speed, as the upload of the image is a time consuming oper-

ation.

However, it is necessary to achieve the normalized feature space required for the ob-

ject classification and also makes the resultant service more widely available, as we do

not require any special software of the device for the communication. In the final stage

of the service development, we will implement REST-like API for the use in other sce-

narios (in line with the AI-brick notion).

Such a service can then be utilized in many applications, most notable are the appli-

cations of the cloud robotics. An example can be the RoboEarth project [7] which is able

to use existing cloud image recognition services like Google Goggles [8].

Fig. 1. High level architecture of the proposed system [3]

2.2 Research Approaches used in Proposal

The image processing is important part of information acquisition for Robots. In im-

age processing a feature space can be used in many forms. We have chosen spectral and

also derived descriptors as features for pattern recognition procedure. We are using the

SIFT (Scale invariant feature transform) [9] ad SURF (Speeded-Up Robust Features)

[10] for features extraction and Membership function ArtMap [11]–[13] and Gaussian

classifier for the classification of objects. The main research goal of our work is to adapt

these approaches to the cloud environment and to find out which combination of extrac-

tor-classifier provides the best results.

We had chosen these two classifiers as the MF ArtMap represents the model-free

classifier, whereas the Gaussian represents the model-dependent classifier. One of our

research goal is to compare these two methods.

We anticipate the challenge with the adaptation of the classifier methods to the cloud

environment. The goal of the proposed system is to provide stable service for all devices

connected without regard to the actual number of connected devices. In other words, the

service has to be scalable. In the terms of cloud computing that means the virtual ma-

chines which are the underlying infrastructure of the service can be at any time rebooted,

shut down or started. Therefore the system itself has to be built in a way that reflects

these conditions.

We also compare these classification methods to the simple matching, which can

prove faster in certain conditions (up to certain number of entries in the table storage).

Another anticipated challenge is how to work effectively with the large sets of data

we assume we will amass during the course of experiments and eventual publication of

the service for the public use.

As one of the classifier, and the one to be used in the proof of concept experiment, we

had considered the use of one from the group of ART neural networks due to the previous

experience, more precisely ArtMap [14], [15] neural network subgroup. These networks

are able to be trained using supervised learning. Finally, MF (membership function) Art-

Map ([13], [16]) neural network was chosen as a classifier. This type of neural network

combines theory of fuzzy sets and ART theory. The consequence of this combination is

structured output consisting of computed values of the membership function of every

found fuzzy cluster of every known class for the input. This way, it is possible to com-

pute how much the input belongs into every class. The input is classified into the class

represented by winner fuzzy cluster. Winner fuzzy cluster is cluster belonging into the

output vector, however the value of its membership function is maximal in the output

vector.

3 Cloud-based Image Classification – Software as a Service

We had divided the service into two parts, one called Cloud-based Feature Extrac-

tion (CFE) and the second the Cloud-based Classification - CCL. In this section we will

talk about the feature extraction part which we had already implemented as a service. In

the following text, we will use the abbreviation CFE instead of the Cloud-based feature

extractor for describing the service.

Fig. 2. CFE architectures overview. On the left (a) is architecture version 1, on the right (b) ar-

chitecture version 2

3.1 CFE Architecture Version 1

Our first architecture design was to use dedicated roles for extraction and for image

preprocessing. The idea was that the image preprocessing was the same for both of the

extractors and seemed only fitting to have it scaled automatically based on the actual

load. For each of the extractors, separate worker roles were created – for the same reason.

The communication with the user was done through another web role.

The inter-roles communication was implemented with the Azure Queues, as com-

pared to the Service Bus Queue have less overhead and are faster. In the queue message,

we are sending the unique identification of the image.

The workflow in this architecture was as follows:

1. The user uploads the image via the web page, and chooses the extractor (SIFT, SURF

or both of them)

2. The image is stored to the blob storage, and the unique Id of the image is put into the

queue for image preprocessing

3. The image preprocessing role accesses the image, and rewrites it with normalized

image (scaled down if too big, and set to the shades of gray). Also, the Id of the pre-

processed image is put into the queues for selected extractor services

4. The extractor role access the image in the storage by unique Id, and extract local fea-

tures, which are then stored in blob storage with extractor prefix and image Id. The

image and its extracted features are also written to the Azure Table, in which the

relations between objects are kept

5. The web page with result table is updated and shows the uploaded preprocessed image

along with the extracted features (available as an XML formatted document)

The schema of the architecture can be seen on the left side of the Fig. 2.

This architecture had a drawback in the terms of speed, as can be seen in the Table 1

and Table 2.

3.2 CFE Architecture Version 2

In the second architecture design, we made changes to speed up the process of feature

extraction. As can be seen from the Table 1 and Table 2 , there is a significant time

when the service is literally doing nothing, it just waits for sleep cycle to complete to

check the queue for new messages. Since the architecture 1 used 3 queues (with one

feeding the other two through the image preprocessing role), we decided to add the im-

age processing to the extraction roles, thereby eliminating the first queue and one worker

role. This idea was supported also because the image preprocessing was the least time

consuming operation in the cycle.

By the elimination of one worker role, the workflow in architecture 2 changed:

1. The user uploads the image via the web page, and chooses the extractor (SIFT, SURF

or both of them)

2. The image is stored to the blob storage, and the unique Id of the image is put into the

queue for selected extractor

3. The extractor role access the image in the storage by unique Id, and extract local fea-

tures, which are then stored in blob storage with extractor prefix and image Id. The

image and its extracted features are also written to the Azure Table, in which the

relations between objects are kept

4. The web page with result table is updated and shows the uploaded preprocessed image

along with the extracted features (available as an XML formatted document)

The schema of the architecture can be seen on the right side of the Fig. 2.

This architecture was quicker than the first. The results of measurements can be seen

in the tables Table 3 and Table 4.The speed-up is between 18 and 32%. Currently we

are optimizing the code to further speed-up the extraction process.

3.3 Measured Speed Results for the CFE Architectures

For testing, we used 20 images of varying size and complexity, smaller one with resolu-

tion 0.16 MPX (mega pixels) and the biggest one had 10.84 MPx. Five of the images

were above FullHD resolution. The batch of images can be considered small, but at this

stage, we use it only for validation of the design and the rough speed tweaking of the

service. After deployment, the testing will be more rigorous with bigger sample size.

We also measured the cloud service run in local emulator, so we can compare these

two environments. But even in local emulator, we were using live cloud storage (unem-

ulated), therefore only the roles were run locally.

The infrastructure we used were Small compute instances for all roles, and the sleep

cycle for worker roles was set to 2 seconds. We will also experiment with these settings

in later stages of research.

In the following tables, the measured values of time taken by the service are shown.

The “Time for the user” column shows the time between clicking the upload button and

showing the result on the page. The “Sum of time taken by tasks” column shows the sum

of time actually consumed by the roles to compute result. The last two rows shows the

time for extracting the local features and storing them in storage.

Table 1. Measurements of the CFE architecture 1 - speed on the local emulator

Time for

user

Sum of time

taken by tasks [s]

SIFT extraction

[ms]

SURF extraction

[ms]

min 0:00:02 2.0450 435.1242 710.4513

max 0:04:39 21.4839 8196.7643 12731.7996

Average 0:00:20 5.0472 1860.4650 2287.6808

Median 0:00:05 3.3764 896.0543 1229.4251

Table 2. Measurements of the CFE architecture 1 - speed on cloud environment

Time for

user

Sum of time

taken by tasks [s]

SIFT extraction

[ms]

SURF extraction

[ms]

min 0:00:01 0.9759 197.9281 354.9850

max 0:00:15 11.9751 5967.6276 8374.6194

Average 0:00:04 2.6114 1007.3908 1690.6716

Median 0:00:03 1.7334 473.1524 1074.7474

Table 3. Measurements of the CFE architecture 2 - speed on the local emulator

Time for

user

Sum of time taken

by tasks [s]

SIFT extraction

[ms]

SURF extraction

[ms]

min 0:00:00 1.3733 170.0301 211.0119

max 0:00:10 12.2926 3121.1854 5058.4078

Average 0:00:03 3.0056 632.8390 942.6149

Median 0:00:02 2.3446 369.7710 586.2928

Table 4. Measurements of the CFE architecture 2 - speed on cloud environment

Time for

user

Sum of time taken

by tasks [s]

SIFT extraction

[ms]

SURF extraction

[ms]

min 0:00:00 0.7403 156.2778 169.7089

max 0:00:11 10.0929 3578.1217 7811.9474

Average 0:00:03 1.9533 686.7257 1358.1596

Median 0:00:02 1.3111 349.4731 788.2077

4 Cloud-based MF ArtMap Classifier

The second part of proposed system is classifier CL – implemented as Software as a

service. . Once the image's descriptors are extracted, they are propagated into classifier.

The classifier classifies the object on the picture into one of known classes or create new

one if the object does not fit to none of known classes.

From the group of ART neural networks we chose ArtMap [14], [15] neural network

subgroup. These networks are able to be trained using supervised learning. Finally, MF

(membership function) ArtMap [13], [16] neural network was chosen as a classifier. This

type of neural network combines theory of fuzzy sets and ART theory. The consequence

of this combination is structured output consisting of computed values of the member-

ship function of every found fuzzy cluster of every known class for the input. This way,

it is possible to compute how much the input belongs into every class.

We implemented MF ArtMap neural network classifier as separated cloud service.

That makes proposed system more modular and allows combination of any classifier and

any image descriptor extractors to reach the best results. MF ArtMap neural network is

implemented like a data structure. All values of the MF ArtMap classifier and also

trained classes, relevant clusters and their settings are stored in cloud table in cloud data

store.

Fig. 3. Graphical description of training problem

During the experiments, we encountered a problem with training new images (Fig.

3). Extractor service (CFE) extract different number of descriptors for every input image.

This number depends on different factors like a size of the input image, number of de-

tected key points in the image etc. Simultaneously, the MF ArtMap neural network ex-

pects constant dimension vector as an input. Therefore, we decided to train MF ArtMap

network sequentially - every descriptor as separate input. Once all descriptors of input

image are propagated through the MF ArtMap neural network, we obtained vector of

values of all membership function of input descriptors to all clusters and all classes. At

this point we were able to statistically classify input image into one of known class or

create new class of no match was found.

Fig. 4. Modification of MF ArtMap topology for sequential input

Described solution to the problem required the modification of the MF ArtMap topol-

ogy. On the Fig. 4 the modified topology is presented. The layer called stack of winner’s

fuzzy clusters has been added. The consequence of this modification is that the output

from neural network is not just one winning fuzzy cluster determining the input class,

but the output is the set of winner fuzzy clusters. After all descriptors are propagated

through the first three layers, the content of the stack is propagated to the output layer,

where the winner fuzzy clusters are statistically evaluated and the class of the input set

of descriptors is determined.

4.1 Proof of Concept Experiment

In our experiments with MF ArtMap on the Cloud, we created architecture shown on

Fig. 5. The robot Nao is capturing the image and send it to the control application on the

computer. The Windows Form application relays this image to the cloud service for pro-

cessing. The image is then processed on the cloud, the features extracted by the CFE

service and passed to the MF ArtMap classifier. The result of the classification is then

send back to the control application on the computer, which relays the data to the robot.

After successful classification, the robot says the result class of the object on the captured

image.

Fig. 5. High-level architecture of the system used as a proof of concept

The experiments were done on two sets of images – set 1 and set 2. First set consisted

of logos and simple objects, set number 2 contained images of more complex objects.

Both sets were divided 60/40 for learning and testing phase. For comparison, we have

used different type of features, SIFT, SURF and spectral RBG features of the image. The

results of the experiments are shown in the Table 5. The basic intention was to observe

a behavior of the CFE on different types of images and there classification accuracy

could be influenced by number of features identified on those different type of images.

The number of clusters and generalization ability of the MF ArtMap classifier was also

observed and taken into consideration. The incrementally of the MF ArtMap classifica-

tion approach is very good advantage since additional classes will not require the retrain-

ing of the neural network but are just incrementally processed in the feature space.

Table 5. Proof of concept – results of the classification using two sets of data

SET 1 SET 2

SIFT SURF RGB SIFT SURF RGB

Cla

ssif

ica

tio

n p

reci

sio

n

Training set 100,0% 100,0% 90,0% 100,0% 100,0% 91,2%

Testing set 70,0% 65,0% 70,0% 65,2% 65,2% 56,5%

Representative set 85,0% 82,5% 80,0% 82,6% 82,6% 73,9%

Number of found clusters 2161 3075 798 2165 2895 681

Generalization of Neural Net 0,223 0,491 0,999 0,149 0,423 0,998

The above classification results are representing average classification rate which was

previously evaluated on the contingency tables in more details. The Number of Clusters

and Generalization is in correlation since the ideal case is to have few clusters in classi-

fication but it also depends on processing data.

5 The Cloud-based Robotics with Brain like Approaches

Our intermediate goal is to use the gained knowledge to implement an MF ArtMap as

a service. The Proof of concept presented in this paper was using the basic structure with

MF ArtMap, which was not modified for cloud infrastructure. Regarding this, the syn-

aptic weights of cloud version of MF ArtMap will have to be stored separately. This will

allow for easy duplication of trained neural network, or moving the application to more

powerful cloud server if there was demand for it. The scaling will then be done by the

platform independently of human intervention, thereby providing robustness to the ob-

ject recognition service. The MF ArtMap will need to be adapted further for the task of

object recognition using feature descriptors, as the number of descriptors varies with

object. The proof of concept used batch learning, which provided rather unsatisfying

results. Therefore, we are working on MF ArtMap input layer modification to allow for

inputting all the descriptors at once.

In close future we do believe to add to this framework some brain like approaches

mainly from repository maintained by PhysioDesigner project [17]. We believe that

implementation of hybrid approaches using selected methods of Artificial or Computational

Intelligence and Brain like and more biologically inspirared systems can lead to more accorate

results in the Cloud-based framework for Robots. The current testing platform is NAO humanoid

robot and we do expect the extend this activity to Pepper humanoid platform next year.

6 Conclusion

We have presented some results of Cloud-based system for Object Recognition use-

able for Humanoid Robot NAO. We do believe that further work on the approach can be

useful for multi robotic platform and also we do expect hybridization of the classical AI

approaches with brain like approaches for the benefit if Cloud-based robotic intelligence.

We expect problems with the standardization of databases for intelligence including fact

that domain oriented knowledge will be preferable and easy to implement versus univer-

sal knowledge. Also the learning procedure is expected to be incremental and domain

oriented and we do not think that universal learning approach will succeed in the close

future.

Acknowledgment: This paper is the result of the Project implementation: University

Science Park TECHNICOM for Innovation Applications Supported by Knowledge

Technology, ITMS: 26220220182, supported by the Research & Development Opera-

tional Programme funded by the ERDF.

References

[1] J. J. Kuffner, “Cloud-enabled robots,” in IEEE-RAS International Conference on

Humanoid Robotics, 2010.

[2] G. Mohanarajah, D. Hunziker, R. D’Andrea, and M. Waibel, “Rapyuta: A Cloud

Robotics Platform,” IEEE Trans. Autom. Sci. Eng., pp. 1–13, 2014.

[3] D. Lorenčík, M. Tarhaničová, and P. Sinčák, “Cloud-Based Object Recognition: A

System Proposal,” in Robot Intelligence Technology and Applications 2, vol. 274, J.-H.

Kim, E. T. . Matson, H. Myung, P. Xu, and F. Karray, Eds. Cham: Springer

International Publishing, 2014, pp. 707–715.

[4] T. Ferraté, “Cloud Robotics - new paradigm is near,” Robotica Educativa y Personal,

20-Jan-2013.

[5] P. Mell and T. Grance, “The NIST Definition of Cloud Computing Recommendations

of the National Institute of Standards and Technology,” Nist Spec. Publ., vol. 145, p. 7,

2011.

[6] J. Giardino, J. Haridas, and B. Calder, “How to get most out of Windows Azure

Tables.” [Online]. Available:

http://blogs.msdn.com/b/windowsazurestorage/archive/2010/11/06/how-to-get-most-

out-of-windows-azure-tables.aspx.

[7] “RoboEarth Project.” [Online]. Available: http://www.roboearth.org/. [Accessed: 20-

Mar-2014].

[8] “Google Goggles.” [Online]. Available: http://www.google.com/mobile/goggles/#text.

[Accessed: 20-Mar-2014].

[9] D. G. Lowe, “Object recognition from local scale-invariant features,” in Proceedings of

the Seventh IEEE International Conference on Computer Vision, 1999, pp. 1150–1157

vol.2.

[10] H. Bay, T. Tuytelaars, and L. Van Gool, “SURF: Speeded Up Robust Features,” in

European Conference on Computer Vision, 2006, pp. 404–417.

[11] A. Bodnárová, “The MF-ARTMAP neural network,” in Latest Trends in Applied

informatics and Computing, 2012, pp. 264–269.

[12] P. Smolár, “Object Categorization using ART Neural Networks,” Technical University

of Kosice, 2012.

[13] P. Sinčák, M. Hric, and J. Vaščák, “Membership Function-ARTMAP Neural

Networks,” TASK Q., vol. 7, no. 1, pp. 43–52, 2003.

[14] G. A. Carpenter, “Default ARTMAP,” Boston, 2003.

[15] N. Kopco, P. Sincak, and S. Kaleta, “ARTMAP Neural Networks for Multispectral

Image Classification,” J. Adv. Comput. Intell., vol. 4, no. 4, pp. 240–245, 2000.

[16] P. Sincak, M. Hric, and J. Vascak, “Neural Network Classifiers based on Membership

Function ARTMAP,” in Systematic organisation of information in fuzzy systems, P.

Melo-Pinto, H.-N. Teodorescu, and T. Fukuda, Eds. IOS Press, 2003, pp. 321–333.

[17] “PhysioDesigner.” [Online]. Available: http://physiodesigner.org/.

Documents

Cloud-based Object Recognition for Robotsneuron.tuke.sk/.../Cloud-based_Object_recognition_for_Robots.pdfCloud-based Object Recognition for Robots ... The image data are stored as