13
Personalized QoS-Aware Web Service Recommendation and Visualization Xi Chen, Member, IEEE, Zibin Zheng, Member, IEEE, Xudong Liu, Zicheng Huang, and Hailong Sun, Member, IEEE Abstract—With the proliferation of web services, effective QoS-based approach to service recommendation is becoming more and more important. Although service recommendation has been studied in the recent literature, the performance of existing ones is not satisfactory, since 1) previous approaches fail to consider the QoS variance according to users’ locations; and 2) previous recommender systems are all black boxes providing limited information on the performance of the service candidates. In this paper, we propose a novel collaborative filtering algorithm designed for large-scale web service recommendation. Different from previous work, our approach employs the characteristic of QoS and achieves considerable improvement on the recommendation accuracy. To help service users better understand the rationale of the recommendation and remove some of the mystery, we use a recommendation visualization technique to show how a recommendation is grouped with other choices. Comprehensive experiments are conducted using more than 1.5 million QoS records of real-world web service invocations. The experimental results show the efficiency and effectiveness of our approach. Index Terms—Service recommendation, QoS, collaborative filtering, self-organizing map, visualization Ç 1 INTRODUCTION W EB services are software components designed to support interoperable machine-to-machine interac- tion over a network. The adoption of web services as a delivery mode in business has fostered a new paradigm shift from the development of monolithic applications to the dynamic setup of business process. In recent years, web services have attracted wide attentions from both industry and academia, and the number of public web services is steadily increasing. When implementing service-oriented applications, ser- vice engineers (also called service users) usually get a list of web services from service brokers or search engines that meet the specific functional requirements. They need to identify the optimal one from the functionally equivalent candidates. However, it is difficult to select the best performing one, since service users usually have limited knowledge of their performance. Effective approaches to service selection and recommendation are urgently needed. Quality-of-Service (QoS) is widely employed to represent the nonfunctional performance of web services and has been considered as the key factor in service selection [33], [34], [35]. QoS is defined as a set of user-perceived properties including response time, availability, reputation, etc. Currently, it’s not practical for users to acquire QoS information by evaluating all the service candidates, since conducting real-world web service invocations is time- consuming and resource-consuming. Moreover, some QoS properties (e.g., reputation and reliability) are difficult to be evaluated, since long-duration observation and a number of invocations are required. Besides client-side evaluation, it’s impractical to acquire QoS information from service providers or third-party communities, because service QoS performance is susceptible to the uncertain Internet envir- onment and user context (e.g., user location, user network condition, etc.). Therefore, different users may observe quite different QoS performance of the same web service, and QoS values evaluated by one user cannot be used directly by another in service selection and recommendation. The objective of this paper is to make personalized QoS- based web service recommendations for different users and thus help them select the optimal one among the functional equivalents. Several previous work [19], [20], [22], [29] has applied collaborative filtering (CF) to web service recom- mendation. These CF-based web service recommender systems work by collecting user observed QoS records of different web services and matching together users who share the same information needs or same tastes. Users of a CF system share their judgments and opinions on web services, and in return, the system provides useful personalized recommendations. However, three unsolved problems of the previous work affect the performance of current service recommender systems. The first problem is that the existing approaches fail to recognize the QoS variation with users’ physical locations. After the analysis of a real-world web service data set, 1 which contains 1.5 million service invocation results of 100 public services evaluated by users from more than 20 countries, we IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 6, NO. 1, JANUARY-MARCH 2013 35 . X. Chen is with the School of Computer Science and Engineering, Beihang University, 1 Qu 2-806 Anli Road No. 58, ZhongCanYuan, Chaoyang District, Beijing 100012, China. E-mail: [email protected]. . Z. Zheng is with the Department of Computer Science and Engineering, The Chinese University of HongKong, Shatin, HSB 101, Hong Kong, China. E-mail: [email protected]. . X. Liu, Z. Huang, and H. Sun are with the School of Computer Science and Engineering, Beihang University, ACT, Xueyuan Road 37, Haidian District, Beijing 100191, China. E-mail: {liuxd, huangzc, sunhl}@act.buaa.edu.cn. Manuscript received 16 Aug. 2010; revised 24 Jan. 2011; accepted 4 May 2011; published online 14 June 2011. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TSC-2010-08-0115. Digital Object Identifier no. 10.1109/TSC.2011.35. 1. http://www.wsdream.net. 1939-1374/13/$31.00 ß 2013 IEEE Published by the IEEE Computer Society

IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 6, NO. 1, …act.buaa.edu.cn › hsun › papers › tsc13-recommend.pdf · Personalized QoS-Aware Web Service Recommendation and Visualization

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 6, NO. 1, …act.buaa.edu.cn › hsun › papers › tsc13-recommend.pdf · Personalized QoS-Aware Web Service Recommendation and Visualization

Personalized QoS-Aware Web ServiceRecommendation and Visualization

Xi Chen, Member, IEEE, Zibin Zheng, Member, IEEE, Xudong Liu,

Zicheng Huang, and Hailong Sun, Member, IEEE

Abstract—With the proliferation of web services, effective QoS-based approach to service recommendation is becoming more and

more important. Although service recommendation has been studied in the recent literature, the performance of existing ones is not

satisfactory, since 1) previous approaches fail to consider the QoS variance according to users’ locations; and 2) previous

recommender systems are all black boxes providing limited information on the performance of the service candidates. In this paper, we

propose a novel collaborative filtering algorithm designed for large-scale web service recommendation. Different from previous work,

our approach employs the characteristic of QoS and achieves considerable improvement on the recommendation accuracy. To help

service users better understand the rationale of the recommendation and remove some of the mystery, we use a recommendation

visualization technique to show how a recommendation is grouped with other choices. Comprehensive experiments are conducted

using more than 1.5 million QoS records of real-world web service invocations. The experimental results show the efficiency and

effectiveness of our approach.

Index Terms—Service recommendation, QoS, collaborative filtering, self-organizing map, visualization

Ç

1 INTRODUCTION

WEB services are software components designed tosupport interoperable machine-to-machine interac-

tion over a network. The adoption of web services as adelivery mode in business has fostered a new paradigmshift from the development of monolithic applications tothe dynamic setup of business process. In recent years, webservices have attracted wide attentions from both industryand academia, and the number of public web services issteadily increasing.

When implementing service-oriented applications, ser-vice engineers (also called service users) usually get a list ofweb services from service brokers or search engines thatmeet the specific functional requirements. They need toidentify the optimal one from the functionally equivalentcandidates. However, it is difficult to select the bestperforming one, since service users usually have limitedknowledge of their performance. Effective approaches toservice selection and recommendation are urgently needed.

Quality-of-Service (QoS) is widely employed to representthe nonfunctional performance of web services and hasbeen considered as the key factor in service selection [33],[34], [35]. QoS is defined as a set of user-perceived

properties including response time, availability, reputation,etc. Currently, it’s not practical for users to acquire QoSinformation by evaluating all the service candidates, sinceconducting real-world web service invocations is time-consuming and resource-consuming. Moreover, some QoSproperties (e.g., reputation and reliability) are difficult to beevaluated, since long-duration observation and a number ofinvocations are required. Besides client-side evaluation, it’simpractical to acquire QoS information from serviceproviders or third-party communities, because service QoSperformance is susceptible to the uncertain Internet envir-onment and user context (e.g., user location, user networkcondition, etc.). Therefore, different users may observe quitedifferent QoS performance of the same web service, andQoS values evaluated by one user cannot be used directlyby another in service selection and recommendation.

The objective of this paper is to make personalized QoS-based web service recommendations for different users andthus help them select the optimal one among the functionalequivalents. Several previous work [19], [20], [22], [29] hasapplied collaborative filtering (CF) to web service recom-mendation. These CF-based web service recommendersystems work by collecting user observed QoS records ofdifferent web services and matching together users whoshare the same information needs or same tastes. Users of aCF system share their judgments and opinions on webservices, and in return, the system provides usefulpersonalized recommendations. However, three unsolvedproblems of the previous work affect the performance ofcurrent service recommender systems. The first problemis that the existing approaches fail to recognize the QoSvariation with users’ physical locations. After the analysisof a real-world web service data set,1 which contains1.5 million service invocation results of 100 public servicesevaluated by users from more than 20 countries, we

IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 6, NO. 1, JANUARY-MARCH 2013 35

. X. Chen is with the School of Computer Science and Engineering, BeihangUniversity, 1 Qu 2-806 Anli Road No. 58, ZhongCanYuan, ChaoyangDistrict, Beijing 100012, China. E-mail: [email protected].

. Z. Zheng is with the Department of Computer Science and Engineering,The Chinese University of HongKong, Shatin, HSB 101, Hong Kong,China. E-mail: [email protected].

. X. Liu, Z. Huang, and H. Sun are with the School of Computer Science andEngineering, Beihang University, ACT, Xueyuan Road 37, HaidianDistrict, Beijing 100191, China.E-mail: {liuxd, huangzc, sunhl}@act.buaa.edu.cn.

Manuscript received 16 Aug. 2010; revised 24 Jan. 2011; accepted 4 May2011; published online 14 June 2011.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TSC-2010-08-0115.Digital Object Identifier no. 10.1109/TSC.2011.35. 1. http://www.wsdream.net.

1939-1374/13/$31.00 � 2013 IEEE Published by the IEEE Computer Society

Page 2: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 6, NO. 1, …act.buaa.edu.cn › hsun › papers › tsc13-recommend.pdf · Personalized QoS-Aware Web Service Recommendation and Visualization

discover that some QoS properties like response time andavailability highly relate to the users’ physical locations. Forexample, the response time of a service observed by userswho are closely located with each other usually fluctuatesmildly around a certain value, while it sometimes variessignificantly between users far away from each other.

The second problem is the online time complexity ofmemory-based CF recommender systems [20], [29]. Theincreasing number of web services and users will pose agreat challenge to current systems. With O(mn) timecomplexity where m is the number of services and n thenumber of users, existing systems cannot generate recom-mendations for tens of thousands users in real time.

The last problem is that current web service recommen-der systems are all black boxes, providing a list of rankedweb services with no transparency into the reasoningbehind the recommendation results [19], [20], [22], [29], [36].It is less likely for users to trust a recommendation whenthey have no knowledge of the underlying rationale. Theopaque recommendation approaches prevent the accep-tance of the recommended services. Herlocker et al. [5]mention that explanation capabilities is an important wayof building trust in recommender systems, since users aremore likely to trust a recommendation when they know thereason behind it.

To address the first two problems, we propose aninnovative CF algorithm for QoS-based web service recom-mendation. To address the third problem and enable animproved understanding of the web service recommenda-tion rationale, we provide a personalized map for browsingthe recommendation results. The map explicitly shows theQoS relationships of the recommended web services as wellas the underlying structure of the QoS space by using mapmetaphor such as dots, areas, and spatial arrangement.

The main contributions of this work are threefold:

. First, we combine the model-based and memory-based CF algorithms for web service recommenda-tion, which significantly improves the recommenda-tion accuracy and time complexity compared withprevious service recommendation algorithms.

. Second, we design a visually rich interface to browsethe recommended web services, which enables abetter understanding of the service performance.

. Finally, we conduct comprehensive experiments toevaluate our approach by employing real-world webservice QoS data set. More than 1.5 millions real-world web service QoS records from more than20 countries are used in our experiments.

The remainder of this paper is organized as follows:Section 2 describes the recommendation approach in detail.Section 3 presents the method for recommendation visua-lization. Sections 4 and 5 show the experiments of therecommendation and visualization, respectively. Section 6discusses the related work, and Section 7 concludes thepaper with a summary and a description of future work.

2 THE RECOMMENDATION APPROACH

2.1 A Motivating Scenario

In this section, we present an online service searchingscenario to show the research problem of this paper. As Fig. 1

depicts, Alice is a software engineer working in India. Sheneeds an email validation service to filter emails. Aftersearching a service registry located in US, she gets a list ofrecommended services in ascending order of the serviceaverage response time. Alice tries the first two servicesprovided by a Canadian company and finds that theresponse time is much higher than her expectation. She thenrealizes that the service ranking is based on the evaluationconducted by the registry in US, and the response time of thesame service may vary greatly due to the different usercontext, such as user location, user network conditions, etc.Alice then turns to her colleagues in India for suggestion.They suggest her try service k provided by a local companythough ranked lower in the previous recommendation list.After trying it, Alice thinks that service k has a goodperformance and meets her requirements.

The problem that Alice faces is to find a service thatmeets both functional and nonfunctional requirements. Thecurrent way of finding a suitable web service is ratherinefficient, since Alice needs to try the recommendedservices one by one. To address this challenge, we proposea more accurate approach to service recommendation withconsideration of the region factor. Moreover, we try toprovide a more informative and user-friendly interface forbrowsing the recommendation results rather than a rankedlist. By this way, users are able to know more about theoverall performance of the recommended services, and thustrust the recommendations.

The basic idea of our approach is that users closelylocated with each other are more likely to have similarservice experience than those who live far away from eachother. Inspired by the success of Web 2.0 websites thatemphasize information sharing, collaboration, and interac-tion, we employ the idea of user-collaboration in our webservice recommender system. Different from sharing in-formation or knowledge on blogs or wikis, users areencouraged to share their observed web service QoSperformance with others in our recommender system. Themore QoS information the user contributes, the moreaccurate service recommendations the user can obtain,since more user characteristics can be analyzed from theuser contributed information.

Based on the collected QoS records, our recommendationapproach is designed as a two-phase process. In the firstphase, we divide the users into different regions based ontheir physical locations and historical QoS experience onweb services. In the second phase, we find similar users for

36 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 6, NO. 1, JANUARY-MARCH 2013

Fig. 1. Alice’s situational problem.

Page 3: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 6, NO. 1, …act.buaa.edu.cn › hsun › papers › tsc13-recommend.pdf · Personalized QoS-Aware Web Service Recommendation and Visualization

the current user and make QoS prediction for the unusedservices. Services with the best predicted QoS will berecommended to the current user. These two phasesare presented in Sections 2.2 and 2.3, respectively, and thetime complexity analysis of the proposed algorithm ispresented in Section 2.4.

2.2 Phase 1: Region Creation

In web service recommender system, users usually provideQoS values on a small number of web services. Traditionalmemory-based CF algorithms suffer from the sparse user-contributed data set, since it’s hard to find similar userswithout enough knowledge of their service experience.Different from existing methods, we employ the correlationbetween users’ physical locations and QoS properties tosolve this problem. In this paper, we focus on the QoSproperties that are prone to change and can be easilyobtained and objectively measured by individual users,such as response time and availability. To simplify thedescription of our approach, we use response time (alsocalled round-trip time (RTT)) to describe our approach.

We assume that there are n users and m services. Therelationship between users and services is denoted by ann�m matrix R. Each entry Ri;j of the matrix represents theRTT of service j observed by user i and ? is the symbol ofno RTT value. Each user i (i 2 f1; 2; . . . ; ng) is associatedwith a row vector Ri representing his/her observed RTTvalues on different web services. The user aða 2f1; 2; . . . ; ngÞ is called the active user or current user ifhe/she has provided some RTT records and needs servicerecommendations.

We define a region as a group of users who are closelylocated with each other and likely to have similar QoSprofiles. Each user is a member of exactly one region.Regions need to be internally coherent, but clearly differentfrom each other. The region creation phase is designed as athree-step process. In the first step, we put users withsimilar IP addresses into a small region and extract regionfeatures. In the second step, we calculate the similaritybetween different regions. In the last step, we aggregatehighly correlated regions to form a certain number of largeregions. Details of these three steps are presented inSections 2.2.1 to 2.2.3, respectively.

2.2.1 Region Feature Extraction

For each region, we use region center as the main feature toreflect the average performance of web services observedby region users. Region center is defined as the medianvector of all the RTT vectors associated with the regionusers. The element i of the center is the median RTT valueof service i observed by users from the region. Median isthe numeric value separating the higher half of a samplefrom the lower half.

Besides the average web service quality observed by theregion users, we also pay attention to the fluctuation ofthe service performance. From large number of QoSrecords, we discover that the service response time usuallyvaries from region to region. Some services have unex-pected long response time or even unavailable to someregions. Inspired by the three-sigma rule [40] which is oftenused to test outlier, we use similar method to distinguish

services with unstable performance to different regions andregard them as region-sensitive services, which is anotherimportant region feature besides the region center.

The set of nonzero RTTs of service s, R:s ¼ fR1;s;R2;s; . . . ; Rk;sg; 1 � k � n, collected from users of all regionsis a sample from the population of service s response time.To estimate the mean � and the standard deviation � of thepopulation, we use two robust measures: median andmedian absolute deviation (MAD) [32]. MAD is defined asthe median of the absolute deviations from the sample’smedian

MAD ¼ mediani�jRi;s �medianjðRj;sÞj

�i ¼ 1; . . . ; k; j ¼ 1; . . . ; k:

ð1Þ

Based on the median and MAD, the two estimators can becalculated by

� ¼ medianiðRi;SÞ i ¼ 1; . . . ; k; ð2Þ

� ¼MADiðRi;SÞ i ¼ 1; . . . ; k: ð3Þ

Definition 1 (Region-Sensitive Service). Let R:s ¼ fR1;s;R2;s; . . . ; Rk;sg; 1 � k � n, be the set of RTTs of service sprovided by users from all regions. Service s is a sensitiveservice to region M iff 9Rj;s 2 R:s > �þ 3� ^ regionðjÞ ¼MÞ, where � ¼ medianiðR:sÞ,� ¼MADðR:sÞ and region(u)function defines the region of user u.

Definition 2 (Region Sensitivity). The sensitivity of region Mis the fraction between the number of sensitive services inregion M over the total number of services.

Definition 3 (Sensitive Region). RegionM is a sensitive regioniff its region sensitivity exceeds the sensitivity threshold �.

By the above definitions, we can identify services withdrastically fluctuating response time and those regionswhere the fluctuation occurs, which is an important featurefor service QoS prediction and recommendation. We detailhow to set � in Section 4.3.

2.2.2 Region Similarity Computation

Determining whether two regions are similar is a key stepbefore region aggregation. The similarity of two regions Mand N is measured by the similarity of their region centersm and n. Pearson Correlation Coefficient (PCC) is widelyused in recommender systems to calculate the similarity oftwo users [2]. PCC value ranges from �1 to 1. Positive PCCvalue indicates that the two users have similar preferences,while negative PCC value means that the two userpreferences are opposite. PCC computes the similaritybetween two regions M and N based on following:

Simðm;nÞ ¼Ps2SðnÞ\SðmÞ ðRm;s� �Rm:Þ � ðRn;s� �Rn:ÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP

s2SðnÞ\SðmÞ ðRm;s� �Rm:Þ2q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP

s2SðnÞ\SðmÞ ðRn;s� �Rn:Þ2q ; ð4Þ

where SðnÞ \ SðmÞ is the set of coinvoked services by usersfrom region M and N , Rm;s is the RTT value of service sprovided by region center m. �Rm: and �Rn:represent theaverage RTT of all the services of center m and n,

CHEN ET AL.: PERSONALIZED QOS-AWARE WEB SERVICE RECOMMENDATION AND VISUALIZATION 37

Page 4: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 6, NO. 1, …act.buaa.edu.cn › hsun › papers › tsc13-recommend.pdf · Personalized QoS-Aware Web Service Recommendation and Visualization

respectively. Because PCC only considers the RTT differ-ence of the coinvoked services by both regions, it oftenoverestimates the similarity of the two regions that are notsimilar, but happen to have a few coinvoked services withvery similar RTTs [13]. Intuitively, we hypothesize that theaccuracy of prediction can be improved if we add acorrelation significance weighting factor that can devaluethe overestimated similarity. We use the following adjustedPCC equation to calculate the similarity between tworegions:

Sim0ðm;nÞ ¼ jSðmÞ \ SðnÞjjSðmÞ [ SðnÞjSimðm;nÞ; ð5Þ

where jSðmÞ [ SðnÞj is the number of web services invokedby users either in region M or region N . The experimentwhich compares the result with (5) and without (4) thesignificance weighting is shown in Section 4.6.

2.2.3 Region Aggregation

Each region formed by users’ physical locations at theoutset always has a very sparse QoS data set, since usersonly use a small number of web services and providelimited QoS records. In this case, it is difficult to findsimilar users and predict the QoS values of the unused webservices for the active user. To solve this problem, wepropose a region aggregation method based on the regionfeatures. As shown in Algorithm 1, the region aggregationapproach is a bottom-up hierarchical clustering algorithm[16]. The input is a set of small regions r1; . . . ; rl. Eachregion consists of users with similar locations. The algo-rithm successively aggregates pairs of the most similarnonsensitive regions until the stopping criterion (line 16) ismet. The result is stored as a list of aggregates in A.

Algorithm 1. Region Aggregation

in: regions r1; . . . ; rlout: result list A

1: for n 1 to l� 1

2: for i nþ 1 to N

3: C½n�½i�:sim SIMðrn;riÞ4: C½n�½i�:index i

5: end for

6: I½n�:sensitivity ISSENSITIV Eðrn)

7: if I½n�:sensitivity ¼ 0

8: then I½n�:aggregate 1

9: else I½n�:aggregate 0

10: P ½n� priority queue for C[n] sorted on sim

11: end for

12: calculate the sensitivity and aggregate of I½l�13: A []

14: while true

15: k1 argmaxfk:I½k�:aggregate¼1gP ½k�:MAXðÞ:sim16: if k1 ¼ null or sim< �

17: then return A

18: k2 P ½k1�:MAXðÞ:index19: A.APPEND(<k1; k2>) and comput k1 center

20: I½k2�.aggregate 0

21: P ½k1� ½�22: I½k1�:sensitivity ISSENSITIV Eðk1Þ23: if I½k1�:sensitivity ¼ 1

24: then I½k1�:aggregate 0

25: for each i with I½i�:aggregate ¼ 1

26: P ½i�:DELETEðC½i�½k1�Þ27: P ½i�:DELETEðC½i�½k2�Þ28: end for

29: else

30: for each i with i½i�:aggregate ¼ 1�i 6¼ k1

31: P ½i�:DELETEðC½i�½k1�Þ32: P ½i�:DELETEðC½i�½k2�Þ33: C½i�½k1�:sim SIMði; k1Þ34: P ½i�:INSERT ðC½i�½k1�Þ35: C½k1�½i�:sim SIMði; k1Þ36: P ½k1�:INSERT ðC½k1�½i�Þ37: end for

38: end while

. Step 1. Initialization (lines 1-12):

- Compute the similarity between each tworegions using (5), store the similarity and thesimilar region index in the similarity matrix C.

- Calculate the sensitivity of each region andidentify whether it can be aggregated. Store theresult in the indicator vector I:I½k�:sensitivityindicates whether region k is sensitive, andI½k�:aggregate indicates whether region k can beaggregated.

- Use a set of priority queues P to sort the rows ofC in decreasing order of the similarity. FunctionP ½k�:MAXðÞ returns the index of the region thatis most similar to region k.

. Step 2. Aggregation (lines 13-38):

- In each iteration, select the two most similar andnonsensitive regions from the priority queues iftheir similarity exceeds threshold �, otherwisereturn A.

- Aggregate the selected two regions and store theirregion index in result list A. Use the smallerregion index of the two as the new region indexand compute the new region center. Mark theindicator vector I of the aggregated region.

- Calculate the sensitivity of the new region and setindicator I. If it is sensitive and cannot beaggregated, remove this region from otherregions’ priority queues. Otherwise, update theelements of both priority queues and similaritymatrix related to the aggregated two regions.Repeat the above three steps.

2.3 Phase 2: QoS Value Prediction

After the phase of region aggregation, thousands of usersare clustered into a certain number of regions based ontheir physical locations and historical QoS similarities. Theservice experience of users in a region is represented by theregion center. With the compressed QoS data, searchingneighbors and making predictions for an active user can becomputed quickly. Traditionally, the QoS prediction meth-ods need to search the entire data set [20], [29], which israther inefficient. In our approach, similarity between theactive user and users of a region is computed by the

38 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 6, NO. 1, JANUARY-MARCH 2013

Page 5: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 6, NO. 1, …act.buaa.edu.cn › hsun › papers › tsc13-recommend.pdf · Personalized QoS-Aware Web Service Recommendation and Visualization

similarity between the active user and the region center.Moreover, it is more reasonable to predict the QoS value foractive users based on their regions, for users in the sameregion are more likely to have similar QoS experience onthe same web service, especially on those region-sensitiveones. To predict the RTT value for the active user a on anunused service s, we take the following steps:

. Find the region of user a by IP address. If noappropriate region is found, the active user will betreated as a member of a new region.

. Identify whether service s is sensitive to the specificregion. If it is region sensitive, then the prediction isgenerated from the region center, because the serviceperformance observed by users from this region issignificantly different from others

Ra;s ¼ Rcenter;s: ð6Þ

. Otherwise, use (5) to compute the similarity betweenthe active user and each region center that hasevaluated service s, and find up to k most similarcenters fc1; c2; :::; ckg. We discuss how to choose k(also called top-k) in Section 4.4.

. If the active user’s region center has the RTT value ofs, the prediction is computed using below equation:

Ra;s ¼ Rcenter;s þPk

j¼1ðRcj;s � �RcjÞSim0ða; cjÞPkj¼1 Sim

0ða; cjÞ; ð7Þ

where Rcj;s is the RTT of service s provided by center cj,and �Rcj is the average RTT of center cj. The prediction iscomposed of two parts. One is the RTT value of the regioncenter of the active user Rcenter;s, which denotes the averageQoS observed by this region users. The other part is thenormalized weighted sum of the deviations of the service sRTT from the average RTT observed by the k most similarneighbors.

. Otherwise, we use the service s RTTs observed bythe k neighbors to compute the prediction as (8)shows. The more similar the active user a and theneighbor cj are, the more weighting the RTT of cjwill carry in the prediction.

Ra;s ¼Pk

j¼1 Rcj;sSim0ða; cjÞPk

j¼1 Sim0ða; cjÞ

: ð8Þ

Note that previous CF-based web service recommenda-tion algorithms [20], [29] use (9), a rating aggregate methodcommonly adopted in recommender systems [13], [15], topredict the missing QoS value.

Ra;s ¼ �Ra þPk

j¼1ðRcj;s � �RcjÞSim0ða; cjÞPkj¼1 Sim

0ða; cjÞ: ð9Þ

However, it is not applicable in our context, since thisequation is based on the idea that each user’s rating range issubjective and comparatively fixed (e.g., critical usersalways rate items with lower ratings), whereas the range

of RTT varies largely from service to service. The averageRTT of all services provided by user a cannot reveal theperformance of a specific web service. Instead, we turn tothe RTT profile of the region center and use its RTT ofservice s to predict the missing value (7).

2.4 Time Complexity Analysis

We discuss the worst-case time complexity of the proposedalgorithm. Since there are two phases in our algorithm: theoffline phase for region creation and the online phase for theQoS value prediction, we analyze their time complexityseparately. We assume the input is a full matrix with n usersand m services.

2.4.1 Offline Time Complexity

In Section 2.2.1, the time complexity of calculating themedian and MAD of each service is OðnlognÞ. Form services,the time complexity is OðmnlognÞ. With MAD and median,we identify the region-sensitive services from the serviceperspective. Since there are at mostn records for each service,the time complexity of each service is OðnÞ using definition 1.Therefore, the total time complexity of region-sensitiveservice identification is OðmnlognþmnÞ ¼ OðmnlognÞ.

The time complexity of region aggregation (see Algo-rithm 1) is analyzed as follows:

We assume there are l0 regions at the beginning. Sincethere are at most m intersecting services for two regions, thetime complexity of the region similarity is OðmÞ using (5),and the complexity for computing similarity matrix C isOðl20mÞ (lines 1-10 of Algorithm 1).

The aggregation of two regions will execute at most l0�1times (lines 14-38), in case that all regions are nonsensitive,extremely correlate to each other and finally aggregate intoone region. In each iteration, we first compare at most l0 � 1heads of the priority queues to find the most similar pairs(line 15). Since the number of regions that can beaggregated decreases with iteration, the real search timewill be less than l0 � 1 in the following iterations. For theselected pair of regions, we calculate the new center andupdate their similar regions. Because the number of usersinvolved in the two regions are uncertain, we use thenumber of all users as the upper bound and the complexityis OðmnlognÞ. The insertion and deletion of a similar regionis Oðlogl0Þ, since we employ the priority queue to sortsimilar regions. Thus, the time complexity of Algorithm 1 isOðl20ðlogl0 þmnlognÞÞ ¼ Oðl20mnlognÞ.

As the above steps are linearly combined, the total timecomplexity of the offline part is Oðl20mnlognÞ.

2.4.2 Online Time Complexity

Let l1 be the number of regions after the phase of regioncreation. To predict the QoS value for an active user, Oðl1Þsimilarity calculations between the active user and regioncenters are needed, each of which takes OðmÞ time.Therefore, the time complexity of similarity computationis Oðl1mÞ.

For each service the active user has not evaluated, theQoS value prediction complexity is Oðl1Þ, because at most l1centers are employed in the prediction as stated by (7) and(8). There are at most m services without QoS values, so thetime complexity of the prediction for an active user is

CHEN ET AL.: PERSONALIZED QOS-AWARE WEB SERVICE RECOMMENDATION AND VISUALIZATION 39

Page 6: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 6, NO. 1, …act.buaa.edu.cn › hsun › papers › tsc13-recommend.pdf · Personalized QoS-Aware Web Service Recommendation and Visualization

Oðl1mÞ. Therefore, the time complexity for the online phaseincluding similarity computation and missing value pre-diction is Oðl1mÞ � OðmÞ (l1 is rather small compared to mor n). Compared to the memory-based CF algorithm used inprevious work with O(mn) online time-complexity, ourapproach is more efficient and well suited for large data set.

3 RECOMMENDATION VISUALIZATION

Conventionally, CF-based web service recommender sys-tems employ the predicted QoS mainly in two ways. 1) Whenusers query a service with specific functionality, the one withthe best predicted QoS is recommended to them. 2) Top-kbest-performing services are recommended to help usersdiscover potential services. While this kind of recommenda-tion is useful, it is not obvious to users why certain servicesare recommended. More than a service list ranked bypredicted QoS as recommendation, we need to develop anexploratory recommendation tool that provides valuableinsight into the QoS space and enables an improvedunderstanding of the overall performance of web services.

The QoS space visualization of all web services on a mapwill reveal the rationale behind QoS-based service recom-mendations. QoS space visualization is more than a picture ormethod of computing. It transforms the information of high-dimensional QoS data into a visual form enabling serviceusers to observe, browse, and understand the information.

We draw the QoS map by two steps: dimensionreduction step and map creation step. In the first step, wecreate a 2D representation of the high-dimensional QoSspace by using self-organizing map (SOM), and each webservice is mapped to a unique 2D coordinates. In the secondstep, we create a geographic-like QoS map based on theSOM training results. We detail the two drawing steps inSections 3.1 and 3.2, respectively.

3.1 SOM Training

The SOM [8] is a popular unsupervised artificial neuralnetwork that has been successfully applied to a broad rangeof areas, such as medical engineering, document organiza-tion, and speech recognition. When SOM is used forinformation visualization, it can be viewed as a mappingof a high-dimensional input space to a lower dimensionaloutput space (usually one or two dimensions).

The output space of SOM is a network of neurons locatedon a regular, usually 2D grid. Each neuron is equipped witha prototype vector which has the same dimension of theinput space. The neurons are connected to adjacent ones bya neighborhood relation indicating the structure of the mapsuch as a rectangular or hexagonal lattice. After the trainingphase, data points close to each other in the input space aremapped onto the nearby neurons. With this topologypreserving projection property, SOM is frequently em-ployed in data survey applications to help visualize theinherent structure of high-dimensional complex data.

The principal goal of using SOM in our context is totransform the QoS data employed by CF into a 2D discretemap in a topologically ordered fashion. In this context, theQoS map is to show the similarity of RTT variance ofdifferent web services. Intuitively, the input of the SOM isthe QoS matrix containing all the services (rows) and their

respective QoS values provided by all users (columns).However, since the data set is rather sparse and the numberof QoS values varies from service to service, the originaldata set cannot reveal the underlying structure of the QoSspace. To address this problem, the QoS set derived fromregion centers is employed to train the SOM.

Let l denote the dimension of the input space (QoS data)and q ¼ ½q1; q2; . . . ; ql�T denote an input pattern (QoS vectorof a service). The prototype of neuron j is denoted by

!j ¼ ½!j1; !j2; . . . ; !jl�T ; j ¼ 1; 2; . . . ; k; ð10Þ

where k is the total number of neurons in the network.The SOM is trained iteratively. In each training step, a

QoS vector q is randomly chosen from the input data set.The euclidean distance between q and all prototypes iscalculated. The neuron with the prototype closest to q ischosen as the best-matching unit (BMU) as Fig. 2 depicts.Let iðxÞ be the index of BMU, we determine iðxÞ byapplying

iðxÞ ¼ arg minjjjq � !jjj; j ¼ 1; 2; . . . ; k: ð11Þ

Then the prototype of BMU and those of BMU’sneighbors are adapted to move closer to the input vector qin the input space. Given the prototype !jðnÞ of neuron j attime n, the updated vector !jðnþ 1Þ of this neuron at timenþ 1 is defined by

!jðnþ 1Þ ¼ !jðnÞ þ �ðnÞhj;iðnÞðq � !jðnÞÞ; ð12Þ

where �ðnÞ is the learning rate function, and hj;iðnÞ is theneighborhood function which determines how strongly theneurons are connected with each other. The commonly usedGaussian neighborhood function is defined as:

hj;iðnÞ ¼ exp �rj � rik��2�2ðnÞ

� �; ð13Þ

where rj is the location of neuron j, and �ðnÞ is theneighborhood radius at time n. Both �ðnÞ and hj;iðnÞdecrease monotonically with time.

The map is usually trained in two phases [4]: a roughtraining phase that neurons are topologically ordered withrelatively large initial neighborhood radius and learningrate; a convergence phase to fine tune the map with small

40 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 6, NO. 1, JANUARY-MARCH 2013

Fig. 2. Mapping QoS space to the 2D output space of SOM. Each QoSvector is mapped to the BMU with closest euclidean distance.

Page 7: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 6, NO. 1, …act.buaa.edu.cn › hsun › papers › tsc13-recommend.pdf · Personalized QoS-Aware Web Service Recommendation and Visualization

initial neighborhood radius and learning rate. After thetraining of SOM, services with similar QoS are mapped ontothe same neuron or nearby neurons. The mapping result ofthe QoS data reflects the QoS similarities between services.

3.2 Map Creation

The direct approach to web service QoS map is to assigneach web service a distinct portion of the 2D display area,and put services with similar QoS performance next to eachother. With the training result, we first assign each serviceunique coordinates by randomly distributing them withinthe cell boundary of the corresponding neuron. Then theVoronoi diagram [21], [30] is used to form a base map inwhich each service corresponds to a unique polygon.

When applied to a large set of services, base map alone isinsufficient and will quickly become too complex to revealthe underlying data relationships. A generalized mapexplicitly telling the cluster information is needed. Weapply hierarchical clustering method [16] to cluster webservices based on their QoS similarity and form a general-ized map by merging service polygons belonging to thesame cluster. The topological preserving feature of SOMguarantees that services belong to the same cluster areusually neighboring polygons on the map.

We put web service recommendations on the map byusing the predicted QoS values. For those functionallyequivalent services, the one with the best predicted QoS willbe marked on the map. We also highlight the top-k bestperforming services to help users find potential ones.Section 5 provides the detail of the map implementation.

4 EXPERIMENTS

In this section, we give a comprehensive study on the QoSprediction performance of our proposed algorithm.

4.1 Experimental Setup

We adopt a real-world web service QoS performance dataset2 for the experiment. The data set contains about1.5 million web service invocation records of 100 webservices from more than 20 countries. The RTT records arecollected by 150 computer nodes from the Planet-Lab,3

which are distributed over 20 countries. For each computernode, there are 100 RTT profiles, and each profile containsthe RTT records of 100 services. We randomly extract 20profiles from each node, and obtain 3,000 users with RTTsranging from 2 to 31,407 milliseconds.

We divide the 3,000 users into two groups, one astraining users and the rest as active (test) users. To simulatethe real situation, we randomly remove a certain number ofRTT records of the training users to obtain a sparse trainingmatrix. We also remove some records of the active users,since active users usually only have invoked a smallnumber of web services in reality.

To evaluate the prediction performance, we compare ourapproach with user-based CF algorithm using PCC (UPCC)[20], item-based CF algorithm using PCC (IPCC) [11], andWSRec [29] which combines UPCC and IPCC.

We use Mean Absolute Error (MAE), the well-knownstatistical accuracy metric, to measure the predictionaccuracy. MAE is the average absolute deviation ofpredictions to the ground truth data. For all test servicesand test users:

MAE ¼

Xu;s

jRu;s � Ru;sj

L; ð14Þ

where Ru;s denotes the actual RTT of web service sobserved by user u; Ru;s denotes the predicted RTT value,and L denotes the number of predicted values. SmallerMAE indicates better prediction accuracy.

4.2 Prediction Evaluation

In this experiment, we randomly remove 90 and 80 percentRTTs of the initial training matrix to generate two sparsematrices with density 0.1 and 0.2, respectively. We vary thenumber of RTT values given by active users from 10, 20 to30, and name them given 10, 20, and 30, respectively. Theremoved records of active users are used to study theprediction accuracy. In this experiment, we set � ¼ 0:3,� ¼ 0:8, top� k ¼ 10. To get a reliable error estimate, weuse 10 times 10-fold cross-validation [31] to evaluate theprediction performance and report the average MAE value.

Table 1 shows the prediction performance of differentmethods employing the 0.1 and 0.2 density training matrix.We observe that our method significantly improves theprediction accuracy, and outperforms others consistently.The performance of UPCC, WSRec, and our approachenhances significantly with the increase of matrix density aswell as the number of QoS values provided by active users(given number). On the contrary, there is only a slightimprovement of IPCC. The original idea of IPCC is to matchitems with similar user ratings and combine them torecommendations. Apparently, it is not appropriate toapply this idea to our context, because even servicesprovided by the same company are hardly to have similarresponse times to different users.

CHEN ET AL.: PERSONALIZED QOS-AWARE WEB SERVICE RECOMMENDATION AND VISUALIZATION 41

TABLE 1Prediction Performance Comparison

2. http://www.wsdream.net.3. http://www.planet-lab.org.

Page 8: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 6, NO. 1, …act.buaa.edu.cn › hsun › papers › tsc13-recommend.pdf · Personalized QoS-Aware Web Service Recommendation and Visualization

4.3 Impact of �� and ��

The two thresholds � and � in the phase of region creation

play a very important role in determining the number of

regions and thus impact the final performance of our

approach. As shown in Algorithm 1, only those regions

with similarity higher than � and sensitivity less than � areable to be aggregated. In this experiment, we study the

impact of � and � on a sparse matrix with 2,700 training

users and 300 active users. We set density ¼ 0:2, given ¼ 10

and employ all the neighbors with positive PCC for QoS

prediction. We vary the two thresholds � and � both from0.1 to 0.9 with a step of 0.1. Fig. 3 shows how � and � affect

the number of regions and the final performance. It shows

that lower � and higher � result in fewer regions, but fewer

regions does not necessarily mean better predictionaccuracy. For this data set, better prediction accuracy is

achieved with higher � and �.Note that the optimal value of � is related to the

sensitivity of the original regions at the outset. Fig. 4 shows

the distribution of the region sensitivity before aggregation.

It shows that the sensitivity of most regions (81.3 percent) isless than 0.1, while the sensitivity of a few regions

(4.67 percent) is around 0.8. Higher � and � allow very

similar regions with high sensitivity to be aggregated and

achieve good performance in this experiment.

Fig. 5a shows the relation between � and predictionaccuracy with training matrix density 0.2, 0.5, and 1. Weemploy all the neighbors with positive PCC values for QoSprediction and set � ¼ 1, so that we do not consider thefactor of sensitivity in region aggregation. Similaritybecomes the single factor. Obviously, for denser matrix,with higher � we obtain a set of coherent regions, and betterprediction is obtained.

4.4 Impact of Top-k

Top-k determines how many neighbors are employed inthe phase of QoS prediction which relates to the predictionaccuracy. We employ a training matrix of density 0.3, andset � ¼ 0:2; � ¼ 0:8. After the model building phase, weobtain 39 regions. To study the impact of neighborhoodsize, we vary top-k from 5 to 40 with a step of 5. Fig. 5bshows the result with given number from 10 to 30. Thetrends of the three curves are the same that MAE decreasessharply with an increasing neighborhood size at thebeginning, and then stays around a certain value. Astop-k grows, more regions are selected in the QoSprediction, while those later added regions are usuallyless similar and make little contribution to the final result.

4.5 Data Sparseness

This experiment investigates the impact of data sparsenesson the prediction accuracy. We examine the impact fromtwo aspects: the density of training matrix and the numberof QoS values given by active users (given number). Wedivide the experiment into two parts and use 10 times 10-fold cross-validation to assess the prediction results andreport the average MAE.

We first study the impact of training matrix density. Wevary the density of the training matrix from 0.1 to 0.5 with astep of 0.1, and set given ¼ 10. Fig. 5c shows theexperimental results. It shows that: 1) with the increase ofthe training matrix density, the performance of IPCC, UPCC,and our method enhances indicating that better prediction isachieved with more QoS data. WSRec is not sensitive to thedata sparseness, and it stays around a certain value. 2) Ourmethod outperforms others consistently.

To study the impact of given number on the predictionresults, we employ the training matrix with density 0.3 andvary the given number from 10 to 50 with a step of 10.

42 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 6, NO. 1, JANUARY-MARCH 2013

Fig. 3. Impact of thresholds � and �. (a) Impact on the number of regions. (b) Impact on the prediction performance (MAE).

Fig. 4. The distribution of region sensitivity.

Page 9: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 6, NO. 1, …act.buaa.edu.cn › hsun › papers › tsc13-recommend.pdf · Personalized QoS-Aware Web Service Recommendation and Visualization

Fig. 5d shows the experimental results. It shows that the

prediction performance of IPCC, UPCC, and WSRec

improves greatly with the growth of the given number,

while our method enhances slightly. This observation

indicates that our method is not sensitive to the value of

given number. It can achieve good prediction result even

when the given number is rather small.

4.6 Significance Weighting

Significance weighting factor is added to devalue similarity

weights that are based on a small number of coinvoked web

services. To study the impact of this factor, we implement

two versions of the algorithm, one with the significance

weighting (5) and the other without (4). Since the over-

estimation of the similarity occurs when the active user and

training users have few coinvoked services, we study the

impact by varying the number of QoS provided by both the

training users (training matrix density) and active users

(given number) with two experiments.In the first experiment, we employ 2,700 training users,

300 active users, set given ¼ 5, � ¼ 0:2, � ¼ 0:3, top-k ¼ 20.

We vary the density of the training matrix from 0.1 to 0.5

CHEN ET AL.: PERSONALIZED QOS-AWARE WEB SERVICE RECOMMENDATION AND VISUALIZATION 43

Fig. 5. Parameters impact on prediction performance (MAE). (a) Impact of region similarity �. (b) Impact of Top-K. (c) Impact of the training matrixdensity. (d) Impact of the given number. (e) Significance weighting Impact with changing density. (f) Significance weighting Impact with changinggiven number.

Page 10: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 6, NO. 1, …act.buaa.edu.cn › hsun › papers › tsc13-recommend.pdf · Personalized QoS-Aware Web Service Recommendation and Visualization

with a step of 0.1. As Fig. 5e shows, applying thesignificance weighting increases the accuracy of the pre-diction algorithm in most cases.

The other experiment is carried out with the samenumber of training users and active users. We set thedensity of the training matrix 0.05, � ¼ 0:2, � ¼ 0:3, top-k¼ 10. We vary the given number from 10 to 50 by a step of10. Fig. 5f shows that the algorithm with the significanceweighting consistently increases the accuracy of theprediction by a relatively large amount. As the givennumber increases, the gap between the two becomes moreobvious. This is because with a sparse training matrixðdensity ¼ 0:05Þ and high given number, it is common forthe active user to have neighbors that were based on tinysamples (two or three coinvoked services with similar QoS).In this case, overestimation of the similarity by (4)frequently leads to poor prediction accuracy.

5 A MAP DISPLAY FOR RECOMMENDATION

In this section, we demonstrate how to create a mapshowing the similarity of RTT variance of web services, andhow to put personalized web service recommendations onthe map for an active user. We use 2,700 training users andset given ¼ 10, density ¼ 0:5. After the region aggregationphase, 17 regions are formed. The input of SOM is a 100�17 RTT matrix containing 100 services (rows) and theirrespective performance on 17 regions (columns). Each webservice’s QoS (row) is an input vector. We train an SOMwith neurons arranged in a 60� 80 hexagonal lattice. Theprototypes of SOM are randomly initialized, and Gaussianfunction is adopted as the neighborhood function (see (13)).We train the SOM in two phases: a rough training phasewith initial neighborhood width 15 and learning rate 0.05; afine tuning phase with initial neighborhood width 2 andlearning rate 0.01. The training lengths of the two phasesare 500 and 3,000 epochs, respectively. The learning ratedecreases linearly to zero during the training.

When the training process is completed, each service ismapped on to a neuron. We assign unique coordinates toeach service by randomly distributing them within theboundaries of the corresponding neuron cell (see Fig. 6a).To create a geographic map, each point is assigned to adistinct portion of the map display by forming a Voronoidiagram (see Fig. 6b). After that, we adopt the hierarchicalclustering to the services based on their QoS similarities and

obtained 42 clusters. We simplified the base map bymerging the neighboring polygons if they are in the samecluster. By this way, we form a generalized map high-lighting the underlying structure of the QoS space.

Labeling individual service is an integral part of the mapcreation. The goal is to help users identify the potentialservices with optimal QoS values. We use different labelstyles to mark services showing how strongly we recom-mend them. For example, the top 10 best performingservices are labeled with 12-point boldface; services withgood predicted QoS are labeled with 8-point nonboldface;while dots indicate those services with poor predictedQoS values. Fig. 7 shows the final map for web servicerecommendations.

The map can also be created for one region to show thesimilarity of web services based on a set of QoS properties.For example, we can employ three QoS properties: RTT,cost, and reputation. RTT and cost are quantitative proper-ties, while reputation is qualitative (usually 1-5 starts). Eachregion center is a matrix composed of three column vector(RTT, Cost, Reputation) as Fig. 8 shows. Since differentproperties have different ranges, we first normalize each ofthem to [0, 1] by the following steps: 1) find the minimumRmin and maximum Rmax value of the property; 2) for eachoriginal value R submitted by the user, the normalizedvalue R’ is calculated by

R0 ¼ R�Rmin

Rmax �Rmin: ð15Þ

With the normalized values, each property will have equal

weights in the SOM training. The map creation process is

the same, and we can obtain a map reflecting the web

service QoS similarity of a specific region.

6 RELATED WORK

In this section, we discuss related work regarding CF andweb service recommendation.

6.1 Collaborative Filtering

Collaborative Filtering is first proposed by Rich [18] andwidely used in commercial recommender systems, such asAmazon.com [11]. The basic idea of CF is to predict andrecommend the potential favorite items for a particularuser by leveraging rating data collected from similar users.Formally, a CF domain consists of n users fu1; u2; . . . ; ung,m items fi1; i2; . . . ; img, and users’ ratings on items, whichis often denoted by a user-item matrix. Each entry rx;yð1 �x � n; 1 � y � mÞ in this matrix represents user x’ s ratingon item y. The rating score usually has a fixed range, like1-5. Since users only express their preference on a smallnumber of items, the matrix is very sparse in reality.

Essentially, CF is based on processing the user-itemmatrix. Breese et al. [2] divided the CF algorithms into twobroad classes: memory-based algorithms and model-basedalgorithms.

Memory-based algorithms such as user-based KNN [13],[15] use the entire user-item matrix when computingrecommendations. These algorithms are easy to implement,require little or no training cost, and can easily take new

44 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 6, NO. 1, JANUARY-MARCH 2013

Fig. 6. Map creation. (a) 2D locations of services derived from SOMtraining. (b) Voronoi diagram of services based on the 2D locations.

Page 11: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 6, NO. 1, …act.buaa.edu.cn › hsun › papers › tsc13-recommend.pdf · Personalized QoS-Aware Web Service Recommendation and Visualization

users’ ratings into account. However, memory-based algo-rithms cannot cope well with large number of users anditems, since their online performance is often slow.

Alternatively, model-based algorithms, such as K-meansclustering [26], Bayesian model [3], etc., learn the modelfrom the data set using statistical and machine learningtechniques. These model-based algorithms can quicklygenerate recommendations and achieve good online per-formance. However, the model must be performed anewwhen new users or items are added to the matrix.

6.2 Web Service Recommendation

Web service discovery is a hot topic which plays a crucialrole in the area of services computing [33]. Some syntacticand semantic-based web service search engines have beenproposed in the recent literature. Dong et al. [37] found thatthe traditional key word-based web service search wasinsufficient, and they provided a similarity search algo-rithm for web services underlying the Woogle searchengine. Liu et al. [38] investigated the similarity measure-ment of web services and designed a graph-based searchmodel to find web services with similar operations.Recommendation techniques have been used in recentresearch projects to enhance web service discovery. Mehtaet al. [14] found that semantics and syntax were inadequateto discover a service that meets user requirements. They

added two more dimensions of service description: qualityand usage pattern. Based on this service description, theypropose the service mediation architecture. Blake andNowlan [1] computed a web service recommendation scoreby matching strings collected from the user’s operationalsessions and the description of the web services. Based onthis score, they judged whether a user is interested in theservice. Maamar et al. [12] proposed a model for the contextof web service interactions and highlighted the resource onwhich the web service performed. Zhao et al. [17] provideda way to model services and their linkages by semanticalgorithm. Based on the input keywords, users can get a setof recommendations with linkages to the query. Previouswork mainly focused on providing a mechanism toformalize users’ preference, resource, and the descriptionof web services, and recommendations are generated basedon the predefined semantic models. Different from thesemethods, our recommendations are generated by miningthe QoS records that are automatically collected frominteractions between users and services.

Limited work has been done to apply CF to web servicerecommendation. Shao et al. [20] proposed a user-based CFalgorithm to predict QoS values. Zheng et al. [29] combinedthe user-based and item-based CF algorithm to recommendweb services. However, since neither of the two approachesrecognized the different characteristic between web serviceQoS and user ratings, the prediction accuracy of thesemethods was unsatisfactory. Sreenath and Singh [22] andRong et al. [19] applied the idea of CF in their systems, andused MovieLens data [15] for experimental analysis.However, using the movie data set to study web servicerecommendation is not convincing.

Different from these existing methods, which suffer fromlow prediction accuracy, we propose an effective CF

CHEN ET AL.: PERSONALIZED QOS-AWARE WEB SERVICE RECOMMENDATION AND VISUALIZATION 45

Fig. 8. Region center matrix.

Fig. 7. Final map with service recommendations.

Page 12: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 6, NO. 1, …act.buaa.edu.cn › hsun › papers › tsc13-recommend.pdf · Personalized QoS-Aware Web Service Recommendation and Visualization

algorithm for web service recommendation with considera-tion of the region factor. Comprehensive experimentsconducted with real QoS records show that our methodoutperforms others consistently.

There are several SOM-based methods to visualize datastructure. U-Matrix [25] is the most popular one thatdisplays the local distance structures of the input vectors.U*-Matrix [24], an enhancement of U-Matrix, combines thedensity and distance information for visualization. Colorassignment is also employed to show the approximatecluster structures [6], [9]. To exploit data topology invisualization, Tasdemir and Merenyi [23] introduced aweighted Delaunay triangulation. Different from ourapproach that directly clusters the web services to form ageneralized map, data clusters can be computed byapplying clustering techniques to the trained prototypes,and clusters can be visualized on top of the map [21], [27].

7 CONCLUSION AND FUTURE WORK

In this paper, we have presented an innovative approach toweb service recommendation and visualization. Differentfrom previous work, our algorithm employs the character-istic of QoS by clustering users into different regions. Basedon the region feature, a refined nearest-neighbor algorithmis proposed to generate QoS prediction. The final servicerecommendations are put on a map to reveal the under-lying structure of QoS space and help users accept therecommendations. Experimental results show that ourapproach significantly improves the prediction accuracythan the existing methods regardless of the sparseness ofthe training matrix. We also demonstrate that the onlinetime complexity of our approach is better than thetraditional CF algorithms.

In this paper, our recommendation approach consideredthe correlation between QoS records and users’ physicallocations by using IP addresses, which has achieved goodprediction performance. In some cases, however, users inthe same physical locations may observe different QoSperformance of the same web service. Besides the userphysical location, we will investigate more contextualinformation that influences the client-side QoS perfor-mance, such as the workload of the servers, networkconditions, and the activities that users carry out with webservices (e.g., web services are used alone or in composi-tion). More investigations on the distribution of RTT andthe correlation between different QoS properties such asRTT and availability will be conducted in our web servicesearch engine project ServiceXchange.4

For the visualization of the recommendation results, weplan to add more user interactions such as searching webservices on the QoS map, zooming in and zooming out.Graphic map like google map will be combined to helpusers navigate their similar users and web service providerson the map.

User acceptance rate of the recommendation is a keyindicator of the effectiveness of the recommender system.We will collect more user feedbacks of the recommendationto help improve the prediction accuracy of our web servicerecommendation algorithm.

ACKNOWLEDGMENTS

This research work is partly supported by China 863program MOST (Ministry of Science and Technology)under grant No. 2007AA010301 and the State Key Lab forSoftware Development Environment under grant SKLSDE-2010-ZX-03.

REFERENCES

[1] M.B. Blake and M.F. Nowlan, “A Web Service RecommenderSystem Using Enhanced Syntactical Matching,” Proc. Int’l Conf.Web Services, pp. 575-582, 2007.

[2] J.S. Breese, D. Heckerman, and C. Kadie, “Empirical Analysis ofPredictive Algorithms for Collaborative Filtering,” Proc. 14th Conf.Uncertainty in Artificial Intelligence (UAI ’98), pp. 43-52, 1998.

[3] Y.H. Chen and E.I. George, “A Bayesian Model for CollaborativeFiltering,” Proc. Seventh Int’l Workshop Artificial Intelligence andStatistics, http://www.stat.wharton.upenn.edu/~edgeorge/Research_papers/Bcollab.pdf, 1999.

[4] S. Haykin, Neural Networks: A Comprehensive Foundation, seconded. Prentice-Hall, 1999.

[5] J.L. Herlocker, J.A. Konstan, and J. Riedl, “Explaining Collabora-tive Filtering Recommendations,” Proc. ACM Conf. ComputerSupported Cooperative Work, pp. 241-250, 2000.

[6] J. Himberg, “A SOM Based Cluster Visualization and ItsApplication for False Coloring,” Proc. IEEE-INNS-ENNS Int’l JointConf. Neural Networks, pp. 587-592, 2000, vol. 3, doi:10.1109/IJCNN.2000.861379.

[7] Hsu, and S.K. Halgamuge, “Class Structure Visualization withSemi-Supervised Growing Self-Organizing Maps,” Neurocomput-ing, vol. 71, pp. 3124-3130, 2008.

[8] T. Kohonen, “The Self-Organizing Map,” Proc. IEEE, vol. 78, no. 9,pp. 1464-1480, Sept. 1990.

[9] S. Kaski, J. Venna, and T. Kohonen, “Coloring that Reveals High-Dimensional Structures in Data,” Proc. Sixth Int’l Conf. NeuralInformation Processing, vol. 2, pp. 729-734, 1999.

[10] J.A. Konstan, B.N. Miller, D. Maltz, J.L. Herlocker, L.R. Gordan,and J. Riedl, “GroupLens: Applying Collaborative Filtering toUsenet News,” Comm. ACM, vol. 40, no. 3, pp. 77-87, 1997.

[11] G. Linden, B. Smith, and J. York, “Amazon.com Recommenda-tions: Item-to-Item Collaborative Filtering,” IEEE Internet Comput-ing, vol. 7, no. 1, pp. 76-80, Jan./Feb. 2003.

[12] Z. Maamar, S.K. Mostefaoui, and Q.H. Mahmoud, “Context forPersonalized Web Services,” Proc. 38th Ann. Hawaii Int’l Conf.,pp. 166b-166b, 2005.

[13] M.R. McLaughlin and J.L. Herlocker, “A Collaborative FilteringAlgorithm and Evaluation Metric That Accurately Model the UserExperience,” Proc. Ann. Int’l ACM SIGIR Conf., pp. 329-336, 2004.

[14] B. Mehta, C. Niederee, A. Stewart, C. Muscogiuri, and E.J.Neuhold, “An Architecture for Recommendation Based ServiceMediation,” Semantics of a Networked World, vol. 3226, pp. 250-262,2004.

[15] B.N. Miller, I. Albert, S.K. Lam, J.A. Konstan, and J. Riedl,“MovieLens Unplugged: Experiences with an Occasionally Con-nected Recommender System,” Proc. ACM Int’l Conf. IntelligentUser Interfaces, pp. 263-266, 2003.

[16] C.D. Mining, P. Raghavan, and H. Schutze, An Introduction toInformation Retrieval. Cambridge Univ., 2009.

[17] C. Zhao, C. Ma, J. Zhang, J. Zhang, L. Yi, and X. Mao,“HyperService: Linking and Exploring Services on the Web,”Proc. Int’l Conf. Web Services, pp. 17-24, 2010.

[18] E. Rich, “User Modeling via Stereotypes,” Cognitive Science, vol. 3,no. 4, pp. 329-354, 1979.

[19] W. Rong, K. Liu, and L. Liang, “Personalized Web ServiceRanking via User Group Combining Association Rule,” Proc. Int’lConf. Web Services, pp. 445-452, 2009.

[20] L. Shao, J. Zhang, Y. Wei, J. Zhao, B. Xie, and H. Mei,“Personalized QoS Prediction for Web Services via CollaborativeFiltering,” Proc. Int’l Conf. Web Services, pp. 439-446, 2007.

[21] A. Skupin, “A Cartographic Approach to Visualizing ConferenceAbstracts,” Computer Graphics and Applications, vol. 22, no. 1,pp. 50-58, 2002.

[22] R.M. Sreenath and M.P. Singh, “Agent-Based Service Selection,”J. Web Semantics. vol. 1, no. 3, pp. 261-279, 2003, doi:10.1016/j.websem.2003.11.006.

46 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 6, NO. 1, JANUARY-MARCH 2013

4. http://www.servicexchange.cn/repository/.

Page 13: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 6, NO. 1, …act.buaa.edu.cn › hsun › papers › tsc13-recommend.pdf · Personalized QoS-Aware Web Service Recommendation and Visualization

[23] K. Tasdemir and E. Merenyi, “Exploiting Data Topology inVisualization and Clustering of Self-Organizing Maps,” IEEETrans. Neural Networks, vol. 20, no. 4, pp. 549-562, Apr. 2009.

[24] A. Ultsch, “U*-Matrix: A Tool to Visualize Clusters in HighDimensional Data,” Technical Report 36, CS Department, Phi-lipps-Univ., 2004.

[25] A. Ultsch and H.P. Siemon, “Kohonen’s Self-Organizing FeatureMaps for Exploratory Data Analysis,” Proc. Int’l Neural NetworksConf., pp. 305-308, 1990.

[26] L.H. Ungar and D.P. Foster, “Clustering Methods for Collabora-tive Filtering,” Proc. AAAI Workshop Recommendation Systems, 1998.

[27] J. Vesanto and E. Alhoniemi, “Clustering of the Self-OrganizingMap,” IEEE Trans. Neural Networks, vol. 11, no. 3, pp. 586-600,May 2000.

[28] J. Zhang, H. Shi, Y. Zhang, “Self-Organizing Map Methodologyand Google Maps Services for Geographical EpidemiologyMapping,” Proc. Digital Image Computing: Techniques and Applica-tions, pp. 229-235, 2009, doi:10.1109/DICTA.2009.46.

[29] Z. Zheng, H. Ma, M.R. Lyu, and I. King, “WSRec: A CollaborativeFiltering Based Web Service Recommendation System,” Proc. Int’lConf. Web Services, pp. 437-444, 2009.

[30] F. Aurenhammer, “Voronoi Diagrams—A Survey of a Funda-mental Geometric Data Structure,” ACM Computing Surveys,vol. 23, no. 3, pp. 345-405, 1991.

[31] I.H. Witten and E. Frank, Data Mining: Practical Machine LearningTools and Techniques, second ed. Elsevier, 2005.

[32] P.J. Rousseeuw and C. Croux, “Alternatives to the MedianAbsolute Deviation,” J. Am. Statistical. Assoc., vol. 88, no. 424,pp. 1273-1283, 1993.

[33] L.-J. Zhang, J. Zhang, and H. Cai, Services Computing. Springer andTsinghua Univ., 2007.

[34] T. Yu, Y. Zhang, and K.-J. Lin, “Efficient Algorithms for WebServices Selection with End-to-End QoS Constraints,” ACM Trans.Web, vol. 1, no. 1, pp. 1-26, 2007.

[35] S. Rosario, A. Benveniste, S. Haar, and C. Jard, “Probabilistic QoSand Soft Contracts for Transaction-Based Web Services Orchestra-tions,” IEEE Trans. Services Computing, vol. 1, no. 4, pp. 187-200,Oct. 2008.

[36] X. Chen, X. Liu, Z. huang, and H. Sun, “RegionKNN: A ScalableHybrid Collaborative Filtering Algorithm for Personalized WebService Recommendation,” Proc. Int’l Conf. Web Services, pp. 9-16,2010.

[37] X. Dong, A. Halevy, J. Madhavan, E. Nemes, and J. Zhang,“Similarity Search for Web Services,” Proc. 30th Int’l Conf. VeryLarge Data Bases, pp. 372-383, 2004.

[38] X. Liu, G. Huang, and H. Mei, “Discovering Homogeneous WebService Community in the User-Centric Web Environment,” IEEETrans. Services Computing, vol. 2, no. 2, pp. 167-181, Apr.-June 2009.

[39] E.M. Maximilien and M.P. Singh, “A Framework and Ontologyfor Dynamic Web Services Selection,” IEEE Internet Computing,vol. 8, no. 5, pp. 84-93, Sept. 2004.

[40] 68-95-99.7 Rule, http://en.wikipedia.org/wiki/68-95-99.7_rule,2012.

Xi Chen received the BS degree in informationand computing science from the Beijing Informa-tion Technology Institute, China, in 2008. Cur-rently, she is working toward the master’sdegree in the School of Computer Science andEngineering, Beihang University, Beijing, China.She received the Google Excellence Scholar-ship in 2010. Her research interests includeservices computing, cloud computing, and datamining. She is a member of the IEEE.

Zibin Zheng received the BEng degree andMPhil degree in computer science from Sun Yat-Sen University, Guangzhou, China, in 2005 and2007, respectively. He is currently workingtoward the PhD degree in the Department ofComputer Science and Engineering, The Chi-nese University of Hong Kong. He receivedSIGSOFT Distringuish Paper Award at ICSE2010, Best Student Paper Award at ICWS 2010,and IBM PhD Fellowship Award 2010-2011. He

served as program committee member of IEEE CLOUD 2009 andCLOUDCOMPUTING 2010 and has served as a reviewer for interna-tional journal and conferences, including the TSE, TPDS, TSC, JSS,DSN, ICEBE, ISSRE, KDD, SCC, WSDM, and WWW. His researchinterests include service computing, software reliability engineering, andcloud computing. He is a member of the IEEE.

Xudong Liu received the PhD degree incomputer application technology from BeihangUniversity, Beijing, China. He is a professor anddoctoral supervisor at Beihang University. Hisresearch interests mainly include middlewaretechnology and applications, service-orientedcomputing, trusted network computing, and net-work software development.

Zicheng Huang received the BS degree fromBeihang University in 2004. He is currentlyworking toward the PhD degree in the Schoolof Computer Science and Engineering, Bei-hang University, Beijing, China. His researchinterests include the areas of services comput-ing, business process management, and socialcomputing.

Hailong Sun received the BS degree incomputer science from Beijing Jiaotong Univer-sity in 2001. He received the PhD degree incomputer software and theory from BeihangUniversity in 2008. He is an assistant professorin the School of Computer Science and En-gineering, Beihang University, Beijing, China.His research interests include services comput-ing, peer-to-peer computing, grid/cloud comput-ing, and distributed systems. He is a member of

the IEEE and the ACM.

CHEN ET AL.: PERSONALIZED QOS-AWARE WEB SERVICE RECOMMENDATION AND VISUALIZATION 47