12
Research Article Attribute-Sentiment Pair Correlation Model Based on Online User Reviews Xiang Ling Fu, 1,2 Ji Wu, 3 Jinpeng Chen , 1 and Shaohui Liu 1 1 School of Software Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China 2 Key Laboratory of Trustworthy Distributed Computing and Service, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing 100876, China 3 Department of Electronic Engineering, Tsinghua University, Beijing 100084, China Correspondence should be addressed to Jinpeng Chen; [email protected] Received 7 August 2018; Revised 6 December 2018; Accepted 23 January 2019; Published 17 March 2019 Academic Editor: Guiyun Tian Copyright © 2019 Xiang Ling Fu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. With the popularization of Internet applications and the rapid development of e-commerce, online shopping has become a widespread and important pattern of consumption. Online user comments are an important data asset on e-commerce sites and have a great potential value for online users and merchants. However, accurate and eective extraction of the characteristics of products and userssentiment evaluation from a tremendous amount of comments is a signicant challenge. Based on the concept of the LinLog energy model, this paper proposes an online review attribute-sentiment pair correlation model that evaluates user comments. After preprocessing the comment data of mobile phones and constructing an attribute dictionary, the proposed model conducts a clustering analysis of attributes and sentiment pairs to gain accurate assessment of attributes in order to explore potential information from user comments. Experiments conducted on one real-world dataset with comprehensive measurements verify the ecacy of the proposed model. 1. Introduction With the popularization of Internet applications and the rapid development of e-commerce, online shopping has become a widespread and important pattern of consumption. Online user comments are an important data asset for e-commerce sites and text collection [1] for product subject to userspersonal subjective or objective attitudes after shop- ping online. These data have a great potential value for online shoppers and merchants. From the perspective of consumers, the comment data aect their purchasing choices [2]. From the manufacturerspoint of view [3], by analyzing the com- ment data, they can improve existing products and develop new product attributes. Most online consumer product review platforms currently do not consider the needs and preferences of individual consumers. Few online review platforms organize and present comments in a personalized, product-oriented manner. Consequently, it is virtually impossible for consumers to identify comments targeted at specic product features through hundreds or even thou- sands of comments [4]. According to statistics, on average, only 6.5% of new prod- uct programs in the market can be transformed into products. Of that number, less than 15% of new products can be success- fully commercialized and 37% of new products coming into the market are a failure in business [5]. When analyzing the factors restricting product development, it is dicult to cap- ture the real user demand from only the overall performance of the products. For example, in the mobile phone industry, users are satised with the overall performance of iPhone 6 but give low feedback for its battery performance [6]. This paper connects the specic characteristics of mobile phones to the sentiments of usersevaluation by tapping their com- ments to deeply explore the comment value. In previous studies on user comments, researchers tended to analyze user sentiment implied in the comments and proposed numerous models to judge sentiment polarity. However, the models developed only analyze the supercial Hindawi Journal of Sensors Volume 2019, Article ID 2456752, 11 pages https://doi.org/10.1155/2019/2456752

Attribute-Sentiment Pair Correlation Model Based on …downloads.hindawi.com/journals/js/2019/2456752.pdfResearch Article Attribute-Sentiment Pair Correlation Model Based on Online

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Attribute-Sentiment Pair Correlation Model Based on …downloads.hindawi.com/journals/js/2019/2456752.pdfResearch Article Attribute-Sentiment Pair Correlation Model Based on Online

Research ArticleAttribute-Sentiment Pair Correlation Model Based on OnlineUser Reviews

Xiang Ling Fu,1,2 Ji Wu,3 Jinpeng Chen ,1 and Shaohui Liu1

1School of Software Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China2Key Laboratory of Trustworthy Distributed Computing and Service, Ministry of Education, Beijing University of Postsand Telecommunications, Beijing 100876, China3Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

Correspondence should be addressed to Jinpeng Chen; [email protected]

Received 7 August 2018; Revised 6 December 2018; Accepted 23 January 2019; Published 17 March 2019

Academic Editor: Guiyun Tian

Copyright © 2019 Xiang Ling Fu et al. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

With the popularization of Internet applications and the rapid development of e-commerce, online shopping has become awidespread and important pattern of consumption. Online user comments are an important data asset on e-commerce sites andhave a great potential value for online users and merchants. However, accurate and effective extraction of the characteristics ofproducts and users’ sentiment evaluation from a tremendous amount of comments is a significant challenge. Based on theconcept of the LinLog energy model, this paper proposes an online review attribute-sentiment pair correlation model thatevaluates user comments. After preprocessing the comment data of mobile phones and constructing an attribute dictionary, theproposed model conducts a clustering analysis of attributes and sentiment pairs to gain accurate assessment of attributes inorder to explore potential information from user comments. Experiments conducted on one real-world dataset withcomprehensive measurements verify the efficacy of the proposed model.

1. Introduction

With the popularization of Internet applications and therapid development of e-commerce, online shopping hasbecome a widespread and important pattern of consumption.Online user comments are an important data asset fore-commerce sites and text collection [1] for product subjectto users’ personal subjective or objective attitudes after shop-ping online. These data have a great potential value for onlineshoppers and merchants. From the perspective of consumers,the comment data affect their purchasing choices [2]. Fromthe manufacturers’ point of view [3], by analyzing the com-ment data, they can improve existing products and developnew product attributes. Most online consumer productreview platforms currently do not consider the needs andpreferences of individual consumers. Few online reviewplatforms organize and present comments in a personalized,product-oriented manner. Consequently, it is virtuallyimpossible for consumers to identify comments targeted at

specific product features through hundreds or even thou-sands of comments [4].

According to statistics, on average, only 6.5% of new prod-uct programs in the market can be transformed into products.Of that number, less than 15% of new products can be success-fully commercialized and 37% of new products coming intothe market are a failure in business [5]. When analyzing thefactors restricting product development, it is difficult to cap-ture the real user demand from only the overall performanceof the products. For example, in the mobile phone industry,users are satisfied with the overall performance of iPhone 6but give low feedback for its battery performance [6]. Thispaper connects the specific characteristics of mobile phonesto the sentiments of users’ evaluation by tapping their com-ments to deeply explore the comment value.

In previous studies on user comments, researcherstended to analyze user sentiment implied in the commentsand proposed numerous models to judge sentiment polarity.However, the models developed only analyze the superficial

HindawiJournal of SensorsVolume 2019, Article ID 2456752, 11 pageshttps://doi.org/10.1155/2019/2456752

Page 2: Attribute-Sentiment Pair Correlation Model Based on …downloads.hindawi.com/journals/js/2019/2456752.pdfResearch Article Attribute-Sentiment Pair Correlation Model Based on Online

meaning of comments and fail to connect the sentiments andattributes accurately. Rob et al. [7] utilized the LinLog energyclustering analysis method to cluster music and concludedthat the method achieves a better effect than other clusteringalgorithms, as it distinguishes the edge of each category andspecifies the central attributes of the connection. Conse-quently, in this study, the same method was applied todevelop an online review attribute-sentiment pair correlationmodel based on user comments that mines potential infor-mation in the comments to learn user demands and subse-quently provides references for shoppers and formanufacturers to improve their goods.

The major contributions of this paper are as follows:

(i) The correlation of mobile phone attributes anduser sentiment on a user comment dataset isdemonstrated

(ii) An integrated method for constructing a dictionaryof mobile phone attributes is presented

(iii) A novel approach for matching user sentiments withmobile phone attributes is proposed

(iv) Results of experiments conducted on a real datasetthat demonstrate that the proposed method effec-tively provides purchasing advice to users are pre-sented and discussed

2. Literature Review

Since Hatzivassiloglou and McKeown [8] proposed the ideamining technology in 1997, the opinion mining has graduallybecome one of the most important research areas in datamining. The current research focuses on the extraction of fea-ture words and opinion words. The opinion words are theuser’s evaluation of features. The feature words and opinionwords constitute a two-group, fine-grained view. The goalof mining is to obtain the <character word, opinion word>two-group to represent the user’s evaluation of a feature.Because of the difficulty in obtaining a corpus, the study ofunsupervised methods accounts for the majority.

Yi et al. [9] used the similarity test (Likelihood Test)method to identify the explicit aspect according to the gram-matical structural features of the noun phrase, but thismethod cannot effectively solve the coverage problem of ter-minology. Smaller, you can build a dictionary of aspects formatching extractions in the text of comments. Samha et al.[10] use WordNet words (http://wordnet.princeton.edu) forexplicit extraction, by querying the names of related fieldsin the dictionary. Synonym information was used to identifyexplicit aspects of Zhu et al. [11]. In 2011, another unsuper-vised MAB (multiaspect bootstrapping) model was proposedfor explicit mining of Chinese restaurant reviews. Bagheriet al. [12] used the bootstrapping model, extracting aspectsfrom the POS mechanism, firstly tagging the text with thepart of speech, then using the heuristic rules to filter out theseed vocabulary set required by the conforming aspect com-position model, and finally using the bootstrapping methodto extract the aspects from the data. Quan and Ren [13] used

point mutual information and word frequency-inverse docu-ment frequency (TF-IDF) to discover explicit aspects andrelated entity classes and the relationship between the parties,from which the aspect belongs to the entity. For example, the“photo quality” is closer to the entity class and the “digitalcamera” than the entity class “MP3.” Zhou et al. [14] devel-oped an implementation aspect in 2016. Kaur and Bansal[15] proposed that in this paper, a mechanism for opinionmining of text comment data has been proposed for generat-ing a product review report based on multiple features andwhich can reveal several products: positive and negativepoints. Kumar et al. [16] proposed a method to automaticallyextract comments from websites and use the Naïve Bayesclassifier, logistic regression, and SentiWordNet algorithmsto classify comments into positive and negative commentsand use quality metrics to measure each of the performanceof the algorithm. The CMiner system extracted from theviewpoint summary was first applied to the microblog topiccomment data. They used dynamic programming-basedalgorithms to perform named entity tag segmentation in Chi-nese microblogs, based on “microblogs” in the same topicwhich may focus on the same or similar hypothesis, whichimplements an unsupervised label propagation algorithm,thus generating aspect candidate sets. Hu and Liu [17] pro-vide feature-based customer review summaries such as digi-tal cameras, mobile phones, and Mp3 players. The popularsemantic assessment workshop (SEMVAL) provides anexclusive tracking of aspect-based sentiment detection, someof which have written heuristic techniques to mine aspectsand emotions [18]. Recently, the authors used prior knowl-edge from several other product areas (e.g., comments onproducts from electronic categories) to extract aspects ofthe target product [19]. Prameswari et al. [20] used textmining methods and aspect-based sentiment analysis toobtain hotel user opinions in the form of emotions by apply-ing the recursive neural tensor network (RNTN) algorithm.Aspect-based sentiment analysis can provide typical senti-ment analysis. Information was provided. Li and Yang [21]have obvious advantages in comparing the sentiment miningmodel with the novel dictionary embedding module with thelogistic regression and support vector machine model basedon its performance. In 2018, Rakesh et al. [22] proposed theLDA variant APSUM model to simultaneously extract fea-ture words and viewpoint words. Cheng et al. [23] proposeda novel aspect-aware potential factor model that effectivelycombines reviews and ratings for rating prediction. Zvare-vashe and Olugbara [24] designed an emotional analysisframework to handle hotel customer feedback through opin-ion mining. Recently, Jiang et al. [25] applied natural lan-guage information opinions and emotion mining to thefield of health monitoring and achieved good results in pri-vacy protection and judgment of users’ psychological state.

Obtaining annotated data for evaluation in e-commercereview is difficult, which poses a limitation for supervisedmodels. Thus, it is more reasonable to use unsupervisedmodels in practical e-commerce applications. In this study, afeature dictionary, a perspective dictionary generation frame-work, and a window method were developed to first roughlyconstruct feature words and viewpoint word pairs. Then, the

2 Journal of Sensors

Page 3: Attribute-Sentiment Pair Correlation Model Based on …downloads.hindawi.com/journals/js/2019/2456752.pdfResearch Article Attribute-Sentiment Pair Correlation Model Based on Online

LinLog clustering algorithm was used to optimize the relation-ship between feature words and viewpoint words, and theresults mined to ascertain users’ actual evaluation of products.

3. Constructing Datasets

3.1. Constructing a Dictionary of Mobile Phone Attributes. Itis well known that Internet commentary differs from officiallanguage and colloquialization is a characteristic of the net-work data. In order to accurately extract the mobile phoneattributes and user sentiments in the comments, we first con-structed a complete mobile phone attribute dictionary andthen extracted the noun collections and adjective collectionsaccording to the dictionary.

The study was conducted on all user comments related tosmartphones running Android or IOS in self-operated storeson http://JD.com, one of the biggest e-commerce websites.For the smartphones preloaded with IOS, the comments oniPhone 4, iPhone 5, and iPhone 6 were collected, resultingin 41,202 pieces of data in total. For smartphones runningAndroid, the mainstream brands in the market were selected:Huawei, ZTE, Xiaomi, and Samsung. A total of 67,885 com-ments were collected. The cosine similarity algorithm [26]was adopted to remove duplicates and then to construct theattribute dictionary.

LDA, PageRank, and the Conditional Entropy integratedmodel were used to construct the dictionary. As shown inFigure 1, the available “NLPIR” tokenizer was first usedto segment words and identify word property for the givencorpus [27]. Then, nouns and noun phrases were extractedas a set, with no need for ordering in the set. Meanwhile,nouns, noun phrases, and adjectives were extracted asanother set, but with the adjectives and nouns sequencedaccording to the original order for future reference. A frac-tion of the data in the noun set was slated to be modelledand input into the LDA model. According to Ma et al.[28], LDA performs well in Chinese e-commerce commentmining. Following the evaluation of the model, we foundthat words get the best results when the number of topicswith the LDA model is set to 50 and the words from thetop 500 weights under the relevant topic are selected. The

meaning of the expression between themes may be crossed.For the same topic, the higher the weight of the words it con-tains, the more relevant it is to the topic [29, 30].

Taking all the candidate words obtained in the previ-ous step from the corpus, Yan et al. [4] proposed anextended PageRank algorithm that achieved excellent resultsin e-commerce comment feature extraction. In our study, weemployed this algorithm. After the LDA algorithm extractedthe candidate keyword set, all data in the set were sequencedin a priority order based on the PageRank-based algorithm[31] to produce candidate evaluation object set I. The PageR-ank model results in a stable weight after iteration, regardlessof the initial value [32]. The PageRank model can get theweight of each word, and the higher the weight is, the moreimportant it is.

For the ordered set of nouns and adjectives, priorityordering of all nouns in the set was realized using the condi-tional entropy filtering algorithm based on the cooccurrenceprobability [33], having candidate evaluation object set II.After the conditional entropy model, the opinion wordsand the characteristic words with weights can be obtainedat the same time. The opinion words can be used as the lex-icon for the viewpoint dictionary. On the basis of the abovetwo steps, the weighted value of the two candidate evalua-tion object sets was calculated for reordering. By weightingand averaging the results of the two models according tothe candidate feature words, the scores of each candidatefeature word were obtained, and the top 1000 candidatefeature words with the highest weight were sorted to con-struct a feature dictionary.

We had two word libraries after data preprocessing:the corpus and the adjective library of weighted sorting.The distribution of the corpus and the adjective library isshown in Figure 2, and the examples of the library areshown in Tables 1 and 2. The categories are artificiallydefined for easy reading, whereas the content is the actualresult obtained. Table 1 is the feature vocabulary, andTable 2 is the adjective library.

3.2. Matching User Sentiments with Mobile Phone Attributes.After the investigation and analysis of user comment data

Onlinecomments

Noun set

Word segmentation andproperty identity

Candidateevaluationobject set

Noun & adjectiveset

LDA model

PageRank algorithm

Conditional entropyalgorithm

Weightedordering

Extractionresult set

Candidateevaluationobject set

Figure 1: Data preprocessing.

3Journal of Sensors

Page 4: Attribute-Sentiment Pair Correlation Model Based on …downloads.hindawi.com/journals/js/2019/2456752.pdfResearch Article Attribute-Sentiment Pair Correlation Model Based on Online

on http://JD.com, it was found that about 84% of usersdescribed the attribute information of mobile phones with“mobile phone attributes-adjectives” such as “quality-good,” “price-affordable” and “cost-effective.” Thus, wetheorized that the adjectives in the comments could beextracted as user sentiments.

Kim and Hovy [34] proposed a method that searchesevaluation objects in a fixed-length window of evaluationwords. Their proposed method starts matching with the firstcharacter of a user comment according to the corpus ofmobile phone attributes, to judge whether the terms begin-ning with the character belong to the dictionary. When themobile phone attribute matches, the search continues forthe adjectives from the location of the matched attribute,with the length being within the X-wide window. If thereis only one such adjective, the matching is successful.When there are several adjectives that fit, the adjectiveclosest to the mobile phone attribute is selected as the sen-timent word. Finally, the mobile phone attributes and adjec-tives are extracted from the user comments to record as<attribute, sentiment>.

Based on the dictionary of mobile phone attributesconstructed in the previous stage, in this study, we adoptedthe fixed-length window method. The method traverses thesentence after word segmentation. When a feature word isencountered, it takes the word as the center, takes the clauseof the window size, and ascertains whether there is a view-point word that can constitute a <character word, viewword> pair in the clause. All the viewpoint words that canbe formed into a word pair are combined with the featurewords to form a word pair. If you are looking for a point ofview, look for a feature word that can form a word pair. Forexample, the phone has a high pixel size and good quality.If the window size is three, then <pixel, high> <quality,good> can be extracted from it. The window size settingwas experimentally compared with the result of manuallabeling (used as ground truth) and precision, recall, and F1(used as metrics). The results are shown in Table 3.

Following preliminary evaluation, the window size wasset to three. In this way, we found the collocation of “attributewords-adjectives” in the sentences. An example of matchingresults obtained from user comments on iPhone 6 is shownin Table 4. In this step, we obtained 118 pairs of user com-ments on iPhone 6, 1,158 on Huawei Honor, 470 on XiaomiNote, and 410 on Samsung Galaxy.

Vision CPU &accessories

Battery Functions Music & video Price Compatibility &extension

Screen Hand feel Appearance Logistics & after-sales

System Signal & heatingphenomenon

Camera Others

350

300

250

200

150

100

50

0

Figure 2: Distribution of the corpus.

Table 1: Examples from the corpus.

Category Content

VersionCustom version, bare phone, high

configuration, low configuration, etc.

CPU and accessoriesMemory, storage card, earphone,

charger, etc.

Battery Battery terms, recharge speed, etc.

Function Music, alarm clock, E-book, etc.

PricePrice, price-performance ratio,

discount, etc.

Screen Size, resolution ratio, sensitivity, etc.

Hand feel Weight, texture, hardness, etc.

AppearanceColor, quality, packaging,

appearance, etc.

Logistics and after-saleExpress, customer service, changing

and refunding, etc.

SystemInterface, response speed,

operation, etc.

Signal and heatingSignal, network, quality of

connection, etc.

Camera Camera, pixel, etc.

OthersStartup and shutdown, regular

update, etc.

Table 2: Examples from the adjective library.

Category Example

Positive sentiment wordsDelicate, remarkable, artistic, cheap,

complete, rare, ideal, etc.

Negative sentiment wordsSevere, insufficient, depressed, bad,regrettable, afflictive, exception, etc.

4 Journal of Sensors

Page 5: Attribute-Sentiment Pair Correlation Model Based on …downloads.hindawi.com/journals/js/2019/2456752.pdfResearch Article Attribute-Sentiment Pair Correlation Model Based on Online

4. Correlation Model of Mobile PhoneAttribute-Sentiment Pairs

4.1. Overall Framework. This study was conducted to evalu-ate the efficacy of our proposed attribute-sentiment pair cor-relation model by using it to determine users’ evaluation ofeach attribute of a mobile phone so as to discover the advan-tages and disadvantages of the mobile phone. As shown inFigure 3, we firstly extracted attributes from user comments,built an attribute dictionary, and then matched the user sen-timents with the attributes of the mobile phone according tothe attribute dictionary. Following the association of the fea-ture and attribute, the <character word, opinion word> pairwas obtained. Finally, the correlation between the attributesand the sentiments was calculated to conduct clustering.

Clustering using the LinLog model is conducted for twopurposes: (1) to classify the attributes and sentiments intodifferent categories so as to explore the sentiments matchedby attribute words. Because the word accuracy obtained inthe previous step cannot be guaranteed and the resultsobtained by simple statistics cannot accurately measure theuser’s perception of the mobile phone, it is necessary tofurther mine the word pair. Using the LinLog clusteringalgorithm, the most relevant viewpoints with feature wordscan be mined as user evaluations. At the same time, relativeto frequent itemset mining algorithms such as Apriori andFP-growth, more low-frequency word pair information isretained. (2) The LinLog model maps feature words andviewpoints into vector space when clustering. The vector rep-resentation of feature words and viewpoint words not onlycontains the degree of association between feature wordsand viewpoints but also can be considered to contain certainsemantic information.

4.2. Attribute-Sentiment Pair Model. When the model ofmobile phone attributes and sentiment pairs is illustrated ina graph, all attribute and sentiment words can be seen as anode in the graph. The edge of the node refers to the cooccur-rence between the attribute word and the opinion word, andthe number of cooccurrences is the number of edges. Each

node has both attraction and repulsion to surroundingnodes, and those nodes cluster according to the attractionor repulsion among them. Nodes with stronger attractionswill cluster near to each other while those with strongerrepulsions will be in different clusters. In this case, the energyof the whole graph will fall to the minimum. Finally, therewill be many clusters of nodes in the graph, each of whichconsists of both attribute words and sentiment words. Webelieve that the sentiment words have a higher matching withthe attribute words in the same cluster. In other words, wecan use these sentiment words to accurately evaluate thecharacteristics of a mobile phone in the cluster.

The purpose of the model of mobile phone attributes andsentiment pairs is to cluster the closely connected nodes inthe graph and to separate the loosely connected nodes, thusreducing the graph’s energy to a minimum. The lower theenergy is, the more accurate the clustering effect is. In themodel, we defined the degree of a node as the number ofedges connected to that node. The greater the degree of anode is, the stronger its attractive and repulsive forces are.The model calculated the energy of the Graph P, as shown in

UEdgeLinLog p = 〠u,v ∈E

p u − p v

− 〠u,v ∈V 2

deg u deg v ln p u − p v ,

1

whereU is the energy of graph P, u and v are the two nodes ingraph P, p u and p v are the locations of the two nodes inthe graph, p u − p v is the Euclidean distance of the twonodes, deg u is the degree of node u (which is equal tothe number of edges connected to node u), and deg v isthe degree of node v (which is equal to the number of edgesconnected to node v).

Using this model for clustering, we finally obtained twooutputs: (1) a diagram of clustering results, from which thesize and distribution locations of each class can be visuallyobserved, and (2) the coordinates of each node in the graph,namely, the locations of the mobile phone attribute wordsand sentiment words. According to the coordinates, we cancalculate the distance between attribute and sentiment words.The closer the distance was, the more accurate the sentimentword’s description of the attribute was. We used the Euclid-ean distance to calculate the distance between the two nodesin the graph. The Euclidean distance calculation formula forthe two-dimensional coordinate system is shown in

dis u, v = x1 − x22 + y1 − y2

2, 2

where u and v represent two nodes in the graph, x1, y1 arethe coordinates of node u in the graph, and x2, y2 are thecoordinates of node v in the graph. The paper adopts a dic-tionary method to calculate the distance between nodes inthe graph, which is to separate attribute words from senti-ment words and then calculate the distance from each keyto each sentiment word and finally sequence according to

Table 4: Noun phrases.

Attributes Sentiments

Signal Good

Screen Size small

Gold Good-looking

Price Expensive

Table 3: Comparison of word pair extraction results with differentwindow sizes.

Window size Precision Recall F1

2 88.3 65.7 75.34

3 85.8 78.8 82.15

4 79.6 79.1 79.34

5 72.1 82.4 76.90

5Journal of Sensors

Page 6: Attribute-Sentiment Pair Correlation Model Based on …downloads.hindawi.com/journals/js/2019/2456752.pdfResearch Article Attribute-Sentiment Pair Correlation Model Based on Online

the distance from small to large. The smaller the distanceis, the more accurate is the sentiment word’s descriptionof the attribute.

5. Experiment

In this study, two experiments were conducted using thecorrelation model of mobile phone attribute-sentimentwords for performance verification: the first experiment wasconducted to verify the effect of the model only; the secondwas conducted to verify the overall effect of the whole systemfrom term extraction to clustering.

When testing the model separately, the pairs extractedfrom user comments on iPhone 6 were selected as the datasetand then manually marked with the combination of mobilephone attributes and user sentiment words. The same datawere then input into the model of mobile phone attribute-sentiment pairs for clustering. The artificially labeled cluster-ing results were manually matched with the model clusteringresults, such that the two results could be compared. Therates of precision and recall and F values were selected asthe evaluation criteria according to the manually markedresults. The specific formulas for accuracy, recall, and Fvalues are as follows:

Recall = TPTP + FN

, 3

Precision =TP

TP + FP, 4

F =2 ∗ precision ∗ recallprecision + recall

5

Table 5 explains the meanings of TP, FN, FP, and TN inequations (3) and (4).

Optimal matching in the model-based clustering resultswas conducted for each artificial cluster Ci. Finally, the aver-age value of evaluation marks of each cluster was calculated.The experimental results show that the precision, recall, andF values resulting from applying only the mobile phone attri-bute sentiment model to clustering were 91.64%, 90.83%, and91.46%, respectively.

The user comments on iPhone 6 were also used as a data-set to test the overall performance of the system. The mobile

phone attributes and sentiment pairs were extracted manu-ally and by machine. The manually extracted pairs weremanually marked, and the pairs extracted by machine wereinput into the online review attribute-sentiment word modelto get clustering results. According to the results with manualmarks, the accuracy, recall, and F values gained were 74.71%,84.29%, and 79.21%, respectively.

It was found that the results obtained by LinLog cluster-ing for manual labeling are more similar to the manual label-ing than the original word pair. In other words, the accuracyof the word after clustering was improved. This also showsthat after clustering, feature words and opinions are moreclosely linked, which better reflects the user’s overall evalua-tion of the features. It also shows that the LinLog model canmap feature words and viewpoint words to vector space to acertain extent, and their corresponding vectors have seman-tic information.

The verification results indicate that the mobile phoneattribute-sentiment model achieved better performance.Thus, better potential values can be found in user commentsif the model is applied.

5.1. Online Review Attribute-Sentiment Model. In this study,product analysis was conducted of four mobile phones—spe-cifically, iPhone 6, Samsung Galaxy Note 4, Huawei Honor4X, and Xiaomi Note 4—and all comments were extractedfrom http://JD.com. There were 569 comments on iPhone6, 892 on Samsung, 2,068 on Huawei, and 1,069 on Xiaomi.The correlative model of mobile phone attribute-sentimentpairs was applied to the comments on the four mobilephones. Owing to space limitations, only the results forXiaomi and Samsung are presented here. The clustering

Table 5: Confusion matrix of a cluster.

TP: at the same time, it belongsto the number of artificialannotation and modelclustering result elements

FN: number of elements thatare manually labeled but notpart of the model clustering

result

FP: a model clustering resultthat does not belong to thenumber of manual labelingelements

TN: neither the manualannotation nor the numberof model clustering results

Onlinecomments

Constructingdictionaries of featurewords and adjectives

Extracting nouns

Extracting adjectives

Short-textset

Features-emotion wordpair model clustering

Calculating the correlationbetween nouns and feature Result analysis

Figure 3: Overall framework.

6 Journal of Sensors

Page 7: Attribute-Sentiment Pair Correlation Model Based on …downloads.hindawi.com/journals/js/2019/2456752.pdfResearch Article Attribute-Sentiment Pair Correlation Model Based on Online

results and distance between attributes and sentiment wordsare shown in the ensuing figures.

Figures 4 and 5 are the clustering results for the Xiaomimobile phone and the Samsung mobile phone, respectively.A single node in the graph represents a mobile phone attri-bute or user sentiment. The size of the node reflects the fre-quency of the occurrence of the word in the comment, andthe distance between nodes indicates the degree of proximity.Figures 6 and 7 show the partial attribute-sentiment distancemaps of some Xiaomi mobile phones and Samsung mobilephones, respectively. Tables 6 and 7 compare the weights ofusers’ emotions on all aspects of the Xiaomi mobile phone

and the Samsung mobile phone, respectively. The two tableswere constructed in the following manner. First, a search wasconducted for each positive, negative, and neutral emotionfor each feature word. The emotional score is the average dis-tance between the feature word and the emotional word.Then, the feature words were divided into 10 categories bymanual labeling, and the average sentiment tendency scoresof each category were calculated. The fraction column repre-sents the emotional score in Tables 6 and 7. The smaller thevalue, the stronger the emotional tendency.

The cluster results show the users’ evaluation of the fourmobile phones.

Packaging

Touch-feeling

ReactionFlexible

RechargePristine

Practical Appearance

Shell PatternMany Pretty

SoftwareNetwork

Picture

Clear

Vivid

Slow

Cost-performance

Image

Black shell

ThinSize

White

EvidentBattery-charge

Big

Strong

Recharge-speed

System

Crack

Fast

Speed

PlasticHeavy

Screen

FreshFeeling

Not-badFunctionInclusiveColorful

High

Suitable GoodWorkmanship

Bad

Price

LowReasonable

Cheap

Not good

BatteryDurable

Fluent

Figure 4: Xiaomi 1 clustering results.

7Journal of Sensors

Page 8: Attribute-Sentiment Pair Correlation Model Based on …downloads.hindawi.com/journals/js/2019/2456752.pdfResearch Article Attribute-Sentiment Pair Correlation Model Based on Online

Power saving mode, considerate, 0.031681376453308985

Brightness, uniform, 0.03170217570035405

Touch, good, 0.03174634144108803

Navigation, slow, 0.03660201006868464

Back shell, elegant, 0.0366081132500405

Charger, rough, 0.07308895068152514

Keypads, soft, 0.08549087301821766

Data cable, bad, 0.0925007977026096

Material, uncomfortable, 0.09625988383246677

Touch screen, sensitive, 0.11842824928093372

Figure 6: Xiaomi 1 attribute-sentiment distance.

Battery, long, 0.020472795420774334

Charging time, short, 0.023877795633167807

Message, timely, 0.02390992528545117

Watching video, cool, 0.057226711394164936

Color, dazzling, 0.059355318065093074

Charger, bad, 0.06033876698506296

Signal, weak, 0.06404216780874511

Operation, smooth, 0.09112691102778182

Voice, low, 0.11989941454784374

Material, fine, 0.12887959285172357

Figure 7: Samsung Galaxy Note 4 attribute-sentiment distance.

Button

Easily-broken

PackagingSimple

Perfect

Pattern

Not-bad

Function

PrettyAppearance

Absolute

BlackscreenLarge

Small

Clear

Fast

On-timeSpeed

Reacting-speed

Speed

Consume electricity

Good

Feeling

Touch-feeling

VersionVoice-quality

One-hand manipulate

System

Workmanship

HardData-lineDisappointed

Golden

Fragile

Well-looked

Cool

IncredibleSignal

Cheap

BatteryIdeal

Reasonable

Cost-efficientCard

Cost-performanceHigh

Image-resolution

Suitable

PriceSurprisingLow

Cost expensive

Back-shellEasy

Working-hours

BadShell

Figure 5: Samsung Galaxy Note 4 clustering results.

8 Journal of Sensors

Page 9: Attribute-Sentiment Pair Correlation Model Based on …downloads.hindawi.com/journals/js/2019/2456752.pdfResearch Article Attribute-Sentiment Pair Correlation Model Based on Online

For the Xiaomi 1, users are satisfied with its feel, camera,and reaction speed, but are dissatisfied with its screen.

The statistical results for the Samsung Galaxy Note4 show that users are not satisfied with the screenand reaction speed, but it has a better result thanthe Xiaomi 1.

After the analysis, such results may be related to differentmobile phone prices. Consumers often have higher require-ments for high-priced mobile phones.

The above analysis results disclose users’ evaluation ofeach attribute of the mobile phones. These results can pro-vide a reference for users to buy mobile phones and alsoprovide a basis for mobile phone manufacturers to furtherimprove their products.

5.2. Comparative Experiment. Table 8 shows the results ofclustering Xiaomi and Samsung using the LinLog modeland the LDA model, respectively. The first line representsthe feature word, and the subsequent line represents theopinion word that belongs to the same class as the featureword after clustering. Among them, red indicates a viewpointword that is not related to the feature word. The choice of fea-ture words is representative, and the opinion words arearranged according to the correlation between the test wordsand the feature words from high to low. The viewpoint wordswith superscripts in the table indicate different Chineseexpressions but have the same English meanings due to dif-ferences in Chinese and English expressions. It can be clearlyseen from the table that the feature word-view word combi-nation can be effectively mined using the LinLog model.

6. Conclusion

In this paper, we proposed an online review attribute-sentiment pair correlation model. To utilize the model, we

Table 7: Emotional comparison of all aspects of Samsung GalaxyNote 4.

Classification Emotion Word pair Fraction

Feel

Positive

<feel, comfortable><feel, good><feel, light>

<feel, light, and thin>

6.224808

Neutral None 0

Negative None 0

System

Positive<system, smooth><system, smooth> 0.626022

Neutral None 0

Negative None 0

Camera

Positive<system, smooth>

<system, smooth>, etc. 1.826219

Neutral None 0

Negative None 0

Screen

Positive<screen, suitable>

<touch screen, stick> 7.625888

Neutral <screen, basic> 3.735861

Negative<screen, broken><resolution, low> 1.731892

Reaction speed

Positive<Reaction speed, fast>

<speed, fast> 6.366212

Neutral None 0

Negative <reaction speed, slow> 0.508527

Storage

Positive None 0

Neutral None 0

Negative None 0

Signal

Positive<signal, strong><signal, stable> 4.001634

Neutral <signal, OK> 0.786754

Negative<signal, difference>

<received signal, weak> 6.049424

Table 6: Emotional comparison of all aspects of Xiaomi 1.

Classification Emotion Word pair Fraction

Feel

Positive<feel, light and thin>

<feel, good><feel, cool>

2.943150245

Neutral None 0

Negative None 0

System

Positive

<machine system,smooth>

<system, smooth><system, complete><system, more><system, simple><system, exquisite>

38.66737706

Neutral <system, general> 1.795016

Negative None 0

Camera

Positive<camera, clear><camera, nice> 1.301265963

Neutral None 0

Negative None 0

Screen

Positive

<screen, big><screen, clear>

<screen, delicate><resolution, high>

3.020417222

Neutral <screen, dark> 1.795016

Negative<screen, bad><screen, small><screen, broken>

0.350402069

Reactionspeed

Positive<reaction speed, fast>

<speed, fast> 2.739988726

Neutral None 0

Negative None 0

Storage

Positive None 0

Neutral None 0

Negative None 0

Signal

Positive<signal, complete><signal, stable> 14.06477071

Neutral None 0

Negative<signal, difference>

<signal, no> 12.19108

9Journal of Sensors

Page 10: Attribute-Sentiment Pair Correlation Model Based on …downloads.hindawi.com/journals/js/2019/2456752.pdfResearch Article Attribute-Sentiment Pair Correlation Model Based on Online

first preprocessed user comment data, built a dictionary ofmobile phone attributes and a dictionary of sentimentwords, and then used it to match user sentiment with mobilephone attributes. Finally, we conducted experiments on onereal-world dataset. Our research results can be used to con-duct a user sentiment comparison of various characteristicsof mobile phones. This helps merchants to improve resultsand helps users make better decisions. This study is basedon mobile phone review information but can also beextended to other types of goods on the e-commerce plat-form. In the future, we intend to extend our work in the fol-lowing direction. First, we will consider syntactic analysisplus window-based collocation to extract word pairs andidentify negative words in the text in order to realize moreaccurate clustering results. Second, we will expand our workto other review datasets such as Amazon and Taobao.

Data Availability

The data used to support the findings of this study are avail-able from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Key Research andDevelopment Program (grant number 2017YFB0803300),the National Natural Science Foundation of China (grantnumbers 91546121 and 61702043), the National Social

Science Foundation of China (grant number 16ZDA055),and the Open Project Fund of the Key Laboratory ofTrustworthy Distributed Computing and Service (BUPT),Ministry of Education.

References

[1] T. L. Ngo-Ye and A. P. Sinha, “The influence of reviewerengagement characteristics on online review helpfulness: a textregression model,” Decision Support Systems, vol. 61, pp. 47–58, 2014.

[2] C. Katawetawaraks and C. Wang, “Online shopper behavior:Influences of online shopping decision,” Asian Journal of Busi-ness Research, vol. 1, no. 2, p. 2, 2011.

[3] W. Fan and M. D. Gordon, “The power of social media analyt-ics,” Communications of the ACM, vol. 57, no. 6, pp. 74–81,2014.

[4] Z. Yan, M. Xing, D. Zhang, and B. Ma, “EXPRS: an extendedpagerank method for product feature extraction from onlineconsumer reviews,” Information & Management, vol. 52,no. 7, pp. 850–858, 2015.

[5] S. Nemorin, “Neuromarketing and the “poor in world” con-sumer: how the animalization of thinking underpins contem-porary market research discourses,” Consumption Markets &Culture, vol. 20, no. 1, pp. 59–80, 2015.

[6] Y. Liu, F. Li, L. Guo, B. Shen, and S. Chen, “A comparativestudy of android and iOS for accessing internet streaming ser-vices,” in Passive and Active Measurement, M. Roughan and R.Chang, Eds., vol. 7799, Springer, Berlin, Heidelberg, 2013.

[7] K. D. Miller, “Organizational risk after modernism,” Organi-zation Studies, vol. 30, no. 2-3, pp. 157–180, 2009.

[8] V. Hatzivassiloglou and K. R. McKeown, “Predicting thesemantic orientation of adjectives,” in Proceedings of the ACL

Table 8: Comparison of LinLog model and LDA model clustering results.

LinLog LDA

Xiaomi

Exterior System Screen Exterior System Screen

Nice Smooth1 Delicate Very beautiful Good Good

Interesting Giant Serious Cheap Smooth1 Very large

Special Complete Individual Texture Affordable Beautiful1

Thin Simple Clear Beautiful The best Clear

Expensive Smooth2 Black Ultrathin Easy Suitable

Great Refined Obvious No response Smooth2 Exquisite

Average True Broken Depressed Simple Comfortable

Good Easy Big Novel Relatively stable Beautiful2

Samsung

Battery Price Screen Battery Price Screen

Long Reasonable Bright Good Cheap Clear

Short Ideal Transparent Durable Good Very large

Bad Cheap Beautiful Serious Rubbish Smooth

Normal Suitable Clear Stunning Accurate Stunning

Durable Expensive Realistic Very easy to use The essential Regret

Multi Master Delicate Too slow Favorable First class

Practical Cost-effective Great Easily Suitable Exquisite

Obvious Low Easy Too small Bright

10 Journal of Sensors

Page 11: Attribute-Sentiment Pair Correlation Model Based on …downloads.hindawi.com/journals/js/2019/2456752.pdfResearch Article Attribute-Sentiment Pair Correlation Model Based on Online

‘98/EACL ‘98 Proceedings of the 35th Annual Meeting of theAssociation for Computational Linguistics and Eighth Confer-ence of the European Chapter of the Association for Computa-tional Linguistics, pp. 174–181, Madrid, Spain, July 1997.

[9] J. Yi, T. Nasukawa, R. Bunescu, and W. Niblack, “Sentimentanalyzer: extracting sentiments about a given topic using nat-ural language processing techniques,” in Third IEEE Interna-tional Conference on Data Mining, pp. 427–434, Melbourne,FL, USA, November 2003.

[10] A. K. Samha, Y. Li, and J. Zhang, “Apect-based opinion extrac-tion from customer reviews,” in Computer Science & Informa-tion Technology ( CS & IT ), April 2014.

[11] J. Zhu, H. Wang, M. Zhu, B. K. Tsou, and M. Ma, “Aspect--Based Opinion Polling from Customer Reviews,” IEEE Trans-actions on Affective Computing, vol. 2, no. 1, pp. 37–49, 2011.

[12] A. Bagheri, M. Saraee, and F. de Jong, “An unsupervised aspectdetection model for sentiment analysis of reviews,” in Interna-tional Conference on Application of Natural Language to Infor-mation Systems, Springer, Berlin, Heidelberg, 2013.

[13] C. Quan and F. Ren, “Unsupervised product feature extractionfor feature-oriented opinion determination,” Information Sci-ences, vol. 272, pp. 16–28, 2014.

[14] X. Zhou, X.Wan, and J. Xiao, “Cminer: opinion extraction andsummarization for chinese microblogs,” IEEE Transactions onKnowledge and Data Engineering, vol. 28, no. 7, pp. 1650–1663, 2016.

[15] J. Kaur and M. Bansal, “Multi-layered sentiment analyticalmodel for product review mining,” in 2016 Fourth Interna-tional Conference on Parallel, Distributed and Grid Computing(PDGC), pp. 415–420, Waknaghat, India, December 2016.

[16] K. L. S. Kumar, J. Desai, and J. Majumdar, “Opinion Miningand Sentiment Analysis on Online Customer Review,” in2016 IEEE International Conference on Computational Intelli-gence and Computing Research (ICCIC), Chennai, India,December 2016.

[17] M. Hu and B. Liu, “Mining and summarizing customerreviews,” in Proceedings of the 2004 ACM SIGKDD interna-tional conference on Knowledge discovery and data mining -KDD '04, 2004.

[18] O. De Clercq, M. Van de Kauter, E. Lefever, and V. Hoste, “LT3:applying hybrid terminology extraction to aspect-based senti-ment analysis,” in Proceedings of the 9th International Work-shop on Semantic Evaluation (SemEval 2015), 2015.

[19] Q. Liu, B. Liu, Y. Zhang, D. S. Kim, and Z. Gao, “Improvingopinion aspect extraction using semantic similarity and aspectassociations,” in AAAI, February 2016.

[20] P. Prameswari, I. Surjandari, and E. Laoh, “Opinion miningfrom online reviews in Bali tourist area,” in 2017 3rd Interna-tional Conference on Science in Information Technology (ICSI-Tech), 2017.

[21] J.-B. Li and L.-B. Yang, “A rule-based Chinese sentiment min-ing system with self-expanding dictionary-taking TripAdvisoras an example,” in 2017 IEEE 14th International Conference one-Business Engineering (ICEBE), 2017.

[22] V. Rakesh, W. Ding, A. Ahuja, N. Rao, Y. Sun, and C. K.Reddy, “A sparse topic model for extracting aspect-specificsummaries from online reviews,” in Proceedings of the 2018World Wide Web Conference on World Wide Web - WWW'18, 2018.

[23] Z. Cheng, Y. Ding, L. Zhu, and M. Kankanhalli, “Aspect-awarelatent factor model: rating prediction with ratings and

reviews,” in Proceedings of the 2018 World Wide Web Confer-ence on World Wide Web - WWW '18, 2018.

[24] K. Zvarevashe and O. O. Olugbara, “A framework for senti-ment analysis with opinion mining of hotel reviews,” in 2018Conference on Information Communications Technology andSociety (ICTAS), March 2018.

[25] L. Jiang, B. Gao, J. Gu et al., “Wearable long-term social sens-ing for mental wellbeing,” IEEE Sensors Journal, pp. 1–1, 2018.

[26] B. Sarwar, G. Karypis, J. Konstan, and J. Reidl, “Item-basedcollaborative filtering recommendation algorithms,” in Pro-ceedings of the tenth international conference on World WideWeb - WWW '01, 2001.

[27] L. Zhou and D. Zhang, “NLPIR: a theoretical framework forapplying natural language processing to informationretrieval,” Journal of the American Society for Information Sci-ence and Technology, vol. 54, no. 2, pp. 115–123, 2003.

[28] B. Ma, D. Zhang, Z. Yan, and T. Kim, “An LDA and synonymlexicon based approach to product feature extraction fromonline consumer product reviews,” Journal of Electronic Com-merce Research, vol. 14, pp. 304–314, 2013.

[29] C. Huang, Q. Wang, D. Yang, and F. Xu, “Topic mining oftourist attractions based on a seasonal context aware LDAmodel,” Intelligent Data Analysis, vol. 22, no. 2, pp. 383–405,2018.

[30] G. B. Chen and H. Y. Kao, “Word co-occurrence augmentedtopic model in short text,” International Journal of Computa-tional Linguistics & Chinese Language Processing, vol. 20,no. 2, pp. 45–64, 2015.

[31] L. Page, S. Brin, R. Motwani, and T.Winograd, “The PageRankCitation Ranking: Bringing Order to the Web. TechnicalReport,” Stanford InfoLab, 1999.

[32] Y. Du, X. Tian, W. Liu et al., “A novel page ranking algorithmbased on triadic closure and hyperlink-induced topic search,”Intelligent Data Analysis, vol. 19, no. 5, pp. 1131–1149, 2015.

[33] C. Kaleli, “An entropy-based neighbor selection approach forcollaborative filtering,” Knowledge-Based Systems, vol. 56,pp. 273–280, 2014.

[34] S.-M. Kim and E. Hovy, “Automatic detection of opinion bear-ing words and sentences,” in Companion Volume to the Pro-ceedings of Conference including Posters/Demos and tutorialabstracts, 2005.

11Journal of Sensors

Page 12: Attribute-Sentiment Pair Correlation Model Based on …downloads.hindawi.com/journals/js/2019/2456752.pdfResearch Article Attribute-Sentiment Pair Correlation Model Based on Online

International Journal of

AerospaceEngineeringHindawiwww.hindawi.com Volume 2018

RoboticsJournal of

Hindawiwww.hindawi.com Volume 2018

Hindawiwww.hindawi.com Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwww.hindawi.com Volume 2018

Hindawiwww.hindawi.com Volume 2018

Shock and Vibration

Hindawiwww.hindawi.com Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwww.hindawi.com Volume 2018

Hindawiwww.hindawi.com Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwww.hindawi.com

Volume 2018

Hindawi Publishing Corporation http://www.hindawi.com Volume 2013Hindawiwww.hindawi.com

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwww.hindawi.com Volume 2018

Hindawiwww.hindawi.com

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwww.hindawi.com Volume 2018

International Journal of

RotatingMachinery

Hindawiwww.hindawi.com Volume 2018

Modelling &Simulationin EngineeringHindawiwww.hindawi.com Volume 2018

Hindawiwww.hindawi.com Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwww.hindawi.com Volume 2018

Hindawiwww.hindawi.com Volume 2018

Navigation and Observation

International Journal of

Hindawi

www.hindawi.com Volume 2018

Advances in

Multimedia

Submit your manuscripts atwww.hindawi.com