Master Thesis 2014 - COnnecting REpositories · Master Thesis 2014 ... later steps of sentiment analysis. ... mation from the big amount of varied data,

Master Thesis 2014

Cross-Domain Investigations of User Evaluations

under the Multi-cultural Backgrounds

Submission Date: July 21st 2014

Supervisor: Hayato Yamana

Department of Computer Science and Engineering,

Graduate School of Fundamental Science and Engineering,

Waseda University

Student ID: 5112BG10-8

Jiawen Le

II

Abstract

Twitter, as one of the most popular social network services, is now widely used to query public

opinions. In this research, a large corpus of Twitter data, along with the reviews collected from

review websites, are used to carry out sentimental and culture-based analysis, so as to figure out the

cultural effects on user evaluations. Posts written in more than 30 languages from more than 30

countries are analyzed.

As the first step, the global restaurants are taken as the research subject. By using a range of new

and standard features, a series of classifiers with high performances are trained and applied in the

later steps of sentiment analysis. Then a field expansion has been carried out to confirm that the

same approach can be applied for world attractions. The experimental and analytical results show

that the proposed methods are quite promising and field transferable for cross-lingual sentiment

analysis. Meanwhile, informative conclusions have been drawn that the cultural effects on user

evaluations for both restaurant domain and travel domain actually exist, and are obvious for some

countries and cultural backgrounds.

III

Contents

1. Introduction .......................................................................................................... 1

2. Background ........................................................................................................... 3

2.1 Twitter Data ............................................................................................................... 3

2.2 Review Data ............................................................................................................... 4

2.3 Machine Translation Tools......................................................................................... 5

3. Related Work ........................................................................................................ 7

3.1 Sentiment Analysis ..................................................................................................... 7

3.2 Cross-lingual Analysis ............................................................................................... 8

4. Methodology .......................................................................................................... 9

4.1 Data Collection ........................................................................................................... 9

4.2 Translation and Pre-filtering ...................................................................................... 9

4.3 Location Definition .................................................................................................. 10

4.4 Spam Filtering .......................................................................................................... 10

4.5 Features for Sentiment Classification ...................................................................... 12

4.5.1 Dictionary Construction ................................................................................................. 12

4.5.2 Syntax Features .............................................................................................................. 12

4.5.3 Modified Unigram ......................................................................................................... 12

4.5.4 Review Dataset-based Average Score ........................................................................... 13

4.5.5 Review Dataset-based CCA Score ................................................................................ 14

4.5.6 Window Co-occurrence-based Average Score .............................................................. 14

4.5.7 POS-based Feature ......................................................................................................... 15

5. Experiment .......................................................................................................... 16

5.1 Overview .................................................................................................................. 16

5.2 Preprocessing ........................................................................................................... 16

5.3 Sentiment Classification ........................................................................................... 17

6. Analysis ................................................................................................................ 20

6.1 Statistical Analysis ................................................................................................... 20

6.2 Basic Sentiment Analysis ......................................................................................... 21

6.2.1 Polarity Distribution ...................................................................................................... 21

6.2.2 Sentiment Map ............................................................................................................... 23

6.2.3 Sentiment Keywords Extraction .................................................................................... 25

6.2.4 (Attribute, Value) Pairs Extraction ................................................................................ 28

IV

6.3 Culture-based Analysis ............................................................................................ 32

7. Field Expansion................................................................................................... 38

7.1 Data .......................................................................................................................... 38

7.2 Experiment ............................................................................................................... 38

7.3 Analysis .................................................................................................................... 39

7.3.1 Statistical Analysis ......................................................................................................... 39

7.3.2 Basic Sentiment Analysis .............................................................................................. 41

7.3.2.1 Polarity Distribution…………………………………………………………….41

7.3.2.2 Sentiment Map………………………………………………………………….44

7.3.2.3 Sentiment Keywords Extraction………………………………………………..48

7.3.2.4 (Attribute, Value) Pairs Extraction…………………………………………….. 53

7.3.3 Culture-based Analysis .................................................................................................. 57

7.3.4 Comparison of the Two Domains………………………………………………………58

8. Conclusion ........................................................................................................... 62

Acknowledgements .................................................................................................... 63

References ................................................................................................................... 64

Publications ................................................................................................................ 66

1

1. Introduction

In recent years, social network service (SNS), a newcomer in the field of social media, has drawn

much attention all around the world. Twitter1, one of the most popular social network services,

owns a range of special characteristics, including the tremendous amount of posts, the great varie-

ty of tweet contents, and the rapid speed of information distribution. By posting tweets, people

may discuss on the currently ‘hot’ topics, express their views towards big events, or just talk about

their feelings about some trivial things in their daily life, such as a joyful trip they have made, a

delicious meal, or a satisfactory service they have been offered. Actually, the huge volume of

tweets can be used to survey public opinions. If many users post tweets that contain complimen-

tary words of a restaurant, it is likely that this restaurant enjoys popularity among customers.

Meanwhile, the recent decades also witnessed the remarkable progress of globalization. With

the increase of the number of transnational enterprises and the development of transportation ser-

vices, people from all over the world can use the same product, savor the same meal, and appreci-

ate the same scenery. However, it is quite common that people from different countries may have

totally different feelings about these experiences, probably partly due to their diverse cultural

backgrounds. In order to figure out the cultural effects on the evaluations of people with different

cultural-backgrounds, tweets, as well as some website reviews, can serve as a good dataset to car-

ry out the analysis. By using the natural language processing techniques and the sentiment analy-

sis approaches over this dataset, conjectures about the possible relationship between user opinions

and cultural backgrounds can be made, and further, instructive suggestions can be given out.

However, there exist several challenges considering this issue. After retrieving the useful infor-

mation from the big amount of varied data, the problem of how to correctly figure out the senti-

ment of these short texts remains as the main task for many researchers. As for this task, a noted

work from Liu [1] reviews the existing approaches and research in the field of sentiment analysis.

Based on his research and some other recent research, it can be found that the mainstream ap-

proaches for sentiment analysis include the 2-way method that classifies tweets into positive or

negative [2][3], and the 3-way method that divides the tweets into positive, neutral, and negative

groups [4][5][6].

The language barrier is another challenge. Most previous research in this field only focus on the

English-written tweets, and posts in other languages are simply discarded. Despite the fact that

English-written tweets predominate in the amount, it should also be realized that Twitter enjoys

great popularity in many non-English-speaking countries, such as Japan, Indonesia, and Brazil.

The previous strategy of ignoring these non-English tweets will definitely lead to the biased and

incorrect results in cross-culture analysis. Several works have studied cross-lingual sentiment

analysis, but the target languages and text formats are very limited. OpinionIt [7] is an opinion

mining system comparing the cross-lingual differences in opinions, and in the paper, the authors

take the reviews written in English and Chinese as the main subject. Other related research in-

cludes the work of Bautin et al. [8] and the work of Nakasaki et al. [9], which study blogs or news

in multiple languages.

One more challenge lies in the field transferability. Most research in the field of sentiment anal-

ysis only consider a single domain, and some of the most popular domains for this kind of study

1 http://twitter.com/

2

include the domain of films [6] and political issues [10]. For the reason that texts in different do-

mains may actually have different vocabularies and stylistic features, it is quite questionable

whether a sentiment analysis approach in one field can be applied into another.

Facing all these challenges, this research has made the following contributions:

Considering the multi-cultural background, tweets written in more than 30 languages from

more than 30 countries are taken into consideration in the sentiment analysis;

A domain expansion is carried out that after applying the proposed sentiment analysis ap-

proach in the field of global restaurants, as a further step, tweets about world attractions are

also analyzed using the same approach;

A sequential three-step process of spam classification, subjectivity classification, and polarity

classification is adopted, and in the sentiment classification steps, a series of combinations of

new features are used to train the classifiers to achieve high performances;

By carrying out a range of analyzing methods, an insight into people’s attitudes towards the

target restaurants and attractions is given, and informative conclusions considering the cul-

tural effects are obtained.

As mentioned above, the global restaurants are chosen as the starting point of this research. One

of the firmest grounds for this choice is that the food aspect plays an important role in multi-culture

comparison, and a culture’s perception or standard of food may manifest itself in people’s attitudes

towards the global restaurants. Then, as for the reason why tourism is selected as the expansion

domain, it can be explained by the fact that an independent and remote field is preferred because

the verification of the transferability of the proposed approach is one of the objectives of the re-

search. Among all the possible remote fields, the tourism domain will provide rich source of re-

view data, which is essential and critical to the proposed methodology and the relevant analysis.

The rest of this paper is organized as follows. In the second section, basic background

knowledge is introduced briefly. Then, after listing the related works in the next section, the con-

crete methods and algorithms adopted in this research are discussed with details. Then the process

and steps of the experiment are described and the obtained results are discussed and analyzed,

which lead to the final conclusions and summarization in the last section.

3

2. Background

In this section, basic background knowledge for this research is introduced.

2.1 Twitter Data

Twitter, as the main data source of this research, is a popular online microblogging service that

allows users posting messages within 140 characters. People use Twitter to read the ‘tweets’ sent

by users who they are following, and post their own ‘tweets’ to a group of users who are their fol-

lowers. Because of its enormous influence as a social media, Twitter is widely used as a research

tool to carry out various investigations, which always rely on the assistance of Twitter APIs2.

Twitter provides developers with a range of diverse APIs to meet different needs, and all these

APIs can be generally categorized into 2 groups, REST API and the Streaming APIs. Basically, all

the simple functionalities can be achieved by the first group of APIs, but as for more powerful

real-time applications, users always resort to the Streaming APIs. Particularly in this research,

both search API (belongs to REST API), and public stream API (belongs to the Streaming APIs)

are used to satisfy multiple requirements. To make it clear, a comparison between the utilized APIs

is presented in Table 2.1.

Table 2.1: Twitter API Examples

search/tweets Statues/filter

category REST API the Streaming API

parameters

(part of)

q: query text (required)

geocode: latitude/longitude restriction

lang: language restriction

since_id: returns results with an ID

greater than the specified ID

until: returns tweets generated before

the given date

follow: indicating the users to return

statuses for

track: keywords to track

locations: specifies a set of bounding

boxes to track

(at least one of above three parameters

should be specified)

response

object tweets tweets

HTTP

methods GET GET, POST

resource

URL

https://api.twitter.com/1.1/search

/tweets.json

https://stream.twitter.com/1.1/statuses

/filter.json

rate

limitation

per 15 minutes:

180/user, 450/app limitations for long-lived connections

After the requesting process, both APIs return tweets as the response object, which give the

basic data form of this research. Actually, a tweet object contains such rich contents that it is far

more than what can be read from the common Twitter user interface. Here, several important

fields, which are particularly useful in this research, are picked out and listed in Table 2.2.

2 https://dev.twitter.com/docs/.

4

Table 2.2: Tweet Object (part of)

field description

id unique integer identifier for the Tweet

created_at creation time

text text of the status update

lang machine-detected language for the text

coordinates geographic location of the Tweet

retweet_count number of times the Tweet has been retweeted

place country full name of the country

country_code ISO code of the country

user

name user name

id user identification number

lang the user-set language

location the user-set location

time_zone the user-set time zone

The Twitter dataset in this research contains millions of Tweets, each of which is constructed by

the same fields and their corresponding values. In addition to the actual post text given by the

crucial ‘text’ field, Tweets also provide valuable information such as the language information

and the location information by fields like ‘lang’, ‘coordinates’, ‘country’, and ‘location’.

2.2 Review Data

Another important data source is the review websites. After consumptions or sightseeing tours,

people often leave some comments on the gourmet or tourism sites, so as to express their feelings

and give others suggestions. This kind of comments is always accompanied by corresponding

scores which are also given by the same user.

Figure 2.1: Example of Review Data

5

Figure 2.1 shows a screen shot of a review, with the typical component parts of a comment and

a 5-scale score. In this example review, the user commented at Eiffel Tower with several lines of

text, and scored it as five circles, which means excellent.

In this research, this kind of review data is also collected as an auxiliary dataset in the experi-

ment, which will be explained in details in the later sections.

2.3 Machine Translation Tools

In research related with natural language processing and cross-lingual investigations, machine

translation systems are commonly used to obtain the preprocessing or intermediate results. Exam-

ples of this kind of research include speech recognition and translation system, webpage transla-

tion, and large-scale field-specialized file translation task. Strictly, machine translation is a

sub-field of computational linguistics. But for this research, only the utilization of online machine

translation tools is considered, which serves as a critical step for the cross-lingual analysis.

There are a range of various machine translation tools available for the research or commercial

end. These tools characterize in their language models, supporting platforms, source availability,

etc., and users may choose the appropriate ones according to their objectives. Table 2.3 gives three

examples of the popular machine translation applications and their related information. More

completed and detailed table can be found in the term ‘Comparison of machine translation appli-

cations’ on Wikipedia3.

Table 2.3: Example Machine Translation Tools

name platform license price Source

availability

Google

Translate4

cross-platform

(web application) paid free no

Bing

Translator5

cross-platform

(web application) commercial free no

Babylon6 Windows, Mac paid

Depend on license

(from $34 to $89 for

one license)

no

While choosing a machine translation system, the performance, or the translation accuracy is a

common measuring criteria. As for this aspect, several previous reports may serve as references.

In ‘Comparison of online machine translation tools’7, the researchers ask professional and unpro-

fessional people for their reviews of the above mentioned three tools, and these reviews are re-

garded as the user evaluations. Conclusions have been drawn that Google Translate outperforms

the other two systems in most situations, and the same translator has quite different performances

under different language and text form settings. In ‘An analysis of Google Translate accuracy’8,

the researchers arrive at the conclusion that although Google Translate provides translations

3 http://en.wikipedia.org/wiki/Comparison_of_machine_translation_applications/.

4 https://translate.google.com/. 5 http://www.bing.com/translator/. 6 http://www.babylon.com/. 7 http://82.165.192.89/initial/index.php?id=175.

8 http://www.translationdirectory.com/articles/article2320.php/.

6

among a wide range of languages, the accuracies vary greatly. Results suggest that translations

between European languages are quite good, but the performance becomes relatively poor while

involving Asian languages. Above all, although its performance is never likely to reach the level

of an expert human’s, it can provide quick, cheap translations even for unusual language pairs.

Based on these evaluations and conclusions, Google Translate is chosen as the tool for the

translation tasks in this research.

Finally, here summarizes the advantages and disadvantages of using a machine translation tool.

Merits:

a) A wide range of existing applications to select, with coverage of all platforms ;

b) APIs provided by some popular online systems for an easy access;

c) Relative high performances for certain languages and certain forms of texts.

Demerits:

a) Restrict rate limitation, or only for commercial use;

b) Not capable of all kinds of languages and all kinds of text forms;

c) Usually cannot be specialized for certain domains or fields.

7

3. Related Work

3.1 Sentiment Analysis

The sentiment analysis of Twitter data has been focused by many researchers in recent years, and

there have been a range of significant works that make contributions to this field.

In the aspect of opinion mining, a noted work is presented by Pang and Lee [11], which gives a

broad view of some existing approaches for sentiment analysis and opinion retrieval. Early re-

search that tries to put forward new methods or improve existing approaches considering the par-

ticular study subject of tweets can be listed as followed.

Go et al. [12] use the emoticons to query Twitter, and take this data as the training set. Then

they divided these tweets into negative ones and positive ones according to the sentiment of the

query emoticons. As for the applied models, they build Naive Bayes, MaxEnt and Support Vector

Machines (SVM), and report that SVM model outperforms other models. They also obtain the

result that unigram feature model achieves the best performance, which cannot be gained by using

bigrams and parts-of-speech feature models.

The work of Pak and Paroubek [4] characterizes in the method of the collection of objective

training data. The source of this objective data includes several popular newspapers, whose sen-

tences are usually considered as without special sentiment polarity. In contrast with the conclusion

of Go et al., Pak and Paroubek report that n-gram and POS strategies both make contributions to

the performance.

On the other side, the research of Barbosa and Feng [13] mainly focus on the syntax features

such as hashtags, URL links, and exclamations, and then make a combination with the POS mod-

el.

In the work of Kouloumpis et al. [14], the authors explore the utility of linguistic features for

detecting the sentiment of tweets. They use a supervised approach to evaluate the usefulness of

existing lexical resources and the informal and creative features in Twitter. Their experimental

results show that POS features are not as useful as features from an existing sentiment lexicon

while they are conjunct with the microblogging features.

Agarwal et al. [2] examine sentiment analysis on Twitter by tree kernel and feature based mod-

els. They figure out that both these models outperform the unigram baseline, and the most im-

portant features are combinations of prior polarity of words and their POS tags. They also tenta-

tively conclude that sentiment analysis for Twitter data is not that different from sentiment analy-

sis for other genres.

In paper presented by Saif et al. [15], the authors introduce a novel approach of adding seman-

tics as additional features into the training set for sentiment analysis. They compared the semantic

features with the unigrams and the POS sequence features as well as the sentiment-topic features,

and find that the semantic feature model outperforms the unigram and POS baseline for identify-

ing negative and positive sentiment.

Inspired by the social sciences findings, the work of Hu et al. [16] investigates the utility of so-

cial relations in sentiment analysis. The authors proposed a high performance framework for han-

dling noisy and short tweets, and obtain the conclusion that the user-centric social relations are

quite useful for sentiment classification of Twitter data.

All the above-mentioned works only take the common English tweets into consideration, and

8

have not touched upon the cross cultural backgrounds.

3.2 Cross-lingual Analysis

As for the field of cross-lingual sentiment analysis, the noted opinion analysis system Oasys,

which is proposed by Cesarano et al. [17], allows the user to observe the change of intensity of

opinion over countries and news sources.

In the work of Abbasi et al. [18], sentiment analysis methodologies are proposed for classifica-

tion of web forum opinions in English and Arabic. A range of stylistic and syntactic features are

evaluated, and the proposed sentiment classification methods are proved to be useful for the doc-

ument level sentiment analysis.

Guo et al. [7] focuses on extracting customers opinions from the reviews and predicting their

sentiment orientation in multiple languages. They present an aspect-oriented opinion mining

method with Cross-lingual Latent Semantic Association model, and by applying this model, they

report that the proposed method achieves better performance compared with the existing ap-

proaches.

The work of Cui et al. [19] uses emotion tokens to solve the problem of cross-lingual sentiment

analysis. The authors hold that emotion tokens are commonly used in Twitter, and they directly

express one’s emotion regardless of his language. They compared their approach with semantic

lexicon based approach and some web services on the sentiment analysis task of multi-lingual

tweets, and prove the effectiveness of the proposed algorithms.

Gao et al. [20] research on Twitter and the Chinese version of Twitter---Sina Weibo, and make a

series of simple statistical comparisons in several different aspects, such as the characteristics of

user behaviors and the content of messages.

Compared to all these works, this research focuses on the analysis of cross-lingual user evalua-

tions, which is based on the sentiment classification using the dataset of Twitter and reviews on

the Web. More than 30 languages and more than 30 countries are taken into account, so as to ob-

tain more authentic and comprehensive results for culture-based analysis. The approaches are then

expanded to the travel field, which testifies the transferability of the proposed cross-cultural sen-

timent analysis methods.

9

4. Methodology

In this section, all the approaches and algorithms are carried out on the datasets of restaurant domain,

and are also applied in the later step of field expansion.

As for the new points that are proposed in this section, they can be briefly concluded as follows:

In the location definition step, both manually constructed dictionary and existing API are

used, so as to define a larger proportion of tweets despite the noisy location data in the origi-

nal tweets;

In spam filtering step, a series of traditional and new features are applied to train spam classi-

fiers to gain high performances;

In the feature selection step of the sentiment classifications, in addition to the common-used

syntax feature, the traditional unigram method is modified to overcome the word sparsity,

and the review data is fully utilized to create novel features. Also, classical statistical method

is adapted to fit this task, and popular NLP tools are harnessed to produce potential feature

groups.

4.1 Data Collection

The data used in this research mainly comes from two sources, Twitter and restaurant review web-

sites.

First, as for the Twitter data collection, 9,523,211 restaurant-related tweets were gathered in 4

months (from Sep. 2013 to Dec. 2013), by using Twitter Streaming API and Search API, which

have been introduced in Section 2.1. All the data has been restricted by the names of target restau-

rants (i.e. McDonald’s, KFC, Burger King, Pizza Hut, Subway, and Starbucks), which has been

translated into multi-languages.

Then, as an auxiliary dataset, the review dataset was constructed by collecting the English-written

reviews from some popular review websites91011. The reviews include the text comment parts and

their corresponding scores, as explained in Section 2.2. Totally 55,031 reviews were collected in

this step.

4.2 Translation and Pre-filtering

In this research, 34 languages (i.e. en, es, id, ja, fr, pt, tl, ru, tr, zh, ar, th, et, nl, it, de, ko, bg, sv, pl, vi,

sk, da, ht, lt, lv, sl, fi, is, no, fa, hu, el, uk12) are taken as the target languages. The selection of the

target languages is based on tweet amounts, language populations, and whether can be translated by

machine translation tools. As mentioned in Section 2.1, the original Twitter data gives the ma-

chine-detected language for the text by the field ‘lang’, and this information is simply used as the

recognized language for the tweet in this research. Here, tweets that cannot be correctly translated

into English are discarded.

The remaining data is then filtered by the pre-defined condition of being related with restaurants,

which is restricted by a list of restaurant related words that have the highest frequencies in the

9 http://www.tripadvisor.com/. 10 http://www.yelp.com/. 11 http://www.zagat.com/. 12 http://en.wikipedia.org/wiki/ISO_639-1/.

10

review dataset. Here, 45 most frequent words are selected as the restaurant related words to filter the

original Twitter data.

4.3 Location Definition

A location name dictionary (with 1836 entries) is manually constructed by referring to online sta-

tistics13

, and is used to query the names of counties or cities appeared in the location-related items

of tweets (e.g. the ‘location’ item in user profile). Then Yahoo yql API14 is used to parse these items

of the remained undefined tweets again to obtain more definitions. After these two steps, the ratio

of the tweets that have been labeled with location names is 72.8%. This part of tweet data is further

used in the later steps.

To make clear of the effectiveness of the proposed method of location definition, Table 4.1

shows the comparison among the definition ratios by using the direct information given by tweet

fields and by using the location name dictionary and yql API. In the table, ‘coordinates’ in the

reference column means using the direct information in the ‘coordinates’ field in a tweet, ‘coordi-

nates’ + ‘place’ means using the direct information in both the ‘coordinates’ field and the ‘place’

field in a tweet. Similarly, by using information in ‘coordinates’ and ‘place’ fields, as well as the

manually constructed dictionary and yql API, a definition ratio of 72.8% is obtained, which is

presented in the last row of the table.

Table 4.1: Location Definition Ratio Comparison

reference definition ratio

‘coordinates’ 7.0%

‘place’ 6.1%

‘coordinates’ + ‘place’ 7.3%

‘coordinates’ + ‘place’ + location dictionary 51.4%

‘coordinates’ + ‘place’ + location dictionary + yql API 72.8%

As can be inferred from the above table, if only the direct information in certain location related

fields are utilized to define the locations, most (approximately 92.7%) of the Twitter data should

be discarded in this very early stage, which will apparently lower the efficiency of the experiment.

By referring to the manually constructed location dictionary, 44% more tweets became useful, and

by additionally collaborating with the yql API tool, tweets that can be used in later steps further

increased.

4.4 Spam Filtering

Strictly, whether a tweet is spam or not in this research should depend on whether the content of the

tweet text contains some useful information to indicate the subjective opinions towards the res-

taurants. In this research, however, a simple spam filtering technique is applied. Firstly, adver-

tisements and pure ‘check-in’ tweets are regarded as ‘spam’. In addition, tweets posted in a certain

short time period which have exactly the same contents are also considered as ‘spam’.

A Bayesian classifier is used here, because Bayesian classification is usually robust to the noisy

13 http://en.wikipedia.org/wiki/Lists_of_cities_by_country 14 http://developer.yahoo.com/yql/.

11

information. The training features include the number of the followers and friends of the user, the

ratio of the number of followers and friends, the date of the registration, average number of new

friends and followers per day, the latest 20 posted tweets, and also some syntax characteristics such

as the at marks, hash tags, and URL links. Among the tweets whose location cannot be defined in

the last step, 1200 tweets of them are randomly selected as the training set, and the later steps of

training and cross-validation are implemented over this set. Each tweet in these two sets is judged

by three persons, and a majority vote is applied to decide whether the tweet belongs to ‘spam’ or

not. Finally, the performance of the ‘spam’ classifier turns out to be of an accuracy of 97.8% by

using all the proposed features. This trained classifier is then applied to the whole dataset to filter

out the ‘spam’ tweets.

Again, as a proof of the effectiveness of the proposed features in spam classification, the accu-

racies of the classifiers trained by different groups of features are demonstrated and compared in

Table 4.2. Some of the feature groups are followed by the bracketed enumerations of the concrete

features that are actually used.

Table 4.2: Spam Classifier Performance Comparison

features accuracy

syntax features (‘#’, ‘@’, URL, RT) 89.3%

syntax features

+ friend count, follower count 92.5%

syntax features

+ friend count, follower count

+ considering the time period (new friends/day, new follow-

ers/day, tweets/day)

94.4%

syntax features

+ friend count, follower count

+ considering the time period

+ considering the recent 20 tweets (‘#’, ‘@’, URL, RT counts

in the 20 tweets, repeating tweets in the 20 tweets )

97.8%

In addition to the traditional syntax features and friend/follower count features, which are

commonly used in the spam classification tasks, the time-based counts and the recent tweet stream

are deemed to be important and helpful to identify ‘spam’ tweets. As presented in the above table,

by cooperatively considering the time period and the latest 20 tweets, the accuracy of the spam

classifier has increased to some extent.

Besides, compared to the previous study15, the 9.8% proportion of ‘spam’ tweets in this research

is relatively high. This may be explained by the specialized definition of a ‘spam’ tweet, and the

special focus on the restaurant field. In this step, these 9.8% ‘spam’ tweets are filtered out from the

dataset.

15 http://www.pearanalytics.com/wp-content/uploads/2012/12/Twitter-Study-August-2009.pdf/.

12

4.5 Features for Sentiment Classification

4.5.1 Dictionary Construction

Before the features selection step, two dictionaries are constructed beforehand.

First, a total word dictionary records all the words appeared in the total Twitter dataset for more

than 3 times, with their occurrence frequencies. The size of this dictionary (tw_total_dict) turns out

to be 58,615 entries.

Then, an initiative polarity dictionary (pol_dict_ini) is constructed by combining the entries of

several popular authoritative polarity dictionaries on the Internet (Table 4.3). The entries in

pol_dict_ini totally amount to 125,277.

Table 4.3: The Structure of the Initial Polarity Dictionary

Label Source

Positive

Positive Score > 0.75, or Positive Score – Negative

Score > 0.5 (SentiWordNet16

),

Strong Positive (MPQA17

),

Positiv category (the General Inquirer18

)

Negative

Negative Score > 0.75, or Negative Score – Posi-

tive Score > 0.5 (SentiWordNet),

Strong Negative (MPQA),

Negativ category (the General Inquirer)

Neutral Positive Score = 0 and Negative Score = 0 (Senti-

WordNet)

4.5.2 Syntax Features

The special syntax characteristics of tweets result in inconvenience while preprocessing the tweet

texts, but on the other hand, they are quite informative in the task of sentiment analysis.

In this research, totally 10 syntax characteristics (i.e. ‘!’, ‘?’, ‘#’, ‘@’, ‘RT’, upper-case words,

capitalized words, URL links, emoticons, and slang words) are taken into consideration. All these

characteristics are counted by their occurrences in one tweet, and this 10-dimension vector is re-

garded as the ‘syn’ feature. Here, a manually built emoticon dictionary (with 300 entries) and slang

dictionary (with 200 entries) are referred to during the counting process.

4.5.3 Modified Unigram

Compared to the standard unigram model, an additional dimension-reduction is applied while

processing the modified unigram features, so as to alleviate the influence of word sparsity.

First, for each word in tw_total_dict, set the polarity score as 2, -2, and 0 if it is labeled as Positive,

Negative, and Neutral in pol_dict_ini respectively. Then parse all the tweets to calculate out the PMI

(Pointwise Mutual Information) values of all the pairs of words in tw_total_dict. The PMI value of

16 http://sentiwordnet.isti.cnr.it/. 17 http://mpqa.cs.pitt.edu/. 18 http://www.wjh.harvard.edu/~inquirer/.

13

word 𝑤1 and 𝑤2 is given by

𝑃𝑀𝐼(𝑤1 , 𝑤2) = log𝑝(𝑤1 , 𝑤2)

𝑝(𝑤1) ∙ 𝑝(𝑤2)

where, 𝑝(𝑤1, 𝑤2) is the co-occurrence probability of word 𝑤1and 𝑤2 in one tweet, and 𝑝(𝑤1)

and 𝑝(𝑤2) are the occurrence probabilities of word 𝑤1and 𝑤2 in one tweet respectively.

Then, for each word NOT appears in pol_dict_ini, sort its PMI values with the words in

pol_dict_ini, and carry out majority voting among the top 10 sorted items. The ‘positive inclined’

word is then score as 1, the ‘negative inclined’ word is then scored as -1, and other words (i.e. the top

10 corresponding words are all from the Neutral category in pol_dict_ini) is then scored as 0. The

output of this step is a new polarity dictionary (pol_dict) with the vocabulary of total_dict, and each

word in it is mapped to a score of 5 scales (i.e. 2, 1, 0, -1, and 2).The comparison of the word counts

for each scale before and after this step is showed in Table 4.4.

Table 4.4: The Comparison of Polarity Word Counts Between pol_dict_ini and pol_dict

score pol_dict_ini pol_dict

2 17,494 17,494

1 0 7,108

0 19,581 24,332

-1 0 4,957

-2 4,737 4,737

Total 58,615 58,615

Based on pol_dict, each tweet can be projected to a 5-dimension vector, and each dimension

records the count of the unigram words in this category. This vector is named as the ‘5s’ feature.

Specifically, to testify the effectiveness of the proposed modified unigram feature, in compari-

son with the standard unigram feature, the performance of the classifier that is trained by only this

one feature (i.e. standard unigram feature, or modified unigram feature) is calculated out. The

training and test set consists of 1000 manually labeled tweets, and for each situation, SVM (Linear,

RBF, and Polynomial) methods and the Naïve Bayes (Gaussian, Multinomial, and Bernoulli)

methods are applied, and the highest accuracy is recorded in the table below (Table 4.5). As for the

more detailed training and test settings, please refer to Section 5.3.

Table 4.5: Standard Unigram Feature and Modified Unigram Feature Effectiveness Comparison

feature subjectivity classification accuracy polarity classification accuracy

Standard Unigram 44.8% 66.7%

Modified Unigram 65.6% 76.5%

From the above table, it can be figured out that by using the modified unigram feature, the ac-

curacy of the classifier has been increased by a wide margin, both in the subjectivity classification

and the polarity classification case, thus proving the effectiveness of the proposed ‘5s’ feature.

4.5.4 Review Dataset-based Average Score

While users always express their opinions in their tweets, they also give out the clear evaluation

towards the products and services on some special review websites. These reviews are much more

detailed in the user experience, and are always with a corresponding concrete score, such as the

14

most common 5 scale score. This kind of information can be quite useful if taken full advantage of.

In the previously constructed review dataset, each entry has a tuple structure of (𝑡𝑒𝑥𝑡, 𝑠𝑐𝑜𝑟𝑒). In

this step, all the text parts are first processed as a BoW model, and the total vocabulary of the review

dataset is described as 𝑊𝑟𝑣. For each word 𝑤𝑖 in 𝑊𝑟𝑣, the review dataset-based polarity score is

calculated by

𝑝𝑜𝑙𝑤𝑖=

∑ 𝑠𝑐𝑜𝑟𝑒𝑗𝑡𝑒𝑥𝑡𝑗∈𝑇𝑋𝑤𝑖

|𝑇𝑋𝑤𝑖|

where, 𝑇𝑋𝑤𝑖 is the set of review texts, in which the word 𝑤𝑖 occurs, 𝑡𝑒𝑥𝑡𝑗 is a review text in

𝑇𝑋𝑤𝑖, and 𝑠𝑐𝑜𝑟𝑒𝑗 is the corresponding score of 𝑡𝑒𝑥𝑡𝑗.

Then for each tweet itw in the Twitter dataset, the review dataset-based average score is given

by

𝑎𝑣𝑔𝑡𝑤𝑖=

∑ 𝑝𝑜𝑙𝑤𝑗𝑤𝑗∈𝑊𝑡𝑤𝑖

|𝑊𝑡𝑤𝑖|

where, 𝑊𝑡𝑤𝑖 is the word set of 𝑡𝑤𝑖, and 𝑝𝑜𝑙𝑤𝑗

is the polarity score of 𝑤𝑗 given by the last step.

In these two calculation steps, length normalization is applied that the occurrence number of the

word that has the highest frequency in a review or in a tweet is normalized into 1. The float average

score calculated by the second formula is named as the ‘rv’ feature.

4.5.5 Review Dataset-based CCA Score

Canonical correlation analysis (CCA) is a classical statistics method to figure out the latent relations

among multiple variables. In some previous work, CCA has already been used in many fields, such

as image retrieval [21], data clustering [22], and opinion mining [23].

In this case, each entry in the review dataset consists of a comment text and a 5-scale score, which

is described by the format (𝑡𝑒𝑥𝑡, 𝑠𝑐𝑜𝑟𝑒). For the reason that there must be some consistency in the

comment text and score from the same person, it can be safely concluded that there is some latent

relationship between them. Thus, the CCA method can be used here to get the latent relationship

between the users’ sentiment and polarity words. Here, the first correlated variable is adopted as the

measure criterion. The review dataset is taken as the condition set, and the first correlated variable

parameters are decided by the CCA process. Then, for each tweet in the Twitter dataset, the first

correlated variable is calculated and this float number is given to each tweet as the ’cca’ feature.

4.5.6 Window Co-occurrence-based Average Score

Since that the neighboring relationship among words may contain indicative information for sen-

timent analysis, the score based on the co-occurrence in a three-word window is calculated out in

this section.

Inspired by the previous research [24], in which a propagation algorithm is applied to analyze the

sentiment of online reviews, a modified graph-based propagation algorithm is also adopted here to

obtain the polarity score of each word in tw_total_dict based on the three-word window neighboring

relationship.

First, a co-occurrence dictionary is constructed by parsing all the tweets in the Twitter dataset.

The key of the item in this dictionary is the word pair 𝑤𝑖_𝑤𝑗, and the value of the item in this dic-

tionary is the times 𝑡(𝑤𝑖 , 𝑤𝑗) these two words appeared in the three-word window.

Then, as an initial propagation graph, all the words in tw_total_dict are taken as the nodes of the

15

graph. The value of each node is initiated as 1, and -1 for the words in the Positive category and

Negative category of pol_dict_ini respectively. For other words, the initiated node value is set as 0.

Then, for each iteration, the value of each node is updated by

𝑣′𝑛𝑖

= (1 − 𝛼) ∙∑ 𝑣𝑛𝑗

∙ (1 + log (𝑡(𝑛𝑖, 𝑛𝑗)))𝑛𝑗∈𝑁𝐸𝐼𝑛𝑖

∑ (1 + log (𝑡(𝑛𝑖 , 𝑛𝑗)))𝑛𝑗∈𝑁𝐸𝐼𝑛𝑖

+α ∙ 𝑣𝑛𝑖

where, 𝑁𝐸𝐼𝑛𝑖 is the set of the nodes neighbored with node 𝑛𝑖, and 𝑡(𝑛𝑖 , 𝑛𝑗) is the co-occurrence

times of the words of node 𝑛𝑖 and 𝑛𝑗, according to the previous built co-occurrence dictionary. α

is a tuning parameter, which is set as 0.6 in this step. In the final graph where it converges, each node

has a float value indicating the polarity of the word of this node. A polarity dictionary can be ob-

tained by this final graph, and the average score of each tweet can be calculated based on the newly

constructed polarity dictionary. This float score for each tweet is named as the ‘win3’ feature.

4.5.7 POS-based Feature

The POS (part-of-speech) information is usually used in the NLP analysis, and some part-of-speech

pairs are especially sentiment expressive. Here, all the tweets are first processed by the Stanford

Parser19 to get the dependencies trees. Then 10 most common and sentiment expressive POS pairs

(i.e. ‘acomp’, ‘advmod’, ‘amod’, ‘conj’, ’dobj’, ‘neg’, ’ nsubj’, ‘purpcl’, ‘rcmod’, and ‘xcomp’.) are

chosen manually, and the sentiment expressed in these pairs are decided according to some manu-

ally constructed rules (e.g. the sentiment expressed in a ‘neg’ pair is the opposite of the sentiment of

the polarity word in the pair ). For each tweet in the Twitter dataset, each above-mentioned POS

pair that appears in the tweet is given with a polarity label. Then, to decide the polarity of the

tweet, a simple majority voting method is applied, which means that the polarity label that has the

biggest POS pair count passes its polarity to the tweet. This feature is called ‘pos’ in the later

analysis steps.

19

http://nlp.stanford.edu/software/lex-parser.shtml/.

16

5. Experiment

In this section, the experiment is carried out over the restaurant domain, but the basic steps and

training methods are also applied in the field expansion section.

5.1 Overview

The main steps of the whole experiment are described in the flow chart shown in Figure 5.1. After

collecting the Twitter and review data, the location definition step is carried out, and two diction-

aries are manually constructed based on the datasets and the online dictionaries explained previ-

ously. Then three main classifiers are trained and used to classify and the tweets. Finally, based on

the classification results and the data analysis results, the cultural effect on evaluations is clarified.

Figure 5.1: The Main Flow of the Experiment

5.2 Preprocessing

As for the Twitter dataset, the preprocessing basically contains 12 steps: a) ‘RT’ and URL links

deletion, b) Emoticons conversion, c) Lower-casing, d) HTML transcoding, e) Hashtags conversion,

f) Punctuation deletion, g) Word segmentation, h) Non-alphabet words and single alphabet words

deletion, i) Stop words discard, j) Repeated alphabets reduction, k) Chat words conversion, l)

Lemmatization. However, the processing is also task-specific in some steps. For example, as input

for the Stanford Parser, only a) to e) are carried out.

As for the review dataset, the preprocessing is much simpler that it only contains 6 steps: c)

Lower-casing, g) Word segmentation, h) Non-alphabet words and single alphabet words deletion, i)

Stop words discard, k) Chat words conversion, l) Lemmatization.

Here, ‘RT’, URL links, hashtags, and repeated alphabets are recognized and processed by regu-

lar expressions. The emoticon conversion step, the stop words discard step, and the chat words

conversion step are based on manually constructed emoticon dictionary (with 300 entries), stop

word list (with 145 terms), and chat word dictionary (with 150 entries) respectively. The low-

er-casing step, the HTML transcoding step, the word segmentation step, and the lemmatization

17

step are implemented by the NLTK20

tools.

5.3 Sentiment Classification

In this research, sentiment classification is divided by two steps. The first step, subjectivity classi-

fication, is to classify the spam filtered dataset into the subjective dataset and the objective dataset.

The second step, polarity classification, is to further classify the subjectivity dataset into the positive

dataset and the negative dataset. In each of these two steps, a pre-trained classifier is applied to carry

out the classification. The training process of these two classifiers is described in details as follows.

Features selection. In previous sections, 6 groups of features are introduced. They are ‘syn’, ‘5s’,

‘rv’, ‘cca’, ‘win3’, and ‘pos’ features. All the combinations of these 6 groups of features are im-

plemented in this experiment.

Training method. The SVM (Linear, RBF, and Polynomial) methods and the Naïve Bayes

(Gaussian, Multinomial, and Bernoulli) methods are used in this experiment.

Training implementation. The total number of implementation variations turns out to be

(26 − 1) ∙ 6 = 378

Validation method. The standard 10-fold cross-validation is applied here.

Training set. For the subjectivity classifier, 1000 tweets, half of whose subjectivity is objective and

another half is subjective are selected from the manually labeled tweets.

For the polarity classifier, 1000 tweets, half of whose polarity is positive and another half is

negative are selected from the manually labeled subjective tweets.

Here, the polarity of the each training tweet is judged by three persons, and then a majority vote

is applied to finally decide the polarity of the tweet.

Test results. Top-7 test results of the subjectivity classifiers and the polarity classifiers are shown in

Table 5.1 and Table 5.2.

Table 5.1: Subjectivity Classifiers Performance

syn 5s rv win3 cca pos accurracy

74.7%

74.9%

75.8%

76.4%

76.5%

77.5%

78.4%

20 http://www.nltk.org/.

18

Table 5.2: Polarity Classifiers Performance


82.2%

85.3%

87.2%

89.6%

89.9%

90.6%

91.1%

The left column represents the implementation where black circle marks indicate the adopted

features, and the right column shows the accuracy of the implementation. As shown in these tables,

the best-performed subjectivity classifier is obtained by the features combination of ‘syn’, ‘rv’,

‘win3’, and ‘pos’, with SVM polynomial training method, while the best-performed polarity clas-

sifier is obtained by the features combination of ‘rv’, ‘win3’, ‘cca’, and ‘pos’, with the SVM linear

training method. These two classifiers are adopted in the classification step for the whole spam

filtered Twitter set.

Here also gives several failure examples in the two sentiment steps, and these cases are further

analyzed so as to figure out the reason for the errors.

Subjectivity classification:

Case 1:

Tweet text: RT @t0shiba: Just a note from last night...Pizza Hut is good pizza...If you have dry

skin...Rub some Pizza Hut on it...Healed!

Manual label: subjective

Classification result: objective

Analysis: this text uses irony to express the negative sentiment, but it seems difficult for the ma-

chine to recognize such kind of sarcasm in a sentence.

Case 2:

Tweet text: If only some people knew how KFC got the chicken to you, some would rather starve

than ever eat KFC again! #SpeakUp

Manual label: subjective

Classification result: objective

Analysis: there is a comparison expression in this tweet, which is quite confusing for a classifier.

Case 3:

Tweet text: @MenHumor: There's a special place in hell for murderers and the guy who decided

what time breakfast ends at McDonalds.”

Manual label: objective

Classification result: subjective

Analysis: even humans cannot fully understand the meaning of the text, let alone for a machine.

Subjectivity classification:

Case 1:

Tweet text: @SamayoaMarissa: "@UberFacts: Burger King uses approximately 1/2 million

19

pounds of bacon every month." you pig killers )':” >:)

Manual label: negative

Classification result: positive

Analysis: the emoticons at the end of the text have clouded the judgment of the classifier.

Case 2:

Tweet text: Ewwwwwww! RT @DREADHEADNATI0N: Mcdonalds did me right… I could eat it

everyday! #fatgirlproblem

Manual label: positive

Classification result: negative

Analysis: because the proposed classifier also takes the commonly used hashtags into account, this

may sometimes lead to the incorrect result.

20

6. Analysis

6.1 Statistical Analysis

Based on the ‘list of restaurant chains’ on Wikipedia, 6 restaurants (i.e. McDonald’s, KFC, Burger

King, Pizza Hut, Subway, and Starbucks) that are located worldwide are chosen as the research

subject. After filtering the Twitter dataset with these restaurants’ names, the location definition

process is carried out, and 33 countries, as shown in Table 6.1, are selected as target countries. That

is, only tweets from these 33 countries and areas are remained to be processed by the spam filtering

step. While our original Twitter dataset amounts to 10 million, the size of this pre-filtered and

spam-filtered Twitter dataset has been reduced to approximately 2 million. This dataset becomes the

input of the later steps of sentiment classification.

Table 6.1: Target Countries and Their ISO 3166-1 Codes (Restaurant Domain)

United States (US), United Kingdom (GB), Australia (AU), Indo-

nesia (ID), Malaysia (MY), Canada (CA), Philippines (PH), Singa-

pore (SG), Brazil (BR), India (IN), South Africa (ZA), Japan (JP),

Mexico (MX), France (FR), Netherlands (NL), Greece (GR), Thai-

land (TH), China (CN), Russia (RU), Spain (ES), Argentina (AR),

Chile (CL), South Korea (KR), Germany (DE), Italy (IT), Ireland

(IE), Venezuela (VE), Colombia (CO), Poland (PL), Egypt (EG),

Ukraine (UA), New Zealand (NZ), Viet Nam (VN)

In this section, basic statistical analysis is taken out to obtain a general overview of these res-

taurants in the 33 countries. Table 6.2 lists out the amount of preprocessed tweets for each target

country. Figure 6.1 shows the distribution of tweets over the 6 restaurants in each country.

Table 6.2: Tweet Amount for Each Country (Restaurant Domain)

US 888,221 JP 119,721 KR 11,201

GB 114,513 MX 28,298 DE 10,187

AU 12,328 FR 51,124 IT 5,513

ID 105,397 NL 84,755 IE 8,683

MY 67,316 GR 72,084 VE 22,437

CA 118,117 TH 59,854 CO 6,718

PH 21,703 CN 59,903 PL 3,920

SG 23,864 RU 37,255 EG 4,069

BR 54,357 ES 39,424 UA 3,567

IN 11,306 AR 27,522 NZ 2,547

ZA 5,271 CL 24,512 VN 1,374

21

Figure 6.1: General Distribution of Tweets in Restaurant Domain

As shown in Table 6.2 and Figure 6.1, the following conclusions can be obtained.

a) There is a huge difference among the tweet amounts for the target countries, and tweets from

the United States predominate in quantity;

b) In different countries, the distribution of tweets over the 6 restaurants is quite different;

c) The overall distributions of the 6 restaurants are also quite biased that tweets for some restau-

rants such as Pizza Hut are much fewer than those for other restaurants;

d) These distributions may give information for the popularities of each restaurant in each coun-

try.

6.2 Basic Sentiment Analysis

After applying the optimal subjectivity classifier and polarity classifier described in Section 5.3,

the preprocessed Twitter dataset is divided into 3 polarity groups, i.e. positive, negative, and ob-

jective. Based on these 3-way classification results, a series of analyzing approaches are imple-

mented, and the concrete descriptions are presented in the subsections.

6.2.1 Polarity Distribution

To figure out the proportions of positive, negative, and objective tweets for each country and for

each restaurant, the polarity distribution graphs are plotted, as shown in Figure 6.2~6.7. The rose

color stands for the positive tweets, the azure color stands for the negative sentiment, and the

lemon yellow stands for the objective tweets.

Figure 6.2: Polarity Distribution for Burger King

22

Figure 6.3: Polarity Distribution for KFC

Figure 6.4: Polarity Distribution for McDonald’s

Figure 6.5: Polarity Distribution for Pizza Hut

Figure 6.6: Polarity Distribution for Starbucks

23

Figure 6.7: Polarity Distribution for Subway

From the above polarity distribution graphs, it can be figured out that:

a) For different restaurants, the general distributions of the positive, negative, and objective

tweets are fairly different. For instance, the proportion of the positive tweets for McDonald’s

obviously outstrip that of other restaurants, which may suggest that McDonald’s enjoys a bet-

ter reputation among people in the world level;

b) Objective tweets predominate in the amount for all the target restaurants, while positive

tweets outnumber negative tweets in general.

c) For the same restaurant, people from different countries seem to have quite different attitudes.

For example, in the case of McDonald’s, Indonesian people seem to favor the restaurant not as

much as American people do, since Indonesia has a larger percentage of negative tweets than

the United States, and also has a smaller percentage of positive tweets than the United States.

6.2.2 Sentiment Map

While the positive, negative, and objective tweet is given a polarity score of 1, -1, and 0 respectively,

the sentiment maps for the target restaurants are depicted in Figure 6.8~6.13. As for the gradient

color axis, green represents negative sentiment; and red represents positive sentiment.

Figure 6.8: Sentiment Score Map of McDonald’s

24

Figure 6.9: Sentiment Score Map of KFC

Figure 6.10: Sentiment Score Map of Burger King

Figure 6.11: Sentiment Score Map of Pizza Hut

25

Figure 6.12: Sentiment Score Map of Subway

Figure 6.13: Sentiment Score Map of Starbucks

By representing the sentiment by gradient color, the above sentiment maps demonstrate the

overall distributions of people’s opinions for the targets in the restaurant domain. Compared to the

polarity distribution graphs in the last subsection, these maps give more intuitive presentation of

the sentiment distribution and make it possible to consider the geographic elements as well.

6.2.3 Sentiment Keywords Extraction

As for the more specific reasons why people like or dislike a target, or the concrete characteristics of

a target that shape people’s attitudes, it remains unclear and needs further exploration. To this end,

the frequently occurred sentiment words, either positive or negative, are extracted with their fre-

quencies for each target restaurant, and the tag cloud is harnessed as a tool to describe these rep-

resentative sentiment keywords. Figure 6.14~6.19 give thee tag clouds for the targets. The white

background indicates the positive sentiment, and the black background indicates the negative sen-

timent. The size of the word denotes the occurrence frequency, and the multicolor of the word has no

special significance.

26

Figure 6.14: Tag Cloud of Sentiment Keywords for McDonald’s

Figure 6.15: Tag Cloud of Sentiment Keywords for KFC

Figure 6.16: Tag Cloud of Sentiment Keywords for Burger King

27

Figure 6.17: Tag Cloud of Sentiment Keywords for Pizza Hut

Figure 6.18: Tag Cloud of Sentiment Keywords for Subway

Figure 6.19: Tag Cloud of Sentiment Keywords for Starbuck

From these tag clouds, we may obtain some clues of the reasons for people’s likes or dislikes of

the target restaurants. However, as it can be figured out in both positive and negative tag clouds, not

so much specific information can be acquired due to the big overlap of vocabulary among the

different targets. Thus, to compensate for this deficiency, (Attribute, Value) pairs are used to de-

28

scribe the target restaurants, which will be introduced in the next subsection.

6.2.4 (Attribute, Value) Pairs Extraction

Based on the Stanford dependency trees obtained in the sentiment classification step, we select out

the sentiment expressive word pairs (explained in Section 4.5.7), each of which typically but not

restrictedly consists of one noun (attribute) and one adjective (value), to construct the (Attribute,

Value) list for each target. Table 6.3~6.8 give parts of the (Attribute, Value) lists of the target res-

taurants. Red color and green color represent positive and negative sentiment respectively. Numbers

following the value words denote frequencies.

Table 6.3: (Attribute, Value) list of McDonald’s

Attribute value

mcdonalds new 33377, good 3992, fat 3055, great 1533, best 1525, bad 1519, big 1210, commercial 913,

happy 907, large 661, better 743, delicious 637, fresh 485, nasty 384, american 379, healthy

363, nice 331, fast 326, yummy 315, bagged 307, expensive 298, different 297, small 280,

mcgorgeous 74, sonic 233, unhealthy 183, worst 162, funny 156, stupid 151, packaged

150, greatest 148, poor 148, favorite 147, perfect 145, beautiful 143, ill 132, regular 120,

romantic 119, terrible 116, slow 113, weird 102, greater 101, original 99, successful 99, quick

87, worse 75, greasy 69, horrible 69, instant 68, awful 66, biggest 65, huge 63, famous 61,

special 60, busy 59, international 57, wonderful 57, healthier 54, top 53, cheaper 51, lucky 49,

desperate 48, fantastic 46, hilarious 46, bigger 45, classic 45, tasty 45, normal 44, common

43, creative 41, scary 40, standard 39, acceptable 35, nastiest 34, daily 34, dirty 33, fatty 33,

ridiculous 32, slowest31

food fast 1861, free 1263, chinese 482, great 451, good 359, best 302, worst 281, healthy 244,

favorite 216, new 151, leftover 133, unhealthy 118, delicious 104, bad 104, terrible 90, fat 82,

better 75, nasty 67, indigestible 60, nice 60, mexican 58, greasy 46, normal 45, regular 35,

asian 35, expensive 35, fresh 34, organic 34 ,lethargic 33, nutritious 31, awful 30, healthier

29, indian 27, filthy 27, healthiest 26, horrible 25

burger delicious 606, double 590, cheese 307, better 243, best 184, free 151, mcbusted 107, big 101,

good 92, large 45, fat 38, fish 33, nice 32, special 32, great 29, bad 28, disappointing 21,

small 17, expensive 17, huge 16, nasty 16

chicken real 153, good 143, large 115, fried 92, bad 61, best 60, cheese 58, grilled 52, big 37, french

31, fresh 29, better 22, crispy 21, small 19, garlic 19, hot 18, nasty 15, classic 14, delicious 14

meal happy 1574, free 475, big 456, large 323, whole 277, full 136, extra 115, unhappy 110, happi-

er 97, best 89, traditional 73, favorite 67, good 67, romantic 66, healthy 54, cheeseburger 38,

nice 33, great 30, breakfast 28, bad 22, worst 16,despicable 15, regular 15, small 15, delicious

14, terrible 13

breakfast good 2476, best 1178, big 742, nice 734, bad 679,great 439, perfect 418, full 330, nasty 206,

early 170, delicious 143, yummy 109, favorite 99, english 91, hot 89, healthy 76, poor 75, fat

64, fabulous 62, happy 55, better 47, wonderful 44, worst 32, quick 31

coffee free 3223, bagged 468, hot 453, small 428, packaged 263, large 211, good 210, best 185,

breakfast 58, worse 58, great 49, iced 48, black 45, bad 44, nice 37, delicious 29, worst 29,

better 27, awful 23, horrible 17

fries large 627, fresh 369, french 314, good 184,cheese 136, best 94, small 71, hot 66, cold 58,

great 39, greasy 32, big 29, nasty 28, favorite 27, yummy 21, delicious 16, famous 15

burger unfit 476, double 145, cheese 118, best 104, good 71, large 58, popular 37, expensive 31,

monthly 32, better 32, special 30, different 23, nasty 21, fat 18, favorite 16, big 15, delicious

15, mcdouble 15

29

cheeseburger mcdouble 3704, double 1291, extra 154, large 126, 50cent 29, small 28, big 26, good 18, bad

16, gigantic 15, best 15

mcflurry best 37, m&m 26, yummy 26, chocolate 24, delicious 23, great 22, small 19, good 18, iced 18,

kitkat 15

pie sweet 231, apple 54, hot 53, delicious 46, good 33, chocolate 29, spinach 26, large 23, bad 22,

best 21, cherry 16

pancake breakfast 69, good 66, best 37, chocolate 27, delicious 25, bad 24, chipotle 23, better 21,

blueberry 18, dry 18, hot 15, nasty 15, nice 15

frappe chocolate 142, breakfast 36, good 25, large 16, delicious 15

service full 399, great 67, worst 27,good 26, horrible 26, slow 23, terrible 21, wonderful 17, bad 17,

nice 16, slowest 16, smile 16

mcmuffin cheese 76, delicious 23, french 20, bad 20, english 18, better 15, breakfast 15

Table 6.4: (Attribute, Value) list of KFC

Attribute Value

kfc fried 1361, healthy 1059, new 913, original 634, zinger 450, great 295, commercial 263, best

240, poor 221, delicious 192, fresh 189, bad 178, famous 151, fat 149, better 138, nice 115,

worst 105, nasty 73, special 73, greasy 55, terrible 53, yummy 52, busy 39, perfect 38, favor-

ite 31, happy 29, american 28, expensive 28, nastiest 27, stupid 26, unhealthy 26, fastest 22,

mediocre 22, classic 20

kentucky fried 1889, great 442, favorite 652, top 304, poor 302, good 284, best 228, ridiculous 99,

national 92, bad 88, classic 75, favorite 59, special 59, international 37, fresh 34, nice 33,

professional 28, crazy 24, greatest 24

chicken fried 3290, original 610, poor 236, best 224, good 219, hot 90, real 89, worst 74, bad 61,

delicious 58, great 56, dry 45, terrible 45, greasy 40, cheese 38, fat 35, fresh 32, cold 29, nice

29, small 27, nasty 25, buttered 19, clean 19, famous 18, healthy 18, , artificial 18, favorite

16, finest 15, fry 15, crunchy 14

food great 1659, fast 287, good 136, chinese 101, favorite 90, best 88, healthy 74, unhealthy 43,

worst 40, delicious 34, bad 28, nastiest 25, fried 16, unusual 15, nice 15, yummy 15

dinner unhealthy 295, special 40, roast 31, nice 28, best 26, delicious 20, romantic 19, full 17, good

16, big 15, great 15, healthy 15

lunch good 42, healthy 27, big 25, special 20, best 18, great 16, happy 16

burger double 93, zinger 57, fish 28, good 27, best 27, cheese 20, bad 17, fat 17, hot 17

meal free 150, good 101, big 87, best 66, additional 41, large 32, zinger 27, hot 17, favorite 16,

romantic 16, full 16, gravy 16, great 16, delicious 15, fried 15, nice 15

wings hot 166, best 19, big 19, chicken 17, large 16, zinger 16, good 15, crispy 15, dry 15

fries lontong 356, cheese 73, hot 57, large 46, french 28, cheesy 26, best 25, gravy 22, healthy 21,

weird 21, bad 20, delicious 20, fat 16, good 16, tasty 15, yummy 15

chips fish 58, gravy 37, cheese 25, best 19, delicious 18

service public 54, national 35, terrible 26, early 23, bad 19, great 18, horrible 18, slow 17

Table 6.5: (Attribute, Value) list of Burger King

Attribute Value

burger breakfast 1029, new 735, good 696, better 450, big 304, free 294, commercial 281, nasty 188,

bad 167, double 157, best 156, , cheese 105, fat 88, french 59, great 57, nice 54, original 44,

expensive 38, trash 36, delicious 34, giant 33, weird 31, funny 30, horrible 30, awful 29,

chipotle 28, fish 28, hot 27, huge 26, small 26, special 25, standard 22, worst 22, $5 19,

roasted 19, terrible 18, top 18, fresh 18

food fast 328, worst 152, chinese 88, best 75, great 63, favorite 58, good 45, nastiest 29, bad 28,

slowest 24, american 16, nasty 16

restaurant unhealthiest 61, fast 4, favorite 4, net 3, successful 3

30

breakfast best 103, good 56, full 26, bad 18, free 18, great 18, better 15, big 15, delicious 13, english

12, nasty 12, nice 12, sonic 12

chicken original 88, mad 67, good 56, large 30, best 28, commercial 26, pure 20, fried 16, new 16, $5

14, bad 14, cheese 12, horrible 12

burgerking good 72, new 42, cheese 26, nice 22, best 20, fresh 20, great 20, bad 18, better 18, delicious

17, favorite 17, silent 17, daily 15, funny 15, large 15, stupid 13, yummy 13

sandwich original 84, new 73, big 58, fish 46, cheese 27, double 21, good 20, authentic 19, breakfast 19,

$1 13, american 13, chicken 12, horrible 12

meal free 41, large 30, big 28, romantic 21, best 17, scrumptious 15, hot 14, delicious 13, bad 12,

dutch 12, good 12, happy 11, special 11, vegetarian 10

place first 42, best 30, good 27, favorite 26, slowest 24, nastiest 22, worst 20, fast 16, new 15, nice

15, great 12, grim 10, ridiculous 10

menu impactful 58, favorite 26, detailed 24, new 19, whole 15, best 12

bacon double 156, cheese 61, extra 15, great 15, large 14, fat 14, fresh 11, fried 11, greasy 11

service worst 43, terrible 26, slow 22, good 16, horrible 16, slowest 15, best 12

taste good 55, better 30, great 25, bad 23, horrible 22, weird 18, different 12, similar 12, wrong 10

Table 6.6: (Attribute, Value) list of Pizza Hut

Attribute Value

pizza large 1877, best 1228, good 1025, commercial 924, new 898, stuffed 734, cheese 465, bad

454, favorite 429, general 385,crust 375, fresh 300, big 265, hot 228, better 212, great 186,

viral 174, delicious 166, full 155, nasty 154, wrong 149, garlic 136, nice 129, classic 103,

worst 98, cheesy 77, fried 74, italian 60, hawaiian 59, greasy 58, cold 46, fat 46, special 46,

cheaper 44, different 44, chicken 42, regular 35, $25 32, cheesestuffed 31, biggest 30, terrible

30, trashy 30

pizzahut big 143, national 88, new 87, good 69, delicious 56, great 54, best 44, commercial 20, fat 20,

favorite 19, large 19, stupid 19, bad 18, happy 18, hawaiian 17, healthy 15, slow 15

dinner single 132, big 129, good 36, lovely 26, delicious 25, great 20, romantic 18, best 18, favorite

17, $10 17, happy 15

wings hot 442, good 141, best 55, garlic 40, chinese 29, chicken 28, cheese 25, 50cent 23, asian 23,

better 22, bad 22, boneless 19, delicious 19, hottest 16, small 15, traditional 15

crust stuffed 1020, cheese 836, thin 150, great 134, best 124, large 120, good 86, cheesy 74, deli-

cious 29, cheesestuffed 25, regular 22, bad 21, $12 18, fabulous 17, perfect 17, soft 16

delivery international 177, local 71, free 30, special 28, late 26, brilliant 24, available 21, fast 16, good

16

food great 255, chinese 143, international 103, free 90, good 52, terrible 32, favorite 30, fast 28,

best 26, cold 26, bad 20, delicious 19, italian 19, organic 18, worst 17, disgusting 15, fried 15,

hot 15

pepperoni stuffed 90, large 69, cheese 55, crust 27, double 25, thin 25, hot 23, italian 16, best 15

sticks cheese 165, cinnamon 29, best 23, bread 23, good 18, hot 17, garlic 16, yummy 16

service bad 83, worst 46, terrible 31, horrible 27, great 22, slow 21, awful 21, good 19, poor 17, best

15, fantastic 15

chicken fried 74, best 26, garlic 25, french 24, delicious 20, grilled 15, hawaiian 15

Table 6.7: (Attribute, Value) list of Subway

Attribute Value

subway great 2899, new 1771, good 1519, fresh 1068, best 983, delicious 709, breakfast 590, cheese

369, favorite 251, bad 227, commercial 200, healthy 193, crowded 149, yummy 146, fat 135,

nice 124, better 116, eatfresh 105, greatest 104, big 99, worst 90, sonic 87, scary 86, garlic 75,

chipotle 64, weird 60, nasty 59, top 56, daily 51, special 51, stupid 43, green 38, grumpy 36,

adorable 33, tame 31, expensive 30, funny 30, horrible 30, packed 29, $5 28, cold 28, fast 25,

31

perfect 24, terrible 24, wonderful 23

cookies best 747, good 434, chocolate 187, fresh 57, delicious 56, great 45, breakfast 35, nice 31, bad

29, perfect 29, famous 25, m&m 25, raspberry 24, fat 22, soft 21, hard 18, top 17, daily 16,

favorite 16, terrible 15, oatmeal 14, sweet 14, tasty 14, healthy 13, wonderful 13, yummy 13

sandwich best 391, new 299, delicious 243, national 181, good 167, favorite 165, victorious 87, cheese

56, great 48, nice 46, whole 45, big 43, bad 36, hot 33, worst 32, better 31, breakfast 30,

healthy 30, tuscan 30, vegetable 30, different 29, fat 27, italian 24, $5 19, finest 19, giant 19,

cold 18, huge 18, american 17, fresh 17, indian 16, nastiest 16, toasted 16, weird 15, expen-

sive 15, flatbread 13, garlic 13, greatest 13

lunch good 70, healthy 57, best 54, nice 40, delicious 29, fresh 20, romantic 17, full 15, perfect 13,

quick 13, special 12, wonderful 12

food chinese 361, healthy 182, great 168, fast 157, good 150, best 86, favorite 75, mexican 27,

greasy 22, fresh 21, indian 20, nice 20, asian 19, delicious 17, different 16, fatty 16, spanish

16, bad 15, better 14, expensive 13, horrible 12, overrated 12, unhealthy 12

chicken sweet 121, tuscan 109, delicious 103, cheese 88, footlong 74, double 45, fresh 42, good 30,

italian 27, garlic 19, flat 18, nice 16, fried 15, great 15, roast 15, steak 14

breakfast good 99, healthy 34, best 30, bad 27, better 20, great 20, awful 18, balanced 13, delicious 13,

english 12, favorite 11

bread cheese 146, garlic 114, flat 88, italian 86, fresh 47, white 43, american 34, hot 31, good 29,

stale 28, best 22, jalapeño 21, great 20, delicious 16, bacon 14, healthy 13, meat 12, salad 11,

soft 10

salad best 28, good 24, breakfast 20, healthy 19, italian 17, cheese 15, egg 15, fresh 14, chopped 12

meal stupid 106, best 46, good 31, full 28, healthy 20, romantic 17, great 15, bad 13, big 13, mis-

erable 12

cheese extra 88, swiss 35, fat 31, pepper 31, steak 28, italian 23, flat 19, best 15, white 15, jalapeño

14, yellow 13

service great 29, worst 24, bad 18, better 17, horrible 17, normal 14, rude 14

Table 6.8: (Attribute, Value) list of Starbucks

Attribute Value

starbucks great 3648, new 3319, good 3093, favorite 2536, better 1567, best 1503, delicious 1201,

yummy 749, poor 733, green 678, topshop 626, nice 601, bad 597, perfect 569, happy 434,

big 341, fresh 302, expensive 295, regular 270, sophisticated 267, noble 255, economical 243,

beautiful 224, different 194, special 192, daily 181, original 165, worst 149, cheaper 146,

horrible 141, global 126, nasty 102, greatest 91, wonderful 89, busy 78, reusable 77, super 75,

ridiculous 75, creative 74, fat 72, healthy 71, weird 67, popular 66

coffee good 1494, best 988, favorite 845, hot 639, , expensive 431, black 295, great 245, bad 231,

breakfast 218, exploitative 201, delicious 200, nice 189, poor 151, iced 124, healthy 104, cold

99, fresh 93, instant 88, terrible 87, packaged 74, nasty 73, normal 68, yummy 65, daily 53,

different 52, special 52, worst 49, horrible 39,classic 34, overpriced 33

drink favorite 3870, free 3646, wrong 965, hot 729, best 456, good 369, seasonal 283, cold 182,

expensive 99, delicious 89, special 72, nice 61, cuddle 57, complimentary 42, mineral 36,

chocolate 30, great 24, popular 23

tea green 3398, hot 344, bubble 297, black 217, sweet 181, good 108, best 85, iced 56, great 42,

favorite 35, nice 27, breakfast 25, herbal 18, nonfat 18, red 17, poor 17, bad 17, chamomile

16, classic 16, daily 16, refresh 16

barista favorite 163, cute 152, best 74, temporary 62, friendly 38, good 26, happy 22, attractive 16,

beautiful 15,rude 15, certified 15

latte delicious 104, french 76, good 65, hot 58, brûlée 57, yummy 46, chocolate 45, best 44, favor-

ite 35, breakfast 28, great 27, nonfat 18, fat 18, iced 18, nice 16

32

mocha white 1502, crumble 324, chocolate 260, delicious 101, salted 88, hot 74, best 67, peppermint

66, good 57, favorite 44, , great 29, yummy 20, bad 20, iced 18, nice 18, perfect 17, white-

chocolate 17

menu new 124, whole 55, best 26, entire 24, seasonal 20, daily 19, winter 15

cake chocolate 70, cheese 59, marble 45, good 31, best 28, lemon 25, classic 24, new 24, bad 23,

complimentary 23, fetid 19, sweet 19, birthday 18, crumble 18, delicious 17, fat 16, favorite

16, festive 16, great 15, healthy 15, nice 15, obnoxious 15, truffle 15

place better 486, favorite 94, great 84, best 67, good 51, special 38, expensive 34, historic 24, nice

24, quiet 24, overrated 23, wonderful 23, exclusive 22, greatest 18

milk chocolate 82, nonfat 41, hot 30, almond 24, bad 22, fat 22, delicious 20, fresh 19, best 18,

classic 16, diabetic 16, good 16, latte 15, allergic 15, bubble 15

frappe green 464, chocolate 72, crumble 67, white 42, delicious 25, hot 25, whipped 20 brûlée 18,

caramel 18, cotton 17, good 17, great 17, berry 16, exclusive 16

cookie crumble 466, chocolate 68, big 39, dough 25, latte 25, cute 23, good 23

, delicious 22, favorite 22, frosted 16, ginger 15, great 15, perfect 15

gingerbread latte 282, good 41, delicious 34, favorite 26, yummy 20, best 18, seasonal 18

taste good 291, better 72, bitter 61, burnt 59, bad 48, great 43, different 29, heaven 27, wonderful

22, awful 21, nice 21, new 20, alien 20, delicious 20, horrible 16, best 15, rich 15, weird 15,

nasty 15, perfect 15, special 15, strong 15

donut waffle 157, chocolate 88, good 35, best 25, breakfast 25, cheese 24, fresh 24, great 24, deli-

cious 19, krispy 19, sweet 18, swiss 16

frappuccino crumble 53, chocolate 42, delicious 29, yummy 28, favorite 23, english 22, best 21, good 21,

white 20, caramel 18, wonderful 18, crème 15, strawberry 15

service active 149, great 101, best 25, terrible 21, full 21, good 20, horrible 19, greatest 19, slow 17,

awful 17, global 17, quick 16, slowest 16

cappuccino normal 137, good 28, better 27, hot 25, nice 18, best 18, french 16, nonfat 16, fat 15

wifi free 413, fast 74, unlimited 24, great 23, slow 19, poor 18

The above (Attribute, Value) lists contain abundant information of specific characteristics of the

target restaurants, and from these lists, it is relatively easy to figure out the detailed and concrete

reasons to explain people’s opinions for the restaurants. For instance, the ‘Attribute’ column exacts

out some particular menu or service of the restaurant, and the ‘Value’ column gives the features of a

product, and the sentiment descriptive words for the product.

6.3 Culture-based Analysis

As one of the main objectives of this research, the relationship between the user evaluations for

global restaurants and cultural background is taken as the analysis subject in this section.

Based on the 6 restaurants’ scores for each country, the k-means method is applied to cluster the

target 33 countries into several groups. Here, k is empirically set as 2~10, and figure 6.20~6.28

shows the world map based on the corresponding clustering results. Countries filled with the same

color are from the same cluster.

33

Figure 6.20: Clustering Result Map (k=2)



34




35




36

Upon observing the changing process of the clustering result, we have following information.

a) Most English-speaking countries, as well as most non-English-speaking European countries,

are in the same cluster while k is set as 2;

b) While k is set as 3, most non-English-speaking Asian countries form a group;

c) While k is set as 4, Italy forms a separate group, which suggests Italian people may have quite

different opinions towards these restaurants, or some limitations have contributed to this re-

sult;

d) While k is set as 5, main South American countries form a separate group, which possibly

reflects the location-based cultural effects;

e) While k is set as 6, non-English-speaking East and southeast Asian countries form a separate

group, which may also reflect the location-based cultural effects;

f) While k is set as 7, Spain and Mexico form a separate group, which may reflect the lan-

guage-based cultural effects;

g) While k is set as 8, a few European countries form a separate cluster, which suggests they

share more similar attitudes towards the target restaurants, compared to North American

countries and English speaking countries in other areas;

h) While k is set as 9, RU and UA, TH and VN become two separate clusters, which reflects the

location-based cultural effects;

i) While k is set as 10, CO and VE, EG and ZA become two separate clusters, which may also

demonstrate the location-based cultural effects.

While only focusing on the k=10 situation, the10 clusters turns out to be

US, CA, PH, SG, DE, AU, IN, NZ;

JP, ID, KR;

ES, MX;

IT;

RU, UA;

GB, NL, FR, GR, CN, IE, PL;

MY, BR, AR, CL;

TH, VN;

EG, ZA;

CO, VE.

Based on this clustering result, the following conclusions can be drawn.

a) The location-based cultural effects are quite obvious. For example, the cluster of BR, CL, AR,

the cluster of RU, UA, the cluster of ZA, EG, the cluster of TH, VN, and the cluster of most

of the Western European countries, have been clustered into the same cluster according to

their location and basic cultural background;

b) Some of the English-speaking Asian countries are clustered into the same group with North

American countries, which suggests that the language-based cultural background may have

some effect;

c) Comparing to most of the European countries, some countries, such as ES and IT, seem to

have quite different opinions for these restaurants, which may suggest that they have special

attitudes considering the food culture.

d) However, some confusing results still exist. For example, CN is clustered into the Western

European cultural background group, and MY is clustered into the South American cultural

37

background group. These confusing results may be explained by other effective elements ex-

cept for general cultural background, such as the eating patterns, the brand reputation, mar-

keting strategies, and locally specialized products and services.

e) Limitations of the experiment, such as the fact that only fast food restaurants are taken as

targets, may also contribute to the unexpected results.

38

7. Field Expansion

From the experiment and results of sentiment analysis in the restaurant domain, it can be seen that

the proposed approach is quite promising for this kind of analysis, and informative conclusions

considering the food culture have been drawn by carrying out the analysis. However, due to the

restriction to a special field, the representativeness and the transferability of the approach are still

unclear and should be further testified. To this end, in this section, the proposed sentiment analysis

approach is applied to the travel domain. The basic methods and experiment steps stay the same,

except that the dictionaries and datasets are reconstructed, and all the classifiers are retrained based

on the travel related data.

7.1 Data

2,113,624 travel related tweets (from Sep. 2013 to Dec. 2013) and 42,769 travel related reviews are

collected as Twitter dataset and review dataset respectively. As for the collection of Twitter data, the

names of 12 world attractions (i.e. Great Wall of China, Mountain Fuji, Matterhorn, Sydney Opera

House, Statue of Liberty, Colosseum, Louvre Museum, Grand Canyon, Machu Picchu, Angkor Wat,

Eiffel Tower, Taj Mahal), along with a list of travel related keywords which are selected according to

the occurrence frequency in the review dataset, are taken as the filtering condition. The target

languages are the same as before, and the target countries and their corresponding codes are listed in

Table 7.1. Totally 34 languages and 50 countries are taken into consideration.

Table 7.1: Target Countries and Their ISO 3166-1 Codes (Tourism Domain)

United States (US), United Kingdom (GB), Australia (AU), Indonesia (ID),

Malaysia (MY), Canada (CA), Philippines (PH), Singapore (SG), Brazil

(BR), India (IN), South Africa (ZA), Japan (JP), Mexico (MX), France (FR),

Netherlands (NL), Greece (GR), Thailand (TH), China (CN), Russia (RU),

Spain (ES), Argentina (AR), Chile (CL), South Korea (KR), Germany (DE),

Italy (IT), Ireland (IE), Venezuela (VE), Colombia (CO), Poland (PL), Egypt

(EG), Viet Nam (VN), Salvador (SV), Slovenia (SI), Sweden (SE), Panama

(PA), Norway (NO), Saudi Arabia (SA), Latvia (LV), Kazakhstan (KZ),

Kuwait (KW), Cambodia (KH), Greenland (GL), Estonia (EE), Ecuador

(EC), Denmark (DK), Czech (CZ), Switzerland (CH), Bulgaria (BG), Bel-

gium(BE), Austria (AT)

7.2 Experiment

As for the spam filtering step, a spam classifier with performance of 92.5% accuracy is trained. It is

used to filter the original Twitter dataset and discard 7.8% ‘spam’ tweets.

In the sentiment classification step, all the combinations of the previously proposed 6 features are

applied to train the subjectivity classifier and the polarity classifier. After the 378 implementations

of the experiment, Top-7 test results of the subjectivity classifiers and the polarity classifiers are

shown in Table 7.2 and Table 7.3.

39

Table 7.2: Subjectivity Classifiers Performance


79.7%

74.9%

81.7%

82.4%

83.0%

83.1%

84.3%

Table 7.3: Polarity Classifiers Performance


82.2%

89.5%

91.2%

93.6%

94.3%

94.9%

96.4%

As it can be figured out in the above tables, the best-performed subjectivity classifier (with an

accuracy of 84.3%) is obtained by the features combination of ‘syn’, ‘5s’, and ‘rv’, with SVM RBF

training method, while the best-performed polarity classifier (with an accuracy of 96.4%) is ob-

tained by the features combination of ‘rv’, ‘5s’, ‘win3’, and ‘cca’, with the SVM polynomial

training method. These two classifiers are used to sequentially classify all the travel related Twitter

data into positive, neutral, and negative groups, and give each tweet in these groups a sentiment

score of 1, 0, or -1.

Comparing the performances of the classifiers in tourism domain and restaurant domain, it can

be found that despite the same feature combinations, the best-performed spam classifier for res-

taurant domain achieves higher accuracy than that for tourism domain, and both best-performed

subjectivity classifier and polarity classifier for tourism domain outperform their counterparts for

restaurant domain. These disparities may adequately demonstrate the difference between data of

the two domains, and also lend credit to the necessity and significance of using domain-exclusive

data for training and test tasks in basic and expansion experiments.

7.3 Analysis

7.3.1 Statistical Analysis

First, basic statistical analysis is taken out to obtain a general overview of these attractions in the 50

countries. Table 7.4 lists out the amount of preprocessed tweets for each target country. Figure 7.1

shows the distribution of tweets over the 12 attractions in each country.

40

Table 7.4: Tweet Amount for Each Country (Tourism Domain)

US 155,150 CN 10,002 PA 254

GB 25,733 RU 10,727 NO 227

AU 6,661 ES 14,150 SA 2,360

ID 37,361 AR 8,387 LV 243

MY 8,369 CL 4,689 KZ 267

CA 16,023 KR 2,867 KW 1,237

PH 2,932 DE 2,686 KH 203

SG 4,639 IT 4,171 GL 5,489

BR 12,879 IE 1,798 EE 208

IN 8,016 VE 8,576 EC 11,955

ZA 1,737

CO 2,652

DK 305

JP 21,512 PL 445 CZ 266

MX 5,832 EG 1,527 CH 953

FR 29,012 VN 201 BG 279

NL 19,885 SK 378 BE 1,285

GR 21,021 SI 199 AT 396

TH 28,022 SE 855

Figure 7.1: General Distribution of tweets in Tourism Domain

From the above table and distribution graph, it can be concluded that:

a) Similar with the restaurant domain, there is a great difference among the tweet amounts for the

target countries, and tweets from the United States predominate in quantity;

b) In different countries, the distribution of tweets over the 12 attractions is quite different;

c) The overall distributions of the 12 attractions are also quite biased that tweets for some attrac-

tions such as Eiffel Tower are much more than those for other attractions;

d) These distributions may give information for the popularities of each attraction in each country.

For example, tweets about Angkor Wat in Cambodia (KH) and Viet Nam (VN) are evidently

more those in other countries, which may indicate that Angkor Wat is more popular with

Cambodian and Vietnamese than with people from other parts of the world.

41

7.3.2 Basic Sentiment Analysis

7.3.2.1 Polarity Distribution

To figure out the proportions of positive, negative, and objective tweets for each country and for

each attraction, the polarity distribution graphs are plotted, as shown in Figure 7.2~7.13. The rose

color stands for the positive tweets, the azure color stands for the negative sentiment, and the lemon

yellow stands for the objective tweets.

Figure 7.2: Polarity Distribution for Great Wall of China

Figure 7.3: Polarity Distribution for Mount Fuji

Figure 7.4: Polarity Distribution for Matterhorn

42

Figure 7.5: Polarity Distribution for Sydney Opera House

Figure 7.6: Polarity Distribution for Statue of Liberty

Figure 7.7: Polarity Distribution for Colosseum

Figure 7.8: Polarity Distribution for Louvre Museum

43

Figure 7.9: Polarity Distribution for Great Canyon

Figure 7.10: Polarity Distribution for Machu Picchu

Figure 7.11: Polarity Distribution for Angkor Wat

Figure 7.12: Polarity Distribution for Eiffel Tower

44

Figure 7.13: Polarity Distribution for Taj Mahal

From the above polarity distribution graphs, conclusions can be obtained that:

a) For different attractions, the general distributions of the positive, negative, and objective tweets

are quite different. For example, by and large Sydney Opera House owns more positive and

fewer negative tweets than Statue of Liberty, which indicates an overall better attitude towards

Sydney Opera House.

b) Objective tweets have a larger amount than positive and negative tweets in almost all the cases,

and for all the target attractions, positive tweets outnumber negative tweets by a large edge;

c) For the same attraction, people from different countries seem to have various opinions. For

example, as for Eiffel Tower, Italian people seem to have more complaints than people from

other countries, since Italy possesses more negative tweets than other countries.

7.3.2.2 Sentiment Map

After calculating the average sentiment score for each country, the sentiment maps for the target

restaurants are depicted in Figure 7.14~7.25. As for the gradient color axis, green represents nega-

tive sentiment; and red represents positive sentiment.

Figure 7.14: Sentiment Score Map of Great Wall of China

45

Figure 7.15: Sentiment Score Map of Mount Fuji

Figure 7.16: Sentiment Score Map of Matterhorn

Figure 7.17: Sentiment Score Map of Sydney Opera House

46

Figure 7.18: Sentiment Score Map of Statue of Liberty

Figure 7.19: Sentiment Score Map of Colosseum

Figure 7.20: Sentiment Score Map of Louvre Museum

47

Figure 7.21: Sentiment Score Map of Great Canyon

Figure 7.22: Sentiment Score Map of Machu Picchu

Figure 7.23: Sentiment Score Map of Angkor Wat

48

Figure 7.24: Sentiment Score Map of Eiffel Tower

Figure 7.25: Sentiment Score Map of Taj Mahal

The above sentiment maps demonstrate the overall distributions of people’s opinions by repre-

senting the sentiment by gradient color. Compared to the polarity distribution graphs, this form of

sentiment presentation is more intuitive and gives an image of geographical relationship of the

target countries.

7.3.2.3 Sentiment Keywords Extraction

Similar with the analysis in restaurant domain, besides the overall distributions of people’s

opinions for the target attractions, we want more information about specific reasons why people like

or dislike a target, and the concrete characteristics of a target that form people’s opinions. To achieve

this objective, the frequently occurred sentiment words, either positive or negative, are extracted

with their frequencies for each target attraction, and the tag cloud is used to describe these repre-

sentative sentiment keywords. Figure 7.26~7.37 give the tag clouds for the targets. The white

background indicates the positive sentiment, and the black background indicates the negative sen-

timent. The size of the word denotes the occurrence frequency, and the multicolor of the word has no

49

special significance.

Figure 7.26: Tag Cloud of Sentiment Keywords for Great Wall of China

Figure 7.27: Tag Cloud of Sentiment Keywords for Mount Fuji

Figure 7.28: Tag Cloud of Sentiment Keywords for Matterhorn

50

Figure 7.29: Tag Cloud of Sentiment Keywords for Sydney Opera House

Figure 7.30: Tag Cloud of Sentiment Keywords for Statue of Liberty

Figure 7.31: Tag Cloud of Sentiment Keywords for Colosseum

51

Figure 7.32: Tag Cloud of Sentiment Keywords for Louvre Museum

Figure 7.33: Tag Cloud of Sentiment Keywords for Great Canyon

Figure 7.34: Tag Cloud of Sentiment Keywords for Machu Picchu

52

Figure 7.35: Tag Cloud of Sentiment Keywords for Angkor Wat

Figure 7.36: Tag Cloud of Sentiment Keywords for Eiffel Tower

Figure 7.37: Tag Cloud of Sentiment Keywords for Taj Mahal

Compared with the tag clouds of the restaurant domain, the tag clouds for world attractions seem

to be more informative and meaningful. For instance, words like ‘famous’, ‘masterpiece’, ‘renais-

sance’, ‘worthy’, ‘treasure’, ‘gorgeous’ are particular or representative for the positive aspect of

Louvre Museum, and words like ‘crowded’, ‘dirty’, ‘boring’, ‘confusing’ may relate to the negative

53

aspect of Louvre Museum. Also, as for Mount Fuji, the positive features can be described by

‘beautiful’, ‘clear’, ‘milky’, ‘blossom’, ‘picturesque’, and ‘fresh’, while the negative descriptions

include words like ‘suicide’, ‘cold’, ‘dangerous’, ‘frozen’, and ‘invisible’. Based on these special

keywords, we may easily obtain some important hints or underlying facts for the pros and cons of

the target attractions. For example, words such as ‘battle’, ‘fight’, ‘blood’, ‘death’, ‘brutality’,

‘beast’, and ‘barbaric’ in the negative keyword set of Colosseum lend sufficient support to the

inference that the trip to Colosseum may remind tourists of the cruel history of Colosseum in An-

cient Roma. Also, people think Eiffel Tower is a romantic, gorgeous place, so when referring to the

Tower, they use words like ‘romantic’, ‘lover’, ‘illuminating’, ‘kiss’, ‘dream’, ‘sparkling’, and

‘splendor’.

7.3.2.4 (Attribute, Value) Pairs Extraction

Besides the above tag clouds, we still want more detailed information and have a closer observation

over these world attractions. Similar with the step in restaurant domain, the sentiment expressive

word pairs are extracted, each of which typically but not restrictedly consists of one noun (attribute)

and one adjective (value). Table 7.5~7.16 give the (Attribute, Value) lists of the target attractions.

Red color and green color represent positive and negative sentiment respectively. Numbers fol-

lowing the value words denote frequencies.

Table 7.5: (Attribute, Value) list of Great Wall of China

Attribute Value

greatwall first 61, real 53, white 38, chinese 33, old 27, long 26, famous 21, positive 14, free 13, high

13, large 13, artificial 11, good 11, visible 11, longest 10, useful 9, best 9, international 8,

largest 8, particular 8, northern 7, big 7, chángchéng 6, robust 6, ancient 6, beautiful 6, enor-

mous 5, greater 5, manual 5, north 5, fantastic 5, favorite 5, huge 5, original 4, technical 4

space visible 256, great 102, real 6, international 5

cemetery longest 413, earth 412, largest 23

wonder new 24, great 10, ancient 5, world 5

photos great 342, light 32, pile 14, famous 6, favorite 5

building great 190, visible 12, biggest 9, long 7, high 7, chinese 5, largest 5

length great 79, total 27, full 18, entire 10, conventional 9, central 6

heritage cultural 21, great 8, eternal 7, mutual 6, desirable 5, famous 5

information useful 17, give 12, private 10, chinese 9, great 7, important 6, interesting 6

dynasty old 19, great 7, boundary 6, successive 6

walk great 101, lazy 21, whole 15, confident 10, difficult 7, entire 7, long 5, toughest 5

scenery beautiful 52, great 12, incredible 7

Table 7.6: (Attribute, Value) list of Mount Fuji

Attribute Value

mountfuji beautiful 282, japanese 59, best 41, favorite 32, highest 28, high 27, good 22, famous 19, flow-

er 18, cultural 18, visible 17, top 17, big 17, eggplant 16, shizuoka 15, scenic 15, clear 13, rich

13, sunny 12, clean 11, dangerous 10, green 10, majestic 10, nice 10, powerful 9, available 9,

black 8, environmental 8, gorgeous 8, milky 7, prefectural 7, special 7, white 6, disappointing

5, distant 5, great 5, large 5, natural 5, spectacular 5

forest suicide 3128, high 26, thin 26, snowy 14, beautiful 8, haunted 7, scary 7, special 5

cloud lenticular 38, top 10, famous 8, beautiful 6

mountain highest 155, beautiful 15, famous 9, japanese 8, immortal 6, big 5, good 5, high5, huge 5

54

view beautiful 31, great 11, clear 10, autumnal 9, good 9, panoramic 8, distant 6, beautiful 6, differ-

ent 5, wonderful 5, breathtaking 5

sky blue 11, beautiful 9, clear 8, special 6, clean 5

beauty majestic 12, natural 8, mystical 7, beautiful 6, great 6, fantastic 5

Table 7.7: (Attribute, Value) list of Matterhorn

Attribute Value

matterhorn italian 620, fliegner 146, zermattlive 36, thunder 34, national 26, beautiful 18, good 15,

gornegrat 12, best 11, famous 10, fantastic 10, mysterious 10, large 8, swiss 8, majestic 8,

wonderous 7, big 6, great 6, blue 6, classic 6, favorite 5, impressive 5, top 5

mountain magic 218, famous 19, everest 16, swiss 12, main 10, high 8, beautiful 6, big 6, fliegner 6

switzerland fantastic 17, beautiful 13, special 7

zermatt royal 16, beautiful 13, free 8, best 7, cardinal 7, glacier 7, glorious 5, good 5, great 4, marvel-

ous 4, special 4

alps swiss 17, highest 10, glacier 5, italian 5, european 5, great 5

view nice 13, great 8, bad 6, beautiful 6, classic 6, spectacular 5

photo vurtual 12, wonderous 7, amazing 6, best 5, magical 5, majestic 5

Table 7.8: (Attribute, Value) list of Sydney Opera House

Attribute Value

Sydney opera

house

new 380, vivid 60, beautiful 19, mobile 12, famous 22, conceptual 20, initial 18, iconic 89,

great 30, beautiful 11, new 9, open 9, spectacular 8, big 8, best 7, classic 6, epic 6, impressive

6, large 5, monumental 5, wonderful 5

fireworks new 386, spectacular 24, first 12, current 9, massive 8, open 8, anniversary 6, beautiful 6,

traditional 6

night beautiful 12, great 11, incredible 5, magical 5, special 5

concert famous 10, live 8, full 6, large 6

heritage cultural 13, immortal 5, unique 5

harbour beautiful 18, modern 9, new 6, iconic 5, light 5

building modern 19, classic 8, great 7, royal 7, shine 6, circular 5, iconic 5, important 5

show fantastic 12, final 10, new 8, beautiful 5, full 5, good 5, light 5

Table 7.9: (Attribute, Value) list of Statue of Liberty

Attribute Value

statue of

liberty

present 2476, new 589, original 302, high 179, snow 176, top 119, rain 85, available 77, black

65, major 64, european 44, big 42, small 39, beautiful 33, good 28, own 22, resemble 22,

visible 22, old 21, italian 18, real 16, classic 14, national 13, american 11, green 9, greatest 8,

nice 8, memorial 7, cultural 7, gorgeous 6, best 6, cute 6, great 6, famous 6, iconic 6 tall 6,

visible 5, incredible 5, french 5, huge 5, solid 5, commemorative 5, giant 5, manhattan 5, open

5, contemporary 5, modern 5, creative 5

torch good 79, original 10, green 6, impressive 6

view beautiful 24, top 13, great 10, nice 7, clear 5, gorgeous 5, manhattan 5

photo famous 113, unique 113, rare 10, historic 5, great 5, original 5

history sad 70, natural 10, american 6, various 6, real 5

park central 46, national 16, new 11, main 7, cultural 5, interesting 5, small 5

tour new 23, incredible 12, finest 8, boat 6, great 6

place wrong 15, best 10, dangerous 9, mythical 7, better 7, biggest 6, favorite 6, great 6, interesting

5, memorial 5

55

Table 7.10: (Attribute, Value) list of Colosseum

Attribute Value

coloseum roman 271, vatican 223, beautiful 42, iconic 27, famous 25, ancient 24, modern 21, good 21,

great 19, big 19, popular 18, huge 18, legendary 18, magnificent 17, special 16, eternal 16,

favorite 15, largest 15, bad 14, gorgeous 14, greatest 14, historic 14, spectacular 13, awesome

13, biggest 13, classical 12, immortal 12, impressive 12, incredible 12, large 11, wonderful 10

rome beautiful 63, ancient 47, vatican 26, archaeological 20, famous 16, historic 15, iconic 14, clas-

sic 14, eternal 12, great 10, incredible 10

city vatican 238, magical 27, big 25, dangerous 19, italian 16, beautiful 15, eternal 13, major 11,

bad 10, gorgeous 10

time first 135, next 57, long 37, great 21, roman 16, greatest 15, free 12, possible 12, ancient 11,

considerable 10

heritage best 34, mutual 33, cultural 11, historical 11

place different 29, favorite 26, good 22, beautiful 18, classical 15, lucky 14, spectacular 14, bad 12,

big 12, interesting 11

emperor ancient 28, stupid 22, roman 16, ephemeral 11, pragmatic 10, rich 10, vespasian 8

monument famous 19, historic 15, iconic 15, european 13, architectural 11, beautiful 9

building dangerous 14, previous 13, impressive 12, mediterranean 12, roman 10, cathedral 9, favorite 8

Table 7.11: (Attribute, Value) list of Louvre Museum

Attribute Value

Louvre

museum

great 201, marble 104, famous 82, national 75, unusual 74, beautiful 66, major 27, good 26,

pyramid 25, large 25, cultural 25, big 18, immersive 17, largest 16, best 15, wide 14, french

12, original 12, spectacular 12, cathedral 11, biggest 11, special 10, majestic 10, exclusive 9,

favorite 9, iconic 8, perfect 8, wonderful 8, nice 8, imaginary 7, incredible 7 gorgeous 7, huge

6, interesting 6, majestic 5, important 5, natural 5, royal 5, bad5

paris beautiful 29, famous 19, good 17, favorite 14, cultural 12, documentary 10, worth 9, cathedral

8, cool 7, incredible 6

photo beautiful 11, romantic 10, cute 10, classic 7, royal 7, incredible 6

art islamic 30, famous 16, important 15, beautiful 14, contemporary 14, modern 13, real 12, great

12, western 10, academic 10, asian 10, classic 10, conceptual 9, eastern 9, egyptian 9, incredi-

ble 9, national 8, religious 8, superior 6, worthy 6

painting famous 305, italian 25, european 15, royal 14, beautiful 8, favorite 6

place secure 177, beautiful 23, great 22, special 20, good 19, interesting 17, favorite 15, best 11,

famous 11, favorite 10

exhibition special 20, international 17, islamic 16, mediterranean 15, large 13, cool 12, french 10, great

10, japanese 9, modern 9, ancient 8, beautiful 8

masterpiece worthy 190, crazy 14, neoclassical 12, favorite 11, specific 10, various 8

Table 7.12: (Attribute, Value) list of Grand Canyon

Attribute Value

grand canyon national 1507, green 236, great 176, natural 154, beautiful 134, incentive 112, best 74, com-

mon 49, cool 37, big 37, good 33, large 30, huge 28, original 25, special 24, american 21,

gorgeous 20, largest 20, majestic 19, famous 17, vertical 16, wonderful 16, giant 16, epic 15,

bad 14, nice 14, fabulous 11, incredible 11, spectacular 10, top 10, catastrophic 9, celestial 8,

dangerous 8, quiet 8, rocky 8, worth 8, glorious 7, scary 6, breathtaking 6, scenic 5

fog heavy 1622, massive 16, breathtaking 8, great 6, rare 6

valley large 84, beautiful 35, national 9, majestic 6, best 6, big 5, good 5

trip spontaneous 15, best 14, great 13, recent 11, good 10, important 10, memorable 9, national 8

view nice 45, beautiful 13, spectacular 12, great 11, best 10, common 10, google 8, new 7, rustic 7,

56

gorgeous 6, panoramic 6, bad 6, breathtaking 6

phenomenon rare 271, atmospheric 34, natural 20, beautiful 17, great 11, gorgeous 10, mysterious 7, special

6

place beautiful 25, exotic 23, great 20, best 15, good 14, public 13, dangerous 13, peaceful 12, nice

10, wonderful 10, exceptional 8, magical 7

heritage hidden 33, best 31, natural 24, majestic 15, world 14, mutual 13, visible 11, cultural 8, dan-

gerous 6, national 6

Table 7.13: (Attribute, Value) list of Machu Picchu

Attribute Value

machu picchu best 837, historical 168, old 167, beautiful 103, ancient 65, top 46, important 38, incredible 32,

historic 31, centenary 27, great 26, cultural 25, botanical 22, famous 18, unforgettable 17, wild

17, good 15, favorite 12, wonderful 11, majestic 10, gorgeous 9, impressive 9, mysterious 7,

national 7, nice 6, popular 6, unique 6, agricultural 5, fantastic 5, healthy 5, indigenous 5,

interesting 5

travel best 65, expensive 9, extraordinary 7, important 7, cultural 6

city mysterious 185, lost 63, ancient 48, familiar 19, legendary 11, good 10, enigmatic 9, indige-

nous 7, large 6, beautiful 5

place best 953, historical 824, worldwide 156, mysterious 26, historic 17, beautiful 13, magical 11,

fantastic 11, wonderful 6

guide shamanic 178, andean 162, famous 72, spiritual 56, famed 46

heritage important 21, historic 16, natural 10, cultural 8, famous 5, popular 5

stairs vertical 191, dangerous 18, various 8

ruins major 19, full 18, historic 15, ancient 12, famous 10, fantastic 8, incredible 7

Table 7.14: (Attribute, Value) list of Angkor Wat

Attribute Value

angkorwat popular 602, famous 168, international 79, ancient 43, beautiful 46 largest 38, great 31, good

24, cultural 22, mysterious 22, nice 19, various 18, spiritual 16, rustic 15, religious 12, arche-

ological 10, best 9, cambodian 9, spiritual 8, panoramic 7, impressive 7, top 6, traditional 6

temple largest 48, ancient 29, imperial 18, golden 15, big 13, huge 11, buddhist 8, beautiful 7, gor-

geous 5, religious 5

heritage world 38 cultural 26, famous 8, fashionable 8, great 7, popular 6, bad 6, beautiful 5

travel overseas 27, domestic 14, popular 12, national 10, cultural 6

time private 78, spacious 68, long 16, tropical 12, limited 10, peaceful 9, closing 8, wonderful 6

ruins famous 16, particular 14, mysterious 13, great 10, spiritual 11, huge 10, ancient 8, good 5

city ancient 23, exotic 20, imperial 17, historic 8, magical 5, beautiful 5

place good 13, best 12, great 10, beautiful 6, religious 6, unbelievable 6

people local 116, architectural 20, reminiscent 18, rustic 13, special 10, materialistic 8, shy 6

Table 7.15: (Attribute, Value) list of Eiffel Tower

Attribute Value

eiffeltower beautiful 751, new 621, tallest 423, lighter 221, cute 165, wonderful 152, high 132, big 130,

good 117, top 106, romantic 99, famous 98, great 84, best 72, different 65, glamorous 63,

global 48, colorful 46, nice 45, visible 44, electric 43, gorgeous 43, large 41, old 37, bad 36,

highest 35, cool 33, original 33, french 27, perfect 26, giant 25, huge 25, favorite 23, fashion-

able 22, magical 22, special 22, spectacular 18, iconic 15, gorgeous 14, bright 14, incredible

13, fantastic 12, hilarious 11, majestic 10, magnificent 10

paris romantic 3122, beautiful 401, wonderful 174, top 49, best 45, famous 32, good 28, big 17,

french 15, large 10

france various 457, beautiful 193, romantic 35, honeymoon 14, good 10

57

view spectacular 880, beautiful 792, amazing 573, different 78, top 48, great 19, artistic 13, won-

derful 12, incredible 12, google 10, good 8, nice 6, magnificent 5, fantastic 5, wonderful 5

picture beautiful 1296, wallpaper 36, good 19, phenomenal 11, wonderful 7, amazing 6

place Popular1733, romantic 68, exotic 63, favorite 43, happy 30, beautiful 27, best 10, famous 8,

good 6, special 5

photo wonderful 382, rare 234, instant 44, good 29, glamorous 24, beautiful22, gorgeous 17, special

11, best 10, beautiful 8, great 7

fireworks beautiful 20, happy 12, romantic 12, special 10

Table 7.16: (Attribute, Value) list of Taj Mahal

Attribute Value

tajmahal beautiful 127, red 108, famous 70, private 51, magnificent 36, great 23, prestigious 23, big 22,

good 20, golden 19, classic 18, iconic 17, magical 17, modern 17, open 15, royal 15, expen-

sive 14, indian 12, majestic 12, special 12, greatest 9, incredible 8, peaceful 8, architectural 7,

authentic 6, best 6, favorite 5, gorgeous 5, marble 5, symmetrical 5

india cultural 34, incredible 32, good 27, ancient 21, magical 15, greatest 13, awesome 10, delicious

8, famous 6, historical 6

story true 28, beautiful 23, sad 13, greatest 11, new 8, eternal 5

tour private 34, magical 12, perfect 12, special 10, indian 6, incredible 5

place divine 29, beautiful 14, best 10, favorite 8, great 5, terrible 5

building huge 25, marble 22, white 14, beautiful 10, impressive 8, funeral 6, original 6

city

atlantic 73, famous 13, indian 10, blue 8, authentic 6, beautiful 5, industrial 5, interesting 5,

romantic 5, expensive 4

architecture islamic 16, mughal 12, beautiful 8, marvellous 7, great 6, persian 5, historical 4

The above (Attribute, Value) lists contain plentiful information of the characteristics of the target

attractions. From these lists, it is relatively easy to figure out the specific reasons to explain people’s

opinions for the attractions.

7.3.3 Culture-based Analysis

Then, k-means method is used to cluster the 50 countries into several groups, according to their

sentiment scores over the 12 attractions. Here, k is empirically set as 4~10, and figure 7.38~7.44

show the world maps presenting the corresponding clustering results. Countries filled with the same

color are from the same cluster.

58




59




60


Based on these clustering result maps, the following information can be obtained:

a) While k is set as 4, EG, SA, and KW form a group, which proves both the location-based and

language-based cultural effects. Moreover, most European countries are clustered into the

same group, which also suggests the existence of location-based cultural effects;

b) While k is set as 5, the European countries are further attributed to two groups, which may

roughly be divided by the boundary of Western and Eastern Europe. This phenomenon can

also in a way demonstrate the location-based cultural effects;

c) While k is set as 6, RU, KZ, and PL become a separate group, which indicates the loca-

tion-based cultural effect.

d) While k is set as 7, AR, PA, CO, and EC form a new group, manifesting the location-based

and language-based cultural effects;

e) While k is set as 8, the two English-speaking Southeast countries—MY and PH, form a sepa-

rate cluster, which also suggests the location-based and language-based cultural effects;

f) While k is set as 9 and 10, the clusters are further subdivided into smaller groups, but not so

much special information can be gained.

While only focusing on the k=8 situation, the clustering result is:

US, GB, AU, CA, SG, BR, JP, NL, GR, CN, KR, VE, CL, IE, SE;

RU, PL, KZ;

MY, PH;

ID, MX,TH, VN, KH;

DE, SK, SI, EE, LV, BG;

IN, ZA FR, ES, IT, NO, GL, DK, CZ, CH, BE, AT;

AR, PA, CO, EC;

EG, KW, SA.

Based on the above clustering result, several conclusions can be drawn.

a) The location-based cultural effects on the user evaluations for world attractions are obvious

for some countries, such as the group of MY and PH, the group of EG, KW, and SA, and

neighboring countries in Europe, North America, and Southeast Asia;

b) The language-based cultural effects also exist that most typical English-speaking countries are

61

in the same cluster, including US, GB, AU, and CA;

c) While considering the opinions towards world attractions, the boundary between North

America and South America is blurring, especially compared to the clustering result in res-

taurant domain.

d) It seems a little confusing that the three main East Asian countries, i.e. JP, CN, KR, are all

clustered into the same group with American countries. To explain this result, some underly-

ing facts should further be excavated.

7.3.4 Comparison of the Two Domains

Finally, by comparing the clustering results of the two domains, it can be concluded that

a) The cultural effects on user evaluations for different domains are not the same. While some

countries constantly belong to the same cluster, the groupings for other countries do not re-

main the same;

b) The disparity of attitudes between North America and South America seem to be more signif-

icant towards food than towards traveling;

c) From the results of both fields, it seems that the Asian countries have rather varied cultural

backgrounds in comparison with other areas;

d) As for European countries, despite the relatively small territorial area, people’s evaluations

vary from culture to culture, which may be attributed to the diversity of languages in these

countries;

e) The main English-speaking countries seem to share more similarities in cultural effects on

user evaluations regardless of the fields, which may indicate the language-based cultural ef-

fects;

f) For both domains, the location-based cultural effects are quite obvious, which means that

countries with close geographical positions tend to hold similar attitudes towards restaurants

or attractions;

g) The limitations of the experiments may have led to some unexpected results. For example, in

both cases, CN has been grouped with Western countries, which may partly due to the block

of Twitter in China mainland.

62

8. Conclusion

In this research, the relationship between user evaluations and cultural backgrounds for the res-

taurant domain was first investigated. This investigation was based on more than 30 countries

around the world, and tweets written in more than 30 languages were analyzed. The main steps

included data preprocessing, spam filtering, subjectivity classification, polarity classification, and

a series of analysis. Three key classifiers (i.e. spam classifier, subjectivity classifier, and polarity

classifier) were trained with a range of different implementations, and they achieved the accuracy

of 97.8%, 78.4%, and 91.1% respectively. The later steps of statistical analysis, basic sentiment

analysis, and culture-based analysis brought us instructive results considering the cultural effects on

user evaluations for restaurants.

Then, the same approach of sentiment analysis was applied to the tourism domain to prove the

transferability of the proposed methods. The three main classifiers achieved the accuracy of 92.5%,

84.3%, and 96.4%, and by applying these sequential classifiers, a series of sentiment analysis were

carried out for the tourism domain, and informative results were obtained.

Through these cross-domain investigations of user evaluations, conclusion can be reached that

the cultural effects on user evaluations for both restaurant domain and tourism domain actually

exist, and are quite obvious for some countries and cultural backgrounds. The proposed approach

has also been proved to be capable of cross-lingual sentiment analysis, and is transferable to other

fields.

As the next steps, first, other latent elements besides the cultural background should be further

investigated, so as to figure out the underlined facts that can explain for some unexpected results

of certain countries. Then, other possible expansions, including the expansion to other domains,

should be further considered.

63

Acknowledgements

Upon the completion of this thesis, I would extend my heartfelt gratitude to a number of people.

First, my faithful gratitude should go to Prof. Yamana, my supervisor, who has taught and sup-

ported me so much throughout the 2-year graduate life. Owning to his insightful guidance and

comment on my research as well as patient revising, this thesis has eventually come to fruition.

My sincere acknowledgement also goes to all the professors and teachers, who have ever taught

me during the master course. It is precisely because of their careful and responsible teaching that I

can lay a solid foundation for my study and research.

I would also express my cordial thanks to the students in Yamana laboratory, for their valuable

advices and enthusiastic help, either in the academic aspect or in the life aspect.

Finally, I give my heartiest gratitude to Ting Hsin Group and Waseda University, for their gen-

erous and constant support for my study life in Japan. The full-scholarship master program pro-

vides me with the precious opportunities to acquire advanced knowledge, to broaden my horizons

and minds, and to achieve my ambitions and visions.

64

References

[1] Liu, B. (2010). Sentiment analysis and subjectivity. Handbook of natural language processing,

2, 627-666.

[2] Agarwal, A., Xie, B., Vovsha, I., Rambow, O., & Passonneau, R. (2011, June). Sentiment

analysis of twitter data. In Proceedings of the Workshop on Languages in Social Media (pp.

30-38). Association for Computational Linguistics.

[3] Brody, S., & Diakopoulos, N. (2011, July). Cooooooooooooooollllllllllllll!!!!!!!!!!!!!!: using

word lengthening to detect sentiment in microblogs. In Proceedings of the Conference on Empiri-

cal Methods in Natural Language Processing (pp. 562-570). Association for Computational Lin-

guistics.

[4] Pak, A., & Paroubek, P. (2010, May). Twitter as a Corpus for Sentiment Analysis and Opinion

Mining. In LREC.

[5] Meng, X., Wei, F., Liu, X., Zhou, M., Li, S., & Wang, H. (2012, August). Entity-centric top-

ic-oriented opinion summarization in twitter. In Proceedings of the 18th ACM SIGKDD interna-

tional conference on Knowledge discovery and data mining (pp. 379-387). ACM.

[6] Wong, F. M. F., Sen, S., & Chiang, M. (2012, August). Why watching movie tweets won't tell

the whole story?. In Proceedings of the 2012 ACM workshop on Workshop on online social net-

works (pp. 61-66). ACM.

[7] Guo, H., Zhu, H., Guo, Z., Zhang, X., & Su, Z. (2010, October). OpinionIt: a text mining sys-

tem for cross-lingual opinion analysis. In Proceedings of the 19th ACM international conference

on Information and knowledge management (pp. 1199-1208). ACM.

[8] Bautin, M., Vijayarenu, L., & Skiena, S. (2008, April). International Sentiment Analysis for

News and Blogs. In ICWSM.

[9] Nakasaki, H., Kawaba, M., Utsuro, T., & Fukuhara, T. (2009). Mining

cross-lingual/cross-cultural differences in concerns and opinions in blogs. In Computer Processing

of Oriental Languages. Language Technology for the Knowledge-based Economy (pp. 213-224).

Springer Berlin Heidelberg.

[10] Tumasjan, A., Sprenger, T. O., Sandner, P. G., & Welpe, I. M. (2010). Predicting Elections

with Twitter: What 140 Characters Reveal about Political Sentiment. ICWSM, 10, 178-185.

[11] Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and trends

in information retrieval, 2(1-2), 1-135.

[12] Go, A., Bhayani, R., & Huang, L. (2009). Twitter sentiment classification using distant su-

pervision. CS224N Project Report, Stanford, 1-12.

[13] Barbosa, L., & Feng, J. (2010, August). Robust sentiment detection on twitter from biased

and noisy data. In Proceedings of the 23rd International Conference on Computational Linguistics:

Posters (pp. 36-44). Association for Computational Linguistics.

[14] Kouloumpis, E., Wilson, T., & Moore, J. (2011). Twitter sentiment analysis: The good the

bad and the omg!. ICWSM, 11, 538-541.

[15] Saif, H., He, Y., & Alani, H. (2012). Semantic sentiment analysis of twitter. In The Semantic

Web–ISWC 2012 (pp. 508-524). Springer Berlin Heidelberg.

[16] Hu, X., Tang, L., Tang, J., & Liu, H. (2013, February). Exploiting social relations for senti-

ment analysis in microblogging. In Proceedings of the sixth ACM international conference on

Web search and data mining (pp. 537-546). ACM.

65

[17] Cesarano, C., Picariello, A., Recupero, D. R., & Subrahmanian, V. S. (2007). The OASYS

2.0 Opinion Analysis System. ICWSM, 7, 313-314.

[18] Abbasi, A., Chen, H., & Salem, A. (2008). Sentiment analysis in multiple languages: Feature

selection for opinion classification in Web forums. ACM Transactions on Information Systems

(TOIS), 26(3), 12.

[19] Cui, A., Zhang, M., Liu, Y., & Ma, S. (2011). Emotion tokens: Bridging the gap among mul-

tilingual twitter sentiment analysis. In Information retrieval technology (pp. 238-249). Springer

Berlin Heidelberg.

[20] Gao, Q., Abel, F., Houben, G. J., & Yu, Y. (2012). A comparative study of users’ microblog-

ging behavior on Sina Weibo and Twitter. In User modeling, adaptation, and personalization (pp.

88-101). Springer Berlin Heidelberg.

[21] Hardoon, D., Szedmak, S., & Shawe-Taylor, J. (2004). Canonical correlation analysis: An

overview with application to learning methods. Neural computation, 16(12), 2639-2664.

[22] Chaudhuri, K., Kakade, S. M., Livescu, K., & Sridharan, K. (2009, June). Multi-view clus-

tering via canonical correlation analysis. In Proceedings of the 26th annual international confer-

ence on machine learning (pp. 129-136). ACM.

[23] Faridani, S., Bitton, E., Ryokai, K., & Goldberg, K. (2010, April). Opinion space: a scalable

tool for browsing online comments. In Proceedings of the SIGCHI Conference on Human Factors

in Computing Systems (pp. 1175-1184). ACM.

[24] Brody, S., & Elhadad, N. (2010, June). An unsupervised aspect-sentiment model for online

reviews. In Human Language Technologies: The 2010 Annual Conference of the North American

Chapter of the Association for Computational Linguistics (pp. 804-812). Association for Compu-

tational Linguistics.

66

Publications

Published:

Le, J., & Yamana, H. (2013, November). A comparative study of user evaluations of global res-

taurants under multi-cultural backgrounds. In WebDB Forum.

Le, J., & Yamana, H. (2014, March). Cross-lingual investigations of user evaluations for global

restaurants. In DEIM 2014 (B4).

To be published:

Le, J., & Yamana, H. (2014, August). Cross-domain investigations of user evaluations under the

multi-cultural backgrounds. In the 159th

DBS workshop.

Le, J., & Yamana, H. (2014, September). Cross-cultural investigations of user evaluations for mul-

tiple domains: using Twitter data. In SICSS 2014.

Documents

Master Thesis 2014 - COnnecting REpositories · Master Thesis 2014 ... later steps of sentiment analysis. ... mation from the big amount of varied data,